[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Shalin Shekhar Mangar (JIRA) Fri, 26 Jun 2009 03:59:35 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724471#action_12724471
 ]


Shalin Shekhar Mangar commented on SOLR-1229:
---------------------------------------------

Erik, the most common use-case as far as I have seen is that the primary key in 
tables is different from the uniqueKey in Solr (think about multiple tables 
with each having a root-entity). Yes, the pk can be transformed (or one can 
alias it in sql) but this being the most common use-case, I feel pk should be 
kept as-is.

Let me give a few possible cases 
# The name of table's primary key is different from solr's unique key name and 
the deletedPkQuery returns only one column (most common use-case)
# The name of table's primary key is different from solr's unique key name and 
the deletedPkQuery returns multiple columns
# The name of table's primary key is same as solr's unique key name and the 
deletedPkQuery returns only one column
# The name of table's primary key is same as solr's unique key name and the 
deletedPkQuery returns multiple columns

For #1 'pk' does not matter because we can use the single columns coming back 
from deletedPkQuery
For #2, 'pk' is required otherwise the user is forced to use a transformer (or 
alias). For non-database use-cases (there is none right now), there is no 
aliasing so the user must write a transformer
For #3, neither 'pk' nor 'uniqueKey' matters
For #4, we can use solr's uniqueKey name (I guess this is your use-case?). I 
think that this is a rare use-case.

If at all, we decide to go with uniqueKey only, the right way to do that would 
be to use the corresponding column-mapping for looking up the unique key. For 
the example below, we should use "db-id" to lookup in the map returned by 
deletedPkQuery if solr-id is the uniqueKey in solr:
{code:xml}
<field column="db-id" name="solr-id" />
{code}

However, even though the above approach is the 'right' one, it is very tricky 
and hard to explain to users. Also, there could be multiple columns mapped to 
same solr key (think about template for unique key for 'types' of documents 
based on a flag column). This may be very error-prone.

What do you think?

> deletedPkQuery feature does not work when pk and uniqueKey field do not have 
> the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the 
> database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a 
> template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In 
> our definition, unique key of Solr document is the primary key of the top 
> level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Reply via email to