[
https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724471#action_12724471
]
Shalin Shekhar Mangar commented on SOLR-1229:
---------------------------------------------
Erik, the most common use-case as far as I have seen is that the primary key in
tables is different from the uniqueKey in Solr (think about multiple tables
with each having a root-entity). Yes, the pk can be transformed (or one can
alias it in sql) but this being the most common use-case, I feel pk should be
kept as-is.
Let me give a few possible cases
# The name of table's primary key is different from solr's unique key name and
the deletedPkQuery returns only one column (most common use-case)
# The name of table's primary key is different from solr's unique key name and
the deletedPkQuery returns multiple columns
# The name of table's primary key is same as solr's unique key name and the
deletedPkQuery returns only one column
# The name of table's primary key is same as solr's unique key name and the
deletedPkQuery returns multiple columns
For #1 'pk' does not matter because we can use the single columns coming back
from deletedPkQuery
For #2, 'pk' is required otherwise the user is forced to use a transformer (or
alias). For non-database use-cases (there is none right now), there is no
aliasing so the user must write a transformer
For #3, neither 'pk' nor 'uniqueKey' matters
For #4, we can use solr's uniqueKey name (I guess this is your use-case?). I
think that this is a rare use-case.
If at all, we decide to go with uniqueKey only, the right way to do that would
be to use the corresponding column-mapping for looking up the unique key. For
the example below, we should use "db-id" to lookup in the map returned by
deletedPkQuery if solr-id is the uniqueKey in solr:
{code:xml}
<field column="db-id" name="solr-id" />
{code}
However, even though the above approach is the 'right' one, it is very tricky
and hard to explain to users. Also, there could be multiple columns mapped to
same solr key (think about template for unique key for 'types' of documents
based on a flag column). This may be very error-prone.
What do you think?
> deletedPkQuery feature does not work when pk and uniqueKey field do not have
> the same value
> -------------------------------------------------------------------------------------------
>
> Key: SOLR-1229
> URL: https://issues.apache.org/jira/browse/SOLR-1229
> Project: Solr
> Issue Type: Bug
> Components: contrib - DataImportHandler
> Affects Versions: 1.4
> Reporter: Erik Hatcher
> Assignee: Noble Paul
> Fix For: 1.4
>
> Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the
> database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
> <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
> <document name="tests">
> <entity name="test"
> pk="board_id"
> transformer="TemplateTransformer"
> deletedPkQuery="select board_id from boards where deleted = 'Y'"
> query="select * from boards where deleted = 'N'"
> deltaImportQuery="select * from boards where deleted = 'N'"
> deltaQuery="select * from boards where deleted = 'N'"
> preImportDeleteQuery="datasource:board">
> <field column="id" template="board-${test.board_id}"/>
> <field column="datasource" template="board"/>
> <field column="title" />
> </entity>
> </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field. And its value is a
> template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In
> our definition, unique key of Solr document is the primary key of the top
> level entity". This of course isn't really an appropriate assumption.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.