Hi all,

the database from which I populate the SOLR index is refreshed
"partially". Subsets of the data is deleted and readded for a certain
group identifier. Is it possible to do something alike in a (delta) import of the DataImportHandler?

Example:
SOLR-Index:
groupID: 1, PK: 1, refreshDate: [before last_index_time]
groupID: 1, PK: 2, refreshDate: [before last_index_time]
groupID: 1, PK: 3, refreshDate: [before last_index_time]

Refreshed DB:
groupID: 1, PK: 1, refreshDate: [after last_index_time]
groupID: 1, PK: 5, refreshDate: [after last_index_time]
groupID: 1, PK: 30, refreshDate: [after last_index_time]
(PK 2 and 3 are not there, anymore. PK is unique across all groupIDs)

deleteQuery="groupID:1"
(An attribute of the entity element that the DocBuilder (1.3) reads and
sends as query once, before the delta import, unchanged to the SOLR
writer to delete documents.)

After that, the delta import loads data with groupID=1 from the DB.

Could I plug into SOLR with maybe a custom processor to achieve
something in the direction of:

deleteInput="select FIELD_VALUE from TABLE where CHANGED_DATE >
'${dataimporter.last_index_time}' group by FIELD_VALUE"
deleteQuery="field:${my_entity.FIELD_VALUE}"

FIELD_VALUE is not the primary key, and the "deleteInput" query can
return multiple rows.


I am aware of SOLR-1060 and SOLR-1059 but I am not sure that those will
help me. In those cases it looks like the delete is run per entity. I
want the delete to run before the (delta)import, once.
If that impression is wrong, I'll happily switch to 1.4, of course.

Cheers!
Chantal


--
Chantal Ackermann


Reply via email to