Hello

We've just started using solr to provide search functionality for our
application with the DataImportHandler performing a delta-import every 1
fired by crontab, which works great, however it does occasionally miss
records that are added to the database while the delta-import is running.

Our data-config.xml has the following queries in its root entity:

query="SELECT id, date_published, date_created, publish_flag FROM Item WHERE
id > 0
                                                                                
        
AND record_type_id=0
                                                                                
        
ORDER BY id DESC"
preImportDeleteQuery="SELECT item_id AS Id FROM
gnpd_production.item_deletions"
deletedPkQuery="SELECT item_id AS id FROM gnpd_production.item_deletions
WHERE deletion_date >=
                                             
SUBDATE('${dataimporter.last_index_time}', INTERVAL 5 MINUTE)"
deltaImportQuery="SELECT id, date_published, date_created, publish_flag FROM
Item WHERE id > 0
                                                                                
                   
AND record_type_id=0
                                                                                
                   
AND id=${dataimporter.delta.id}
                                                                                
                   
ORDER BY id DESC"
deltaQuery="SELECT id, date_published, date_created, publish_flag FROM Item
WHERE id > 0
                                                                                
             
AND record_type_id=0
                                                                                
             
AND sys_time_stamp >=
                                                                                
   
SUBDATE('${dataimporter.last_index_time}', INTERVAL 1 MINUTE) ORDER BY id
DESC">

I think the problem i'm having comes from the way solr stores the
last_index_time in conf/dataimport.properties as stated on the wiki as 

""When delta-import command is executed, it reads the start time stored in
conf/dataimport.properties. It uses that timestamp to run delta queries and
after completion, updates the timestamp in conf/dataimport.properties.""

Which to me seems to indicate that any records with a time-stamp between
when the dataimport starts and ends will be missed as the last_index_time is
set to when it completes the import.

This doesn't seem quite right to me. I would have expected the
last_index_time to refer to when the dataimport was last STARTED so that
there was no gaps in the timestamp covered.

I changed the deltaQuery of our config to include the SUBDATE by INTERVAL 1
MINUTE statement to alleviate this problem, but it does only cover times
when the delta-import takes less than a minute.

Any ideas as to how this can be overcome? ,other than increasing the
INTERVAL to something larger.

Regards

Barry Tucker
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Delta-Import-occasionally-missing-records-tp2300877p2300877.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to