The timestamp thing is not perfect. You can instead do a search
against Solr and find the latest timestamp in the index. SOLR-1499
allows you to search against Solr in the DataImportHandler.

On Fri, Jan 21, 2011 at 2:27 AM, btucker <btuc...@mintel.com> wrote:
>
> Hello
>
> We've just started using solr to provide search functionality for our
> application with the DataImportHandler performing a delta-import every 1
> fired by crontab, which works great, however it does occasionally miss
> records that are added to the database while the delta-import is running.
>
> Our data-config.xml has the following queries in its root entity:
>
> query="SELECT id, date_published, date_created, publish_flag FROM Item WHERE
> id > 0
>
> AND record_type_id=0
>
> ORDER BY id DESC"
> preImportDeleteQuery="SELECT item_id AS Id FROM
> gnpd_production.item_deletions"
> deletedPkQuery="SELECT item_id AS id FROM gnpd_production.item_deletions
> WHERE deletion_date >=
>
> SUBDATE('${dataimporter.last_index_time}', INTERVAL 5 MINUTE)"
> deltaImportQuery="SELECT id, date_published, date_created, publish_flag FROM
> Item WHERE id > 0
>
> AND record_type_id=0
>
> AND id=${dataimporter.delta.id}
>
> ORDER BY id DESC"
> deltaQuery="SELECT id, date_published, date_created, publish_flag FROM Item
> WHERE id > 0
>
> AND record_type_id=0
>
> AND sys_time_stamp >=
>
> SUBDATE('${dataimporter.last_index_time}', INTERVAL 1 MINUTE) ORDER BY id
> DESC">
>
> I think the problem i'm having comes from the way solr stores the
> last_index_time in conf/dataimport.properties as stated on the wiki as
>
> ""When delta-import command is executed, it reads the start time stored in
> conf/dataimport.properties. It uses that timestamp to run delta queries and
> after completion, updates the timestamp in conf/dataimport.properties.""
>
> Which to me seems to indicate that any records with a time-stamp between
> when the dataimport starts and ends will be missed as the last_index_time is
> set to when it completes the import.
>
> This doesn't seem quite right to me. I would have expected the
> last_index_time to refer to when the dataimport was last STARTED so that
> there was no gaps in the timestamp covered.
>
> I changed the deltaQuery of our config to include the SUBDATE by INTERVAL 1
> MINUTE statement to alleviate this problem, but it does only cover times
> when the delta-import takes less than a minute.
>
> Any ideas as to how this can be overcome? ,other than increasing the
> INTERVAL to something larger.
>
> Regards
>
> Barry Tucker
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Delta-Import-occasionally-missing-records-tp2300877p2300877.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to