Re: Delta Import occasionally missing records.

2011-01-24 Thread btucker

Thank you for your response.

In what way is 'timestamp' not perfect?

I've looked into the SolrEntityProcessor and added a timestamp field to our
index.
However i'm struggling to work out a query to get the max value od the
timestamp field
and does the SolrEntityProcessor entity appear before the root entity or
does it wrap around the root entity.

On 22 January 2011 07:24, Lance Norskog-2 [via Lucene] 
ml-node+2307215-627680969-326...@n3.nabble.comml-node%2b2307215-627680969-326...@n3.nabble.com
 wrote:

 The timestamp thing is not perfect. You can instead do a search
 against Solr and find the latest timestamp in the index. SOLR-1499
 allows you to search against Solr in the DataImportHandler.

 On Fri, Jan 21, 2011 at 2:27 AM, btucker [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=2307215i=0
 wrote:

 
  Hello
 
  We've just started using solr to provide search functionality for our
  application with the DataImportHandler performing a delta-import every 1
  fired by crontab, which works great, however it does occasionally miss
  records that are added to the database while the delta-import is running.

 
  Our data-config.xml has the following queries in its root entity:
 
  query=SELECT id, date_published, date_created, publish_flag FROM Item
 WHERE
  id  0
 
  AND record_type_id=0
 
  ORDER BY id DESC
  preImportDeleteQuery=SELECT item_id AS Id FROM
  gnpd_production.item_deletions
  deletedPkQuery=SELECT item_id AS id FROM gnpd_production.item_deletions
  WHERE deletion_date =
 
  SUBDATE('${dataimporter.last_index_time}', INTERVAL 5 MINUTE)
  deltaImportQuery=SELECT id, date_published, date_created, publish_flag
 FROM
  Item WHERE id  0
 
  AND record_type_id=0
 
  AND id=${dataimporter.delta.id}
 
  ORDER BY id DESC
  deltaQuery=SELECT id, date_published, date_created, publish_flag FROM
 Item
  WHERE id  0
 
  AND record_type_id=0
 
  AND sys_time_stamp =
 
  SUBDATE('${dataimporter.last_index_time}', INTERVAL 1 MINUTE) ORDER BY id

  DESC
 
  I think the problem i'm having comes from the way solr stores the
  last_index_time in conf/dataimport.properties as stated on the wiki as
 
  When delta-import command is executed, it reads the start time stored
 in
  conf/dataimport.properties. It uses that timestamp to run delta queries
 and
  after completion, updates the timestamp in conf/dataimport.properties.
 
  Which to me seems to indicate that any records with a time-stamp between
  when the dataimport starts and ends will be missed as the last_index_time
 is
  set to when it completes the import.
 
  This doesn't seem quite right to me. I would have expected the
  last_index_time to refer to when the dataimport was last STARTED so that
  there was no gaps in the timestamp covered.
 
  I changed the deltaQuery of our config to include the SUBDATE by INTERVAL
 1
  MINUTE statement to alleviate this problem, but it does only cover times
  when the delta-import takes less than a minute.
 
  Any ideas as to how this can be overcome? ,other than increasing the
  INTERVAL to something larger.
 
  Regards
 
  Barry Tucker
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Delta-Import-occasionally-missing-records-tp2300877p2300877.htmlhttp://lucene.472066.n3.nabble.com/Delta-Import-occasionally-missing-records-tp2300877p2300877.html?by-user=t
  Sent from the Solr - User mailing list archive at Nabble.com.
 



 --
 Lance Norskog
 [hidden email] http://user/SendEmail.jtp?type=nodenode=2307215i=1


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Delta-Import-occasionally-missing-records-tp2300877p2307215.html
  To unsubscribe from Delta Import occasionally missing records., click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=2300877code=YnR1Y2tlckBtaW50ZWwuY29tfDIzMDA4Nzd8LTEzMDE5MDUxOTI=.



font size=1 face=Verdana

Mintel International Group Ltd | 18-19 Long Lane | London EC1A 9PL UK
Registered in England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for our other offices can be found at 
http://www.mintel.com/office-locations.

This email and any attachments may include content that is confidential, 
privileged, or otherwise protected
under applicable law. Unauthorised disclosure, copying, distribution, or use of 
the contents is prohibited 
and may be unlawful. If you have received this email in error, including 
without appropriate authorisation, 
then please reply to the sender about the error and delete this email and any 
attachments./font

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Delta-Import-occasionally-missing-records-tp2300877p2318572.html
Sent from the Solr - User mailing list archive at Nabble.com.


Delta Import occasionally missing records.

2011-01-21 Thread btucker

Hello

We've just started using solr to provide search functionality for our
application with the DataImportHandler performing a delta-import every 1
fired by crontab, which works great, however it does occasionally miss
records that are added to the database while the delta-import is running.

Our data-config.xml has the following queries in its root entity:

query=SELECT id, date_published, date_created, publish_flag FROM Item WHERE
id  0


AND record_type_id=0


ORDER BY id DESC
preImportDeleteQuery=SELECT item_id AS Id FROM
gnpd_production.item_deletions
deletedPkQuery=SELECT item_id AS id FROM gnpd_production.item_deletions
WHERE deletion_date =
 
SUBDATE('${dataimporter.last_index_time}', INTERVAL 5 MINUTE)
deltaImportQuery=SELECT id, date_published, date_created, publish_flag FROM
Item WHERE id  0

   
AND record_type_id=0

   
AND id=${dataimporter.delta.id}

   
ORDER BY id DESC
deltaQuery=SELECT id, date_published, date_created, publish_flag FROM Item
WHERE id  0

 
AND record_type_id=0

 
AND sys_time_stamp =

   
SUBDATE('${dataimporter.last_index_time}', INTERVAL 1 MINUTE) ORDER BY id
DESC

I think the problem i'm having comes from the way solr stores the
last_index_time in conf/dataimport.properties as stated on the wiki as 

When delta-import command is executed, it reads the start time stored in
conf/dataimport.properties. It uses that timestamp to run delta queries and
after completion, updates the timestamp in conf/dataimport.properties.

Which to me seems to indicate that any records with a time-stamp between
when the dataimport starts and ends will be missed as the last_index_time is
set to when it completes the import.

This doesn't seem quite right to me. I would have expected the
last_index_time to refer to when the dataimport was last STARTED so that
there was no gaps in the timestamp covered.

I changed the deltaQuery of our config to include the SUBDATE by INTERVAL 1
MINUTE statement to alleviate this problem, but it does only cover times
when the delta-import takes less than a minute.

Any ideas as to how this can be overcome? ,other than increasing the
INTERVAL to something larger.

Regards

Barry Tucker
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Delta-Import-occasionally-missing-records-tp2300877p2300877.html
Sent from the Solr - User mailing list archive at Nabble.com.