Hi - I am a newbie to Solr and would like to get some advice on the best
strategy for updating the index in an environment where both content is
added and searches are executed 24/7. We would also like to have the
option of doing a full re-index on an as needed basis.

 

I was initially looking into using the SolrJ client in conjunction with
Hibernate event listener and annotations on the entities. I would
process entities with special search annotations and then generate the
documents and send it to the Solr server using Solrj. But when looking
at how doing the full re-index - I felt it started to become too complex
having the solrserver ask for data from the app that would respond with
the documents based on some query and annotation processing.

 

So I started to look at the DataImportHandler where having queries run
directly against the database - circumventing any integration with
Hibernate. Our requirement is to keep the index updated as close to
realtime as possible (max 5 min. lag). Looking at the DataImportHandler
we would need to trigger it with some type of scheduler - which seems
easy to set up on the master server. But I have seen comments on the
mailing lists saying that running an update every 5 min could be
excessive. Is that a problem? I assume it depends on how many updates
there are in the timeframe - we anticipate max 100 updates have occurred
in a 5 min span.

 

Is the DataImportHandler the best approach in this case? Or are there
other approaches to consider?

 

Thanks for your time,

Roger Kjensrud

Reply via email to