Hi - I am a newbie to Solr and would like to get some advice on the best strategy for updating the index in an environment where both content is added and searches are executed 24/7. We would also like to have the option of doing a full re-index on an as needed basis.
I was initially looking into using the SolrJ client in conjunction with Hibernate event listener and annotations on the entities. I would process entities with special search annotations and then generate the documents and send it to the Solr server using Solrj. But when looking at how doing the full re-index - I felt it started to become too complex having the solrserver ask for data from the app that would respond with the documents based on some query and annotation processing. So I started to look at the DataImportHandler where having queries run directly against the database - circumventing any integration with Hibernate. Our requirement is to keep the index updated as close to realtime as possible (max 5 min. lag). Looking at the DataImportHandler we would need to trigger it with some type of scheduler - which seems easy to set up on the master server. But I have seen comments on the mailing lists saying that running an update every 5 min could be excessive. Is that a problem? I assume it depends on how many updates there are in the timeframe - we anticipate max 100 updates have occurred in a 5 min span. Is the DataImportHandler the best approach in this case? Or are there other approaches to consider? Thanks for your time, Roger Kjensrud