Re: Query regarding incremental index replication
On Thu, Sep 10, 2009 at 7:08 AM, Silent Surfer silentsurfe...@yahoo.comwrote: Hi , Currently we are using Solr 1.3 and we have the following requirement. As we need to process very high volumes of documents (of the order of 400 GB per day), we are planning to separate indexer(s) and searcher(s), so that there won't be performance hit. Our idea is to have have a set of servers which is used only for indexers for index creation and then every 5 mins or so, the index will be copied to the searchers(set of solr servers only for querying). For this we tried to use the snapshooter,rsysnc etc. But the problem with this approach is, the same index is present on both the indexer and searcher, and hence occupying large FS. Set of servers used only for indexers? Solr replication currently supports only a single master. If you have a dedicated master then why do you care about index occupying too much disk space? What we need is a mechanism, where in the indexer contains only the index for the past 5 mins(last indexing cycle before the snap shooter is run) and the searcher should have the accumulated(total) index i.e every 5 mins, we should be able to move the entire index from indexer to searcher and so on. The above scenario is slightly different from master/slave implementation, as on master we want only the latest(WIP) index and the slave should contain the entire index. If you commit but do not optimize then rsync will transfer only the new segment files which should be possible within 5 minutes. So I'd suggest optimize less frequently (once or twice a day). However, if for some reasons you still want to go with your design, there is a new MergeIndexes feature in Solr 1.4 which can help (assuming that you have only additions or replacements and no deletes). However, that is not used by the Solr 1.4 Java replication. You may be able to modify the snappuller and snapinstaller scripts to use merge indexes command though. Something like that can also work with multiple servers creating indexes (again assuming no deletes are needed). http://wiki.apache.org/solr/MergingSolrIndexes -- Regards, Shalin Shekhar Mangar.
Re: Query regarding incremental index replication
There is only one index. The index has newer segments which represent new records and deletes to old records (sort of). Incremental replication copies new segments; putting the new segments together with the previous index makes the new index. Incremental replication under rsync does work; perhaps it did not work for you. If you do not want to store the full index on the indexer, that is a problem. You will not be able to optimize the index on the indexer and ship the new index to the slaves. This has more on large-volume Solr installation design: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr On 9/9/09, Silent Surfer silentsurfe...@yahoo.com wrote: Hi , Currently we are using Solr 1.3 and we have the following requirement. As we need to process very high volumes of documents (of the order of 400 GB per day), we are planning to separate indexer(s) and searcher(s), so that there won't be performance hit. Our idea is to have have a set of servers which is used only for indexers for index creation and then every 5 mins or so, the index will be copied to the searchers(set of solr servers only for querying). For this we tried to use the snapshooter,rsysnc etc. But the problem with this approach is, the same index is present on both the indexer and searcher, and hence occupying large FS. What we need is a mechanism, where in the indexer contains only the index for the past 5 mins(last indexing cycle before the snap shooter is run) and the searcher should have the accumulated(total) index i.e every 5 mins, we should be able to move the entire index from indexer to searcher and so on. The above scenario is slightly different from master/slave implementation, as on master we want only the latest(WIP) index and the slave should contain the entire index. Appreciate if anyone can throw some light on how to achieve this. Thanks, sS -- Lance Norskog goks...@gmail.com
Query regarding incremental index replication
Hi , Currently we are using Solr 1.3 and we have the following requirement. As we need to process very high volumes of documents (of the order of 400 GB per day), we are planning to separate indexer(s) and searcher(s), so that there won't be performance hit. Our idea is to have have a set of servers which is used only for indexers for index creation and then every 5 mins or so, the index will be copied to the searchers(set of solr servers only for querying). For this we tried to use the snapshooter,rsysnc etc. But the problem with this approach is, the same index is present on both the indexer and searcher, and hence occupying large FS. What we need is a mechanism, where in the indexer contains only the index for the past 5 mins(last indexing cycle before the snap shooter is run) and the searcher should have the accumulated(total) index i.e every 5 mins, we should be able to move the entire index from indexer to searcher and so on. The above scenario is slightly different from master/slave implementation, as on master we want only the latest(WIP) index and the slave should contain the entire index. Appreciate if anyone can throw some light on how to achieve this. Thanks, sS