Re: Query regarding incremental index replication

2009-09-11 Thread Shalin Shekhar Mangar
On Thu, Sep 10, 2009 at 7:08 AM, Silent Surfer silentsurfe...@yahoo.comwrote:

 Hi ,

 Currently we are using Solr 1.3 and we have the following requirement.

 As we need to process very high volumes of documents (of the order of 400
 GB per day), we are planning to separate indexer(s) and searcher(s), so that
 there won't be performance hit.

 Our idea is to have have a set of servers which is used only for indexers
 for index creation and then every 5 mins or so, the index will be copied to
 the searchers(set of solr servers only for querying). For this we tried to
 use the snapshooter,rsysnc etc.

 But the problem with this approach is, the same index is present on both
 the indexer and searcher, and hence occupying large FS.


Set of servers used only for indexers? Solr replication currently supports
only a single master.

If you have a dedicated master then why do you care about index occupying
too much disk space?


 What we need is a mechanism, where in the indexer contains only the index
 for the past 5 mins(last indexing cycle before the snap shooter is run) and
 the searcher should have the accumulated(total) index i.e every 5 mins, we
 should be able to move the entire index from indexer to searcher and so on.

 The above scenario is slightly different from master/slave implementation,
 as on master we want only the latest(WIP) index and the slave should contain
 the entire index.


If you commit but do not optimize then rsync will transfer only the new
segment files which should be possible within 5 minutes. So I'd suggest
optimize less frequently (once or twice a day).

However, if for some reasons you still want to go with your design, there is
a new MergeIndexes feature in Solr 1.4 which can help (assuming that you
have only additions or replacements and no deletes). However, that is not
used by the Solr 1.4 Java replication. You may be able to modify the
snappuller and snapinstaller scripts to use merge indexes command though.
Something like that can also work with multiple servers creating indexes
(again assuming no deletes are needed).

http://wiki.apache.org/solr/MergingSolrIndexes

-- 
Regards,
Shalin Shekhar Mangar.


Re: Query regarding incremental index replication

2009-09-10 Thread Lance Norskog
There is only one index. The index has newer segments which represent new
records and deletes to old records (sort of). Incremental replication copies
new segments; putting the new segments together with the previous index
makes the new index.

Incremental replication under rsync does work; perhaps it did not work for
you.

If you do not want to store the full index on the indexer, that is a
problem. You will not be able to optimize the index on the indexer and ship
the new index to the slaves.

This has more on large-volume Solr installation design:

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr

On 9/9/09, Silent Surfer silentsurfe...@yahoo.com wrote:

 Hi ,

 Currently we are using Solr 1.3 and we have the following requirement.

 As we need to process very high volumes of documents (of the order of 400
 GB per day), we are planning to separate indexer(s) and searcher(s), so that
 there won't be performance hit.

 Our idea is to have have a set of servers which is used only for indexers
 for index creation and then every 5 mins or so, the index will be copied to
 the searchers(set of solr servers only for querying). For this we tried to
 use the snapshooter,rsysnc etc.

 But the problem with this approach is, the same index is present on both
 the indexer and searcher, and hence occupying large FS.

 What we need is a mechanism, where in the indexer contains only the index
 for the past 5 mins(last indexing cycle before the snap shooter is run) and
 the searcher should have the accumulated(total) index i.e every 5 mins, we
 should be able to move the entire index from indexer to searcher and so on.

 The above scenario is slightly different from master/slave implementation,
 as on master we want only the latest(WIP) index and the slave should contain
 the entire index.

 Appreciate if anyone can throw some light on how to achieve this.

 Thanks,
 sS







-- 
Lance Norskog
goks...@gmail.com


Query regarding incremental index replication

2009-09-09 Thread Silent Surfer
Hi ,

Currently we are using Solr 1.3 and we have the following requirement.

As we need to process very high volumes of documents (of the order of 400 GB 
per day), we are planning to separate indexer(s) and searcher(s), so that there 
won't be performance hit.

Our idea is to have have a set of servers which is used only for indexers for 
index creation and then every 5 mins or so, the index will be copied to the 
searchers(set of solr servers only for querying). For this we tried to use the 
snapshooter,rsysnc etc.

But the problem with this approach is, the same index is present on both the 
indexer and searcher, and hence occupying large FS.

What we need is a mechanism, where in the indexer contains only the index for 
the past 5 mins(last indexing cycle before the snap shooter is run) and the 
searcher should have the accumulated(total) index i.e every 5 mins, we should 
be able to move the entire index from indexer to searcher and so on.

The above scenario is slightly different from master/slave implementation, as 
on master we want only the latest(WIP) index and the slave should contain the 
entire index.

Appreciate if anyone can throw some light on how to achieve this.

Thanks,
sS