@Shawn: Cool table, thanks!

@Dan:
Just to throw a different spin on it, if you migrate to SolrCloud, then
this question becomes moot as the raw documents are sent to each of the
replicas so you very rarely have to copy the full index. Kind of a tradeoff
between constant load because you're sending the raw documents around
whenever you index and peak usage when the index replicates.

There are a bunch of other reasons to go to SolrCloud, but you know your
problem space best.

FWIW,
Erick

On Sun, Jan 25, 2015 at 9:26 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 1/24/2015 10:56 PM, Dan Davis wrote:
> > When I polled the various projects already using Solr at my
> organization, I
> > was greatly surprised that none of them were using Solr replication,
> > because they had talked about "replicating" the data.
> >
> > But we are not Pinterest, and do not expect to be taking in changes one
> > post at a time (at least the engineers don't - just wait until its used
> for
> > a Crud app that wants full-text search on a description field!).
> Still,
> > rsync can be very, very fast with the right options (-W for gigabit
> > ethernet, and maybe -S for sparse files).   I've clocked it at 48 MB/s
> over
> > GigE previously.
> >
> > Does anyone have any numbers for how fast Solr replication goes, and what
> > to do to tune it?
> >
> > I'm not enthusiastic to give-up recently tested cluster stability for a
> > home grown mess, but I am interested in numbers that are out there.
>
> Numbers are included on the Solr replication wiki page, both in graph
> and numeric form.  Gathering these numbers must have been pretty easy --
> before the HTTP replication made it into Solr, Solr used to contain an
> rsync-based implementation.
>
> http://wiki.apache.org/solr/SolrReplication#Performance_numbers
>
> Other data on that wiki page discusses the replication config.  There's
> not a lot to tune.
>
> I run a redundant non-SolrCloud index myself through a different method
> -- my indexing program indexes each index copy completely independently.
>  There is no replication.  This separation allows me to upgrade any
> component, or change any part of solrconfig or schema, on either copy of
> the index without affecting the other copy at all.  With replication, if
> something is changed on the master or the slave, you might find that the
> slave no longer works, because it will be handling an index created by
> different software or a different config.
>
> Thanks,
> Shawn
>
>

Reply via email to