@Erick,

Problem space is not constant indexing.   I thought SolrCloud replicas were
replication, and you imply parallel indexing.  Good to know.

On Sunday, January 25, 2015, Erick Erickson <erickerick...@gmail.com> wrote:

> @Shawn: Cool table, thanks!
>
> @Dan:
> Just to throw a different spin on it, if you migrate to SolrCloud, then
> this question becomes moot as the raw documents are sent to each of the
> replicas so you very rarely have to copy the full index. Kind of a tradeoff
> between constant load because you're sending the raw documents around
> whenever you index and peak usage when the index replicates.
>
> There are a bunch of other reasons to go to SolrCloud, but you know your
> problem space best.
>
> FWIW,
> Erick
>
> On Sun, Jan 25, 2015 at 9:26 AM, Shawn Heisey <apa...@elyograg.org
> <javascript:;>> wrote:
>
> > On 1/24/2015 10:56 PM, Dan Davis wrote:
> > > When I polled the various projects already using Solr at my
> > organization, I
> > > was greatly surprised that none of them were using Solr replication,
> > > because they had talked about "replicating" the data.
> > >
> > > But we are not Pinterest, and do not expect to be taking in changes one
> > > post at a time (at least the engineers don't - just wait until its used
> > for
> > > a Crud app that wants full-text search on a description field!).
> > Still,
> > > rsync can be very, very fast with the right options (-W for gigabit
> > > ethernet, and maybe -S for sparse files).   I've clocked it at 48 MB/s
> > over
> > > GigE previously.
> > >
> > > Does anyone have any numbers for how fast Solr replication goes, and
> what
> > > to do to tune it?
> > >
> > > I'm not enthusiastic to give-up recently tested cluster stability for a
> > > home grown mess, but I am interested in numbers that are out there.
> >
> > Numbers are included on the Solr replication wiki page, both in graph
> > and numeric form.  Gathering these numbers must have been pretty easy --
> > before the HTTP replication made it into Solr, Solr used to contain an
> > rsync-based implementation.
> >
> > http://wiki.apache.org/solr/SolrReplication#Performance_numbers
> >
> > Other data on that wiki page discusses the replication config.  There's
> > not a lot to tune.
> >
> > I run a redundant non-SolrCloud index myself through a different method
> > -- my indexing program indexes each index copy completely independently.
> >  There is no replication.  This separation allows me to upgrade any
> > component, or change any part of solrconfig or schema, on either copy of
> > the index without affecting the other copy at all.  With replication, if
> > something is changed on the master or the slave, you might find that the
> > slave no longer works, because it will be handling an index created by
> > different software or a different config.
> >
> > Thanks,
> > Shawn
> >
> >
>

Reply via email to