Re: Indexing vs Search node

2018-11-14 Thread Fernando Otero
Thanks everyone this gave me great arguments for migrating to Solr7 :D

On Fri, Nov 9, 2018 at 7:50 PM Shawn Heisey  wrote:

> On 11/9/2018 1:58 PM, David Hastings wrote:
> > I personally like standalone solr for this reason, i can tune the
> indexing
> > "master" for doing nothing but taking in documents and that way the
> slaves
> > dont battle for resources in the process.
>
> SolrCloud can be set up pretty similar to this if you're running 7.5.
> You set things up so each collection has two TLOG replicas and the rest
> of them are PULL.
>
> SolrCloud doesn't have master and slave in the same way as the old
> architecture.  There are no single points of failure if the hardware is
> set up correctly.  But because PULL replicas cannot become leader, they
> are a lot like slaves.  Solr 7.5 and later can configure a preference
> for different replica types at query time.  So with the setup described
> above, you tell it to prefer PULL replicas.  If all the PULL replicas
> were to die, then SolrCloud would use whatever is left.
>
> Let's say that you set up a collection so it has two TLOG replicas and
> four PULL replicas.  You could have the TLOG replicas live on a pair of
> servers with SSD drives and less memory than the other four servers that
> have PULL replicas, which could be running standard hard drives.
> Queries love memory, indexing loves fast disks.  The preference that
> indicates PULL replicas would keep the queries so they are running only
> on the four machines with more memory.
>
> The reason that you want two TLOG replicas instead of one is so that if
> the current leader dies, there is another TLOG replica available to
> become leader.
>
> Thanks,
> Shawn
>
>

-- 

Fernando Otero

Sr Engineering Manager, Panamera

Buenos Aires - Argentina

Mobile: +54 911 67697108

Email:  fernando.ot...@olx.com


Re: Indexing vs Search node

2018-11-09 Thread Shawn Heisey

On 11/9/2018 1:58 PM, David Hastings wrote:

I personally like standalone solr for this reason, i can tune the indexing
"master" for doing nothing but taking in documents and that way the slaves
dont battle for resources in the process.


SolrCloud can be set up pretty similar to this if you're running 7.5.  
You set things up so each collection has two TLOG replicas and the rest 
of them are PULL.


SolrCloud doesn't have master and slave in the same way as the old 
architecture.  There are no single points of failure if the hardware is 
set up correctly.  But because PULL replicas cannot become leader, they 
are a lot like slaves.  Solr 7.5 and later can configure a preference 
for different replica types at query time.  So with the setup described 
above, you tell it to prefer PULL replicas.  If all the PULL replicas 
were to die, then SolrCloud would use whatever is left.


Let's say that you set up a collection so it has two TLOG replicas and 
four PULL replicas.  You could have the TLOG replicas live on a pair of 
servers with SSD drives and less memory than the other four servers that 
have PULL replicas, which could be running standard hard drives.  
Queries love memory, indexing loves fast disks.  The preference that 
indicates PULL replicas would keep the queries so they are running only 
on the four machines with more memory.


The reason that you want two TLOG replicas instead of one is so that if 
the current leader dies, there is another TLOG replica available to 
become leader.


Thanks,
Shawn



Re: Indexing vs Search node

2018-11-09 Thread David Hastings
I personally like standalone solr for this reason, i can tune the indexing
"master" for doing nothing but taking in documents and that way the slaves
dont battle for resources in the process.

On Fri, Nov 9, 2018 at 3:10 PM Erick Erickson 
wrote:

> Fernando:
>
> I'd phrase it more strongly than Shawn. Prior to 7.0
> all replicas both indexed and search (they were NRT replica),
> so there wasn't any choice but to index and search on
> every replica.
>
> It's one of those things that if you have very high
> throughput (indexing) situations, you _might_
> want to use TLOG and/or PULL replicas.
>
> But TANSTAAFL (There Ain't  No Such Thing As A Free Lunch).
> TLOG/PULL replicas copy index segments around, which
> may be up to 5G each (default TieredMergePolicy cap on individual
> segment sizes), whereas NRT replicas just get the raw document.
>
> So in the TLOG/PULL situations, you'll get bursts of network traffic
> but each replica has less CPU load because all the replicas but one
> for each shard do not  have to index the doc.
>
> In the NRT case, the raw documents are forwarded so the
> network is less bursty, but all of the replicas spend CPU
> cycles indexing.
>
> So I wouldn't worry about it unless you running into performance
> problems, _then_ I'd investigate TLOG/PULL replicas.
>
> Best,
> Erick
> On Fri, Nov 9, 2018 at 11:37 AM Shawn Heisey  wrote:
> >
> > On 11/9/2018 12:13 PM, Fernando Otero wrote:
> > >  I read in several blog posts that it's never a good idea to index
> and
> > > search on the same node. I wonder how that can be achieved in Solr
> Cloud or
> > > if it happens automatically.
> >
> > I would disagree with that blanket assertion.
> >
> > Indexing does put extra load on a server that can interfere with query
> > performance.  Whether that will be a real problem pretty much depends on
> > exactly how much indexing you're doing, and what kind of query load you
> > need to handle.  For extreme scaling, it can be a good idea to separate
> > indexing and searching.
> >
> > With a master/slave architecture, any version of Solr can separate
> > indexing and querying.
> >
> > Before 7.x, it wasn't possible to separate indexing and querying with
> > SolrCloud.  With previous major versions, ALL replicas do the same
> > indexing.  With 7.x, that's still the default behavior, but 7.x has new
> > replica types that make it possible for indexing to only take place on
> > shard leaders. The latest version of Solr 7.x has a way to prefer
> > certain replica types, which is how the separation can be achieved.
> >
> > Thanks,
> > Shawn
> >
>


Re: Indexing vs Search node

2018-11-09 Thread Erick Erickson
Fernando:

I'd phrase it more strongly than Shawn. Prior to 7.0
all replicas both indexed and search (they were NRT replica),
so there wasn't any choice but to index and search on
every replica.

It's one of those things that if you have very high
throughput (indexing) situations, you _might_
want to use TLOG and/or PULL replicas.

But TANSTAAFL (There Ain't  No Such Thing As A Free Lunch).
TLOG/PULL replicas copy index segments around, which
may be up to 5G each (default TieredMergePolicy cap on individual
segment sizes), whereas NRT replicas just get the raw document.

So in the TLOG/PULL situations, you'll get bursts of network traffic
but each replica has less CPU load because all the replicas but one
for each shard do not  have to index the doc.

In the NRT case, the raw documents are forwarded so the
network is less bursty, but all of the replicas spend CPU
cycles indexing.

So I wouldn't worry about it unless you running into performance
problems, _then_ I'd investigate TLOG/PULL replicas.

Best,
Erick
On Fri, Nov 9, 2018 at 11:37 AM Shawn Heisey  wrote:
>
> On 11/9/2018 12:13 PM, Fernando Otero wrote:
> >  I read in several blog posts that it's never a good idea to index and
> > search on the same node. I wonder how that can be achieved in Solr Cloud or
> > if it happens automatically.
>
> I would disagree with that blanket assertion.
>
> Indexing does put extra load on a server that can interfere with query
> performance.  Whether that will be a real problem pretty much depends on
> exactly how much indexing you're doing, and what kind of query load you
> need to handle.  For extreme scaling, it can be a good idea to separate
> indexing and searching.
>
> With a master/slave architecture, any version of Solr can separate
> indexing and querying.
>
> Before 7.x, it wasn't possible to separate indexing and querying with
> SolrCloud.  With previous major versions, ALL replicas do the same
> indexing.  With 7.x, that's still the default behavior, but 7.x has new
> replica types that make it possible for indexing to only take place on
> shard leaders. The latest version of Solr 7.x has a way to prefer
> certain replica types, which is how the separation can be achieved.
>
> Thanks,
> Shawn
>


Re: Indexing vs Search node

2018-11-09 Thread Shawn Heisey

On 11/9/2018 12:13 PM, Fernando Otero wrote:

 I read in several blog posts that it's never a good idea to index and
search on the same node. I wonder how that can be achieved in Solr Cloud or
if it happens automatically.


I would disagree with that blanket assertion.

Indexing does put extra load on a server that can interfere with query 
performance.  Whether that will be a real problem pretty much depends on 
exactly how much indexing you're doing, and what kind of query load you 
need to handle.  For extreme scaling, it can be a good idea to separate 
indexing and searching.


With a master/slave architecture, any version of Solr can separate 
indexing and querying.


Before 7.x, it wasn't possible to separate indexing and querying with 
SolrCloud.  With previous major versions, ALL replicas do the same 
indexing.  With 7.x, that's still the default behavior, but 7.x has new 
replica types that make it possible for indexing to only take place on 
shard leaders. The latest version of Solr 7.x has a way to prefer 
certain replica types, which is how the separation can be achieved.


Thanks,
Shawn