Re: Indexing vs Search node
Thanks everyone this gave me great arguments for migrating to Solr7 :D On Fri, Nov 9, 2018 at 7:50 PM Shawn Heisey wrote: > On 11/9/2018 1:58 PM, David Hastings wrote: > > I personally like standalone solr for this reason, i can tune the > indexing > > "master" for doing nothing but taking in documents and that way the > slaves > > dont battle for resources in the process. > > SolrCloud can be set up pretty similar to this if you're running 7.5. > You set things up so each collection has two TLOG replicas and the rest > of them are PULL. > > SolrCloud doesn't have master and slave in the same way as the old > architecture. There are no single points of failure if the hardware is > set up correctly. But because PULL replicas cannot become leader, they > are a lot like slaves. Solr 7.5 and later can configure a preference > for different replica types at query time. So with the setup described > above, you tell it to prefer PULL replicas. If all the PULL replicas > were to die, then SolrCloud would use whatever is left. > > Let's say that you set up a collection so it has two TLOG replicas and > four PULL replicas. You could have the TLOG replicas live on a pair of > servers with SSD drives and less memory than the other four servers that > have PULL replicas, which could be running standard hard drives. > Queries love memory, indexing loves fast disks. The preference that > indicates PULL replicas would keep the queries so they are running only > on the four machines with more memory. > > The reason that you want two TLOG replicas instead of one is so that if > the current leader dies, there is another TLOG replica available to > become leader. > > Thanks, > Shawn > > -- Fernando Otero Sr Engineering Manager, Panamera Buenos Aires - Argentina Mobile: +54 911 67697108 Email: fernando.ot...@olx.com
Re: Indexing vs Search node
On 11/9/2018 1:58 PM, David Hastings wrote: I personally like standalone solr for this reason, i can tune the indexing "master" for doing nothing but taking in documents and that way the slaves dont battle for resources in the process. SolrCloud can be set up pretty similar to this if you're running 7.5. You set things up so each collection has two TLOG replicas and the rest of them are PULL. SolrCloud doesn't have master and slave in the same way as the old architecture. There are no single points of failure if the hardware is set up correctly. But because PULL replicas cannot become leader, they are a lot like slaves. Solr 7.5 and later can configure a preference for different replica types at query time. So with the setup described above, you tell it to prefer PULL replicas. If all the PULL replicas were to die, then SolrCloud would use whatever is left. Let's say that you set up a collection so it has two TLOG replicas and four PULL replicas. You could have the TLOG replicas live on a pair of servers with SSD drives and less memory than the other four servers that have PULL replicas, which could be running standard hard drives. Queries love memory, indexing loves fast disks. The preference that indicates PULL replicas would keep the queries so they are running only on the four machines with more memory. The reason that you want two TLOG replicas instead of one is so that if the current leader dies, there is another TLOG replica available to become leader. Thanks, Shawn
Re: Indexing vs Search node
I personally like standalone solr for this reason, i can tune the indexing "master" for doing nothing but taking in documents and that way the slaves dont battle for resources in the process. On Fri, Nov 9, 2018 at 3:10 PM Erick Erickson wrote: > Fernando: > > I'd phrase it more strongly than Shawn. Prior to 7.0 > all replicas both indexed and search (they were NRT replica), > so there wasn't any choice but to index and search on > every replica. > > It's one of those things that if you have very high > throughput (indexing) situations, you _might_ > want to use TLOG and/or PULL replicas. > > But TANSTAAFL (There Ain't No Such Thing As A Free Lunch). > TLOG/PULL replicas copy index segments around, which > may be up to 5G each (default TieredMergePolicy cap on individual > segment sizes), whereas NRT replicas just get the raw document. > > So in the TLOG/PULL situations, you'll get bursts of network traffic > but each replica has less CPU load because all the replicas but one > for each shard do not have to index the doc. > > In the NRT case, the raw documents are forwarded so the > network is less bursty, but all of the replicas spend CPU > cycles indexing. > > So I wouldn't worry about it unless you running into performance > problems, _then_ I'd investigate TLOG/PULL replicas. > > Best, > Erick > On Fri, Nov 9, 2018 at 11:37 AM Shawn Heisey wrote: > > > > On 11/9/2018 12:13 PM, Fernando Otero wrote: > > > I read in several blog posts that it's never a good idea to index > and > > > search on the same node. I wonder how that can be achieved in Solr > Cloud or > > > if it happens automatically. > > > > I would disagree with that blanket assertion. > > > > Indexing does put extra load on a server that can interfere with query > > performance. Whether that will be a real problem pretty much depends on > > exactly how much indexing you're doing, and what kind of query load you > > need to handle. For extreme scaling, it can be a good idea to separate > > indexing and searching. > > > > With a master/slave architecture, any version of Solr can separate > > indexing and querying. > > > > Before 7.x, it wasn't possible to separate indexing and querying with > > SolrCloud. With previous major versions, ALL replicas do the same > > indexing. With 7.x, that's still the default behavior, but 7.x has new > > replica types that make it possible for indexing to only take place on > > shard leaders. The latest version of Solr 7.x has a way to prefer > > certain replica types, which is how the separation can be achieved. > > > > Thanks, > > Shawn > > >
Re: Indexing vs Search node
Fernando: I'd phrase it more strongly than Shawn. Prior to 7.0 all replicas both indexed and search (they were NRT replica), so there wasn't any choice but to index and search on every replica. It's one of those things that if you have very high throughput (indexing) situations, you _might_ want to use TLOG and/or PULL replicas. But TANSTAAFL (There Ain't No Such Thing As A Free Lunch). TLOG/PULL replicas copy index segments around, which may be up to 5G each (default TieredMergePolicy cap on individual segment sizes), whereas NRT replicas just get the raw document. So in the TLOG/PULL situations, you'll get bursts of network traffic but each replica has less CPU load because all the replicas but one for each shard do not have to index the doc. In the NRT case, the raw documents are forwarded so the network is less bursty, but all of the replicas spend CPU cycles indexing. So I wouldn't worry about it unless you running into performance problems, _then_ I'd investigate TLOG/PULL replicas. Best, Erick On Fri, Nov 9, 2018 at 11:37 AM Shawn Heisey wrote: > > On 11/9/2018 12:13 PM, Fernando Otero wrote: > > I read in several blog posts that it's never a good idea to index and > > search on the same node. I wonder how that can be achieved in Solr Cloud or > > if it happens automatically. > > I would disagree with that blanket assertion. > > Indexing does put extra load on a server that can interfere with query > performance. Whether that will be a real problem pretty much depends on > exactly how much indexing you're doing, and what kind of query load you > need to handle. For extreme scaling, it can be a good idea to separate > indexing and searching. > > With a master/slave architecture, any version of Solr can separate > indexing and querying. > > Before 7.x, it wasn't possible to separate indexing and querying with > SolrCloud. With previous major versions, ALL replicas do the same > indexing. With 7.x, that's still the default behavior, but 7.x has new > replica types that make it possible for indexing to only take place on > shard leaders. The latest version of Solr 7.x has a way to prefer > certain replica types, which is how the separation can be achieved. > > Thanks, > Shawn >
Re: Indexing vs Search node
On 11/9/2018 12:13 PM, Fernando Otero wrote: I read in several blog posts that it's never a good idea to index and search on the same node. I wonder how that can be achieved in Solr Cloud or if it happens automatically. I would disagree with that blanket assertion. Indexing does put extra load on a server that can interfere with query performance. Whether that will be a real problem pretty much depends on exactly how much indexing you're doing, and what kind of query load you need to handle. For extreme scaling, it can be a good idea to separate indexing and searching. With a master/slave architecture, any version of Solr can separate indexing and querying. Before 7.x, it wasn't possible to separate indexing and querying with SolrCloud. With previous major versions, ALL replicas do the same indexing. With 7.x, that's still the default behavior, but 7.x has new replica types that make it possible for indexing to only take place on shard leaders. The latest version of Solr 7.x has a way to prefer certain replica types, which is how the separation can be achieved. Thanks, Shawn