You could even run a separate Solr on the node just to redistribute the queries. But if I was going to do that, I’d run a copy of nginx as a load balancer instead.
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 10, 2021, at 4:51 PM, Mike Drob <md...@mdrob.com> wrote: > >>> 2) a single "extra" solr node in the cluster can be used as a "self > configuring" load balancer > > I’ve thought about this a bunch before, are there mechanisms to instruct > Solr to not host shards for this purpose? Maybe it deserves its own > discussion. > > On Wed, Mar 10, 2021 at 5:14 PM Chris Hostetter <hossman_luc...@fucit.org> > wrote: > >> >> : > that seems... dangerous. you could easily wind up in a situation >> where >> : > nodes just keep trying to forward forever? >> : >> : There is some special http parameter being added when forwarding >> : requests, so I'm sure each node will be able to decide whether it should >> : act as LB or if it is supposed to be the final destination. Or we can >> : add such a param. Of course, if SolrJ on the client side has already >> : selected a replica, the receiving node should not discard that and do >> : its own balancing. So there is some state to get right here. >> >> "Forever" wasn'treally what i ment to say ... I'm concerned more about how >> you would implement this to work well in the 'general case' -- ie: >> multiple nodes, multiple collections, multiple shards, multiple replicas >> per shard -- w/o doing "too much" forwarding. >> >> >> If nodeA gets a request, when exactly should it decide "i *COULD* handle >> this request for collection1 using local core, but I'll go ahead and >> forward it to nodeB instead." ? ... should it be based on what percentage >> of collection1's total replica list are located on nodeA, or based on what >> pecentage of nodeA is dedicated to collection1? ... should nodeB be more >> or less likely then nodeC to get the request based on how many total cores >> each node has for collection1, or how many unique shards each one has? >> >> >> Also bear in mind that even if you assumed everything was nice and evenly >> distributed, a "simple" round robin based approach would have some pretty >> signifincat impacts on the number of intra-node network requests.... >> >> Say you have a 5 node cluster, hosting a 1shard/5replica collection such >> that each node has 1 replica: today any node can process the request >> locally; but if we did a round robin proxy of the request, that means we'd >> only handle it locally 1/5th the time, and 4/5ths of the time you add an >> extra network hop and the assocaited network IO involved (plus the >> original node has a thread tied up waiting to proxy the response) .. so >> you'd go from needing 0 "internal" network requests/IO to having internal >> traffic of 80% of the amount of external traffic recieved. >> >> If those 5 nodes host a collection with 2 shards/5replicas each, spread >> evenly over the 5 nodes: today any given request typically causes 2 >> intra-cluster network requests to get the per-shard data; but if we round >> robin proxy the initial request to a differnet node 4/5ths of the time we >> now typically need 2.8 internal requests for each external request... >> >> >> It just seems like adding more forwarding/proxy logic -- that isn't >> strictly neccessary to compute complete results -- could introduce a lot >> of complexity risk for a problem that already has multiple solutions: >> >> 1) client (or external load blanacer) can round robin over live nodes (and >> given that cluster state and metrics are available via HTTP, a client can >> make very sophisticated choices) >> >> 2) a single "extra" solr node in the cluster can be used as a "self >> configuring" load balancer that will automatically know when new nodes are >> added to the cluster, or when replicas get moved/added, etc... >> >> >> >> >> >> >> : >> : Jan >> : >> : > 10. mar. 2021 kl. 19:32 skrev Chris Hostetter < >> hossman_luc...@fucit.org>: >> : > >> : > >> : > : Is there any way whatsoever to solve this on the Solr side only? >> : > : >> : > : Only I can think of is to send all requests to a 3rd node in the >> cluster >> : > : that does not have a core for the collection, then it will balance >> : > : between the two :) >> : > >> : > correct -- you can create a Solr node w/o any cores that will act as a >> : > "load balancer" to other solr nodes. >> : > >> : > : Or create a new, empty collection on the node, which acts as a >> routing >> : > : collection only to the target collection? >> : > >> : > no -- this won't work, because the requerst your remote client sends >> will >> : > need to specify the actual collection you want to query, and when the >> node >> : > gets this it will hand it to the local core for that collection -- it >> : > won't care that there is another local collection that's unrelated. >> : > >> : > : Sounds like there should be a way to explicitly disable the >> : > : "optimization" of always handling the request locally in >> single-shard >> : > : collections, i.e. always try to balance unless >> shards.preference=local? >> : > >> : > that seems... dangerous. you could easily wind up in a situation >> where >> : > nodes just keep trying to forward forever? >> : > >> : > >> : > >> : > : >> : > : Jan >> : > : >> : > : > 10. mar. 2021 kl. 19:06 skrev Chris Hostetter < >> hossman_luc...@fucit.org <mailto:hossman_luc...@fucit.org>>: >> : > : > >> : > : > >> : > : > : Ah, I missed "single shard" ... this looks relevant: >> : > : > : https://issues.apache.org/jira/browse/SOLR-12217 < >> https://issues.apache.org/jira/browse/SOLR-12217> >> : > : > >> : > : > That improvement still isn't going to impact Jan's situation where >> the >> : > : > *client* isn't SolrJ ... as the description says: >> : > : > >> : > : >>> NOTE: This Jira doesn't cover the single-sharded collections >> cases when >> : > : >>> not using the CloudSolrClient or Streaming Expressions (i.e. if >> you do >> : > : >>> a non-streaming curl request to a random node in the cluster, >> the >> : > : >>> shards.preference parameter is not considered in the case of >> single >> : > : >>> shards collections). >> : > : > >> : > : > >> : > : > : >> : > : > : On Wed, Mar 10, 2021 at 12:43 PM Jan Høydahl < >> jan....@cominvent.com <mailto:jan....@cominvent.com>> wrote: >> : > : > : >> : > : > : > We have not set any shard.preference, and I also think >> preferLocal >> : > : > : > defaults to false, i.e random >> : > : > : > >> : > : > : > Earlier we had 2 shares for the same collection (both existed >> on both >> : > : > : > nodes) and then requests were distributed to both nodes. >> That’s why, when >> : > : > : > we went to 1 shard, I was wondering if the “single-shard” code >> path perhaps >> : > : > : > never attempts to utilize replicas?? But have not looked in >> code yet. >> : > : > : > >> : > : > : > Guess next step is to setup a small local test cluster and see >> what >> : > : > : > happens. >> : > : > : > >> : > : > : > Jan Høydahl >> : > : > : > >> : > : > : > > 10. mar. 2021 kl. 15:46 skrev Michael Gibney < >> mich...@michaelgibney.net <mailto:mich...@michaelgibney.net> >> : > : > : > >: >> : > : > : > > >> : > : > : > > You say not "anything fancy" -- depending on how you define >> "fancy", if >> : > : > : > you >> : > : > : > > have an explicit `shards.preference` param, based on the >> version you're >> : > : > : > > running (8.4) you might also take a look at >> : > : > : > > https://issues.apache.org/jira/browse/SOLR-14471 < >> https://issues.apache.org/jira/browse/SOLR-14471>. (If SOLR-14471 is the >> : > : > : > > problem, removing the explicit `shards.preference` param >> should restore >> : > : > : > > default "shuffling" routing). >> : > : > : > > >> : > : > : > > I haven't dug too deep, but it looks like for 8.4 >> preferLocalShards >> : > : > : > > actually defaults to false? I might be missing something >> though: >> : > : > : > > >> : > : > : > >> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/solrj/src/java/org/apache/solr/client/solrj/routing/RequestReplicaListTransformerGenerator.java#L85 >> < >> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/solrj/src/java/org/apache/solr/client/solrj/routing/RequestReplicaListTransformerGenerator.java#L85 >>> >> : > : > : > > >> : > : > : > > >> : > : > : > > >> : > : > : > >> On Wed, Mar 10, 2021 at 9:10 AM Houston Putman < >> houstonput...@gmail.com >> : > : > : > > >> : > : > : > >> wrote: >> : > : > : > >> >> : > : > : > >> I could be wrong, but i dont think preferLocalShards is the >> default in >> : > : > : > >> multi-shard use cases. >> : > : > : > >> >> : > : > : > >>> On Wed, Mar 10, 2021 at 9:07 AM Mike Drob <md...@mdrob.com> >> wrote: >> : > : > : > >>> >> : > : > : > >>> I believe a server will always try to prefer local cores. >> Can you do an >> : > : > : > >>> experiment with 3 nodes, and send http queries to the node >> not hosting >> : > : > : > >> any >> : > : > : > >>> replicas? That should confirm the balanced distribution. >> : > : > : > >>> >> : > : > : > >>> If you have multiple shards, the receiving server will >> forward the >> : > : > : > >> requests >> : > : > : > >>> for shards it doesn’t have, but would still prefer local >> shards when >> : > : > : > they >> : > : > : > >>> are available. >> : > : > : > >>> >> : > : > : > >>> On Wed, Mar 10, 2021 at 8:00 AM Jan Høydahl < >> jan....@cominvent.com> >> : > : > : > >> wrote: >> : > : > : > >>> >> : > : > : > >>>> Hi, >> : > : > : > >>>> >> : > : > : > >>>> A client has a SolrCloud 8.4 setup with two nodes, and >> one collection >> : > : > : > >>> with >> : > : > : > >>>> one shard and replicationFactor=2. >> : > : > : > >>>> Of course we want search traffic to be evenly distributed >> between the >> : > : > : > >> two >> : > : > : > >>>> replicas. >> : > : > : > >>>> The client is using plain HTTP requests, no SolrJ or >> anything fancy, >> : > : > : > >> and >> : > : > : > >>>> sends all requests to one of the two nodes. >> : > : > : > >>>> I was expecting Solr to forward about 50% of those >> requests to the >> : > : > : > >> other >> : > : > : > >>>> replica, but it is serving them all locally. >> : > : > : > >>>> >> : > : > : > >>>> I know we can setup an LB in front or re-program the >> client to do >> : > : > : > round >> : > : > : > >>>> robin, but that is not my question. >> : > : > : > >>>> Is the select-random-replica logic only active when we >> have a sharded >> : > : > : > >>>> oollection, and not for a single-shard? >> : > : > : > >>>> >> : > : > : > >>>> Jan >> : > : > : > >>> >> : > : > : > >> >> : > : > : > >> : > : > : >> : > : > >> : > : > -Hoss >> : > : > http://www.lucidworks.com/ <http://www.lucidworks.com/> >> : > : >> : > : >> : > >> : > -Hoss >> : > http://www.lucidworks.com/ <http://www.lucidworks.com/> >> : >> >> -Hoss >> http://www.lucidworks.com/