Re: Solr not distributing search requests among replicas

Walter Underwood Wed, 10 Mar 2021 16:58:33 -0800

You could even run a separate Solr on the node just to redistribute the queries.
But if I was going to do that, I’d run a copy of nginx as a load balancer 
instead.


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 10, 2021, at 4:51 PM, Mike Drob <md...@mdrob.com> wrote:
> 
>>> 2) a single "extra" solr node in the cluster can be used as a "self
> configuring" load balancer
> 
> I’ve thought about this a bunch before, are there mechanisms to instruct
> Solr to not host shards for this purpose? Maybe it deserves its own
> discussion.
> 
> On Wed, Mar 10, 2021 at 5:14 PM Chris Hostetter <hossman_luc...@fucit.org>
> wrote:
> 
>> 
>> : > that seems... dangerous.  you could easily wind up in a situation
>> where
>> : > nodes just keep trying to forward forever?
>> :
>> : There is some special http parameter being added when forwarding
>> : requests, so I'm sure each node will be able to decide whether it should
>> : act as LB or if it is supposed to be the final destination. Or we can
>> : add such a param. Of course, if SolrJ on the client side has already
>> : selected a replica, the receiving node should not discard that and do
>> : its own balancing. So there is some state to get right here.
>> 
>> "Forever" wasn'treally what i ment to say ... I'm concerned more about how
>> you would implement this to work well in the 'general case' -- ie:
>> multiple nodes, multiple collections, multiple shards, multiple replicas
>> per shard -- w/o doing "too much" forwarding.
>> 
>> 
>> If nodeA gets a request, when exactly should it decide "i *COULD* handle
>> this request for collection1 using local core, but I'll go ahead and
>> forward it to nodeB instead." ? ... should it be based on what percentage
>> of collection1's total replica list are located on nodeA, or based on what
>> pecentage of nodeA is dedicated to collection1? ... should nodeB be more
>> or less likely then nodeC to get the request based on how many total cores
>> each node has for collection1, or how many unique shards each one has?
>> 
>> 
>> Also bear in mind that even if you assumed everything was nice and evenly
>> distributed, a "simple" round robin based approach would have some pretty
>> signifincat impacts on the number of intra-node network requests....
>> 
>> Say you have a 5 node cluster, hosting a 1shard/5replica collection such
>> that each node has 1 replica:  today any node can process the request
>> locally; but if we did a round robin proxy of the request, that means we'd
>> only handle it locally 1/5th the time, and 4/5ths of the time you add an
>> extra network hop and the assocaited network IO involved (plus the
>> original node has a thread tied up waiting to proxy the response) .. so
>> you'd go from needing 0 "internal" network requests/IO to having internal
>> traffic of 80% of the amount of external traffic recieved.
>> 
>> If those 5 nodes host a collection with 2 shards/5replicas each, spread
>> evenly over the 5 nodes: today any given request typically causes 2
>> intra-cluster network requests to get the per-shard data; but if we round
>> robin proxy the initial request to a differnet node 4/5ths of the time we
>> now typically need 2.8 internal requests for each external request...
>> 
>> 
>> It just seems like adding more forwarding/proxy logic -- that isn't
>> strictly neccessary to compute complete results -- could introduce a lot
>> of complexity risk for a problem that already has multiple solutions:
>> 
>> 1) client (or external load blanacer) can round robin over live nodes (and
>> given that cluster state and metrics are available via HTTP, a client can
>> make very sophisticated choices)
>> 
>> 2) a single "extra" solr node in the cluster can be used as a "self
>> configuring" load balancer that will automatically know when new nodes are
>> added to the cluster, or when replicas get moved/added, etc...
>> 
>> 
>> 
>> 
>> 
>> 
>> :
>> : Jan
>> :
>> : > 10. mar. 2021 kl. 19:32 skrev Chris Hostetter <
>> hossman_luc...@fucit.org>:
>> : >
>> : >
>> : > : Is there any way whatsoever to solve this on the Solr side only?
>> : > :
>> : > : Only I can think of is to send all requests to a 3rd node in the
>> cluster
>> : > : that does not have a core for the collection, then it will balance
>> : > : between the two :)
>> : >
>> : > correct -- you can create a Solr node w/o any cores that will act as a
>> : > "load balancer" to other solr nodes.
>> : >
>> : > : Or create a new, empty collection on the node, which acts as a
>> routing
>> : > : collection only to the target collection?
>> : >
>> : > no -- this won't work, because the requerst your remote client sends
>> will
>> : > need to specify the actual collection you want to query, and when the
>> node
>> : > gets this it will hand it to the local core for that collection -- it
>> : > won't care that there is another local collection that's unrelated.
>> : >
>> : > : Sounds like there should be a way to explicitly disable the
>> : > : "optimization" of always handling the request locally in
>> single-shard
>> : > : collections, i.e. always try to balance unless
>> shards.preference=local?
>> : >
>> : > that seems... dangerous.  you could easily wind up in a situation
>> where
>> : > nodes just keep trying to forward forever?
>> : >
>> : >
>> : >
>> : > :
>> : > : Jan
>> : > :
>> : > : > 10. mar. 2021 kl. 19:06 skrev Chris Hostetter <
>> hossman_luc...@fucit.org <mailto:hossman_luc...@fucit.org>>:
>> : > : >
>> : > : >
>> : > : > : Ah, I missed "single shard" ... this looks relevant:
>> : > : > : https://issues.apache.org/jira/browse/SOLR-12217 <
>> https://issues.apache.org/jira/browse/SOLR-12217>
>> : > : >
>> : > : > That improvement still isn't going to impact Jan's situation where
>> the
>> : > : > *client* isn't SolrJ ... as the description says:
>> : > : >
>> : > : >>> NOTE: This Jira doesn't cover the single-sharded collections
>> cases when
>> : > : >>> not using the CloudSolrClient or Streaming Expressions (i.e. if
>> you do
>> : > : >>> a non-streaming curl request to a random node in the cluster,
>> the
>> : > : >>> shards.preference parameter is not considered in the case of
>> single
>> : > : >>> shards collections).
>> : > : >
>> : > : >
>> : > : > :
>> : > : > : On Wed, Mar 10, 2021 at 12:43 PM Jan Høydahl <
>> jan....@cominvent.com <mailto:jan....@cominvent.com>> wrote:
>> : > : > :
>> : > : > : > We have not set any shard.preference, and I also think
>> preferLocal
>> : > : > : > defaults to false, i.e random
>> : > : > : >
>> : > : > : > Earlier we had 2 shares for the same collection (both existed
>> on both
>> : > : > : > nodes) and then requests were distributed to both nodes.
>> That’s why, when
>> : > : > : > we went to 1 shard, I was wondering if the “single-shard” code
>> path perhaps
>> : > : > : > never attempts to utilize replicas?? But have not looked in
>> code yet.
>> : > : > : >
>> : > : > : > Guess next step is to setup a small local test cluster and see
>> what
>> : > : > : > happens.
>> : > : > : >
>> : > : > : > Jan Høydahl
>> : > : > : >
>> : > : > : > > 10. mar. 2021 kl. 15:46 skrev Michael Gibney <
>> mich...@michaelgibney.net <mailto:mich...@michaelgibney.net>
>> : > : > : > >:
>> : > : > : > >
>> : > : > : > > You say not "anything fancy" -- depending on how you define
>> "fancy", if
>> : > : > : > you
>> : > : > : > > have an explicit `shards.preference` param, based on the
>> version you're
>> : > : > : > > running (8.4) you might also take a look at
>> : > : > : > > https://issues.apache.org/jira/browse/SOLR-14471 <
>> https://issues.apache.org/jira/browse/SOLR-14471>. (If SOLR-14471 is the
>> : > : > : > > problem, removing the explicit `shards.preference` param
>> should restore
>> : > : > : > > default "shuffling" routing).
>> : > : > : > >
>> : > : > : > > I haven't dug too deep, but it looks like for 8.4
>> preferLocalShards
>> : > : > : > > actually defaults to false? I might be missing something
>> though:
>> : > : > : > >
>> : > : > : >
>> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/solrj/src/java/org/apache/solr/client/solrj/routing/RequestReplicaListTransformerGenerator.java#L85
>> <
>> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/solrj/src/java/org/apache/solr/client/solrj/routing/RequestReplicaListTransformerGenerator.java#L85
>>> 
>> : > : > : > >
>> : > : > : > >
>> : > : > : > >
>> : > : > : > >> On Wed, Mar 10, 2021 at 9:10 AM Houston Putman <
>> houstonput...@gmail.com
>> : > : > : > >
>> : > : > : > >> wrote:
>> : > : > : > >>
>> : > : > : > >> I could be wrong, but i dont think preferLocalShards is the
>> default in
>> : > : > : > >> multi-shard use cases.
>> : > : > : > >>
>> : > : > : > >>> On Wed, Mar 10, 2021 at 9:07 AM Mike Drob <md...@mdrob.com>
>> wrote:
>> : > : > : > >>>
>> : > : > : > >>> I believe a server will always try to prefer local cores.
>> Can you do an
>> : > : > : > >>> experiment with 3 nodes, and send http queries to the node
>> not hosting
>> : > : > : > >> any
>> : > : > : > >>> replicas? That should confirm the balanced distribution.
>> : > : > : > >>>
>> : > : > : > >>> If you have multiple shards, the receiving server will
>> forward the
>> : > : > : > >> requests
>> : > : > : > >>> for shards it doesn’t have, but would still prefer local
>> shards when
>> : > : > : > they
>> : > : > : > >>> are available.
>> : > : > : > >>>
>> : > : > : > >>> On Wed, Mar 10, 2021 at 8:00 AM Jan Høydahl <
>> jan....@cominvent.com>
>> : > : > : > >> wrote:
>> : > : > : > >>>
>> : > : > : > >>>> Hi,
>> : > : > : > >>>>
>> : > : > : > >>>> A client has a SolrCloud 8.4 setup with two nodes, and
>> one collection
>> : > : > : > >>> with
>> : > : > : > >>>> one shard and replicationFactor=2.
>> : > : > : > >>>> Of course we want search traffic to be evenly distributed
>> between the
>> : > : > : > >> two
>> : > : > : > >>>> replicas.
>> : > : > : > >>>> The client is using plain HTTP requests, no SolrJ or
>> anything fancy,
>> : > : > : > >> and
>> : > : > : > >>>> sends all requests to one of the two nodes.
>> : > : > : > >>>> I was expecting Solr to forward about 50% of those
>> requests to the
>> : > : > : > >> other
>> : > : > : > >>>> replica, but it is serving them all locally.
>> : > : > : > >>>>
>> : > : > : > >>>> I know we can setup an LB in front or re-program the
>> client to do
>> : > : > : > round
>> : > : > : > >>>> robin, but that is not my question.
>> : > : > : > >>>> Is the select-random-replica logic only active when we
>> have a sharded
>> : > : > : > >>>> oollection, and not for a single-shard?
>> : > : > : > >>>>
>> : > : > : > >>>> Jan
>> : > : > : > >>>
>> : > : > : > >>
>> : > : > : >
>> : > : > :
>> : > : >
>> : > : > -Hoss
>> : > : > http://www.lucidworks.com/ <http://www.lucidworks.com/>
>> : > :
>> : > :
>> : >
>> : > -Hoss
>> : > http://www.lucidworks.com/ <http://www.lucidworks.com/>
>> :
>> 
>> -Hoss
>> http://www.lucidworks.com/

Re: Solr not distributing search requests among replicas

Reply via email to