Thanks Ere. I've taken a look at the discussion here:
This is how I was imagining TLOG & PULL replicas would wor, so if this
functionality does get developed, it would be useful to me.
I still have 2 questions at the moment:
1. I am running the single shard scenario. I'm thinking of using a
dedicated HTTP load-balancer in front of the PULL replicas only with
read-only queries directed directly at the load-balancer. In this
situation, the healthy PULL replicas *should* handle the queries on the
node itself without a proxy hop (assuming state=active). New PULL replicas
added to the load-balancer will internally proxy queries to the other PULL
or TLOG replicas while in state=recovering until the switch to
state=active. Is my understanding correct?
2. Is it all worth it? Is there any advantage to running a cluster of 3
TLOGs + 10 PULL replicas vs running 13 TLOG replicas?
On 12 February 2018 at 19:25, Ere Maijala <ere.maij...@helsinki.fi> wrote:
> Your question about directing queries to PULL replicas only has been
> discussed on the list. Look for topic "Limit search queries only to pull
> replicas". What I'd like to see is something similar to the
> preferLocalShards parameter. It could be something like
> "preferReplicaTypes=TLOG,PULL". Tomás mentioned previously that
> SOLR-10880 could be used as a base for such funtionality, and I'm
> considering taking a stab at implementing it.
> Greg Roodt kirjoitti 12.2.2018 klo 6.55:
>> Thank you both for your very detailed answers.
>> This is great to know. I knew that SolrJ had the cluster aware knowledge
>> (via zookeeper), but I was wondering what something like curl would do.
>> Great to know that internally the cluster will proxy queries to the
>> appropriate place regardless.
>> I am running the single shard scenario. I'm thinking of using a dedicated
>> HTTP load-balancer in front of the PULL replicas only with read-only
>> queries directed directly at the load-balancer. In this situation, the
>> healthy PULL replicas *should* handle the queries on the node itself
>> without a proxy hop (assuming state=active). New PULL replicas added to
>> load-balancer will internally proxy queries to the other PULL or TLOG
>> replicas while in state=recovering until the switch to state=active.
>> Is my understanding correct?
>> Is this sensible to do, or is it not worth it due to the smart proxying
>> that SolrCloud can do anyway?
>> If the TLOG and PULL replicas are so similar, is there any real advantage
>> to having a mixed cluster? I assume a bit less work is required across the
>> cluster to propagate writes if you only have 3 TLOG nodes vs 10+ PULL
>> nodes? Or would it be better to just have 13 TLOG nodes?
>> On 12 February 2018 at 15:24, Tomas Fernandez Lobbe <tflo...@apple.com>
>> On the last question:
>>> For Writes: Yes. Writes are going to be sent to the shard leader, and
>>> since PULL replicas can’t be leaders, it’s going to be a TLOG replica.
>>> you are using CloudSolrClient, then this routing will be done directly
>>> the client (since it will send the update to the leader), and if you are
>>> using some other HTTP client, then yes, the PULL replica will forward the
>>> update, the same way any non-leader node would.
>>> For reads: this won’t happen today, and any replica can respond to
>>> queries. I do believe there is value in this kind of routing logic,
>>> sometimes you simply don’t want the leader to handle any queries,
>>> when queries can be expensive. You could do this today if you want, by
>>> putting some load balancer in front and just direct your queries to the
>>> nodes you know are PULL, but keep in mind that this would only work in
>>> single shard scenario, and only if you hit an active replica (otherwise,
>>> you said, the query will be routed to any other node of the shard,
>>> regardless of the type), if you have multiple shards then you need to use
>>> the “shards” parameter and tell Solr exactly which nodes you want to hit
>>> for each shard (the “shards” approach can also be done in the single
>>> case, although you would be adding an extra hop I believe)
>>> Sent from my iPhone
>>> On Feb 11, 2018, at 6:35 PM, Greg Roodt <gro...@gmail.com> wrote:
>>>> I have a question around how queries are routed and load-balanced in a
>>>> cluster of mixed TLOG and PULL replicas.
>>>> I thought that I might have to put a load-balancer in front of the PULL
>>>> replicas and direct queries at them manually as nodes are added and
>>>> as PULL replicas. However, it seems that SolrCloud handles this
>>>> If I add a new PULL replica node, it goes into state="recovering" while
>>>> pulls the core. As expected. What happens if queries are directed at
>>>> node while in this state? From what I am observing, the query gets
>>>> to another node?
>>>> If SolrCloud is handling the routing of requests to active nodes, will
>>>> automatically favour PULL replicas for read queries and TLOG replicas
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland