Your question about directing queries to PULL replicas only has been discussed on the list. Look for topic "Limit search queries only to pull replicas". What I'd like to see is something similar to the preferLocalShards parameter. It could be something like "preferReplicaTypes=TLOG,PULL". Tomás mentioned previously that SOLR-10880 could be used as a base for such funtionality, and I'm considering taking a stab at implementing it.

--Ere

Greg Roodt kirjoitti 12.2.2018 klo 6.55:
Thank you both for your very detailed answers.

This is great to know. I knew that SolrJ had the cluster aware knowledge
(via zookeeper), but I was wondering what something like curl would do.
Great to know that internally the cluster will proxy queries to the
appropriate place regardless.

I am running the single shard scenario. I'm thinking of using a dedicated
HTTP load-balancer in front of the PULL replicas only with read-only
queries directed directly at the load-balancer. In this situation, the
healthy PULL replicas *should* handle the queries on the node itself
without a proxy hop (assuming state=active). New PULL replicas added to the
load-balancer will internally proxy queries to the other PULL or TLOG
replicas while in state=recovering until the switch to state=active.

Is my understanding correct?

Is this sensible to do, or is it not worth it due to the smart proxying
that SolrCloud can do anyway?

If the TLOG and PULL replicas are so similar, is there any real advantage
to having a mixed cluster? I assume a bit less work is required across the
cluster to propagate writes if you only have 3 TLOG nodes vs 10+ PULL
nodes? Or would it be better to just have 13 TLOG nodes?





On 12 February 2018 at 15:24, Tomas Fernandez Lobbe <tflo...@apple.com>
wrote:

On the last question:
For Writes: Yes. Writes are going to be sent to the shard leader, and
since PULL replicas can’t  be leaders, it’s going to be a TLOG replica. If
you are using CloudSolrClient, then this routing will be done directly from
the client (since it will send the update to the leader), and if you are
using some other HTTP client, then yes, the PULL replica will forward the
update, the same way any non-leader node would.

For reads: this won’t happen today, and any replica can respond to
queries. I do believe there is value in this kind of routing logic,
sometimes you simply don’t want the leader to handle any queries, specially
when queries can be expensive. You could do this today if you want, by
putting some load balancer in front and just direct your queries to the
nodes you know are PULL, but keep in mind that this would only work in the
single shard scenario, and only if you hit an active replica (otherwise, as
you said, the query will be routed to any other node of the shard,
regardless of the type), if you have multiple shards then you need to use
the “shards” parameter and tell Solr exactly which nodes you want to hit
for each shard (the “shards” approach can also be done in the single shard
case, although you would be adding an extra hop I believe)

Tomás
Sent from my iPhone

On Feb 11, 2018, at 6:35 PM, Greg Roodt <gro...@gmail.com> wrote:

Hi

I have a question around how queries are routed and load-balanced in a
cluster of mixed TLOG and PULL replicas.

I thought that I might have to put a load-balancer in front of the PULL
replicas and direct queries at them manually as nodes are added and
removed
as PULL replicas. However, it seems that SolrCloud handles this
automatically?

If I add a new PULL replica node, it goes into state="recovering" while
it
pulls the core. As expected. What happens if queries are directed at this
node while in this state? From what I am observing, the query gets
directed
to another node?

If SolrCloud is handling the routing of requests to active nodes, will it
automatically favour PULL replicas for read queries and TLOG replicas for
writes?

Thanks
Greg



--
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Reply via email to