In your setup, the load balancer prevents single points of failure.

Since you're pinging a URL, what happens if that node dies or is turned off?
Your PHP program has no way of knowing what to do, but the load
balancer does.

Your understanding of Zookeeper's role shows a common misconception.

Zookeeper keeps track of the topology of the collections, what nodes are up,
what ones down etc. It does _not_ have anything to do with distributing queries
or updates. Imagine a 1,000 node collection. If each and every request had
to go through Zookeeper, that would be a bottleneck.

Instead, when each node's state changes, it informs Zookeeper which in turn
informs all the other Solr nodes who care. It looks like this.
- node starts up.
- as each replica comes up, it informs Zookeeper that it is now "active".
- for each collection with any replica on that node, a "watch" is set on the
   collection's state.json node in Zookeeper
- every time that state.json node changes, Zookeeper notifies
   the node.
- eventually everything starts all the state changes are broadcast
  and Zookeeper just sits there.
- periodically Zookeeper pings each Solr node and if it has gone away
  it informs all the Solr nodes that this node is dead
  and the Solr node updates it's snapshot of the cluster's
  topologyl

A query comes in to a Solr node and this is what happens:
- the Solr node looks in it's Zookeeper information to see
  where all the replicas for the collection are.
- Solr picks one replica from each shard and sends the
   subquery to them
- Solr assembles the response from the subrequests
- Solr sends the response to the client.

note that Zookeeper isn't involved at all. In fact, Zookeeper
can go away completely and each Solr node will work on it's
last snapshot of the topology of the network and answer
_queries_. Updates will fail completely if Zookeeper falls
below quorum, but Zookeeper isn't handling the _update_.
It's still Solr knowing that Zookeeper is below quorum
and refusing to process an update.

There's more going on of course, but that's the general outline.

Since you're using PHP, it doesn't know about Zookeeper, all it
has is a URL so as I mentioned above, if that node goes away
it's your php program that's not Zookeeper-aware.

If you were using "CloudSolrClient" in SolrJ, it _is_ Zookeeper
aware and you would not need a load balancer. But again
that's because it knows the cluster topology (it registers its own
watchers) and can "do the right thing" if something goes away.
Zookeeper is still not directly involved in processing queries
or updates.

Best,
Erick

On Fri, Jun 29, 2018 at 7:31 PM, Sushant Vengurlekar
<svengurle...@curvolabs.com> wrote:
> Thanks for your reply. I have a follow up question. Why is a load balancer
> needed? Isn't that the job of zookeeper to loadbalance queries across solr
> nodes?
>
> I was under the impression that you send query to zookeeper and it handles
> the rest and sends the response back. Can you please enlighten .me on that
> one.
>
> Thank you
>
> On Fri, Jun 29, 2018 at 7:19 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> You send your queries and updates directly to Solr's collection e.g.
>> http://host:port/solr/<your_collection_name>. You can use any Solr node
>> for
>> this request. If the node does not have the collection being queried then
>> the request will be forwarded internally to a Solr instance which has that
>> collection.
>>
>> ZooKeeper is used by Solr's Java client to look up the list of Solr nodes
>> having the collection being queried. But if you are using PHP then you can
>> probably keep a list of Solr nodes in configuration and randomly choose
>> one. A better implementation would be to setup a load balancer and put all
>> Solr nodes behind it and query the load balancer URL in your application.
>>
>> On Sat, Jun 30, 2018 at 7:31 AM Sushant Vengurlekar <
>> svengurle...@curvolabs.com> wrote:
>>
>> > I have a question regarding querying in solrcloud.
>> >
>> > I am working on php code to query solrcloud for search results. Do I send
>> > the query to zookeeper or send it to a particular solr node? How does the
>> > querying process work in general.
>> >
>> > Thank you
>> >
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>

Reply via email to