Thanks for the detailed explanation Eric. Really helped clear up my
understanding.

On Fri, Jun 29, 2018 at 8:04 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> In your setup, the load balancer prevents single points of failure.
>
> Since you're pinging a URL, what happens if that node dies or is turned
> off?
> Your PHP program has no way of knowing what to do, but the load
> balancer does.
>
> Your understanding of Zookeeper's role shows a common misconception.
>
> Zookeeper keeps track of the topology of the collections, what nodes are
> up,
> what ones down etc. It does _not_ have anything to do with distributing
> queries
> or updates. Imagine a 1,000 node collection. If each and every request had
> to go through Zookeeper, that would be a bottleneck.
>
> Instead, when each node's state changes, it informs Zookeeper which in turn
> informs all the other Solr nodes who care. It looks like this.
> - node starts up.
> - as each replica comes up, it informs Zookeeper that it is now "active".
> - for each collection with any replica on that node, a "watch" is set on
> the
>    collection's state.json node in Zookeeper
> - every time that state.json node changes, Zookeeper notifies
>    the node.
> - eventually everything starts all the state changes are broadcast
>   and Zookeeper just sits there.
> - periodically Zookeeper pings each Solr node and if it has gone away
>   it informs all the Solr nodes that this node is dead
>   and the Solr node updates it's snapshot of the cluster's
>   topologyl
>
> A query comes in to a Solr node and this is what happens:
> - the Solr node looks in it's Zookeeper information to see
>   where all the replicas for the collection are.
> - Solr picks one replica from each shard and sends the
>    subquery to them
> - Solr assembles the response from the subrequests
> - Solr sends the response to the client.
>
> note that Zookeeper isn't involved at all. In fact, Zookeeper
> can go away completely and each Solr node will work on it's
> last snapshot of the topology of the network and answer
> _queries_. Updates will fail completely if Zookeeper falls
> below quorum, but Zookeeper isn't handling the _update_.
> It's still Solr knowing that Zookeeper is below quorum
> and refusing to process an update.
>
> There's more going on of course, but that's the general outline.
>
> Since you're using PHP, it doesn't know about Zookeeper, all it
> has is a URL so as I mentioned above, if that node goes away
> it's your php program that's not Zookeeper-aware.
>
> If you were using "CloudSolrClient" in SolrJ, it _is_ Zookeeper
> aware and you would not need a load balancer. But again
> that's because it knows the cluster topology (it registers its own
> watchers) and can "do the right thing" if something goes away.
> Zookeeper is still not directly involved in processing queries
> or updates.
>
> Best,
> Erick
>
> On Fri, Jun 29, 2018 at 7:31 PM, Sushant Vengurlekar
> <svengurle...@curvolabs.com> wrote:
> > Thanks for your reply. I have a follow up question. Why is a load
> balancer
> > needed? Isn't that the job of zookeeper to loadbalance queries across
> solr
> > nodes?
> >
> > I was under the impression that you send query to zookeeper and it
> handles
> > the rest and sends the response back. Can you please enlighten .me on
> that
> > one.
> >
> > Thank you
> >
> > On Fri, Jun 29, 2018 at 7:19 PM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> >> You send your queries and updates directly to Solr's collection e.g.
> >> http://host:port/solr/<your_collection_name>. You can use any Solr node
> >> for
> >> this request. If the node does not have the collection being queried
> then
> >> the request will be forwarded internally to a Solr instance which has
> that
> >> collection.
> >>
> >> ZooKeeper is used by Solr's Java client to look up the list of Solr
> nodes
> >> having the collection being queried. But if you are using PHP then you
> can
> >> probably keep a list of Solr nodes in configuration and randomly choose
> >> one. A better implementation would be to setup a load balancer and put
> all
> >> Solr nodes behind it and query the load balancer URL in your
> application.
> >>
> >> On Sat, Jun 30, 2018 at 7:31 AM Sushant Vengurlekar <
> >> svengurle...@curvolabs.com> wrote:
> >>
> >> > I have a question regarding querying in solrcloud.
> >> >
> >> > I am working on php code to query solrcloud for search results. Do I
> send
> >> > the query to zookeeper or send it to a particular solr node? How does
> the
> >> > querying process work in general.
> >> >
> >> > Thank you
> >> >
> >>
> >>
> >> --
> >> Regards,
> >> Shalin Shekhar Mangar.
> >>
>

Reply via email to