Re: CloudSolrClient getDocCollection

Hendrik Haddorp Fri, 08 Feb 2019 15:23:39 -0800

Hi Jason,

thanks for your answer. Yes, you would need one watch per state.json andthus one watch per collection. That should however not really be aproblem with ZK. I would assume that the Solr server instances need tomonitor those nodes to be up to date on the cluster state. Usingorg.apache.solr.common.cloud.ZkStateReader.registerCollectionStateWatcheryou can even add a watch for that using the SolrJ API. At least for thecurrently watched collections the client should thus actually alreadyhave the correct information available. The access to that would likelybe a bit ugly though.

The CloudSolrClient also allows to set a watch on /collections usingorg.apache.solr.common.cloud.ZkStateReader.registerCloudCollectionsListener.This is actually another thing I just ran into. As the code has a watchon /collections the listener gets informed about new collections as soonas the "directory" for the collection is being created. If the listenerdoes then straight away try to access the collection info viazkStateReader.getClusterState() the DocCollection can be returned asnull as the DocCollection is build on the information stored in thestate.json file, which might not exist yet. I'm trying to monitor theSolr cluster state and thus ran into this. Not sure if I should open aJira for that.


regards,
Hendrik

On 08.02.2019 23:20, Jason Gerlowski wrote:

Hi Henrik,

I'll try to answer, and let others correct me if I stray.  I wasn't
around when CloudSolrClient was written, so take this with a grain of
salt:

"Why does the client need that timeout?....Wouldn't it make sense to
use a watch?"

You could probably write a CloudSolrClient that uses watch(es) to keep
track of changing collection state.  But I suspect you'd need a
watch-per-collection, instead of just a single watch.

Modern versions of Solr store the state for each collection in
individual "state.json" ZK nodes
("/solr/collections/<collection_name>/state.json").  To catch changes
to all of these collections, you'd need to watch each of those nodes.
Which wouldn't scale well for users who want lots of collections.  I
suspect this was one of the concerns that nudged the author(s) to use
a cache-based approach.

(Even when all collection state was stored in a single ZK node, a
watch-based CloudSolrClient would likely have scaling issues for the
many-collection use case.  The client would need to recalculate its
state information for _all_ collections any time that _any_ of the
collections changed, since it has no way to tell which collection was
changed.)

Best,

Jason

On Thu, Feb 7, 2019 at 11:44 AM Hendrik Haddorp <hendrik.hadd...@gmx.net> wrote:

Hi,

when I perform a query using the CloudSolrClient the code first
retrieves the DocCollection to determine to which instance the query
should be send [1]. getDocCollection [2] does a lookup in a cache, which
has a 60s expiration time [3]. When a DocCollection has to be reloaded
this is guarded by a lock [4]. Per default there are 3 locks, which can
cause some congestion. The main question though is why does the client
need that timeout? According to this [5] comment the code does not use a
watch. Wouldn't it make sense to use a watch? I thought the big
advantage of the CloudSolrClient is that is knows were to send requests
to, so that no extra hop needs to be done on the server side. Having to
query ZooKeeper though for the current state does however take some of
that advantage.

regards,
Hendrik

[1]
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L849
[2]
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L1180
[3]
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L162
[4]
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L1200
[5]
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L821

Re: CloudSolrClient getDocCollection

Reply via email to