I kind of think this might be "working as designed", but I'll be happy to
be corrected by others :)

We had a similar issue which we discovered by accident, we had 2 or 3
collections spread across some machines, and we accidentally tried to send
an indexing request to a node in teh cloud that didn't have a replica of
collection1 (but it had other collections). We saw an instant jump in
indexing latency to 5s, which given the previous latencies had been ~20ms
was rather obvious!

Querying seems to be fine with this kind of forwarding approach, but
indexing would logically require ZK information (to find the right shard
for the destination collection and the leader of that shard), so I'm
wondering if a node in the cloud that has a replica of collection1 has that
information cached, whereas a node in the (same) cloud that only has a
collection2 replica only has collection2 information cached, and has to go
to ZK for every "forwarding" request.

I haven't checked the code recently, but that seems plausible to me. Would
you really want all your collection2 nodes to be running ZK watches for all
collection1 updates as well as their own collection2 watches, that would
clog them up processing updates that in all honestly, they shouldn't have
to deal with. Every node in the cloud would have to have a watch on
everything else which if you have a lot of independent collections would be
an unnecessary burden on each of them.

If you use SolrJ as a client, that would route to a correct node in the
cloud (which is what we ended up using through JNI which was
"interesting"), but if you are using HTTP to index, that's something your
application has to take care of.

On 28 October 2014 19:29, Matt Hilt <matt.h...@numerica.us> wrote:

> I have three equal machines each running solr cloud (4.8). I have multiple
> collections that are replicated but not sharded. I also have document
> generation processes running on these nodes which involves querying the
> collection ~5 times per document generated.
>
> Node 1 has a replica of collection A and is running document generation
> code that pushes to the HTTP /update/json hander.
> Node 2 is the leader of collection A.
> Node 3 does not have a replica of node A, but is running document
> generation code for collection A.
>
> The issue I see is that node 1 can push documents into Solr 3-5 times
> faster than node 3 when they both talk to the solr instance on their
> localhost. If either of them talk directly to the solr instance on node 2,
> the performance is excellent (on par with node 1). To me it seems that the
> only difference in these cases is the query/put request forwarding. Does
> this involve some slow zookeeper communication that should be avoided? Any
> other insights?
>
> Thanks

Reply via email to