Re: Slow forwarding requests to collection leader
Thanks for the info Daniel. I will go forth and make a better client. On Oct 29, 2014, at 2:28 AM, Daniel Collins danwcoll...@gmail.com wrote: I kind of think this might be working as designed, but I'll be happy to be corrected by others :) We had a similar issue which we discovered by accident, we had 2 or 3 collections spread across some machines, and we accidentally tried to send an indexing request to a node in teh cloud that didn't have a replica of collection1 (but it had other collections). We saw an instant jump in indexing latency to 5s, which given the previous latencies had been ~20ms was rather obvious! Querying seems to be fine with this kind of forwarding approach, but indexing would logically require ZK information (to find the right shard for the destination collection and the leader of that shard), so I'm wondering if a node in the cloud that has a replica of collection1 has that information cached, whereas a node in the (same) cloud that only has a collection2 replica only has collection2 information cached, and has to go to ZK for every forwarding request. I haven't checked the code recently, but that seems plausible to me. Would you really want all your collection2 nodes to be running ZK watches for all collection1 updates as well as their own collection2 watches, that would clog them up processing updates that in all honestly, they shouldn't have to deal with. Every node in the cloud would have to have a watch on everything else which if you have a lot of independent collections would be an unnecessary burden on each of them. If you use SolrJ as a client, that would route to a correct node in the cloud (which is what we ended up using through JNI which was interesting), but if you are using HTTP to index, that's something your application has to take care of. On 28 October 2014 19:29, Matt Hilt matt.h...@numerica.us wrote: I have three equal machines each running solr cloud (4.8). I have multiple collections that are replicated but not sharded. I also have document generation processes running on these nodes which involves querying the collection ~5 times per document generated. Node 1 has a replica of collection A and is running document generation code that pushes to the HTTP /update/json hander. Node 2 is the leader of collection A. Node 3 does not have a replica of node A, but is running document generation code for collection A. The issue I see is that node 1 can push documents into Solr 3-5 times faster than node 3 when they both talk to the solr instance on their localhost. If either of them talk directly to the solr instance on node 2, the performance is excellent (on par with node 1). To me it seems that the only difference in these cases is the query/put request forwarding. Does this involve some slow zookeeper communication that should be avoided? Any other insights? Thanks smime.p7s Description: S/MIME cryptographic signature
Re: Slow forwarding requests to collection leader
Matt: You might want to look at SolrJ, in particular with the use of CloudSolrServer. The big benefit here is that it'll route the docs to the correct leader for each shard rather than relying on the nodes to communicate with each other. Here's a SolrJ example. NOTE: it used ConcurrentUpdateSolrServer which you should replace with CloudSolrServer. Other than making the c'tor work, that should be the only change you need as far as instantiating the right Solr Server. This one connects with a DB and also parses Tika files, but you should be able to remove all that without too much problem. https://lucidworks.com/blog/indexing-with-solrj/ Best, Erick On Thu, Oct 30, 2014 at 10:08 AM, Matt Hilt matt.h...@numerica.us wrote: Thanks for the info Daniel. I will go forth and make a better client. On Oct 29, 2014, at 2:28 AM, Daniel Collins danwcoll...@gmail.com wrote: I kind of think this might be working as designed, but I'll be happy to be corrected by others :) We had a similar issue which we discovered by accident, we had 2 or 3 collections spread across some machines, and we accidentally tried to send an indexing request to a node in teh cloud that didn't have a replica of collection1 (but it had other collections). We saw an instant jump in indexing latency to 5s, which given the previous latencies had been ~20ms was rather obvious! Querying seems to be fine with this kind of forwarding approach, but indexing would logically require ZK information (to find the right shard for the destination collection and the leader of that shard), so I'm wondering if a node in the cloud that has a replica of collection1 has that information cached, whereas a node in the (same) cloud that only has a collection2 replica only has collection2 information cached, and has to go to ZK for every forwarding request. I haven't checked the code recently, but that seems plausible to me. Would you really want all your collection2 nodes to be running ZK watches for all collection1 updates as well as their own collection2 watches, that would clog them up processing updates that in all honestly, they shouldn't have to deal with. Every node in the cloud would have to have a watch on everything else which if you have a lot of independent collections would be an unnecessary burden on each of them. If you use SolrJ as a client, that would route to a correct node in the cloud (which is what we ended up using through JNI which was interesting), but if you are using HTTP to index, that's something your application has to take care of. On 28 October 2014 19:29, Matt Hilt matt.h...@numerica.us wrote: I have three equal machines each running solr cloud (4.8). I have multiple collections that are replicated but not sharded. I also have document generation processes running on these nodes which involves querying the collection ~5 times per document generated. Node 1 has a replica of collection A and is running document generation code that pushes to the HTTP /update/json hander. Node 2 is the leader of collection A. Node 3 does not have a replica of node A, but is running document generation code for collection A. The issue I see is that node 1 can push documents into Solr 3-5 times faster than node 3 when they both talk to the solr instance on their localhost. If either of them talk directly to the solr instance on node 2, the performance is excellent (on par with node 1). To me it seems that the only difference in these cases is the query/put request forwarding. Does this involve some slow zookeeper communication that should be avoided? Any other insights? Thanks
Re: Slow forwarding requests to collection leader
+1 for CloudSolrServer CloudSolrServer also has built in fault tolerance (i.e. if the master shard is not reachable then it adds to the replica) and much better error reporting than ConcurrentUpdateSolrServer. The only downside is lack of batching. As long as you are adding documents in decent size batches (can also use multiple threads to add), you will get good indexing performance. CP On Thu, Oct 30, 2014 at 6:53 PM, Erick Erickson erickerick...@gmail.com wrote: Matt: You might want to look at SolrJ, in particular with the use of CloudSolrServer. The big benefit here is that it'll route the docs to the correct leader for each shard rather than relying on the nodes to communicate with each other. Here's a SolrJ example. NOTE: it used ConcurrentUpdateSolrServer which you should replace with CloudSolrServer. Other than making the c'tor work, that should be the only change you need as far as instantiating the right Solr Server. This one connects with a DB and also parses Tika files, but you should be able to remove all that without too much problem. https://lucidworks.com/blog/indexing-with-solrj/ Best, Erick On Thu, Oct 30, 2014 at 10:08 AM, Matt Hilt matt.h...@numerica.us wrote: Thanks for the info Daniel. I will go forth and make a better client. On Oct 29, 2014, at 2:28 AM, Daniel Collins danwcoll...@gmail.com wrote: I kind of think this might be working as designed, but I'll be happy to be corrected by others :) We had a similar issue which we discovered by accident, we had 2 or 3 collections spread across some machines, and we accidentally tried to send an indexing request to a node in teh cloud that didn't have a replica of collection1 (but it had other collections). We saw an instant jump in indexing latency to 5s, which given the previous latencies had been ~20ms was rather obvious! Querying seems to be fine with this kind of forwarding approach, but indexing would logically require ZK information (to find the right shard for the destination collection and the leader of that shard), so I'm wondering if a node in the cloud that has a replica of collection1 has that information cached, whereas a node in the (same) cloud that only has a collection2 replica only has collection2 information cached, and has to go to ZK for every forwarding request. I haven't checked the code recently, but that seems plausible to me. Would you really want all your collection2 nodes to be running ZK watches for all collection1 updates as well as their own collection2 watches, that would clog them up processing updates that in all honestly, they shouldn't have to deal with. Every node in the cloud would have to have a watch on everything else which if you have a lot of independent collections would be an unnecessary burden on each of them. If you use SolrJ as a client, that would route to a correct node in the cloud (which is what we ended up using through JNI which was interesting), but if you are using HTTP to index, that's something your application has to take care of. On 28 October 2014 19:29, Matt Hilt matt.h...@numerica.us wrote: I have three equal machines each running solr cloud (4.8). I have multiple collections that are replicated but not sharded. I also have document generation processes running on these nodes which involves querying the collection ~5 times per document generated. Node 1 has a replica of collection A and is running document generation code that pushes to the HTTP /update/json hander. Node 2 is the leader of collection A. Node 3 does not have a replica of node A, but is running document generation code for collection A. The issue I see is that node 1 can push documents into Solr 3-5 times faster than node 3 when they both talk to the solr instance on their localhost. If either of them talk directly to the solr instance on node 2, the performance is excellent (on par with node 1). To me it seems that the only difference in these cases is the query/put request forwarding. Does this involve some slow zookeeper communication that should be avoided? Any other insights? Thanks
Re: Slow forwarding requests to collection leader
I kind of think this might be working as designed, but I'll be happy to be corrected by others :) We had a similar issue which we discovered by accident, we had 2 or 3 collections spread across some machines, and we accidentally tried to send an indexing request to a node in teh cloud that didn't have a replica of collection1 (but it had other collections). We saw an instant jump in indexing latency to 5s, which given the previous latencies had been ~20ms was rather obvious! Querying seems to be fine with this kind of forwarding approach, but indexing would logically require ZK information (to find the right shard for the destination collection and the leader of that shard), so I'm wondering if a node in the cloud that has a replica of collection1 has that information cached, whereas a node in the (same) cloud that only has a collection2 replica only has collection2 information cached, and has to go to ZK for every forwarding request. I haven't checked the code recently, but that seems plausible to me. Would you really want all your collection2 nodes to be running ZK watches for all collection1 updates as well as their own collection2 watches, that would clog them up processing updates that in all honestly, they shouldn't have to deal with. Every node in the cloud would have to have a watch on everything else which if you have a lot of independent collections would be an unnecessary burden on each of them. If you use SolrJ as a client, that would route to a correct node in the cloud (which is what we ended up using through JNI which was interesting), but if you are using HTTP to index, that's something your application has to take care of. On 28 October 2014 19:29, Matt Hilt matt.h...@numerica.us wrote: I have three equal machines each running solr cloud (4.8). I have multiple collections that are replicated but not sharded. I also have document generation processes running on these nodes which involves querying the collection ~5 times per document generated. Node 1 has a replica of collection A and is running document generation code that pushes to the HTTP /update/json hander. Node 2 is the leader of collection A. Node 3 does not have a replica of node A, but is running document generation code for collection A. The issue I see is that node 1 can push documents into Solr 3-5 times faster than node 3 when they both talk to the solr instance on their localhost. If either of them talk directly to the solr instance on node 2, the performance is excellent (on par with node 1). To me it seems that the only difference in these cases is the query/put request forwarding. Does this involve some slow zookeeper communication that should be avoided? Any other insights? Thanks
Slow forwarding requests to collection leader
I have three equal machines each running solr cloud (4.8). I have multiple collections that are replicated but not sharded. I also have document generation processes running on these nodes which involves querying the collection ~5 times per document generated. Node 1 has a replica of collection A and is running document generation code that pushes to the HTTP /update/json hander. Node 2 is the leader of collection A. Node 3 does not have a replica of node A, but is running document generation code for collection A. The issue I see is that node 1 can push documents into Solr 3-5 times faster than node 3 when they both talk to the solr instance on their localhost. If either of them talk directly to the solr instance on node 2, the performance is excellent (on par with node 1). To me it seems that the only difference in these cases is the query/put request forwarding. Does this involve some slow zookeeper communication that should be avoided? Any other insights? Thanks smime.p7s Description: S/MIME cryptographic signature