Re: Does ConcurrentUpdateSolrClient apply for SolrCloud ?
One comment to complicate Erick's already-good advice. > If a doc that needs to go to shard2 is received by a replica on shard1, it > must be forwarded to the leader of shard1, introducing an extra hop. Definitely true, but I don't think that's the only factor in the relative performance of CUSC vs CSC. CUSC responds asynchronously when you're using it for updates, which lets users continue on to prepare the next set of docs while a CloudSolrClient might still be waiting to hear back from Solr. I benchmarked this recently and was surprised to see that ConcurrentUpdateSolrClient actually came out ahead in some setups. Now I'm not trying to say that CUSC performs better than CSC, just that "It Depends" (Erick's TM) on the rest of your ETL code, on the topology of your SolrCloud cluster, etc. Good luck! Jason On Wed, Oct 24, 2018 at 6:49 PM shamik wrote: > > Thanks Erick, appreciate your help > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Does ConcurrentUpdateSolrClient apply for SolrCloud ?
Thanks Erick, appreciate your help -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Does ConcurrentUpdateSolrClient apply for SolrCloud ?
No best practices as such, "whatever works" about covers it. That's not a huge query rate, especially if you have replicas per shard so I wouldn't worry too much about it. If you rack 100 clients all driving Solr as hard as possible and people complain that query responses are bad you'll know where to look first. About batching, see: https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/ YMMV of course. If I were going to give you a starting point for batching it would be on the order of at least 100 per shard. So a 5 shard collection would have at least 500 Solr documents per call to cloudSolrClient.add(doclist). Best, Erick On Wed, Oct 24, 2018 at 2:20 PM shamik wrote: > > Thanks Erick, that's extremely insightful. I'm not using batching and that's > the reason I was exploring ConcurrentUpdateSolrClient. Currently, N threads > are reusing the same CloudSolrClient to send data to Solr. Ofcourse, the > single point of failure was my biggest concern with > ConcurrentUpdateSolrClient, thanks for clarifying my doubt. > > "You also want to be a little careful how hard you drive Solr if you're also > serving queries at the same time, the more cycles you use for indexing the > fewer are available to serve queries." > > Our solr servers are also used to serve queries (50-100/minute). Our hard > commit set at 10 minutes while soft commit is disabled. Are there any best > practices (I know it's too generic, but specifically around indexing) that I > should follow? > > > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Does ConcurrentUpdateSolrClient apply for SolrCloud ?
Thanks Erick, that's extremely insightful. I'm not using batching and that's the reason I was exploring ConcurrentUpdateSolrClient. Currently, N threads are reusing the same CloudSolrClient to send data to Solr. Ofcourse, the single point of failure was my biggest concern with ConcurrentUpdateSolrClient, thanks for clarifying my doubt. "You also want to be a little careful how hard you drive Solr if you're also serving queries at the same time, the more cycles you use for indexing the fewer are available to serve queries." Our solr servers are also used to serve queries (50-100/minute). Our hard commit set at 10 minutes while soft commit is disabled. Are there any best practices (I know it's too generic, but specifically around indexing) that I should follow? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Does ConcurrentUpdateSolrClient apply for SolrCloud ?
I wouldn't use ConcurrentUpdateSolrClient for the following reasons: 1> If a doc that needs to go to shard2 is received by a replica on shard1, it must be forwarded to the leader of shard1, introducing an extra hop. CloudSolrClient subdivides the batch and sends the docs to the leader of the right shard automatically. You are batching, right? You should. 2> CloudSolrClient does the above in parallel _already_. 3> You put the load for routing docs entirely on the single Solr node you specify in the url. 4> You introduce a single point of failure (i.e. the node you specify in the url). 5> If your indexing throughput is not what you need, you can string together N SolrJ clients. Or you can create N threads in your indexing client and still get the advantages of CloudSolrClient routing docs correctly. You also want to be a little careful how hard you drive Solr if you're also serving queries at the same time, the more cycles you use for indexing the fewer are available to serve queries. Best, Erick On Wed, Oct 24, 2018 at 1:01 PM Shamik Bandopadhyay wrote: > > Hi, > >I'm looking into the possibility of using ConcurrentUpdateSolrClient for > indexing a large volume of data instead of CloudSolrClient. Having an > async,batch API seems to be a better fit for us where we tend to index a > lot of data periodically. As I'm looking into the API, I'm wonderign if > this can be used for SolrCloud. > > ConcurrentUpdateSolrClientclient = new > ConcurrentUpdateSolrClient.Builder(url).withThreadCount(100).withQueueSize(50).build(); > > The Builder object only takes a single url, not sure what that would be in > case of SolrCloud. For e.g. if I've two shards with a couple of replicas, > then what will be the server url? > > I was not able to find any relevant document or example to clarify my > doubt. Any pointers will be appreciated. > > Thanks
Does ConcurrentUpdateSolrClient apply for SolrCloud ?
Hi, I'm looking into the possibility of using ConcurrentUpdateSolrClient for indexing a large volume of data instead of CloudSolrClient. Having an async,batch API seems to be a better fit for us where we tend to index a lot of data periodically. As I'm looking into the API, I'm wonderign if this can be used for SolrCloud. ConcurrentUpdateSolrClientclient = new ConcurrentUpdateSolrClient.Builder(url).withThreadCount(100).withQueueSize(50).build(); The Builder object only takes a single url, not sure what that would be in case of SolrCloud. For e.g. if I've two shards with a couple of replicas, then what will be the server url? I was not able to find any relevant document or example to clarify my doubt. Any pointers will be appreciated. Thanks