Re: Solr4 update and query performance question
bq: There is no batching while updating/inserting documents in Solr3 Correct, but all the updates only went to the server you targeted them for. The batching you're seeing is the auto-distributing the docs to the various shards, a whole different animal. Keep an eye on: https://issues.apache.org/jira/browse/SOLR-4816. You might prompt Joel to see if this is testable. This JIRA routes the docs directly to the leader of the shard they should go to. IOW it does the routing on the client side. There will still be batching from the leader to the replicas, but this should help. It is usually a Bad Thing to commit after every batch either in Solr 3 or Solr 4 from the client. I suspect you're right that the wait for all the searchers on all the shards is one of your problems. Try configuring autocommit (both hard and soft) in solrconfig.xml and forgetting the commit bits from the client. This is the usual pattern in Solr4. Your soft commit (which may be commented out) controls when the documents are searchable. It is less expensive than hard commits with openSearcher=true and makes docs visible. Hard commit closes the current segment and opens a new one. So set up openSearcher=false for your hard commit and a soft commit interval of whatever latency you can stand would by my recommendation. Final note: if you set your hard commit with openSearcher=false, do it fairly often since it truncates the transaction logs and is quite inexpensive. If you let your tlog grow huge, if you kill your server and re-start Solr you get into a situation where solr may replay the tlog. If it has a bazillion docs in it that can take a very long time to start up. Best Erick On Wed, Aug 14, 2013 at 4:39 PM, Joshi, Shital shital.jo...@gs.com wrote: We didn't copy/paste Solr3 config to solr4. We started with Solr4 config and only updated new searcher queries and few other things. There is no batching while updating/inserting documents in Solr3, is that correct? Committing 1000 documents in Solr3 takes 19 seconds while in Solr4 it takes about 3-4 minutes. We noticed in Solr4 logs that, commit only returns after new searcher is created across all nodes. This is possibly cause waitSearcher=true by default in Solr4. This was not the case with Solr3, commit would return without waiting for new searcher creation. In order to improve performance with Solr4, we first changed from commit=true to commit=false in update URL and added autoHardCommit setting in solrconfig.xml. This improved performance from 3-4 minutes to 1-2 minutes but that is not good enough. Then we changed maxBufferedAddsPerServer value in SolrCmdDistributor class from 10 to 1000 and deployed this class in $JETTY_TEMP_FOLDER/solr-webapp/webapp/WEB-INF/classes folder and restarted solr4 nodes. But we still see the batch size of 10 being used. Did we change correct variable/class? Next thing We will try using softCommit=true in update url and check if it gives us desired performance. Thanks for looking into this. Appreciate your help. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, August 13, 2013 8:12 AM To: solr-user@lucene.apache.org Subject: Re: Solr4 update and query performance question 1 That's hard-coded at present. There's anecdotal evidence that there are throughput improvements with larger batch sizes, but no action yet. 2 Yep, all searchers are also re-opened, caches re-warmed, etc. 3 Odd. I'm assuming your Solr3 was master/slave setup? Seeing the queries would help diagnose this. Also, did you try to copy/paste the configuration from your Solr3 to Solr4? I'd start with the Solr4 and copy/paste only the parts needed from your SOlr3 setup. Best Erick On Mon, Aug 12, 2013 at 11:38 AM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes with about 450 mil documents (~90 mil per shard). We're loading 1000 or less documents in CSV format every few minutes. In Solr3, with 300 mil documents, it used to take 30 seconds to load 1000 documents while in Solr4, its taking up to 3 minutes to load 1000 documents. We're using custom sharding, we include _shard_=shardid parameter in update command. Upon looking Solr4 log files we found that: 1. Documents are added in a batch of 10 records. How do we increase this batch size from 10 to 1000 documents? 2. We do hard commit after loading 1000 documents. For every hard commit, it refreshes searcher on all nodes. Are all caches also refreshed when hard commit happens? We're planning to change to soft commit and do auto hard commit every 10-15 minutes. 3. We're not seeing improved query performance compared to Solr3. Queries which took 3-5 seconds in Solr3 (300 mil docs) are taking 20 seconds with Solr4. We think this could be due to frequent hard commits and searcher refresh. Do you think
RE: Solr4 update and query performance question
We didn't copy/paste Solr3 config to solr4. We started with Solr4 config and only updated new searcher queries and few other things. There is no batching while updating/inserting documents in Solr3, is that correct? Committing 1000 documents in Solr3 takes 19 seconds while in Solr4 it takes about 3-4 minutes. We noticed in Solr4 logs that, commit only returns after new searcher is created across all nodes. This is possibly cause waitSearcher=true by default in Solr4. This was not the case with Solr3, commit would return without waiting for new searcher creation. In order to improve performance with Solr4, we first changed from commit=true to commit=false in update URL and added autoHardCommit setting in solrconfig.xml. This improved performance from 3-4 minutes to 1-2 minutes but that is not good enough. Then we changed maxBufferedAddsPerServer value in SolrCmdDistributor class from 10 to 1000 and deployed this class in $JETTY_TEMP_FOLDER/solr-webapp/webapp/WEB-INF/classes folder and restarted solr4 nodes. But we still see the batch size of 10 being used. Did we change correct variable/class? Next thing We will try using softCommit=true in update url and check if it gives us desired performance. Thanks for looking into this. Appreciate your help. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, August 13, 2013 8:12 AM To: solr-user@lucene.apache.org Subject: Re: Solr4 update and query performance question 1 That's hard-coded at present. There's anecdotal evidence that there are throughput improvements with larger batch sizes, but no action yet. 2 Yep, all searchers are also re-opened, caches re-warmed, etc. 3 Odd. I'm assuming your Solr3 was master/slave setup? Seeing the queries would help diagnose this. Also, did you try to copy/paste the configuration from your Solr3 to Solr4? I'd start with the Solr4 and copy/paste only the parts needed from your SOlr3 setup. Best Erick On Mon, Aug 12, 2013 at 11:38 AM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes with about 450 mil documents (~90 mil per shard). We're loading 1000 or less documents in CSV format every few minutes. In Solr3, with 300 mil documents, it used to take 30 seconds to load 1000 documents while in Solr4, its taking up to 3 minutes to load 1000 documents. We're using custom sharding, we include _shard_=shardid parameter in update command. Upon looking Solr4 log files we found that: 1. Documents are added in a batch of 10 records. How do we increase this batch size from 10 to 1000 documents? 2. We do hard commit after loading 1000 documents. For every hard commit, it refreshes searcher on all nodes. Are all caches also refreshed when hard commit happens? We're planning to change to soft commit and do auto hard commit every 10-15 minutes. 3. We're not seeing improved query performance compared to Solr3. Queries which took 3-5 seconds in Solr3 (300 mil docs) are taking 20 seconds with Solr4. We think this could be due to frequent hard commits and searcher refresh. Do you think when we change to soft commit and increase the batch size, we will see better query performance. Thanks!
Re: Solr4 update and query performance question
1 That's hard-coded at present. There's anecdotal evidence that there are throughput improvements with larger batch sizes, but no action yet. 2 Yep, all searchers are also re-opened, caches re-warmed, etc. 3 Odd. I'm assuming your Solr3 was master/slave setup? Seeing the queries would help diagnose this. Also, did you try to copy/paste the configuration from your Solr3 to Solr4? I'd start with the Solr4 and copy/paste only the parts needed from your SOlr3 setup. Best Erick On Mon, Aug 12, 2013 at 11:38 AM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes with about 450 mil documents (~90 mil per shard). We're loading 1000 or less documents in CSV format every few minutes. In Solr3, with 300 mil documents, it used to take 30 seconds to load 1000 documents while in Solr4, its taking up to 3 minutes to load 1000 documents. We're using custom sharding, we include _shard_=shardid parameter in update command. Upon looking Solr4 log files we found that: 1. Documents are added in a batch of 10 records. How do we increase this batch size from 10 to 1000 documents? 2. We do hard commit after loading 1000 documents. For every hard commit, it refreshes searcher on all nodes. Are all caches also refreshed when hard commit happens? We're planning to change to soft commit and do auto hard commit every 10-15 minutes. 3. We're not seeing improved query performance compared to Solr3. Queries which took 3-5 seconds in Solr3 (300 mil docs) are taking 20 seconds with Solr4. We think this could be due to frequent hard commits and searcher refresh. Do you think when we change to soft commit and increase the batch size, we will see better query performance. Thanks!