Re: Solr4 update and query performance question

2013-08-15 Thread Erick Erickson
bq: There is no batching while updating/inserting documents in Solr3

Correct, but all the updates only went to the server you targeted them for.
The batching you're seeing is the auto-distributing the docs to the various
shards, a whole different animal.

Keep an eye on: https://issues.apache.org/jira/browse/SOLR-4816. You might
prompt Joel to see if this is testable. This JIRA routes the docs directly
to the leader of the shard they should go to. IOW it does the routing on
the client side. There will still be batching from the leader to the
replicas, but this should help.

It is usually a Bad Thing to commit after every batch either in Solr 3 or
Solr 4 from the client. I suspect you're right that the wait for all the
searchers on all the shards is one of your problems. Try configuring
autocommit (both hard and soft) in solrconfig.xml and forgetting the commit
bits from the client. This is the usual pattern in Solr4.

Your soft commit (which may be commented out) controls when the documents
are searchable. It is less expensive than hard commits with
openSearcher=true and makes docs visible. Hard commit closes the current
segment and opens a new one. So set up openSearcher=false for your hard
commit and a soft commit interval of whatever latency you can stand would
by my recommendation.

Final note: if you set your hard commit with openSearcher=false, do it
fairly often since it truncates the transaction logs and is quite
inexpensive. If you let your tlog grow huge, if you kill your server and
re-start Solr you get into a situation where solr may replay the tlog. If
it has a bazillion docs in it that can take a very long time to start up.

Best
Erick




On Wed, Aug 14, 2013 at 4:39 PM, Joshi, Shital shital.jo...@gs.com wrote:

 We didn't copy/paste Solr3 config to solr4. We started with Solr4 config
 and only updated new searcher queries and few other things.

 There is no batching while updating/inserting documents in Solr3, is that
 correct? Committing 1000 documents in Solr3 takes 19 seconds while in Solr4
 it takes about 3-4 minutes. We noticed in Solr4 logs that, commit only
 returns after new searcher is created across all nodes. This is possibly
 cause waitSearcher=true by default in Solr4. This was not the case with
 Solr3, commit would return without waiting for new searcher creation.

 In order to improve performance with Solr4, we first changed from
 commit=true to commit=false in update URL and added autoHardCommit setting
 in solrconfig.xml. This improved performance from 3-4 minutes to 1-2
 minutes but that is not good enough.

 Then we changed maxBufferedAddsPerServer value in SolrCmdDistributor class
 from 10 to 1000 and deployed this class in
 $JETTY_TEMP_FOLDER/solr-webapp/webapp/WEB-INF/classes folder and restarted
 solr4 nodes. But we still see the batch size of 10 being used. Did we
 change correct variable/class?

 Next thing We will try using softCommit=true in update url and check if it
 gives us desired performance.

 Thanks for looking into this. Appreciate your help.

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, August 13, 2013 8:12 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr4 update and query performance question

 1 That's hard-coded at present. There's anecdotal evidence that there
  are throughput improvements with larger batch sizes, but no action
  yet.
 2 Yep, all searchers are also re-opened, caches re-warmed, etc.
 3 Odd. I'm assuming your Solr3 was master/slave setup? Seeing the
 queries would help diagnose this. Also, did you try to copy/paste
 the configuration from your Solr3 to Solr4? I'd start with the
 Solr4 and copy/paste only the parts needed from your SOlr3 setup.

 Best
 Erick


 On Mon, Aug 12, 2013 at 11:38 AM, Joshi, Shital shital.jo...@gs.com
 wrote:

  Hi,
 
  We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes
  with about 450 mil documents (~90 mil per shard). We're loading 1000 or
  less documents in CSV format every few minutes. In Solr3, with 300 mil
  documents, it used to take 30 seconds to load 1000 documents while in
  Solr4, its taking up to 3 minutes to load 1000 documents. We're using
  custom sharding, we include _shard_=shardid parameter in update command.
  Upon looking Solr4 log files we found that:
 
  1.   Documents are added in a batch of 10 records. How do we increase
  this batch size from 10 to 1000 documents?
 
  2.  We do hard commit after loading 1000 documents. For every hard
  commit, it refreshes searcher on all nodes. Are all caches also refreshed
  when hard commit happens? We're planning to change to soft commit and do
  auto hard commit every 10-15 minutes.
 
  3.  We're not seeing improved query performance compared to Solr3.
  Queries which took 3-5 seconds in Solr3 (300 mil docs) are taking 20
  seconds with Solr4. We think this could be due to frequent hard commits
 and
  searcher refresh. Do you think

RE: Solr4 update and query performance question

2013-08-14 Thread Joshi, Shital
We didn't copy/paste Solr3 config to solr4. We started with Solr4 config and 
only updated new searcher queries and few other things.

There is no batching while updating/inserting documents in Solr3, is that 
correct? Committing 1000 documents in Solr3 takes 19 seconds while in Solr4 it 
takes about 3-4 minutes. We noticed in Solr4 logs that, commit only returns 
after new searcher is created across all nodes. This is possibly cause 
waitSearcher=true by default in Solr4. This was not the case with Solr3, commit 
would return without waiting for new searcher creation. 

In order to improve performance with Solr4, we first changed from commit=true 
to commit=false in update URL and added autoHardCommit setting in 
solrconfig.xml. This improved performance from 3-4 minutes to 1-2 minutes but 
that is not good enough. 

Then we changed maxBufferedAddsPerServer value in SolrCmdDistributor class from 
10 to 1000 and deployed this class in 
$JETTY_TEMP_FOLDER/solr-webapp/webapp/WEB-INF/classes folder and restarted 
solr4 nodes. But we still see the batch size of 10 being used. Did we change 
correct variable/class? 

Next thing We will try using softCommit=true in update url and check if it 
gives us desired performance. 

Thanks for looking into this. Appreciate your help. 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, August 13, 2013 8:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr4 update and query performance question

1 That's hard-coded at present. There's anecdotal evidence that there
 are throughput improvements with larger batch sizes, but no action
 yet.
2 Yep, all searchers are also re-opened, caches re-warmed, etc.
3 Odd. I'm assuming your Solr3 was master/slave setup? Seeing the
queries would help diagnose this. Also, did you try to copy/paste
the configuration from your Solr3 to Solr4? I'd start with the
Solr4 and copy/paste only the parts needed from your SOlr3 setup.

Best
Erick


On Mon, Aug 12, 2013 at 11:38 AM, Joshi, Shital shital.jo...@gs.com wrote:

 Hi,

 We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes
 with about 450 mil documents (~90 mil per shard). We're loading 1000 or
 less documents in CSV format every few minutes. In Solr3, with 300 mil
 documents, it used to take 30 seconds to load 1000 documents while in
 Solr4, its taking up to 3 minutes to load 1000 documents. We're using
 custom sharding, we include _shard_=shardid parameter in update command.
 Upon looking Solr4 log files we found that:

 1.   Documents are added in a batch of 10 records. How do we increase
 this batch size from 10 to 1000 documents?

 2.  We do hard commit after loading 1000 documents. For every hard
 commit, it refreshes searcher on all nodes. Are all caches also refreshed
 when hard commit happens? We're planning to change to soft commit and do
 auto hard commit every 10-15 minutes.

 3.  We're not seeing improved query performance compared to Solr3.
 Queries which took 3-5 seconds in Solr3 (300 mil docs) are taking 20
 seconds with Solr4. We think this could be due to frequent hard commits and
 searcher refresh. Do you think when we change to soft commit and increase
 the batch size, we will see better query performance.

 Thanks!





Re: Solr4 update and query performance question

2013-08-13 Thread Erick Erickson
1 That's hard-coded at present. There's anecdotal evidence that there
 are throughput improvements with larger batch sizes, but no action
 yet.
2 Yep, all searchers are also re-opened, caches re-warmed, etc.
3 Odd. I'm assuming your Solr3 was master/slave setup? Seeing the
queries would help diagnose this. Also, did you try to copy/paste
the configuration from your Solr3 to Solr4? I'd start with the
Solr4 and copy/paste only the parts needed from your SOlr3 setup.

Best
Erick


On Mon, Aug 12, 2013 at 11:38 AM, Joshi, Shital shital.jo...@gs.com wrote:

 Hi,

 We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes
 with about 450 mil documents (~90 mil per shard). We're loading 1000 or
 less documents in CSV format every few minutes. In Solr3, with 300 mil
 documents, it used to take 30 seconds to load 1000 documents while in
 Solr4, its taking up to 3 minutes to load 1000 documents. We're using
 custom sharding, we include _shard_=shardid parameter in update command.
 Upon looking Solr4 log files we found that:

 1.   Documents are added in a batch of 10 records. How do we increase
 this batch size from 10 to 1000 documents?

 2.  We do hard commit after loading 1000 documents. For every hard
 commit, it refreshes searcher on all nodes. Are all caches also refreshed
 when hard commit happens? We're planning to change to soft commit and do
 auto hard commit every 10-15 minutes.

 3.  We're not seeing improved query performance compared to Solr3.
 Queries which took 3-5 seconds in Solr3 (300 mil docs) are taking 20
 seconds with Solr4. We think this could be due to frequent hard commits and
 searcher refresh. Do you think when we change to soft commit and increase
 the batch size, we will see better query performance.

 Thanks!