Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Mark Miller
Doing a standard commit after every document is a Solr anti-pattern.

commitWithin is a “near-realtime” commit in recent versions of Solr and not a 
standard commit.

https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching

- Mark

http://about.me/markrmiller

On Feb 12, 2014, at 9:52 AM, Pisarev, Vitaliy vitaliy.pisa...@hp.com wrote:

 I am running a very simple performance experiment where I post 2000 documents 
 to my application. Who in turn persists them to a relational DB and sends 
 them to Solr for indexing (Synchronously, in the same request).
 I am testing 3 use cases:
 
  1.  No indexing at all - ~45 sec to post 2000 documents
  2.  Indexing included - commit after each add. ~8 minutes (!) to post and 
 index 2000 documents
  3.  Indexing included - commitWithin 1ms ~55 seconds (!) to post and index 
 2000 documents
 The 3rd result does not make any sense, I would expect the behavior to be 
 similar to the one in point 2. At first I thought that the documents were not 
 really committed but I could actually see them being added by executing some 
 queries during the experiment (via the solr web UI).
 I am worried that I am missing something very big. The code I use for point 2:
 SolrInputDocument = // get doc
 SolrServer solrConnection = // get connection
 solrConnection.add(doc);
 solrConnection.commit();
 Whereas the code for point 3:
 SolrInputDocument = // get doc
 SolrServer solrConnection = // get connection
 solrConnection.add(doc, 1); // According to API documentation I understand 
 there is no need to explicitly call commit with this API
 Is it possible that committing after each add will degrade performance by a 
 factor of 40?
 



Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Joel Bernstein
Yes, committing after each document will greatly degrade performance. I
typically use autoCommit and autoSoftCommit to set the time interval
between commits, but commitWithin should have a similar effect.. I often
see performance of 2000+ docs per second on the load using auto commits.
When explicitly committing after each document, your commits will happen
too frequently, overworking the indexing process.

Joel Bernstein
Search Engineer at Heliosearch


On Wed, Feb 12, 2014 at 9:52 AM, Pisarev, Vitaliy vitaliy.pisa...@hp.comwrote:

 I am running a very simple performance experiment where I post 2000
 documents to my application. Who in turn persists them to a relational DB
 and sends them to Solr for indexing (Synchronously, in the same request).
 I am testing 3 use cases:

   1.  No indexing at all - ~45 sec to post 2000 documents
   2.  Indexing included - commit after each add. ~8 minutes (!) to post
 and index 2000 documents
   3.  Indexing included - commitWithin 1ms ~55 seconds (!) to post and
 index 2000 documents
 The 3rd result does not make any sense, I would expect the behavior to be
 similar to the one in point 2. At first I thought that the documents were
 not really committed but I could actually see them being added by executing
 some queries during the experiment (via the solr web UI).
 I am worried that I am missing something very big. The code I use for
 point 2:
 SolrInputDocument = // get doc
 SolrServer solrConnection = // get connection
 solrConnection.add(doc);
 solrConnection.commit();
 Whereas the code for point 3:
 SolrInputDocument = // get doc
 SolrServer solrConnection = // get connection
 solrConnection.add(doc, 1); // According to API documentation I understand
 there is no need to explicitly call commit with this API
 Is it possible that committing after each add will degrade performance by
 a factor of 40?




RE: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Pisarev, Vitaliy
I absolutely agree and I even read the NRT page before posting this question.

The thing that baffles me is this:

Doing a commit after each add kills the performance.
On the other hand, when I use commit within and specify an (absurd) 1ms delay,- 
I expect that this behavior will be equivalent to making a commit- from a 
functional perspective.

Seeing that there is no magic in the world, I am trying to understand what is 
the price I am actually paying when using the commitWithin feature, on the one 
hand it commits almost immediately, on the other hand, it performs wonderfully. 
Where is the catch?


-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: יום ד 12 פברואר 2014 17:00
To: solr-user
Subject: Re: Solr perfromance with commitWithin seesm too good to be true. I am 
afraid I am missing something

Doing a standard commit after every document is a Solr anti-pattern.

commitWithin is a “near-realtime” commit in recent versions of Solr and not a 
standard commit.

https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching

- Mark

http://about.me/markrmiller

On Feb 12, 2014, at 9:52 AM, Pisarev, Vitaliy vitaliy.pisa...@hp.com wrote:

 I am running a very simple performance experiment where I post 2000 documents 
 to my application. Who in turn persists them to a relational DB and sends 
 them to Solr for indexing (Synchronously, in the same request).
 I am testing 3 use cases:
 
  1.  No indexing at all - ~45 sec to post 2000 documents  2.  Indexing 
 included - commit after each add. ~8 minutes (!) to post and index 
 2000 documents  3.  Indexing included - commitWithin 1ms ~55 seconds 
 (!) to post and index 2000 documents The 3rd result does not make any sense, 
 I would expect the behavior to be similar to the one in point 2. At first I 
 thought that the documents were not really committed but I could actually see 
 them being added by executing some queries during the experiment (via the 
 solr web UI).
 I am worried that I am missing something very big. The code I use for point 2:
 SolrInputDocument = // get doc
 SolrServer solrConnection = // get connection solrConnection.add(doc); 
 solrConnection.commit(); Whereas the code for point 3:
 SolrInputDocument = // get doc
 SolrServer solrConnection = // get connection solrConnection.add(doc, 
 1); // According to API documentation I understand there is no need to 
 explicitly call commit with this API Is it possible that committing after 
 each add will degrade performance by a factor of 40?
 



Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Dmitry Kan
Cross-posting my answer from SO:

According to this wiki:

https://wiki.apache.org/solr/NearRealtimeSearch

the commitWithin is a soft-commit by default. Soft-commits are very
efficient in terms of making the added documents immediately searchable.
But! They are not on the disk yet. That means the documents are being
committed into RAM. In this setup you would use updateLog to be solr
instance crash tolerant.

What you do in point 2 is hard-commit, i.e. flush the added documents to
disk. Doing this after each document add is very expensive. So instead,
post a bunch of documents and issue a hard commit or even have you
autoCommit set to some reasonable value, like 10 min or 1 hour (depends on
your user expectations).



On Wed, Feb 12, 2014 at 5:28 PM, Pisarev, Vitaliy vitaliy.pisa...@hp.comwrote:

 I absolutely agree and I even read the NRT page before posting this
 question.

 The thing that baffles me is this:

 Doing a commit after each add kills the performance.
 On the other hand, when I use commit within and specify an (absurd) 1ms
 delay,- I expect that this behavior will be equivalent to making a commit-
 from a functional perspective.

 Seeing that there is no magic in the world, I am trying to understand what
 is the price I am actually paying when using the commitWithin feature, on
 the one hand it commits almost immediately, on the other hand, it performs
 wonderfully. Where is the catch?


 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: יום ד 12 פברואר 2014 17:00
 To: solr-user
 Subject: Re: Solr perfromance with commitWithin seesm too good to be true.
 I am afraid I am missing something

 Doing a standard commit after every document is a Solr anti-pattern.

 commitWithin is a “near-realtime” commit in recent versions of Solr and
 not a standard commit.

 https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching

 - Mark

 http://about.me/markrmiller

 On Feb 12, 2014, at 9:52 AM, Pisarev, Vitaliy vitaliy.pisa...@hp.com
 wrote:

  I am running a very simple performance experiment where I post 2000
 documents to my application. Who in turn persists them to a relational DB
 and sends them to Solr for indexing (Synchronously, in the same request).
  I am testing 3 use cases:
 
   1.  No indexing at all - ~45 sec to post 2000 documents  2.  Indexing
  included - commit after each add. ~8 minutes (!) to post and index
  2000 documents  3.  Indexing included - commitWithin 1ms ~55 seconds
  (!) to post and index 2000 documents The 3rd result does not make any
 sense, I would expect the behavior to be similar to the one in point 2. At
 first I thought that the documents were not really committed but I could
 actually see them being added by executing some queries during the
 experiment (via the solr web UI).
  I am worried that I am missing something very big. The code I use for
 point 2:
  SolrInputDocument = // get doc
  SolrServer solrConnection = // get connection solrConnection.add(doc);
  solrConnection.commit(); Whereas the code for point 3:
  SolrInputDocument = // get doc
  SolrServer solrConnection = // get connection solrConnection.add(doc,
  1); // According to API documentation I understand there is no need to
  explicitly call commit with this API Is it possible that committing
 after each add will degrade performance by a factor of 40?
 




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Erick Erickson
Here's some additional background that may shed light on the
performance..

http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick


On Wed, Feb 12, 2014 at 7:40 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Cross-posting my answer from SO:

 According to this wiki:

 https://wiki.apache.org/solr/NearRealtimeSearch

 the commitWithin is a soft-commit by default. Soft-commits are very
 efficient in terms of making the added documents immediately searchable.
 But! They are not on the disk yet. That means the documents are being
 committed into RAM. In this setup you would use updateLog to be solr
 instance crash tolerant.

 What you do in point 2 is hard-commit, i.e. flush the added documents to
 disk. Doing this after each document add is very expensive. So instead,
 post a bunch of documents and issue a hard commit or even have you
 autoCommit set to some reasonable value, like 10 min or 1 hour (depends on
 your user expectations).



 On Wed, Feb 12, 2014 at 5:28 PM, Pisarev, Vitaliy vitaliy.pisa...@hp.com
 wrote:

  I absolutely agree and I even read the NRT page before posting this
  question.
 
  The thing that baffles me is this:
 
  Doing a commit after each add kills the performance.
  On the other hand, when I use commit within and specify an (absurd) 1ms
  delay,- I expect that this behavior will be equivalent to making a
 commit-
  from a functional perspective.
 
  Seeing that there is no magic in the world, I am trying to understand
 what
  is the price I am actually paying when using the commitWithin feature, on
  the one hand it commits almost immediately, on the other hand, it
 performs
  wonderfully. Where is the catch?
 
 
  -Original Message-
  From: Mark Miller [mailto:markrmil...@gmail.com]
  Sent: יום ד 12 פברואר 2014 17:00
  To: solr-user
  Subject: Re: Solr perfromance with commitWithin seesm too good to be
 true.
  I am afraid I am missing something
 
  Doing a standard commit after every document is a Solr anti-pattern.
 
  commitWithin is a “near-realtime” commit in recent versions of Solr and
  not a standard commit.
 
 
 https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
 
  - Mark
 
  http://about.me/markrmiller
 
  On Feb 12, 2014, at 9:52 AM, Pisarev, Vitaliy vitaliy.pisa...@hp.com
  wrote:
 
   I am running a very simple performance experiment where I post 2000
  documents to my application. Who in turn persists them to a relational DB
  and sends them to Solr for indexing (Synchronously, in the same request).
   I am testing 3 use cases:
  
1.  No indexing at all - ~45 sec to post 2000 documents  2.  Indexing
   included - commit after each add. ~8 minutes (!) to post and index
   2000 documents  3.  Indexing included - commitWithin 1ms ~55 seconds
   (!) to post and index 2000 documents The 3rd result does not make any
  sense, I would expect the behavior to be similar to the one in point 2.
 At
  first I thought that the documents were not really committed but I could
  actually see them being added by executing some queries during the
  experiment (via the solr web UI).
   I am worried that I am missing something very big. The code I use for
  point 2:
   SolrInputDocument = // get doc
   SolrServer solrConnection = // get connection solrConnection.add(doc);
   solrConnection.commit(); Whereas the code for point 3:
   SolrInputDocument = // get doc
   SolrServer solrConnection = // get connection solrConnection.add(doc,
   1); // According to API documentation I understand there is no need to
   explicitly call commit with this API Is it possible that committing
  after each add will degrade performance by a factor of 40?
  
 
 


 --
 Dmitry
 Blog: http://dmitrykan.blogspot.com
 Twitter: twitter.com/dmitrykan



Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Jack Krupansky
The explicit commit will cause your app to be delayed until that commit 
completes, and then Solr would be idle until that request completion makes 
its way back to your app and you submit another request which finds its way 
to Solr, maybe a few ms. That includes network latency. That interval of 
time could well be more than enough for the short-interval autoCommit or 
commitWithin to run in the background and in parallel with the request 
return to your app and the submission by your app of the subsequent request.


The magic of asynchronous operation in a parallel and distributed computing 
environment, coupled with multi-core processors and parallel threads.


-- Jack Krupansky

-Original Message- 
From: Pisarev, Vitaliy

Sent: Wednesday, February 12, 2014 10:28 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr perfromance with commitWithin seesm too good to be true. I 
am afraid I am missing something


I absolutely agree and I even read the NRT page before posting this 
question.


The thing that baffles me is this:

Doing a commit after each add kills the performance.
On the other hand, when I use commit within and specify an (absurd) 1ms 
delay,- I expect that this behavior will be equivalent to making a commit- 
from a functional perspective.


Seeing that there is no magic in the world, I am trying to understand what 
is the price I am actually paying when using the commitWithin feature, on 
the one hand it commits almost immediately, on the other hand, it performs 
wonderfully. Where is the catch?



-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: יום ד 12 פברואר 2014 17:00
To: solr-user
Subject: Re: Solr perfromance with commitWithin seesm too good to be true. I 
am afraid I am missing something


Doing a standard commit after every document is a Solr anti-pattern.

commitWithin is a “near-realtime” commit in recent versions of Solr and not 
a standard commit.


https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching

- Mark

http://about.me/markrmiller

On Feb 12, 2014, at 9:52 AM, Pisarev, Vitaliy vitaliy.pisa...@hp.com 
wrote:


I am running a very simple performance experiment where I post 2000 
documents to my application. Who in turn persists them to a relational DB 
and sends them to Solr for indexing (Synchronously, in the same request).

I am testing 3 use cases:

 1.  No indexing at all - ~45 sec to post 2000 documents  2.  Indexing
included - commit after each add. ~8 minutes (!) to post and index
2000 documents  3.  Indexing included - commitWithin 1ms ~55 seconds
(!) to post and index 2000 documents The 3rd result does not make any 
sense, I would expect the behavior to be similar to the one in point 2. At 
first I thought that the documents were not really committed but I could 
actually see them being added by executing some queries during the 
experiment (via the solr web UI).
I am worried that I am missing something very big. The code I use for 
point 2:

SolrInputDocument = // get doc
SolrServer solrConnection = // get connection solrConnection.add(doc);
solrConnection.commit(); Whereas the code for point 3:
SolrInputDocument = // get doc
SolrServer solrConnection = // get connection solrConnection.add(doc,
1); // According to API documentation I understand there is no need to
explicitly call commit with this API Is it possible that committing after 
each add will degrade performance by a factor of 40?