Re: No server response code on insert: how do I avoid this at high speed?

2008-09-18 Thread Paleo Tek

Otis Gospodnetic wrote:

Perhaps the container logs explain what happened?
How about just throttling to the point where the failure rate is 0%?  
Too slow?
  


Otis's questions regarding dropped inserts sent me back to the drawing 
board.  The system had been tuned to a slower database to optimize speed 
and accept a few drops.  When I migrated to a faster DB I didn't 
retune.  Here are results of testing indexing performance for Tomcat and 
Jetty.   The DB speedup apparently moved the bottleneck from getting 
records from the database (around 400 rps) to cramming records into the 
servlet container.



System:   16 processor, 2.5 ghz, 64G memory
Index:  33 Gig, freshly optimized, avg recordsize 1.4k
Insert load: 250,000 records

I calculate records/sec by dividing the number of successful inserts by 
the time.  The adjusted time is the estimated time it would take to 
insert the full 250,000 records with no errors, which is raw time plus 
the additional time required to insert those dropped records, ie, raw 
time * (1 + error-rate * 0.01).  Judging from processor/memory/io 
utilization, it appears the write speed of a single java thread is 
dominating the solr indexing speed.  Which makes sense.



Takehome lessons:

  The speed limit is about 450 records per second in our environment.

  Three or four threads posting inserts max out speed.  More threads 
don't help.


  Jetty is significantly faster than Tomcat at sane thread counts in 
our environment


I hope this is useful. 


  -Jim

PS:  If you have formatting issues with this table, try viewing with a 
fixed width font
 

 
Tomcat
Jetty
  
_



# threads Raw time   # Drops% Error Records/sec  
Adj. time  Raw time  # Drops   % Error   Records/sec   Adj. time
16533 171316.85  
436.9 569.51 594 24222 9.69  380.1 651.55
15520 168786.75  448.31  
 555.1  518 28581 11.43 427.45577.22
14547 163786.55  
427.1 582.83 496 30047 12.02 443.45555.61
13540 166386.65  
432.15575.91 495 27076 10.83 450.35548.61
12545 159206.36  
429.5 579.66 494 28785 11.51 447.8 550.88
11523 161926.47  447.05  
 556.84 484 26495 10.6  461.79535.29
10540 156436.26  433.99  
 573.8  497 27190 10.88 448.31551.05
9553 155436.21  
423.97587.34 494 25862 10.34 453.72545.1
8541 140955.64  
436.05571.51 501 23482  9.39 452.13548.06
7549 107354.29  435.82  
 572.55 499 24657  9.86 451.59548.22
6566  94683.79  
424.97587.45 502 23074  9.23 452.04548.33
5588  77543.10  
411.98606.23 527 20779  8.31 434.95570.8
4577  42011.68  
425.99586.69 513 16608  6.64 454.96547.08
3613 0   0  
407.83613537  9503  3.8  447.85557.41
2801 0   0  
312.11801633 00  394.94633
1   1365 0   0  
183.15   1365   1122 00  222.82   1122





Re: No server response code on insert: how do I avoid this at high speed?

2008-09-15 Thread Paleo Tek
Good questions. 


Otis Gospodnetic wrote:

Perhaps the container logs explain what happened

1)  I can't find anything intersterting in the container logs.  To the 
best of my knowledge, neither of the containers notice the drop.  Jetty 
d show out of threads type errors before I tweaking the thread 
parameters.  Once it was tuned a bit, I stopped seeing these entries in 
the log, but did not stop getting the errors.



How about just throttling to the point where the failure rate is 0%?  Too slow?



2) Throttling to 0 errors really slows things down.  The last time I ran 
stats, performance scaled almost linearly with additional threads until 
we reached the approximate number of CPUs in the system.  Anything above 
two threads shows progressively more error if I don't apply any 
throttling.  The churn I need to keep up with makes that undesirable.


I'll put together some stats on insert rates, number of threads, and 
error rates and post them here.  It's a classic trade off: tolerating 
poor results that require additional processing in exchange for higher 
performance.  A set of heuristics for this situation might be useful, 
since I'm likely not the only one with an indexing bottleneck.


 -Jim

Otis Gospodnetic wrote:

Perhaps the container logs explain what happened?
How about just throttling to the point where the failure rate is 0%?  Too slow?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
  

From: Paleo Tek [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, September 12, 2008 11:19:52 AM
Subject: No server response code on insert:  how do I avoid this at high speed?

I have a largish index with a lot of churn, and inserts that come in 
large bursts.  My server is a multiprocessor with plenty of memory, so I 
can multi-thread and stuff in about 1.6 million records per hour, going 
full speed.  I use a dozen or so threads to post curl inserts, and 
monitor the responses.


Using jetty, there is ~10% failure rate with no server response code 
received.  Switching to tomcat reduces the error rate to around 2%.  
(which makes me like tomcat  a lot, even though I'm a dog person...).  I 
suspect I'm overrunning the capacity of the servlet container.  Tweaking 
parameters in Jetty improved performance, and I can tune Tomcat.  But 
then I'll just be overrunning a tuned system, at a slightly faster rate.


My work around is to keep track of which inserts fail, but I suspect 
there's a better approach.  Any suggestions how I can balance maximum 
insert speed with a low error rate?  Thanks!


  -Jim




  




Re: No server response code on insert: how do I avoid this at high speed?

2008-09-15 Thread Yonik Seeley
On Mon, Sep 15, 2008 at 2:17 PM, Paleo Tek [EMAIL PROTECTED] wrote:
 1)  I can't find anything intersterting in the container logs.

Is the client timing out the connection?
If Solr were encountering errors, they would be logged.

-Yonik


No server response code on insert: how do I avoid this at high speed?

2008-09-12 Thread Paleo Tek
I have a largish index with a lot of churn, and inserts that come in 
large bursts.  My server is a multiprocessor with plenty of memory, so I 
can multi-thread and stuff in about 1.6 million records per hour, going 
full speed.  I use a dozen or so threads to post curl inserts, and 
monitor the responses.


Using jetty, there is ~10% failure rate with no server response code 
received.  Switching to tomcat reduces the error rate to around 2%.  
(which makes me like tomcat  a lot, even though I'm a dog person...).  I 
suspect I'm overrunning the capacity of the servlet container.  Tweaking 
parameters in Jetty improved performance, and I can tune Tomcat.  But 
then I'll just be overrunning a tuned system, at a slightly faster rate.


My work around is to keep track of which inserts fail, but I suspect 
there's a better approach.  Any suggestions how I can balance maximum 
insert speed with a low error rate?  Thanks!


 -Jim


Re: No server response code on insert: how do I avoid this at high speed?

2008-09-12 Thread Otis Gospodnetic
Perhaps the container logs explain what happened?
How about just throttling to the point where the failure rate is 0%?  Too slow?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Paleo Tek [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Friday, September 12, 2008 11:19:52 AM
 Subject: No server response code on insert:  how do I avoid this at high 
 speed?
 
 I have a largish index with a lot of churn, and inserts that come in 
 large bursts.  My server is a multiprocessor with plenty of memory, so I 
 can multi-thread and stuff in about 1.6 million records per hour, going 
 full speed.  I use a dozen or so threads to post curl inserts, and 
 monitor the responses.
 
 Using jetty, there is ~10% failure rate with no server response code 
 received.  Switching to tomcat reduces the error rate to around 2%.  
 (which makes me like tomcat  a lot, even though I'm a dog person...).  I 
 suspect I'm overrunning the capacity of the servlet container.  Tweaking 
 parameters in Jetty improved performance, and I can tune Tomcat.  But 
 then I'll just be overrunning a tuned system, at a slightly faster rate.
 
 My work around is to keep track of which inserts fail, but I suspect 
 there's a better approach.  Any suggestions how I can balance maximum 
 insert speed with a low error rate?  Thanks!
 
   -Jim



Re: No server response code on insert: how do I avoid this at high speed?

2008-09-12 Thread Yonik Seeley
On Fri, Sep 12, 2008 at 11:19 AM, Paleo Tek [EMAIL PROTECTED] wrote:
 Using jetty, there is ~10% failure rate with no server response code
 received.

What happened then?  Did the network connection just drop, or did the
server or client time it out?  How can you tell it failed?

-Yonik