[Manual Sharding] Solr distrib search cause thread exhaustion

2016-01-04 Thread Alessandro Benedetti
Hi guys,
this is the scenario we are studying :

Solr 4.10.2
16 shards, a solr instance aggregating the results running a distrib query
with shards=. ( all the shards) .

Currently we are not using shards.tolerant=true, so we throw an exception
on error.

We are in a situation when a shard is too slow to respond ( empty filter
cache, big load).
According to the timeout that the shard handler is expecting that shard is
not fast enough, and for this reason we whole request fails.

So far, everything is clear.
We need to improve the speed of the shards, managing properly the auto
warming , load balancing etc .
We can play with the tolerant factor, and possibly be tolerant of errors.

But what happens is that the solr aggregator which runs the queries against
the shards is exhausting his threads...
Looking into the code, in the case we are not tolerant we get this :

// Was there an exception?
> if (srsp.getException() != null) {
>   // If things are not tolerant, abort everything and rethrow
>   if(!tolerant) {
>* shardHandler1.cancelAll();*
> if (srsp.getException() instanceof SolrException) {
>   throw (SolrException)srsp.getException();
> } else {
>   throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
> srsp.getException());
> }


I would assume that is the responsible of the thread cleaning.
Any idea why the thread cleaning should not happen properly?
Can be some jetty misconfiguration ?

Cheers
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: [Manual Sharding] Solr distrib search cause thread exhaustion

2016-01-04 Thread Alessandro Benedetti
Yes Erick, our jetty is configured with a 10.000 threads.

Actually the puzzle got more complicated as we realised the connTimeout by
default is set to 0.
But we definetely get an error from one of the shards and the aggregator
throw the exception because not tolerant.

The weird thing is that the shard presents an error which is a typical clue
of a client closing the http connection.

*Jan 03 16:55:55 solr-a00.bug.example.com 
java[10661]: 37661057 [qtp15642-279052] ERROR
org.apache.solr.servlet.SolrDispatchFilter  –
null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://solr10.bug *
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
...
*Jan 03 16:55:55 solr-a00.bug.example.com 
java[10661]: Caused by: org.apache.solr.client.solrj.SolrServerException:
IOException occured when talking to server at: http://solr10.bug
*
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:157)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
java.util.concurrent.FutureTask.run(FutureTask.java:262)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
java.util.concurrent.FutureTask.run(FutureTask.java:262)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: ... 1 more
*Jan 03 16:55:55 solr-a00.bug.example.com 
java[10661]: Caused by: java.net.SocketException: Connection reset*
...

*Shard Log*

*Jan 03 16:55:10 solr.bug.example.com 
java[21200]: 1214068 [qtp1018590076-595] ERROR
org.apache.solr.servlet.SolrDispatchFilter  –
null:org.eclipse.jetty.io.EofException*
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:214)
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:207)
...
*Jan 03 16:55:10 solr.bug.example.com 
java[21200]: 1214073 [qtp1018590076-595] ERROR
org.apache.solr.servlet.SolrDispatchFilter  –
null:org.eclipse.jetty.io.EofException*
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:214)
...
*Jan 03 16:55:10 solr.bug.example.com 
java[21200]: 1214074 [qtp1018590076-595] WARN
 org.eclipse.jetty.server.Response  – Committed before 500
{trace=org.eclipse.jetty.io.EofException*
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
Jan 03 16:55:10 

Re: [Manual Sharding] Solr distrib search cause thread exhaustion

2016-01-04 Thread Erick Erickson
How many threads are you allocating for the servlet container? 10,000
is the "usual" number.

Best,
Erick

On Mon, Jan 4, 2016 at 5:21 AM, Alessandro Benedetti
 wrote:
> Hi guys,
> this is the scenario we are studying :
>
> Solr 4.10.2
> 16 shards, a solr instance aggregating the results running a distrib query
> with shards=. ( all the shards) .
>
> Currently we are not using shards.tolerant=true, so we throw an exception
> on error.
>
> We are in a situation when a shard is too slow to respond ( empty filter
> cache, big load).
> According to the timeout that the shard handler is expecting that shard is
> not fast enough, and for this reason we whole request fails.
>
> So far, everything is clear.
> We need to improve the speed of the shards, managing properly the auto
> warming , load balancing etc .
> We can play with the tolerant factor, and possibly be tolerant of errors.
>
> But what happens is that the solr aggregator which runs the queries against
> the shards is exhausting his threads...
> Looking into the code, in the case we are not tolerant we get this :
>
> // Was there an exception?
>> if (srsp.getException() != null) {
>>   // If things are not tolerant, abort everything and rethrow
>>   if(!tolerant) {
>>* shardHandler1.cancelAll();*
>> if (srsp.getException() instanceof SolrException) {
>>   throw (SolrException)srsp.getException();
>> } else {
>>   throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
>> srsp.getException());
>> }
>
>
> I would assume that is the responsible of the thread cleaning.
> Any idea why the thread cleaning should not happen properly?
> Can be some jetty misconfiguration ?
>
> Cheers
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England