Re: SolrCloud 4.x hangs under high update volume
) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1083) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:379) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1017) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:445) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:260) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:225) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:596) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:527) at java.lang.Thread.run(Thread.java:724) On your live_nodes question, I don't have historical data on this from when the crash occurred, which I guess is what you're looking for. I could add this to our monitoring for future tests, however. I'd be glad to continue further testing, but I think first more monitoring is needed to understand this further. Could we come up with a list of metrics that would be useful to see following another test and successful crash? Metrics needed: 1) # of live_nodes. 2) Full stack traces. 3) CPU used by Solr's JVM specifically (instead of system-wide). 4) Solr's JVM thread count (already done) 5) ? Cheers, Tim Vaillancourt On 6 September 2013 13:11, Mark Millermarkrmil...@gmail.com wrote: Did you ever get to index that long before without hitting the deadlock? There really isn't anything negative the patch could be introducing, other than allowing for some more threads to possibly run at once. If I had to guess, I would say its likely this patch fixes the deadlock issue and your seeing another issue - which looks like the system cannot keep up with the requests or something for some reason - perhaps due to some OS networking settings or something (more guessing). Connection refused happens generally when there is nothing listening on the port. Do you see anything interesting change with the rest of the system? CPU usage spikes or something like that? Clamping down further on the overall number of threads night help (which would require making something configurable). How many nodes are listed in zk under live_nodes? Mark Sent from my iPhone On Sep 6, 2013, at 12:02 PM, Tim Vaillancourtt...@elementspace.com wrote: Hey guys, (copy of my post to SOLR-5216) We tested this patch and unfortunately encountered some serious issues a few hours of 500 update-batches/sec. Our update batch is 10 docs, so we are writing about 5000 docs/sec total, using autoCommit to commit the updates (no explicit commits). Our environment: Solr 4.3.1 w/SOLR-5216 patch. Jetty 9, Java 1.7. 3 solr instances, 1 per physical server. 1 collection. 3 shards. 2 replicas (each instance is a leader and a replica). Soft autoCommit is 1000ms. Hard autoCommit is 15000ms. After about 6 hours of stress-testing this patch, we see many of these stalled transactions (below), and the Solr instances start to see each other as down, flooding our Solr logs with Connection Refused exceptions, and otherwise no obviously-useful logs that I could see. I did notice some stalled transactions on both /select and /update, however. This never occurred without this patch. Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9 Lastly, I have a summary of the ERROR-severity logs from this 24-hour soak. My script normalizes the ERROR-severity stack traces and returns them in order of occurrence. Summary of my solr.log: http://pastebin.com/pBdMAWeb Thanks! Tim Vaillancourt On 6 September 2013 07:27, Markus Jelsma markus.jel...@openindex.io wrote: Thanks! -Original message- From:Erick Ericksonerickerick...@gmail.com Sent: Friday 6th September 2013 16:20 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume Markus: See: https://issues.apache.org/jira/browse/SOLR-5216 On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Mark, Got an issue to watch? Thanks, Markus
Re: SolrCloud 4.x hangs under high update volume
of these stalled transactions (below), and the Solr instances start to see each other as down, flooding our Solr logs with Connection Refused exceptions, and otherwise no obviously-useful logs that I could see. I did notice some stalled transactions on both /select and /update, however. This never occurred without this patch. Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9 Lastly, I have a summary of the ERROR-severity logs from this 24-hour soak. My script normalizes the ERROR-severity stack traces and returns them in order of occurrence. Summary of my solr.log: http://pastebin.com/pBdMAWeb Thanks! Tim Vaillancourt On 6 September 2013 07:27, Markus Jelsma markus.jel...@openindex.io wrote: Thanks! -Original message- From:Erick Ericksonerickerickson@gmail.**comerickerick...@gmail.com Sent: Friday 6th September 2013 16:20 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume Markus: See: https://issues.apache.org/**jira/browse/SOLR-5216https://issues.apache.org/jira/browse/SOLR-5216 On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma markus.jel...@openindex.io**wrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Millermarkrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for0x0007216e68d8 (a java.util.concurrent.**Semaphore$NonfairSync) at java.util.concurrent.locks.**LockSupport.park(LockSupport.** java:186) at java.util.concurrent.locks.**AbstractQueuedSynchronizer.** parkAndCheckInterrupt(**AbstractQueuedSynchronizer.**java:834) at java.util.concurrent.locks.**AbstractQueuedSynchronizer.** doAcquireSharedInterruptibly(**AbstractQueuedSynchronizer.**java:994) at java.util.concurrent.locks.**AbstractQueuedSynchronizer.** acquireSharedInterruptibly(**AbstractQueuedSynchronizer.**java:1303) at java.util.concurrent.**Semaphore.acquire(Semaphore.**java:317) at org.apache.solr.util.**AdjustableSemaphore.acquire(** AdjustableSemaphore.java:61) at org.apache.solr.update.**SolrCmdDistributor.submit(** SolrCmdDistributor.java:418) at org.apache.solr.update.**SolrCmdDistributor.submit(** SolrCmdDistributor.java:368
Re: SolrCloud 4.x hangs under high update volume
or something like that? Clamping down further on the overall number of threads night help (which would require making something configurable). How many nodes are listed in zk under live_nodes? Mark Sent from my iPhone On Sep 6, 2013, at 12:02 PM, Tim Vaillancourttim@elementspace. **comt...@elementspace.com wrote: Hey guys, (copy of my post to SOLR-5216) We tested this patch and unfortunately encountered some serious issues a few hours of 500 update-batches/sec. Our update batch is 10 docs, so we are writing about 5000 docs/sec total, using autoCommit to commit the updates (no explicit commits). Our environment: Solr 4.3.1 w/SOLR-5216 patch. Jetty 9, Java 1.7. 3 solr instances, 1 per physical server. 1 collection. 3 shards. 2 replicas (each instance is a leader and a replica). Soft autoCommit is 1000ms. Hard autoCommit is 15000ms. After about 6 hours of stress-testing this patch, we see many of these stalled transactions (below), and the Solr instances start to see each other as down, flooding our Solr logs with Connection Refused exceptions, and otherwise no obviously-useful logs that I could see. I did notice some stalled transactions on both /select and /update, however. This never occurred without this patch. Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9 Lastly, I have a summary of the ERROR-severity logs from this 24-hour soak. My script normalizes the ERROR-severity stack traces and returns them in order of occurrence. Summary of my solr.log: http://pastebin.com/pBdMAWeb Thanks! Tim Vaillancourt On 6 September 2013 07:27, Markus Jelsma markus.jel...@openindex.io wrote: Thanks! -Original message- From:Erick Ericksonerickerickson@gmail.**com erickerick...@gmail.com Sent: Friday 6th September 2013 16:20 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume Markus: See: https://issues.apache.org/**jira/browse/SOLR-5216 https://issues.apache.org/jira/browse/SOLR-5216 On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma markus.jel...@openindex.io**wrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Millermarkrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr
Re: SolrCloud 4.x hangs under high update volume
2013 13:11, Mark Millermarkrmil...@gmail.com wrote: Did you ever get to index that long before without hitting the deadlock? There really isn't anything negative the patch could be introducing, other than allowing for some more threads to possibly run at once. If I had to guess, I would say its likely this patch fixes the deadlock issue and your seeing another issue - which looks like the system cannot keep up with the requests or something for some reason - perhaps due to some OS networking settings or something (more guessing). Connection refused happens generally when there is nothing listening on the port. Do you see anything interesting change with the rest of the system? CPU usage spikes or something like that? Clamping down further on the overall number of threads night help (which would require making something configurable). How many nodes are listed in zk under live_nodes? Mark Sent from my iPhone On Sep 6, 2013, at 12:02 PM, Tim Vaillancourttim@elementspace. **comt...@elementspace.com wrote: Hey guys, (copy of my post to SOLR-5216) We tested this patch and unfortunately encountered some serious issues a few hours of 500 update-batches/sec. Our update batch is 10 docs, so we are writing about 5000 docs/sec total, using autoCommit to commit the updates (no explicit commits). Our environment: Solr 4.3.1 w/SOLR-5216 patch. Jetty 9, Java 1.7. 3 solr instances, 1 per physical server. 1 collection. 3 shards. 2 replicas (each instance is a leader and a replica). Soft autoCommit is 1000ms. Hard autoCommit is 15000ms. After about 6 hours of stress-testing this patch, we see many of these stalled transactions (below), and the Solr instances start to see each other as down, flooding our Solr logs with Connection Refused exceptions, and otherwise no obviously-useful logs that I could see. I did notice some stalled transactions on both /select and /update, however. This never occurred without this patch. Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9 Lastly, I have a summary of the ERROR-severity logs from this 24-hour soak. My script normalizes the ERROR-severity stack traces and returns them in order of occurrence. Summary of my solr.log: http://pastebin.com/pBdMAWeb Thanks! Tim Vaillancourt On 6 September 2013 07:27, Markus Jelsma markus.jel...@openindex.io wrote: Thanks! -Original message- From:Erick Ericksonerickerickson@gmail.**com erickerick...@gmail.com Sent: Friday 6th September 2013 16:20 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume Markus: See: https://issues.apache.org/**jira/browse/SOLR-5216 https://issues.apache.org/jira/browse/SOLR-5216 On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma markus.jel...@openindex.io**wrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Millermarkrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current
Re: SolrCloud 4.x hangs under high update volume
crash? Metrics needed: 1) # of live_nodes. 2) Full stack traces. 3) CPU used by Solr's JVM specifically (instead of system-wide). 4) Solr's JVM thread count (already done) 5) ? Cheers, Tim Vaillancourt On 6 September 2013 13:11, Mark Millermarkrmil...@gmail.com wrote: Did you ever get to index that long before without hitting the deadlock? There really isn't anything negative the patch could be introducing, other than allowing for some more threads to possibly run at once. If I had to guess, I would say its likely this patch fixes the deadlock issue and your seeing another issue - which looks like the system cannot keep up with the requests or something for some reason - perhaps due to some OS networking settings or something (more guessing). Connection refused happens generally when there is nothing listening on the port. Do you see anything interesting change with the rest of the system? CPU usage spikes or something like that? Clamping down further on the overall number of threads night help (which would require making something configurable). How many nodes are listed in zk under live_nodes? Mark Sent from my iPhone On Sep 6, 2013, at 12:02 PM, Tim Vaillancourttim@elementspace. **comt...@elementspace.com wrote: Hey guys, (copy of my post to SOLR-5216) We tested this patch and unfortunately encountered some serious issues a few hours of 500 update-batches/sec. Our update batch is 10 docs, so we are writing about 5000 docs/sec total, using autoCommit to commit the updates (no explicit commits). Our environment: Solr 4.3.1 w/SOLR-5216 patch. Jetty 9, Java 1.7. 3 solr instances, 1 per physical server. 1 collection. 3 shards. 2 replicas (each instance is a leader and a replica). Soft autoCommit is 1000ms. Hard autoCommit is 15000ms. After about 6 hours of stress-testing this patch, we see many of these stalled transactions (below), and the Solr instances start to see each other as down, flooding our Solr logs with Connection Refused exceptions, and otherwise no obviously-useful logs that I could see. I did notice some stalled transactions on both /select and /update, however. This never occurred without this patch. Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9 Lastly, I have a summary of the ERROR-severity logs from this 24-hour soak. My script normalizes the ERROR-severity stack traces and returns them in order of occurrence. Summary of my solr.log: http://pastebin.com/pBdMAWeb Thanks! Tim Vaillancourt On 6 September 2013 07:27, Markus Jelsma markus.jel...@openindex.io wrote: Thanks! -Original message- From:Erick Ericksonerickerickson@gmail.**com erickerick...@gmail.com Sent: Friday 6th September 2013 16:20 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume Markus: See: https://issues.apache.org/**jira/browse/SOLR-5216 https://issues.apache.org/jira/browse/SOLR-5216 On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma markus.jel...@openindex.io**wrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Millermarkrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all
Re: SolrCloud 4.x hangs under high update volume
, however. I'd be glad to continue further testing, but I think first more monitoring is needed to understand this further. Could we come up with a list of metrics that would be useful to see following another test and successful crash? Metrics needed: 1) # of live_nodes. 2) Full stack traces. 3) CPU used by Solr's JVM specifically (instead of system-wide). 4) Solr's JVM thread count (already done) 5) ? Cheers, Tim Vaillancourt On 6 September 2013 13:11, Mark Millermarkrmil...@gmail.com wrote: Did you ever get to index that long before without hitting the deadlock? There really isn't anything negative the patch could be introducing, other than allowing for some more threads to possibly run at once. If I had to guess, I would say its likely this patch fixes the deadlock issue and your seeing another issue - which looks like the system cannot keep up with the requests or something for some reason - perhaps due to some OS networking settings or something (more guessing). Connection refused happens generally when there is nothing listening on the port. Do you see anything interesting change with the rest of the system? CPU usage spikes or something like that? Clamping down further on the overall number of threads night help (which would require making something configurable). How many nodes are listed in zk under live_nodes? Mark Sent from my iPhone On Sep 6, 2013, at 12:02 PM, Tim Vaillancourttim@elementspace. **comt...@elementspace.com wrote: Hey guys, (copy of my post to SOLR-5216) We tested this patch and unfortunately encountered some serious issues a few hours of 500 update-batches/sec. Our update batch is 10 docs, so we are writing about 5000 docs/sec total, using autoCommit to commit the updates (no explicit commits). Our environment: Solr 4.3.1 w/SOLR-5216 patch. Jetty 9, Java 1.7. 3 solr instances, 1 per physical server. 1 collection. 3 shards. 2 replicas (each instance is a leader and a replica). Soft autoCommit is 1000ms. Hard autoCommit is 15000ms. After about 6 hours of stress-testing this patch, we see many of these stalled transactions (below), and the Solr instances start to see each other as down, flooding our Solr logs with Connection Refused exceptions, and otherwise no obviously-useful logs that I could see. I did notice some stalled transactions on both /select and /update, however. This never occurred without this patch. Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9 Lastly, I have a summary of the ERROR-severity logs from this 24-hour soak. My script normalizes the ERROR-severity stack traces and returns them in order of occurrence. Summary of my solr.log: http://pastebin.com/pBdMAWeb Thanks! Tim Vaillancourt On 6 September 2013 07:27, Markus Jelsma markus.jel...@openindex.io wrote: Thanks! -Original message- From:Erick Ericksonerickerickson@gmail.**com erickerick...@gmail.com Sent: Friday 6th September 2013 16:20 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume Markus: See: https://issues.apache.org/**jira/browse/SOLR-5216 https://issues.apache.org/jira/browse/SOLR-5216 On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma markus.jel...@openindex.io**wrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Millermarkrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads
Re: SolrCloud 4.x hangs under high update volume
) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1017) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:445) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:260) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:225) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:596) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:527) at java.lang.Thread.run(Thread.java:724) On your live_nodes question, I don't have historical data on this from when the crash occurred, which I guess is what you're looking for. I could add this to our monitoring for future tests, however. I'd be glad to continue further testing, but I think first more monitoring is needed to understand this further. Could we come up with a list of metrics that would be useful to see following another test and successful crash? Metrics needed: 1) # of live_nodes. 2) Full stack traces. 3) CPU used by Solr's JVM specifically (instead of system-wide). 4) Solr's JVM thread count (already done) 5) ? Cheers, Tim Vaillancourt On 6 September 2013 13:11, Mark Miller markrmil...@gmail.com wrote: Did you ever get to index that long before without hitting the deadlock? There really isn't anything negative the patch could be introducing, other than allowing for some more threads to possibly run at once. If I had to guess, I would say its likely this patch fixes the deadlock issue and your seeing another issue - which looks like the system cannot keep up with the requests or something for some reason - perhaps due to some OS networking settings or something (more guessing). Connection refused happens generally when there is nothing listening on the port. Do you see anything interesting change with the rest of the system? CPU usage spikes or something like that? Clamping down further on the overall number of threads night help (which would require making something configurable). How many nodes are listed in zk under live_nodes? Mark Sent from my iPhone On Sep 6, 2013, at 12:02 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, (copy of my post to SOLR-5216) We tested this patch and unfortunately encountered some serious issues a few hours of 500 update-batches/sec. Our update batch is 10 docs, so we are writing about 5000 docs/sec total, using autoCommit to commit the updates (no explicit commits). Our environment: Solr 4.3.1 w/SOLR-5216 patch. Jetty 9, Java 1.7. 3 solr instances, 1 per physical server. 1 collection. 3 shards. 2 replicas (each instance is a leader and a replica). Soft autoCommit is 1000ms. Hard autoCommit is 15000ms. After about 6 hours of stress-testing this patch, we see many of these stalled transactions (below), and the Solr instances start to see each other as down, flooding our Solr logs with Connection Refused exceptions, and otherwise no obviously-useful logs that I could see. I did notice some stalled transactions on both /select and /update, however. This never occurred without this patch. Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9 Lastly, I have a summary of the ERROR-severity logs from this 24-hour soak. My script normalizes the ERROR-severity stack traces and returns them in order of occurrence. Summary of my solr.log: http://pastebin.com/pBdMAWeb Thanks! Tim Vaillancourt On 6 September 2013 07:27, Markus Jelsma markus.jel...@openindex.io wrote: Thanks! -Original message- From:Erick Erickson erickerick...@gmail.com Sent: Friday 6th September 2013 16:20 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume Markus: See: https://issues.apache.org/jira/browse/SOLR-5216 On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013
Re: SolrCloud 4.x hangs under high update volume
) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:260) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:225) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:596) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:527) at java.lang.Thread.run(Thread.java:724) On your live_nodes question, I don't have historical data on this from when the crash occurred, which I guess is what you're looking for. I could add this to our monitoring for future tests, however. I'd be glad to continue further testing, but I think first more monitoring is needed to understand this further. Could we come up with a list of metrics that would be useful to see following another test and successful crash? Metrics needed: 1) # of live_nodes. 2) Full stack traces. 3) CPU used by Solr's JVM specifically (instead of system-wide). 4) Solr's JVM thread count (already done) 5) ? Cheers, Tim Vaillancourt On 6 September 2013 13:11, Mark Miller markrmil...@gmail.com wrote: Did you ever get to index that long before without hitting the deadlock? There really isn't anything negative the patch could be introducing, other than allowing for some more threads to possibly run at once. If I had to guess, I would say its likely this patch fixes the deadlock issue and your seeing another issue - which looks like the system cannot keep up with the requests or something for some reason - perhaps due to some OS networking settings or something (more guessing). Connection refused happens generally when there is nothing listening on the port. Do you see anything interesting change with the rest of the system? CPU usage spikes or something like that? Clamping down further on the overall number of threads night help (which would require making something configurable). How many nodes are listed in zk under live_nodes? Mark Sent from my iPhone On Sep 6, 2013, at 12:02 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, (copy of my post to SOLR-5216) We tested this patch and unfortunately encountered some serious issues a few hours of 500 update-batches/sec. Our update batch is 10 docs, so we are writing about 5000 docs/sec total, using autoCommit to commit the updates (no explicit commits). Our environment: Solr 4.3.1 w/SOLR-5216 patch. Jetty 9, Java 1.7. 3 solr instances, 1 per physical server. 1 collection. 3 shards. 2 replicas (each instance is a leader and a replica). Soft autoCommit is 1000ms. Hard autoCommit is 15000ms. After about 6 hours of stress-testing this patch, we see many of these stalled transactions (below), and the Solr instances start to see each other as down, flooding our Solr logs with Connection Refused exceptions, and otherwise no obviously-useful logs that I could see. I did notice some stalled transactions on both /select and /update, however. This never occurred without this patch. Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9 Lastly, I have a summary of the ERROR-severity logs from this 24-hour soak. My script normalizes the ERROR-severity stack traces and returns them in order of occurrence. Summary of my solr.log: http://pastebin.com/pBdMAWeb Thanks! Tim Vaillancourt On 6 September 2013 07:27, Markus Jelsma markus.jel...@openindex.io wrote: Thanks! -Original message- From:Erick Erickson erickerick...@gmail.com Sent: Friday 6th September 2013 16:20 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume Markus: See: https://issues.apache.org/jira/browse/SOLR-5216 On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we
Re: SolrCloud 4.x hangs under high update volume
Markus: See: https://issues.apache.org/jira/browse/SOLR-5216 On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462) at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486
Re: SolrCloud 4.x hangs under high update volume
Hey guys, (copy of my post to SOLR-5216) We tested this patch and unfortunately encountered some serious issues a few hours of 500 update-batches/sec. Our update batch is 10 docs, so we are writing about 5000 docs/sec total, using autoCommit to commit the updates (no explicit commits). Our environment: Solr 4.3.1 w/SOLR-5216 patch. Jetty 9, Java 1.7. 3 solr instances, 1 per physical server. 1 collection. 3 shards. 2 replicas (each instance is a leader and a replica). Soft autoCommit is 1000ms. Hard autoCommit is 15000ms. After about 6 hours of stress-testing this patch, we see many of these stalled transactions (below), and the Solr instances start to see each other as down, flooding our Solr logs with Connection Refused exceptions, and otherwise no obviously-useful logs that I could see. I did notice some stalled transactions on both /select and /update, however. This never occurred without this patch. Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9 Lastly, I have a summary of the ERROR-severity logs from this 24-hour soak. My script normalizes the ERROR-severity stack traces and returns them in order of occurrence. Summary of my solr.log: http://pastebin.com/pBdMAWeb Thanks! Tim Vaillancourt On 6 September 2013 07:27, Markus Jelsma markus.jel...@openindex.io wrote: Thanks! -Original message- From:Erick Erickson erickerick...@gmail.com Sent: Friday 6th September 2013 16:20 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume Markus: See: https://issues.apache.org/jira/browse/SOLR-5216 On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186
Re: SolrCloud 4.x hangs under high update volume
Did you ever get to index that long before without hitting the deadlock? There really isn't anything negative the patch could be introducing, other than allowing for some more threads to possibly run at once. If I had to guess, I would say its likely this patch fixes the deadlock issue and your seeing another issue - which looks like the system cannot keep up with the requests or something for some reason - perhaps due to some OS networking settings or something (more guessing). Connection refused happens generally when there is nothing listening on the port. Do you see anything interesting change with the rest of the system? CPU usage spikes or something like that? Clamping down further on the overall number of threads night help (which would require making something configurable). How many nodes are listed in zk under live_nodes? Mark Sent from my iPhone On Sep 6, 2013, at 12:02 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, (copy of my post to SOLR-5216) We tested this patch and unfortunately encountered some serious issues a few hours of 500 update-batches/sec. Our update batch is 10 docs, so we are writing about 5000 docs/sec total, using autoCommit to commit the updates (no explicit commits). Our environment: Solr 4.3.1 w/SOLR-5216 patch. Jetty 9, Java 1.7. 3 solr instances, 1 per physical server. 1 collection. 3 shards. 2 replicas (each instance is a leader and a replica). Soft autoCommit is 1000ms. Hard autoCommit is 15000ms. After about 6 hours of stress-testing this patch, we see many of these stalled transactions (below), and the Solr instances start to see each other as down, flooding our Solr logs with Connection Refused exceptions, and otherwise no obviously-useful logs that I could see. I did notice some stalled transactions on both /select and /update, however. This never occurred without this patch. Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9 Lastly, I have a summary of the ERROR-severity logs from this 24-hour soak. My script normalizes the ERROR-severity stack traces and returns them in order of occurrence. Summary of my solr.log: http://pastebin.com/pBdMAWeb Thanks! Tim Vaillancourt On 6 September 2013 07:27, Markus Jelsma markus.jel...@openindex.io wrote: Thanks! -Original message- From:Erick Erickson erickerick...@gmail.com Sent: Friday 6th September 2013 16:20 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume Markus: See: https://issues.apache.org/jira/browse/SOLR-5216 On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads
Re: SolrCloud 4.x hangs under high update volume
anything interesting change with the rest of the system? CPU usage spikes or something like that? Clamping down further on the overall number of threads night help (which would require making something configurable). How many nodes are listed in zk under live_nodes? Mark Sent from my iPhone On Sep 6, 2013, at 12:02 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, (copy of my post to SOLR-5216) We tested this patch and unfortunately encountered some serious issues a few hours of 500 update-batches/sec. Our update batch is 10 docs, so we are writing about 5000 docs/sec total, using autoCommit to commit the updates (no explicit commits). Our environment: Solr 4.3.1 w/SOLR-5216 patch. Jetty 9, Java 1.7. 3 solr instances, 1 per physical server. 1 collection. 3 shards. 2 replicas (each instance is a leader and a replica). Soft autoCommit is 1000ms. Hard autoCommit is 15000ms. After about 6 hours of stress-testing this patch, we see many of these stalled transactions (below), and the Solr instances start to see each other as down, flooding our Solr logs with Connection Refused exceptions, and otherwise no obviously-useful logs that I could see. I did notice some stalled transactions on both /select and /update, however. This never occurred without this patch. Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9 Lastly, I have a summary of the ERROR-severity logs from this 24-hour soak. My script normalizes the ERROR-severity stack traces and returns them in order of occurrence. Summary of my solr.log: http://pastebin.com/pBdMAWeb Thanks! Tim Vaillancourt On 6 September 2013 07:27, Markus Jelsma markus.jel...@openindex.io wrote: Thanks! -Original message- From:Erick Erickson erickerick...@gmail.com Sent: Friday 6th September 2013 16:20 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume Markus: See: https://issues.apache.org/jira/browse/SOLR-5216 On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8
Re: SolrCloud 4.x hangs under high update volume
, (copy of my post to SOLR-5216) We tested this patch and unfortunately encountered some serious issues a few hours of 500 update-batches/sec. Our update batch is 10 docs, so we are writing about 5000 docs/sec total, using autoCommit to commit the updates (no explicit commits). Our environment: Solr 4.3.1 w/SOLR-5216 patch. Jetty 9, Java 1.7. 3 solr instances, 1 per physical server. 1 collection. 3 shards. 2 replicas (each instance is a leader and a replica). Soft autoCommit is 1000ms. Hard autoCommit is 15000ms. After about 6 hours of stress-testing this patch, we see many of these stalled transactions (below), and the Solr instances start to see each other as down, flooding our Solr logs with Connection Refused exceptions, and otherwise no obviously-useful logs that I could see. I did notice some stalled transactions on both /select and /update, however. This never occurred without this patch. Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9 Lastly, I have a summary of the ERROR-severity logs from this 24-hour soak. My script normalizes the ERROR-severity stack traces and returns them in order of occurrence. Summary of my solr.log: http://pastebin.com/pBdMAWeb Thanks! Tim Vaillancourt On 6 September 2013 07:27, Markus Jelsma markus.jel...@openindex.io wrote: Thanks! -Original message- From:Erick Erickson erickerick...@gmail.com Sent: Friday 6th September 2013 16:20 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume Markus: See: https://issues.apache.org/jira/browse/SOLR-5216 On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834
Re: SolrCloud 4.x hangs under high update volume
cannot keep up with the requests or something for some reason - perhaps due to some OS networking settings or something (more guessing). Connection refused happens generally when there is nothing listening on the port. Do you see anything interesting change with the rest of the system? CPU usage spikes or something like that? Clamping down further on the overall number of threads night help (which would require making something configurable). How many nodes are listed in zk under live_nodes? Mark Sent from my iPhone On Sep 6, 2013, at 12:02 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, (copy of my post to SOLR-5216) We tested this patch and unfortunately encountered some serious issues a few hours of 500 update-batches/sec. Our update batch is 10 docs, so we are writing about 5000 docs/sec total, using autoCommit to commit the updates (no explicit commits). Our environment: Solr 4.3.1 w/SOLR-5216 patch. Jetty 9, Java 1.7. 3 solr instances, 1 per physical server. 1 collection. 3 shards. 2 replicas (each instance is a leader and a replica). Soft autoCommit is 1000ms. Hard autoCommit is 15000ms. After about 6 hours of stress-testing this patch, we see many of these stalled transactions (below), and the Solr instances start to see each other as down, flooding our Solr logs with Connection Refused exceptions, and otherwise no obviously-useful logs that I could see. I did notice some stalled transactions on both /select and /update, however. This never occurred without this patch. Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9 Lastly, I have a summary of the ERROR-severity logs from this 24-hour soak. My script normalizes the ERROR-severity stack traces and returns them in order of occurrence. Summary of my solr.log: http://pastebin.com/pBdMAWeb Thanks! Tim Vaillancourt On 6 September 2013 07:27, Markus Jelsma markus.jel...@openindex.io wrote: Thanks! -Original message- From:Erick Erickson erickerick...@gmail.com Sent: Friday 6th September 2013 16:20 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume Markus: See: https://issues.apache.org/jira/browse/SOLR-5216 On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself
Re: SolrCloud 4.x hangs under high update volume
Update: It is a bit too soon to tell, but about 6 hours into testing there are no crashes with this patch. :) We are pushing 500 batches of 10 updates per second to a 3 node, 3 shard cluster I mentioned above. 5000 updates per second total. More tomorrow after a 24 hr soak! Tim On Wednesday, 4 September 2013, Tim Vaillancourt wrote: Thanks so much for the explanation Mark, I owe you one (many)! We have this on our high TPS cluster and will run it through it's paces tomorrow. I'll provide any feedback I can, more soon! :D Cheers, Tim
RE: SolrCloud 4.x hangs under high update volume
Tim, Take a look at http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html and https://issues.apache.org/jira/browse/SOLR-4816. I had the same issue that you're reporting for a while then I applied the patch from SOLR-4816 to my clients and the problems went away. If you don't feel like applying the patch it looks like it should be included in the release of version 4.5. Also note that the problem happens more frequently when the replication factor is greater than 1. Thanks, Greg -Original Message- From: Tim Vaillancourt [mailto:t...@elementspace.com] Sent: Tuesday, September 03, 2013 6:31 PM To: solr-user@lucene.apache.org Subject: SolrCloud 4.x hangs under high update volume Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462) at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564
Re: SolrCloud 4.x hangs under high update volume
I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462) at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1096) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175) at
Re: SolrCloud 4.x hangs under high update volume
I am having this issue as well. I did apply this patch. Unfortunately, it did not resolve the issue in my case. On Wed, Sep 4, 2013 at 7:01 AM, Greg Walters gwalt...@sherpaanalytics.comwrote: Tim, Take a look at http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.htmland https://issues.apache.org/jira/browse/SOLR-4816. I had the same issue that you're reporting for a while then I applied the patch from SOLR-4816 to my clients and the problems went away. If you don't feel like applying the patch it looks like it should be included in the release of version 4.5. Also note that the problem happens more frequently when the replication factor is greater than 1. Thanks, Greg -Original Message- From: Tim Vaillancourt [mailto:t...@elementspace.com] Sent: Tuesday, September 03, 2013 6:31 PM To: solr-user@lucene.apache.org Subject: SolrCloud 4.x hangs under high update volume Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462) at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter
Re: SolrCloud 4.x hangs under high update volume
There is an issue if I remember right, but I can't find it right now. If anyone that has the problem could try this patch, that would be very helpful: http://pastebin.com/raw.php?i=aaRWwSGP - Mark On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462) at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155
Re: SolrCloud 4.x hangs under high update volume
Thanks guys! :) Mark: this patch is much appreciated, I will try to test this shortly, hopefully today. For my curiosity/understanding, could someone explain to me quickly what locks SolrCloud takes on updates? Was I on to something that more shards decrease the chance for locking? Secondly, I was wondering if someone could summarize what this patch 'fixes'? I'm not too familiar with Java and the solr codebase (working on that though :D). Cheers, Tim On 4 September 2013 09:52, Mark Miller markrmil...@gmail.com wrote: There is an issue if I remember right, but I can't find it right now. If anyone that has the problem could try this patch, that would be very helpful: http://pastebin.com/raw.php?i=aaRWwSGP - Mark On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462
Re: SolrCloud 4.x hangs under high update volume
The 'lock' or semaphore was added to cap the number of threads that would be used. Previously, the number of threads in use could spike to many, many thousands on heavy updates. A limit on the number of outstanding requests was put in place to keep this from happening. Something like 16 * the number of hosts in the cluster. I assume the deadlock comes from the fact that requests are of two kinds - forward to the leader and distrib updates from the leader to replicas. Forward to the leader actually waits for the leader to then distrib the updates to replicas before returning. I believe this is what can lead to deadlock. This is likely why the patch for the CloudSolrServer can help the situation - it removes the need to forward to the leader because it sends to the correct leader to begin with. Only useful if you are adding docs with CloudSolrServer though, and more like a workaround than a fix. The patch uses a separate 'limiting' semaphore for the two cases. - Mark On Sep 4, 2013, at 10:22 AM, Tim Vaillancourt t...@elementspace.com wrote: Thanks guys! :) Mark: this patch is much appreciated, I will try to test this shortly, hopefully today. For my curiosity/understanding, could someone explain to me quickly what locks SolrCloud takes on updates? Was I on to something that more shards decrease the chance for locking? Secondly, I was wondering if someone could summarize what this patch 'fixes'? I'm not too familiar with Java and the solr codebase (working on that though :D). Cheers, Tim On 4 September 2013 09:52, Mark Miller markrmil...@gmail.com wrote: There is an issue if I remember right, but I can't find it right now. If anyone that has the problem could try this patch, that would be very helpful: http://pastebin.com/raw.php?i=aaRWwSGP - Mark On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186
RE: SolrCloud 4.x hangs under high update volume
Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462) at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564
Re: SolrCloud 4.x hangs under high update volume
Thanks so much for the explanation Mark, I owe you one (many)! We have this on our high TPS cluster and will run it through it's paces tomorrow. I'll provide any feedback I can, more soon! :D Cheers, Tim
SolrCloud 4.x hangs under high update volume
Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462) at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1096) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1030) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:201) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109) at