Kevin, Take a look at http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html and https://issues.apache.org/jira/browse/SOLR-4816. I had the same issue that you're reporting for a while then I applied the patch from SOLR-4816 to my clients and the problems went away. If you don't feel like applying the patch it looks like it should be included in the release of version 4.5. Also note that the problem happens more frequently when the replication factor is greater than 1.
Thanks, Greg -----Original Message----- From: kevin.osb...@cbsinteractive.com [mailto:kevin.osb...@cbsinteractive.com] On Behalf Of Kevin Osborn Sent: Tuesday, September 03, 2013 4:16 PM To: solr-user Subject: Solr Cloud hangs when replicating updates I was having problems updating SolrCloud with a large batch of records. The records are coming in bursts with lulls between updates. At first, I just tried large updates of 100,000 records at a time. Eventually, this caused Solr to hang. When hung, I can still query Solr. But I cannot do any deletes or other updates to the index. At first, my updates were going as SolrJ CSV posts. I have also tried local file updates and had similar results. I finally slowed things down to just use SolrJ's Update feature, which is basically just JavaBin. I am also sending over just 100 at a time in 10 threads. Again, it eventually hung. Sometimes, Solr hangs in the first couple of chunks. Other times, it hangs right away. These are my commit settings: <autoCommit> <maxTime>15000</maxTime> <maxDocs>5000</maxDocs> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>30000</maxTime> </autoSoftCommit> I have tried quite a few variations with the same results. I also tried various JVM settings with the same results. The only variable seems to be that reducing the cluster size from 2 to 1 is the only thing that helps. I also did a jstack trace. I did not see any explicit deadlocks, but I did see quite a few threads in WAITING or TIMED_WAITING. It is typically something like this: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000074039a450> (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474) at org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395) at org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44) at org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364) at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) It basically appears that Solr gets stuck while trying to acquire a semaphore that never becomes available. Anyone have any ideas? This is definitely causing major problems for us. -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions]