*Problem:* We periodically rebuild our Solr index from scratch. We have built a custom publisher that horizontally scales to increase write throughput. On a given rebuild, we will have ~60 JVMs running with 5 threads that are actively publishing to all Solr masters.
For each thread, we instantiate one StreamingUpdateSolrServer( QueueSize:100, QueueThreadSize: 2 ) for each master = 20 servers/thread. At the end of a publish cycle (we publish in smaller chunks = 5MM records), we execute server.blockUntilFinished() on each of the 20 servers on each thread ( 100 total ). Before we applied a recent change, this would always execute to completion. There were a few hang-ups on publishes but we consistently re-published our entire corpus in 6-7 hours. The *problem* is that the blockUntilFinished hangs indefinitely. From the java thread dumps, it appears that the loop in StreamingUpdateSolrServer thinks a runner thread is still active so it blocks (as expected). The other note about the java thread dump is that the active runner thread is exactly this: *Hung Runner Thread:* "pool-1-thread-8" prio=3 tid=0x00000001084c0000 nid=0xfe runnable [0xffffffff5c7fe000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked <0xfffffffe81dbcbe0> (a java.io.BufferedInputStream) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:154) Although the runner thread is reading the socket, there is absolutely no activity on the Solr clients. Other than the blockUntilFinished thread, the client is basically sleeping. * * * * ***Recent Change:* We increased the "maxFieldLength" from 10000(default) to 2147483647 (Integer.MAX_VALUE). Given this change is server side, I don't know how this would impact adding a new document. I see how it would increase commit times and index size, but don't see the relationship to hanging client adds. *Ingest Workflow:* 1) Pull artifacts from relational database (PDF/TXT/Java bean) 2) Extract all searchable text fields -- this is where we use Tika, independent of Solr 3) Using Solr4J client, we publish an object that is serialized to XML and written to the master 4) execute "blockUntilFinished" for all 20 servers on each thread. 5) Autocommit set on servers at 30 minutes or 50k documents. During republish, 50k threshold is met first. * * *Environment:* Solr v3.5.0 20 masters 2 slaves/master = 40 slaves *Corpus:* We have ~100MM records, ranging in size from 50MB PDFs to 1KB TXT files. Our schema has an unusually large number of fields, 200. Our index size averages about 30GB/shards, totally 600GB. *Releated Bugs:* My symptoms are most related to this bug but we are not executing any deletes so I have low confidence that it is 100% related https://issues.apache.org/jira/browse/SOLR-1990 Although we have similar stack traces, we are only ADDING docs. Thanks ahead for any input/help! -- Justin Babuscio