Chris, my response is below each of your paragraphs...
I don't have the means to try out this code right now ... but i can't see
any obvious problems with it (there may be somewhere that you are opening a stream or reader and not closing it, but i didn't see one) ... i notice you are running this client on the same machine as Solr (hence the localhost URLs) did you by any chance try running the client on a seperate machine to see if hte number of updates before it hangs changes?
When I run the client locally and the Solr server on a slower and separate development box, the maximum number of updates drops to 3,219. So it's almost as if it's related to some sort of timeout problem because the maximum number of updates drops considerably on a slower machine, but it's weird how consistent the number is. 6,144 locally, 5,000 something when I run it on the external server, and 3,219 when the client is separate from the server. my money is still on a filehandle resource limit somwhere ... if you are
running on a system that has "lsof" (on some Unix/Linux installations you need sudo/su root permissions to run it) you can use "lsof -p ####" to look up what files/network connections are open for a given process. You can try running that on both the client pid and the Solr server pid once it's hung -- You'll probably see a lot of Jar files in use for both, but if you see more then a few XML files open by the client, or more then a 1 TCP connection open by either the client or the server, there's your culprit.
The only output I get from 'lsof -p' that pertains to TCP connections are the following...I'm not too sure how to interpret it though: java 4104 sangraal 261u IPv6 0x5b060f0 0t0 TCP *:8009 (LISTEN) java 4104 sangraal 262u IPv6 0x55d59e8 0t0 TCP [::127.0.0.1]:8005 (LISTEN) java 4104 sangraal 263u IPv6 0x53cc0e0 0t0 TCP [::127.0.0.1 ]:http-alt->[::127.0.0.1]:51039 (ESTABLISHED) java 4104 sangraal 264u IPv6 0x5b059d0 0t0 TCP [::127.0.0.1 ]:51045->[::127.0.0.1]:http-alt (ESTABLISHED) java 4104 sangraal 265u IPv6 0x53cc9c8 0t0 TCP [::127.0.0.1 ]:http-alt->[::127.0.0.1]:51045 (ESTABLISHED) java 4104 sangraal 11u IPv6 0x5b04f20 0t0 TCP *:http-alt (LISTEN) java 4104 sangraal 12u IPv6 0x5b06d68 0t0 TCP localhost:51037->localhost:51036 (TIME_WAIT) I'm not sure what Windows equivilent of lsof may exist.
Wait ... i just had another thought.... You are using InputStreamReader to deal with the InputStreams of your remote XML files -- but you aren't specifying a charset, so it's using your system default which may be differnet from the charset of the orriginal XML files you are pulling from the URL -- which (i *think*) means that your InputStreamReader may in some cases fail to read all of the bytes of the stream, which might some dangling filehandles (i'm just guessing on that part ... i'm not acctually sure whta happens in that case). What if you simplify your code (for the purposes of testing) and just put the post-transform version ganja-full.xml in a big ass String variable in your java app and just call GanjaUpdate.doUpdate(bigAssString) over and over again ... does that cause the same problem?
In the code, I read the XML with a StringReader and then pass it to GanjaUpdate as a string anyway. I've output the String object and verified that it is in fact all there. -Sangraal