Hi all,
Thanks so much for this list - I'm constantly lurking and learning things :)
I'm having trouble with our distributed cluster - our setup is as follows:
We have a 'reader' node reading from the local FS, 15 'worker' nodes each
running identical aggregates of analysis and consumers that connect to an
oracle DB for final storing of data results. On each worker I have
multiple instances running, typically 32, so I have 15x32 connections to
Oracle. I have about 20,000,000 documents to process.
After a certain amount of time, I start to get Broken Pipe server socket
exceptions, of the following:
4/21/08 5:40:24 PM - 11:
org.apache.uima.adapter.vinci.CASTransportable.toStream(288): WARNING:
Broken pipe
java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:78)
at
org.apache.vinci.transport.XTalkTransporter.writeInt(XTalkTransporter.java:508)
at
org.apache.vinci.transport.XTalkTransporter.stringToBin(XTalkTransporter.java:446)
at
org.apache.uima.adapter.vinci.CASTransportable$XTalkSerializer.startElement(CASTransportable.java:219)
at
org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.startElement(XCASSerializer.java:327)
at
org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.encodeFS(XCASSerializer.java:466)
at
org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.encodeIndexed(XCASSerializer.java:347)
at
org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.serialize(XCASSerializer.java:271)
at
org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.access$600(XCASSerializer.java:62)
at
org.apache.uima.cas.impl.XCASSerializer.serialize(XCASSerializer.java:919)
at
org.apache.uima.adapter.vinci.CASTransportable.toStream(CASTransportable.java:279)
at
org.apache.vinci.transport.BaseServerRunnable.run(BaseServerRunnable.java:90)
at
org.apache.vinci.transport.BaseServer$PooledThread.run(BaseServer.java:101)
and
org.apache.uima.collection.impl.base_cpm.container.ServiceConnectionException:
The service did not complete a call within the specified time. (Thread
Name: [Procesing Pipeline#172 Thread]::) Host: 192.168.3.52 Port: 11000
Exceeded Timeout Value: 600000
at
org.apache.uima.collection.impl.cpm.container.deployer.VinciTAP.sendAndReceive(VinciTAP.java:533)
at
org.apache.uima.collection.impl.cpm.container.deployer.VinciTAP.analyze(VinciTAP.java:927)
at
org.apache.uima.collection.impl.cpm.container.NetworkCasProcessorImpl.process(NetworkCasProcessorImpl.java:198)
at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:1071)
at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:668)
org.apache.uima.resource.ResourceProcessException
at
org.apache.uima.collection.impl.cpm.container.NetworkCasProcessorImpl.process(NetworkCasProcessorImpl.java:200)
at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:1071)
at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:668)
I've found that if I lower my <timeout> for each casProcessor too low, (and
also my maxConsecutiveRestarts), I only get about 70000 documents in before
the whole thing goes sour. If I raise everything to obscenely high (say,
1,000,000,000 ms), then I get about 400,000 in. The the whole system
freezes, and nothing gets into Oracle. I keep my vinci descriptor for the
VNS server at unlimited, and the serverSocketTimeout for the Vinci
descriptor for the CPE obscenely high as well.
I don't know if I'm adequately explaining my problem, but I'm trying to
figure out the best way to set my timeouts on the CPE and the Vinci
descriptors as well.
My next attempt is to keep the timeouts from the CPE side very high, the
Vinci VNS descriptor unlimited, and the serverSocketTimeout at 30000ms.
I guess, overall, I would like to give ample time to let an AE work, but
not so long it never returns. This includes the fact that since I have
32x15 processingUnitThreadCounts, I need the timeout to be large enough at
initialization.
Sorry for the rambling, does anyone have any general guidelines/experiences
for this kind of setup?
Thanks!
Steve