Hi Steve -
I'm no expert in these matter, but I wonder if changing the timeouts is
the right approach. Have you isolated the problem to something wrong
with the timeouts? Could it be something else (some rare race condition
causing a hang at some point, for intance)?
What level of UIMA are you running?
-Marshall
Steve Suppe wrote:
Hi all,
Thanks so much for this list - I'm constantly lurking and learning
things :)
I'm having trouble with our distributed cluster - our setup is as
follows:
We have a 'reader' node reading from the local FS, 15 'worker' nodes
each running identical aggregates of analysis and consumers that
connect to an oracle DB for final storing of data results. On each
worker I have multiple instances running, typically 32, so I have
15x32 connections to Oracle. I have about 20,000,000 documents to
process.
After a certain amount of time, I start to get Broken Pipe server
socket exceptions, of the following:
4/21/08 5:40:24 PM - 11:
org.apache.uima.adapter.vinci.CASTransportable.toStream(288): WARNING:
Broken pipe
java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at
java.io.BufferedOutputStream.write(BufferedOutputStream.java:78)
at
org.apache.vinci.transport.XTalkTransporter.writeInt(XTalkTransporter.java:508)
at
org.apache.vinci.transport.XTalkTransporter.stringToBin(XTalkTransporter.java:446)
at
org.apache.uima.adapter.vinci.CASTransportable$XTalkSerializer.startElement(CASTransportable.java:219)
at
org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.startElement(XCASSerializer.java:327)
at
org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.encodeFS(XCASSerializer.java:466)
at
org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.encodeIndexed(XCASSerializer.java:347)
at
org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.serialize(XCASSerializer.java:271)
at
org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.access$600(XCASSerializer.java:62)
at
org.apache.uima.cas.impl.XCASSerializer.serialize(XCASSerializer.java:919)
at
org.apache.uima.adapter.vinci.CASTransportable.toStream(CASTransportable.java:279)
at
org.apache.vinci.transport.BaseServerRunnable.run(BaseServerRunnable.java:90)
at
org.apache.vinci.transport.BaseServer$PooledThread.run(BaseServer.java:101)
and
org.apache.uima.collection.impl.base_cpm.container.ServiceConnectionException:
The service did not complete a call within the specified time. (Thread
Name: [Procesing Pipeline#172 Thread]::) Host: 192.168.3.52 Port:
11000 Exceeded Timeout Value: 600000
at
org.apache.uima.collection.impl.cpm.container.deployer.VinciTAP.sendAndReceive(VinciTAP.java:533)
at
org.apache.uima.collection.impl.cpm.container.deployer.VinciTAP.analyze(VinciTAP.java:927)
at
org.apache.uima.collection.impl.cpm.container.NetworkCasProcessorImpl.process(NetworkCasProcessorImpl.java:198)
at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:1071)
at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:668)
org.apache.uima.resource.ResourceProcessException
at
org.apache.uima.collection.impl.cpm.container.NetworkCasProcessorImpl.process(NetworkCasProcessorImpl.java:200)
at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:1071)
at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:668)
I've found that if I lower my <timeout> for each casProcessor too low,
(and also my maxConsecutiveRestarts), I only get about 70000 documents
in before the whole thing goes sour. If I raise everything to
obscenely high (say, 1,000,000,000 ms), then I get about 400,000 in.
The the whole system freezes, and nothing gets into Oracle. I keep my
vinci descriptor for the VNS server at unlimited, and the
serverSocketTimeout for the Vinci descriptor for the CPE obscenely
high as well.
I don't know if I'm adequately explaining my problem, but I'm trying
to figure out the best way to set my timeouts on the CPE and the Vinci
descriptors as well.
My next attempt is to keep the timeouts from the CPE side very high,
the Vinci VNS descriptor unlimited, and the serverSocketTimeout at
30000ms.
I guess, overall, I would like to give ample time to let an AE work,
but not so long it never returns. This includes the fact that since I
have 32x15 processingUnitThreadCounts, I need the timeout to be large
enough at initialization.
Sorry for the rambling, does anyone have any general
guidelines/experiences for this kind of setup?
Thanks!
Steve