Marshall,
Thanks for the response. We're running 2.2.1. I had this nagging
suspicion that maybe I was treating the symptoms, not the problem, but at
this point I'm out of ideas :) My next attempt will be to bring the
threads and pool size down to something more manageable. This is supposed
to be 'rev2' of our design, hardware wise. I'll go back to my original CPE
settings (lower count, etc) from rev1, and hope for the best.
Thanks again, I will keep you posted.
Steve
At 05:00 PM 4/22/2008, you wrote:
Hi Steve -
I'm no expert in these matter, but I wonder if changing the timeouts is
the right approach. Have you isolated the problem to something wrong with
the timeouts? Could it be something else (some rare race condition
causing a hang at some point, for intance)?
What level of UIMA are you running?
-Marshall
Steve Suppe wrote:
Hi all,
Thanks so much for this list - I'm constantly lurking and learning things :)
I'm having trouble with our distributed cluster - our setup is as follows:
We have a 'reader' node reading from the local FS, 15 'worker' nodes each
running identical aggregates of analysis and consumers that connect to an
oracle DB for final storing of data results. On each worker I have
multiple instances running, typically 32, so I have 15x32 connections to
Oracle. I have about 20,000,000 documents to process.
After a certain amount of time, I start to get Broken Pipe server socket
exceptions, of the following:
4/21/08 5:40:24 PM - 11:
org.apache.uima.adapter.vinci.CASTransportable.toStream(288): WARNING:
Broken pipe
java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:78)
at
org.apache.vinci.transport.XTalkTransporter.writeInt(XTalkTransporter.java:508)
at
org.apache.vinci.transport.XTalkTransporter.stringToBin(XTalkTransporter.java:446)
at
org.apache.uima.adapter.vinci.CASTransportable$XTalkSerializer.startElement(CASTransportable.java:219)
at
org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.startElement(XCASSerializer.java:327)
at
org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.encodeFS(XCASSerializer.java:466)
at
org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.encodeIndexed(XCASSerializer.java:347)
at
org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.serialize(XCASSerializer.java:271)
at
org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.access$600(XCASSerializer.java:62)
at
org.apache.uima.cas.impl.XCASSerializer.serialize(XCASSerializer.java:919)
at
org.apache.uima.adapter.vinci.CASTransportable.toStream(CASTransportable.java:279)
at
org.apache.vinci.transport.BaseServerRunnable.run(BaseServerRunnable.java:90)
at
org.apache.vinci.transport.BaseServer$PooledThread.run(BaseServer.java:101)
and
org.apache.uima.collection.impl.base_cpm.container.ServiceConnectionException:
The service did not complete a call within the specified time. (Thread
Name: [Procesing Pipeline#172 Thread]::) Host: 192.168.3.52 Port: 11000
Exceeded Timeout Value: 600000
at
org.apache.uima.collection.impl.cpm.container.deployer.VinciTAP.sendAndReceive(VinciTAP.java:533)
at
org.apache.uima.collection.impl.cpm.container.deployer.VinciTAP.analyze(VinciTAP.java:927)
at
org.apache.uima.collection.impl.cpm.container.NetworkCasProcessorImpl.process(NetworkCasProcessorImpl.java:198)
at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:1071)
at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:668)
org.apache.uima.resource.ResourceProcessException
at
org.apache.uima.collection.impl.cpm.container.NetworkCasProcessorImpl.process(NetworkCasProcessorImpl.java:200)
at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:1071)
at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:668)
I've found that if I lower my <timeout> for each casProcessor too low,
(and also my maxConsecutiveRestarts), I only get about 70000 documents in
before the whole thing goes sour. If I raise everything to obscenely
high (say, 1,000,000,000 ms), then I get about 400,000 in.
The the whole system freezes, and nothing gets into Oracle. I keep my
vinci descriptor for the VNS server at unlimited, and the
serverSocketTimeout for the Vinci descriptor for the CPE obscenely high
as well.
I don't know if I'm adequately explaining my problem, but I'm trying to
figure out the best way to set my timeouts on the CPE and the Vinci
descriptors as well.
My next attempt is to keep the timeouts from the CPE side very high, the
Vinci VNS descriptor unlimited, and the serverSocketTimeout at 30000ms.
I guess, overall, I would like to give ample time to let an AE work, but
not so long it never returns. This includes the fact that since I have
32x15 processingUnitThreadCounts, I need the timeout to be large enough
at initialization.
Sorry for the rambling, does anyone have any general
guidelines/experiences for this kind of setup?
Thanks!
Steve