Hi all,

Thanks so much for this list - I'm constantly lurking and learning things :)

I'm having trouble with our distributed cluster - our setup is as follows:

We have a 'reader' node reading from the local FS, 15 'worker' nodes each running identical aggregates of analysis and consumers that connect to an oracle DB for final storing of data results. On each worker I have multiple instances running, typically 32, so I have 15x32 connections to Oracle. I have about 20,000,000 documents to process.

After a certain amount of time, I start to get Broken Pipe server socket exceptions, of the following:

4/21/08 5:40:24 PM - 11: org.apache.uima.adapter.vinci.CASTransportable.toStream(288): WARNING: Broken pipe
java.net.SocketException: Broken pipe
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:78)
at org.apache.vinci.transport.XTalkTransporter.writeInt(XTalkTransporter.java:508) at org.apache.vinci.transport.XTalkTransporter.stringToBin(XTalkTransporter.java:446) at org.apache.uima.adapter.vinci.CASTransportable$XTalkSerializer.startElement(CASTransportable.java:219) at org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.startElement(XCASSerializer.java:327) at org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.encodeFS(XCASSerializer.java:466) at org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.encodeIndexed(XCASSerializer.java:347) at org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.serialize(XCASSerializer.java:271) at org.apache.uima.cas.impl.XCASSerializer$XCASDocSerializer.access$600(XCASSerializer.java:62) at org.apache.uima.cas.impl.XCASSerializer.serialize(XCASSerializer.java:919) at org.apache.uima.adapter.vinci.CASTransportable.toStream(CASTransportable.java:279) at org.apache.vinci.transport.BaseServerRunnable.run(BaseServerRunnable.java:90) at org.apache.vinci.transport.BaseServer$PooledThread.run(BaseServer.java:101)

and

org.apache.uima.collection.impl.base_cpm.container.ServiceConnectionException: The service did not complete a call within the specified time. (Thread Name: [Procesing Pipeline#172 Thread]::) Host: 192.168.3.52 Port: 11000 Exceeded Timeout Value: 600000 at org.apache.uima.collection.impl.cpm.container.deployer.VinciTAP.sendAndReceive(VinciTAP.java:533) at org.apache.uima.collection.impl.cpm.container.deployer.VinciTAP.analyze(VinciTAP.java:927) at org.apache.uima.collection.impl.cpm.container.NetworkCasProcessorImpl.process(NetworkCasProcessorImpl.java:198) at org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:1071) at org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:668)
org.apache.uima.resource.ResourceProcessException
at org.apache.uima.collection.impl.cpm.container.NetworkCasProcessorImpl.process(NetworkCasProcessorImpl.java:200) at org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:1071) at org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:668)



I've found that if I lower my <timeout> for each casProcessor too low, (and also my maxConsecutiveRestarts), I only get about 70000 documents in before the whole thing goes sour. If I raise everything to obscenely high (say, 1,000,000,000 ms), then I get about 400,000 in. The the whole system freezes, and nothing gets into Oracle. I keep my vinci descriptor for the VNS server at unlimited, and the serverSocketTimeout for the Vinci descriptor for the CPE obscenely high as well.

I don't know if I'm adequately explaining my problem, but I'm trying to figure out the best way to set my timeouts on the CPE and the Vinci descriptors as well.

My next attempt is to keep the timeouts from the CPE side very high, the Vinci VNS descriptor unlimited, and the serverSocketTimeout at 30000ms.

I guess, overall, I would like to give ample time to let an AE work, but not so long it never returns. This includes the fact that since I have 32x15 processingUnitThreadCounts, I need the timeout to be large enough at initialization.

Sorry for the rambling, does anyone have any general guidelines/experiences for this kind of setup?

Thanks!

Steve

Reply via email to