Well, this is a bit odd: > 2013-02-13 20:58:45,740 INFO org.apache.giraph.worker.BspServiceWorker: > loadInputSplits: Using 1 thread(s), originally 1 threads(s) for 14 total > splits. > 2013-02-13 20:58:45,742 INFO org.apache.giraph.comm.netty.NettyServer: start: > Using Netty without authentication. > 2013-02-13 20:58:45,744 INFO org.apache.giraph.comm.SendPartitionCache: > SendPartitionCache: maxVerticesPerTransfer = 10000 > 2013-02-13 20:58:45,744 INFO org.apache.giraph.comm.SendPartitionCache: > SendPartitionCache: maxEdgesPerTransfer = 80000 > 2013-02-13 20:58:45,745 INFO org.apache.giraph.comm.netty.NettyServer: start: > Using Netty without authentication. > 2013-02-13 20:58:45,755 INFO org.apache.giraph.comm.netty.NettyServer: start: > Using Netty without authentication. > 2013-02-13 20:58:45,758 INFO org.apache.giraph.comm.netty.NettyServer: start: > Using Netty without authentication. > 2013-02-13 20:58:45,814 INFO org.apache.giraph.worker.InputSplitsCallable: > call: Loaded 0 input splits in 0.07298644 secs, (v=0, e=0) 0.0 vertices/sec, > 0.0 edges/sec > 2013-02-13 20:58:45,817 INFO org.apache.giraph.comm.netty.NettyClient: > waitAllRequests: Finished all requests. MBytes/sec sent = 0, MBytes/sec > received = 0, MBytesSent = 0, MBytesReceived = 0, ave sent req MBytes = 0, > ave received req MBytes = 0, secs waited = 8.303 > 2013-02-13 20:58:45,817 INFO org.apache.giraph.worker.BspServiceWorker: > setup: Finally loaded a total of (v=0, e=0) > > What would cause this? I imagine that it's related to my overall problem.
On Wed, Feb 13, 2013 at 3:31 PM, Zachary Hanif <[email protected]> wrote: > It is my own code. I'm staring at my VertexInputFormat class right now. It > extends TextVertexInputFormat<Text, DoubleWritable, NullWritable, > DoubleWritable>. I cannot imagine why a value would not be set for these > vertexes, but I'll drop in some code to more stringently ensure value > creation. > > Why would this begin to fail on a distributed deployment (multiple > workers) but not with a single worker? The dataset is identical between the > two executions. > > > On Wed, Feb 13, 2013 at 2:35 PM, Alessandro Presta <[email protected]>wrote: > >> Hi Zachary, >> >> Are you running one of the examples or your own code? >> It seems to me that a call to edge.getValue() is returning null, which >> should never happen. >> >> Alessandro >> >> From: Zachary Hanif <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Wednesday, February 13, 2013 11:29 AM >> To: "[email protected]" <[email protected]> >> Subject: Giraph/Netty issues on a cluster >> >> (How embarrassing! I forgot a subject header in a previous attempt to >> post this. Please reply to this thread, not the other.) >> >> Hi everyone, >> >> I am having some odd issues when trying to run a Giraph 0.2 job across my >> CDH 3u3 cluster. After building the jar, and deploying it across the >> cluster, I start to notice a handful of my nodes reporting the following >> error: >> >> 2013-02-13 17:47:43,341 WARN >>> org.apache.giraph.comm.netty.handler.ResponseClientHandler: >>> exceptionCaught: Channel failed with remote address <EDITED_INTERNAL_DNS>/ >>> 10.2.0.16:30001 >>> java.lang.NullPointerException >>> at >>> org.apache.giraph.vertex.EdgeListVertexBase.write(EdgeListVertexBase.java:106) >>> at >>> org.apache.giraph.partition.SimplePartition.write(SimplePartition.java:169) >>> at >>> org.apache.giraph.comm.requests.SendVertexRequest.writeRequest(SendVertexRequest.java:71) >>> at >>> org.apache.giraph.comm.requests.WritableRequest.write(WritableRequest.java:127) >>> at >>> org.apache.giraph.comm.netty.handler.RequestEncoder.encode(RequestEncoder.java:96) >>> at >>> org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:61) >>> at >>> org.jboss.netty.handler.execution.ExecutionHandler.handleDownstream(ExecutionHandler.java:185) >>> at org.jboss.netty.channel.Channels.write(Channels.java:712) >>> at org.jboss.netty.channel.Channels.write(Channels.java:679) >>> at >>> org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:246) >>> at >>> org.apache.giraph.comm.netty.NettyClient.sendWritableRequest(NettyClient.java:655) >>> at >>> org.apache.giraph.comm.netty.NettyWorkerClient.sendWritableRequest(NettyWorkerClient.java:144) >>> at >>> org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:425) >>> at >>> org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendPartitionRequest(NettyWorkerClientRequestProcessor.java:195) >>> at >>> org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:365) >>> at >>> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:190) >>> at >>> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58) >>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>> at java.util.concurrent.FutureTask.run(FutureTask.java:166) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >>> at java.lang.Thread.run(Thread.java:722) >>> >> >> What would be causing this? All other Hadoop jobs run well on the >> cluster, and when the Giraph job is run with only one worker, it completes >> without any issues. When run with any number of workers >1, the above error >> occurs. I have referenced this >> post<http://mail-archives.apache.org/mod_mbox/giraph-user/201209.mbox/%3ccaeq6y7shc4in-l73nr7abizspmrrfw9sfa8tmi3myqml8vk...@mail.gmail.com%3E>where >> superficially similar issues were discussed, but the root cause >> appears to be different, and suggested methods of resolution are not >> panning out. >> >> As extra background, the 'remote address' changes, as the error cycles >> through my available cluster nodes, and the failing workers do not seem to >> favor one physical machine over another. Not all nodes present this issue, >> only a handful per job. Is there soemthing simple that I am missing? >> > >
