Hi, I am trying to run multi-threaded giraph workers. This is the command that i use:
hadoop jar giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.ConnectedComponentsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip in/road-template -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op out/cc_mt_road4 -w 24 -ca giraph.numComputeThreads=4,giraph.userPartitionCount=4 We have a 12 node cluster with 8 cores each. I am running 24 workers and wish to run each worker in a multi-threaded way so that multiple vertices are processed in parallel on a single node. I read in a different thread that suggested to use userPartitionCount=<threadcount> so that each thread works on a different partition. However when i do that, i get the following exception ava.lang.IllegalStateException: run: Caught an unrecoverable exception null at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.NullPointerException at org.apache.giraph.comm.SendCache.<init>(SendCache.java:100) at org.apache.giraph.comm.SendEdgeCache.<init>(SendEdgeCache.java:50) at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.<init>(NettyWorkerClientRequestProcessor.java:128) at org.apache.giraph.worker.InputSplitsCallable.<init>(InputSplitsCallable.java:104) at org.apache.giraph.worker.VertexInputSplitsCallable.<init>(VertexInputSplitsCallable.java:98) at org.apache.giraph.worker.VertexInputSplitsCallableFactory.newCallable(VertexInputSplitsCallableFactory.java:80) at org.apache.giraph.worker.VertexInputSplitsCallableFactory.newCallable(VertexInputSplitsCallableFactory.java:37) at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:213) at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:283) at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:327) at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:508) at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:246) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91) ... 7 more when i run the command without giraph.userPartitionCount=4 but specify just -ca giraph.numComputeThreads=4, i dont see any performance improvement. Please suggest the correct way to use multi threading or point me to a document. Thanks, Alok Kumbhare
