Re: Error partitioning the input path

邓凯 Fri, 06 Sep 2013 00:25:59 -0700

Sorry,
       I just found the successful job logsin the tasklogs directory.
       But when I check the Bspmaster's log,I found some exceptions int the
log.
      2013-09-06 11:24:49,550 ERROR
org.apache.hama.bsp.sync.ZKSyncBSPMasterClient:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /bsp
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at
org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62)
at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:509)
at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:492)
at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:475)
at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56)
2013-09-06 11:24:49,796 ERROR
org.apache.hama.bsp.sync.ZKSyncBSPMasterClient:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /bsp
       And when I attempted to run the failed job,the log in bspmaster said,
       2013-09-05 08:55:28,927 INFO org.apache.hama.bsp.JobInProgress: num
BSPTasks: 119
       2013-09-05 08:55:28,968 INFO org.apache.hama.bsp.JobInProgress: Job
is initialized.
       2013-09-05 08:55:28,969 ERROR
org.apache.hama.bsp.SimpleTaskScheduler: Could not schedule all tasks!
       2013-09-05 08:55:28,970 ERROR
org.apache.hama.bsp.SimpleTaskScheduler: Scheduling of job Weiborank could
not be done successfully. Killing it!



       Meanwhile,I found some exceptions in the groom's log.
       2013-09-05 15:48:06,326 ERROR
org.apache.zookeeper.ClientCnxnSocketNIO: Unable to open socket to
datanode1/192.168.1.201:21810
2013-09-05 15:48:07,343 ERROR org.apache.zookeeper.ClientCnxnSocketNIO:
Unable to open socket to datanode3/192.168.1.203:21810
2013-09-05 15:48:08,572 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: namenode/192.168.1.200:40000. Already tried 0 time(s); retry
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
SECONDS)
2013-09-05 15:48:08,663 ERROR org.apache.zookeeper.ClientCnxnSocketNIO:
Unable to open socket to namenode/192.168.1.200:21810
2013-09-05 15:48:08,958 ERROR org.apache.zookeeper.ClientCnxnSocketNIO:
Unable to open socket to datanode2/192.168.1.202:21810
2013-09-05 15:48:09,573 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: namenode/192.168.1.200:40000. Already tried 1 time(s); retry
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
SECONDS)
2013-09-05 15:48:09,914 ERROR org.apache.zookeeper.ClientCnxnSocketNIO:
Unable to open socket to datanode4/192.168.1.204:21810
2013-09-05 15:48:10,106 ERROR org.apache.zookeeper.ClientCnxnSocketNIO:
Unable to open socket to datanode1/192.168.1.201:21810
2013-09-05 15:48:10,573 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: namenode/192.168.1.200:40000. Already tried 2 time(s); retry
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
SECONDS)
2013-09-05 15:48:11,086 ERROR org.apache.zookeeper.ClientCnxnSocketNIO:
Unable to open socket to datanode3/192.168.1.203:21810
2013-09-05 15:48:11,574 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: namenode/192.168.1.200:40000. Already tried 3 time(s); retry
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
SECONDS)
2013-09-05 15:48:12,329 ERROR org.apache.zookeeper.ClientCnxnSocketNIO:
Unable to open socket to namenode/192.168.1.200:21810
2013-09-05 15:48:12,574 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: namenode/192.168.1.200:40000. Already tried 4 time(s); retry
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
SECONDS)
2013-09-05 15:48:12,869 ERROR org.apache.zookeeper.ClientCnxnSocketNIO:
Unable to open socket to datanode2/192.168.1.202:21810
2013-09-05 15:48:13,575 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: namenode/192.168.1.200:40000. Already tried 5 time(s); retry
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
SECONDS)
2013-09-05 15:48:13,727 ERROR org.apache.zookeeper.ClientCnxnSocketNIO:
Unable to open socket to datanode4/192.168.1.204:21810
2013-09-05 15:48:13,855 ERROR org.apache.zookeeper.ClientCnxnSocketNIO:
Unable to open socket to datanode1/192.168.1.201:21810
2013-09-05 15:48:14,334 ERROR org.apache.zookeeper.ClientCnxnSocketNIO:
Unable to open socket to datanode3/192.168.1.203:21810
2013-09-05 15:48:14,575 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: namenode/192.168.1.200:40000. Already tried 6 time(s); retry
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
SECONDS)
2013-09-05 15:48:15,576 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: namenode/192.168.1.200:40000. Already tried 7 time(s); retry
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
SECONDS)
2013-09-05 15:48:15,830 ERROR org.apache.zookeeper.ClientCnxnSocketNIO:
Unable to open socket to namenode/192.168.1.200:21810
2013-09-05 15:48:16,242 ERROR org.apache.zookeeper.ClientCnxnSocketNIO:
Unable to open socket to datanode2/192.168.1.202:21810
2013-09-05 15:48:16,576 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: namenode/192.168.1.200:40000. Already tried 8 time(s); retry
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
SECONDS)
2013-09-05 15:48:16,805 ERROR org.apache.zookeeper.ClientCnxnSocketNIO:
Unable to open socket to datanode4/192.168.1.204:21810
2013-09-05 15:48:17,233 ERROR org.apache.zookeeper.ClientCnxnSocketNIO:
Unable to open socket to datanode1/192.168.1.201:21810
2013-09-05 15:48:17,540 ERROR org.apache.zookeeper.ClientCnxnSocketNIO:
Unable to open socket to datanode3/192.168.1.203:21810
2013-09-05 15:48:17,577 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: namenode/192.168.1.200:40000. Already tried 9 time(s); retry
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
SECONDS)

at org.apache.hadoop.ipc.Client.getConnection(Client.java:1243)
at org.apache.hadoop.ipc.Client.call(Client.java:1087)
... 6 more2013-09-05 15:48:17,577 ERROR org.apache.hama.bsp.GroomServer:
Fail to communicate with BSPMaster for reporting.
java.io.IOException: Call to namenode/192.168.1.200:40000 failed on local
exception: java.net.SocketException: Network is unreachable
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
at org.apache.hadoop.ipc.Client.call(Client.java:1112)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy5.report(Unknown Source)
at org.apache.hama.bsp.GromServer.doReport(GroomServer.java:654)
at org.apache.hama.bsp.GrooomServer.offerService(GroomServer.java:525)
at org.apache.hama.bsp.GroomServer.run(GroomServer.java:876)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.net.SocketException: Network is unreachable
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:465)
at sun.nio.ch.Net.connect(Net.java:457)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:639)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:453)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:579)
at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:202)


2013/9/5 Edward J. Yoon <[email protected]>

> Hi,
>
> > 13/09/05 08:55:13 INFO bsp.BSPJobClient: Running job:
> job_201309041029_0025
> > 13/09/05 08:55:13 INFO bsp.BSPJobClient: Job failed.
>
> Partitioning job has failed. You will be able to find why job failed,
> by checking the tasklogs in logs/tasklogs/job_ID directory.
>
> Can you also attached the task logs?
>
> On Thu, Sep 5, 2013 at 10:15 AM, 邓凯 <[email protected]> wrote:
> > Sorry,I didn't complete the mail but sent it.
> > Here is the Partitioner:
> >        public static class WeiboRankPartioner extends
> > HashPartitioner<LongWritable,Text>{
> >
> > @Override
> > public int getPartition(LongWritable key, Text value, int numTasks) {
> > String[] keyvaluePair=value.toString().split("\t");
> > System.out.println(keyvaluePair[0]+" "+keyvaluePair.length);
> > return Math.abs(keyvaluePair[0].hashCode() % numTasks);
> > }
> >  }
> > And the following is the properties:
> >                 GraphJob weiboJob = new GraphJob(conf, WeiboRank.class);
> > weiboJob.setJobName("Weiborank");
> >
> > weiboJob.setVertexClass(WeiboRankVertex.class);
> > weiboJob.setInputPath(new Path(args[0]));
> > weiboJob.setOutputPath(new Path(args[1]));
> >
> > // set the defaults
> > weiboJob.setMaxIteration(30);
> > weiboJob.set("hama.weiborank.alpha", "0.85");
> > // reference vertices to itself, because we don't have a dangling node
> > // contribution here
> > weiboJob.set("hama.graph.self.ref", "true");
> > weiboJob.set("hama.graph.max.convergence.error", "0.001");
> >
> > if (args.length == 3) {
> > weiboJob.setNumBspTask(Integer.parseInt(args[2]));
> > }
> >
> > // error
> > weiboJob.setAggregatorClass(AverageAggregator.class);
> >
> > // Vertex reader
> > weiboJob.setVertexInputReaderClass(WeiboRankReader.class);
> >
> > weiboJob.setVertexIDClass(Text.class);
> > weiboJob.setVertexValueClass(DoubleWritable.class);
> > weiboJob.setEdgeValueClass(NullWritable.class);
> >
> > weiboJob.setInputFormat(TextInputFormat.class);
> >
> > weiboJob.setPartitioner(WeiboRankPartioner.class);
> > weiboJob.setOutputFormat(TextOutputFormat.class);
> > weiboJob.setOutputKeyClass(Text.class);
> > weiboJob.setOutputValueClass(DoubleWritable.class);
> >
> > That's all.Thank U.
> >
> >
> > 2013/9/5 邓凯 <[email protected]>
> >
> >> Hi,
> >>    Here is the output in the console,I can't find anymore in the
> >> HAMA_HOME/logs.
> >>
> >> hadoop@datanode4:/usr/local/hama$ bin/hama jar
> >> /home/datanode4/Desktop/WeiboRank.jar vertexresult weiborankresult
> >> 13/09/05 08:55:08 INFO mortbay.log: Logging to
> >> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> >> org.mortbay.log.Slf4jLog
> >> 13/09/05 08:55:11 INFO bsp.FileInputFormat: Total input paths to
> process :
> >> 1
> >> 13/09/05 08:55:11 WARN util.NativeCodeLoader: Unable to load
> native-hadoop
> >> library for your platform... using builtin-java classes where applicable
> >> 13/09/05 08:55:11 WARN snappy.LoadSnappy: Snappy native library not
> loaded
> >> 13/09/05 08:55:11 INFO bsp.FileInputFormat: Total input paths to
> process :
> >> 1
> >> 13/09/05 08:55:13 INFO bsp.BSPJobClient: Running job:
> job_201309041029_0025
> >> 13/09/05 08:55:13 INFO bsp.BSPJobClient: Job failed.
> >> 13/09/05 08:55:13 ERROR bsp.BSPJobClient: Error partitioning the input
> >> path.
> >> Exception in thread "main" java.io.IOException: Runtime partition failed
> >> for the job.
> >> at org.apache.hama.bsp.BSPJobClient.partition(BSPJobClient.java:465)
> >>  at
> >>
> org.apache.hama.bsp.BSPJobClient.submitJobInternal(BSPJobClient.java:333)
> >> at org.apache.hama.bsp.BSPJobClient.submitJob(BSPJobClient.java:293)
> >>  at org.apache.hama.bsp.BSPJob.submit(BSPJob.java:229)
> >> at org.apache.hama.graph.GraphJob.submit(GraphJob.java:203)
> >>  at org.apache.hama.bsp.BSPJob.waitForCompletion(BSPJob.java:236)
> >> at WeiboRank.main(WeiboRank.java:161)
> >>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> at
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >>  at
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >> at java.lang.reflect.Method.invoke(Method.java:606)
> >>  at org.apache.hama.util.RunJar.main(RunJar.java:146)
> >>
> >> The class extends Vertex just like the PageRank.
> >> Here is the InputReader:
> >> public static class WeiboRankReader
> >> extends
> >> VertexInputReader<LongWritable, Text, Text, NullWritable,
> DoubleWritable> {
> >>
> >> @Override
> >> public boolean parseVertex(LongWritable key, Text value,
> >> Vertex<Text, NullWritable, DoubleWritable> vertex)
> >>  throws Exception {
> >> String[] keyvaluePair = value.toString().split("\t");
> >>  if (keyvaluePair.length > 1) {
> >> vertex.setVertexID(new Text(keyvaluePair[0]));
> >> String edgeString = keyvaluePair[1];
> >>  if (!edgeString.equals("")) {
> >> String[] edges = edgeString.split(",");
> >> for (String e : edges) {
> >>  vertex.addEdge(new Edge<Text, NullWritable>(
> >> new Text(e), null));
> >> }
> >>  }
> >> }
> >> else
> >> vertex.setVertexID(new Text(keyvaluePair[0]));
> >>  return true;
> >> }
> >>
> >> }
> >>
> >>
> >>
> >> 2013/9/5 Edward J. Yoon <[email protected]>
> >>
> >> Can you provide full client console logs?
> >>>
> >>> On Wed, Sep 4, 2013 at 10:21 PM, 邓凯 <[email protected]> wrote:
> >>> > Hi,
> >>> >       I have a hadoop-1.1.2 cluster with one namenode and four
> >>> datanodes.I
> >>> > built the hama-0.6.2 on it.When I run the benchmarks and the examples
> >>> such
> >>> > as Pagerank it goes well.
> >>> >       But today when I ran my own code it met a exception.
> >>> >       The log says ERROR bsp.BSPJobClient:Error partitioning the
> input
> >>> path
> >>> >       The exception is Execption inthread "main" java.io.IOException
> :
> >>> > Runtime partition failed for the job.
> >>> >       According to this,I think there is someting wrong with my code.
> >>> >       My hama has 4 groomservers and task capacity is 12.
> >>> >       I use the command bin/hama jar Weiborank.jar vertexresult
> >>> > weiborankresult 12
> >>> >       The directory vertexresult has only one file in it.And I use
> the
> >>> > HashPartitioner.class as the partitioner.
> >>> >       I wonder whether it caused by the only one file in the input
> path
> >>> but
> >>> > there are 12 bsp tasks.If so,can I fix it by increasing the num of
> file
> >>> in
> >>> > the input path.
> >>> >       Thanks a lot.
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards, Edward J. Yoon
> >>> @eddieyoon
> >>>
> >>
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: Error partitioning the input path

Reply via email to