Hi,
I want to run a remote mapreduce job. So, I have created programatically
a job [1], but when I submit it remotely, I get the error [2].
First, I have thought that it was a security issue because the client
username is |xeon| and the remote usernomu in |ubuntu|, but I have
noticed that the |temp| dirs with the username were created [3].
Now, I really don’t know why I get this error. Any help for this? Is
there a good tutorial that explains how to submit a remote job, and
configure Yarn Mapreduce to accept remote jobs?
[1]
|Configuration conf = job.getConfiguration();
// this should be like defined in your yarn-site.xml
conf.set("yarn.resourcemanager.address", host + ":" +
Util.yarn_resourcemanager_address_port);
// framework is now "yarn", should be defined like this in
mapred-site.xml
conf.set("mapreduce.framework.name", "yarn");
conf.set("hadoop.job.ugi", "ubuntu");
conf.set("yarn.application.classpath ",
"/home/ubuntu/Programs/hadoop-2.6.0/etc/hadoop," +
"/home/ubuntu/Programs/hadoop-2.6.0/etc/hadoop," +
"/home/ubuntu/Programs/hadoop-2.6.0/etc/hadoop,"+
"/home/ubuntu/Programs/hadoop-2.6.0/share/hadoop/common/lib/*,"
+
"/home/ubuntu/Programs/hadoop-2.6.0/share/hadoop/common/*," +
"/home/ubuntu/Programs/hadoop-2.6.0/share/hadoop/hdfs," +
"/home/ubuntu/Programs/hadoop-2.6.0/share/hadoop/hdfs/lib/*," +
"/home/ubuntu/Programs/hadoop-2.6.0/share/hadoop/hdfs/*," +
"/home/ubuntu/Programs/hadoop-2.6.0/share/hadoop/yarn/lib/*," +
"/home/ubuntu/Programs/hadoop-2.6.0/share/hadoop/yarn/*," +
"/home/ubuntu/Programs/hadoop-2.6.0/share/hadoop/mapreduce/lib/*," +
"/home/ubuntu/Programs/hadoop-2.6.0/share/hadoop/mapreduce/*," +
"/contrib/capacity-scheduler/*.jar," +
"/home/ubuntu/Programs/hadoop-2.6.0/share/hadoop/yarn/*," +
"/home/ubuntu/Programs/hadoop-2.6.0/share/hadoop/yarn/lib/*," +
"/home/ubuntu/Programs/hadoop-2.6.0/*");
// like defined in hdfs-site.xml
conf.set("fs.defaultFS", "hdfs://" + host + ":" + Util.fs_defaultFS);
for (Path inputPath : inputPaths){
try {
FileInputFormat.addInputPath(job, new
Path(inputPath.toString()));
} catch (IllegalArgumentException | IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
FileOutputFormat.setOutputPath(job, outputpath);
try {
job.waitForCompletion(true);
} catch (ClassNotFoundException | IOException | InterruptedException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
|
[2]
|Configuration: core-default.xml, core-site.xml
-hosts
===> ec2-52-25-10-73
2015-05-14 18:42:36,277 WARN [main] util.NativeCodeLoader
(NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable
XXXX: /input1
org.apache.hadoop.mapred.examples.MyHashPartitioner
---> Job 0: /input1, : temp-1431625359972
2015-05-14 18:42:49,391 INFO [pool-1-thread-1] client.RMProxy
(RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at
ec2-52-25-10-73/12.35.40.33:8040
2015-05-14 18:42:52,878 WARN [pool-1-thread-1] mapreduce.JobSubmitter
(JobSubmitter.java:copyAndConfigureFiles(261)) - No job jar file set. User
classes may not be found. See Job or Job#setJar(String).
2015-05-14 18:42:53,680 INFO [pool-1-thread-1] input.FileInputFormat
(FileInputFormat.java:listStatus(281)) - Total input paths to process : 1
2015-05-14 18:43:54,717 INFO [Thread-5] hdfs.DFSClient
(DFSOutputStream.java:createBlockOutputStream(1471)) - Exception in
createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while
waiting for channel to be ready for connect. ch :
java.nio.channels.SocketChannel[connection-pending remote=/172.31.17.45:50010]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
at
org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1610)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
2015-05-14 18:43:54,722 INFO [Thread-5] hdfs.DFSClient
(DFSOutputStream.java:nextBlockOutputStream(1364)) - Abandoning
BP-2006008085-172.31.17.45-1431620173976:blk_1073741829_1005
2015-05-14 18:43:54,934 INFO [Thread-5] hdfs.DFSClient
(DFSOutputStream.java:nextBlockOutputStream(1368)) - Excluding datanode
172.31.17.45:50010
2015-05-14 18:43:55,153 WARN [Thread-5] hdfs.DFSClient
(DFSOutputStream.java:run(691)) - DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/tmp/hadoop-yarn/staging/xeon/.staging/job_1431623302732_0004/job.split could
only be replicated to 0 nodes instead of minReplication (=1). There are 1
datanode(s) running and 1 node(s) are excluded in this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549)
2015-05-14 18:43:55,165 INFO [pool-1-thread-1] mapreduce.JobSubmitter
(JobSubmitter.java:submitJobInternal(545)) - Cleaning up the staging area
/tmp/hadoop-yarn/staging/xeon/.staging/job_1431623302732_0004
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/tmp/hadoop-yarn/staging/xeon/.staging/job_1431623302732_0004/job.split could
only be replicated to 0 nodes instead of minReplication (=1). There are 1
datanode(s) running and 1 node(s) are excluded in this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
|
[3]
|$ ~/Programs/hadoop-wordcount-coc$ hdfs dfs -ls
/tmp/hadoop-yarn/staging/xeon/.staging
Found 2 items
drwx------ - xeon supergroup 0 2015-05-14 17:09
/tmp/hadoop-yarn/staging/xeon/.staging/job_1431623302732_0001
drwx------ - xeon supergroup 0 2015-05-14 17:11
/tmp/hadoop-yarn/staging/xeon/.staging/job_1431623302732_0002
|
--
--