Bertrand, Yes this is an unfortunate edge case. Though, this is fixed in the trunk/2.x client rewrite and tracked as a test now by https://issues.apache.org/jira/browse/MAPREDUCE-2384.
On Fri, Oct 5, 2012 at 10:28 PM, Bertrand Dechoux <[email protected]> wrote: > Hi, > > I am launching my job using the command line and I observed that when the > provided input path do not match any files, the jar in the staging > repository is not removed. > It is removed on job termination (success or failure) but here the job isn't > even really started so it may be an edge case. > Has anyone seen the same behaviour? (I am using 1.0.3) > > Here is an extract of the stack trace with hadoop related classes. > >> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path >> does not exist: [removed] >> at >> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235) >> at >> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) >> at >> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:902) >> at >> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:919) >> at >> org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170) >> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838) >> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) >> at >> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791) >> at org.apache.hadoop.mapreduce.Job.submit(Job.java:465) >> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:494) > > > Second question is a bit related because one of its consequence would > nullify the impact of the above 'bug'. > Is it possible to set directly the main job jar as a jar already inside > HDFS? > From what I know, the configuration points to a local jar archive which is > uploaded each time to the staging repository. > > The same question was asked in the jira but without clear resolution. > https://issues.apache.org/jira/browse/MAPREDUCE-236 > > My question might be related to > https://issues.apache.org/jira/browse/MAPREDUCE-4408 > which is resolved for next version. But it seems to be only about uberjar > and I am using a standard jar. > If it works with a hdfs location, what are the details? Won't it be cleaned > during job termination? Why not? Will it also be setup within the > distributed cache? > > Regards > > Bertrand > > PS : I know there are others solutions to my problem. I will look at Oozie. > And worst case, I can create a FileSystem instance myself to check whether > the job should be really launched or not. Both could work but both seem > overkill in my context. -- Harsh J
