Hello,

To give you a bit of context, I wrote a java library that aims to provide an 
easy way to coordinate multiple MR jobs and execute them with a single jar 
submission. The final result is a "fat jar” (build using the maven assembly 
plugin) that contains the different Mapper and Reducer classes and a Main class 
that has the logic to submit the different jobs to the cluster.

To accomplish this, the Main relies on some text files (packaged in the jar) to 
be present. Those files are not needed by the MR jobs themselves, it’s some 
kind of configuration for the Main to know how it should schedule the different 
MR jobs. 

The jar is executed like that:
hadoop jar the_jar_file.jar <args>

It has been used in production for a long time now but recently we decided to 
upgrade to hadoop 2.6 (we were using 0.20). All our jobs packaged like that are 
failing because the Main cannot locate the text files in the classpath.

I did a bit of debugging by replacing the Main with a piece of code that print 
the content of the classpath. When running the jar with:
java -jar the_jar_file.jar <args>

I can see the text files in the list. But when I run the same jar with:
hadoop jar the_jar_file.jar <args>

The text files are missing. I assume that something changed in the way the 
hadoop jar command read the jar and build the classpath. I found someone 
complaining about the same issue on stakoverflow 
(http://stackoverflow.com/questions/31670390/accessing-jar-resource-when-run-in-hadoop)
 but nobody replied.

I would like to be able to keep the same mechanism (keep those conf files in 
the jar and access them at runtime from the classpath), maybe their is an 
options to alter the way the jar command behave? Can someone point me to the 
source code of the jar command?

Thanks!

Reply via email to