If you use 'hadoop jar' to invoke your application, this is the default behaviour. The reason it is done is that the utility supports use of jars-within-jar feature, that lets one pack additional dependency jars into an application as a lib/ subdirectory under the root of the main jar.
It is not a configurable behaviour presently, so given your inodes issue, you may want to either use the jars-within-jar feature, which does not produce massive amounts of .class files cause of use of packed dependent jars within the jar's lib/, or you may avoid use of 'hadoop jar' (RunJar utility) by invoking instead with the generated classpath: java -cp $(hadoop classpath):my-fat-jar-with-all-dependencies.jar your.app.mainClass On Sat, Oct 25, 2014 at 3:17 PM, Yang <[email protected]> wrote: > I thought this might be because that hadoop wants to pack everything > (including the -files dfs cache files) into one single jar, so I removed > the -files commands I have. > > but it still extracts the jar. this is rather confusing > > > > On Fri, Oct 24, 2014 at 11:51 AM, Yang <[email protected]> wrote: > >> I just noticed that when I run a "hadoop jar >> my-fat-jar-with-all-dependencies.jar" , it unjars the job jar in >> /tmp/hadoop-username/hadoop-unjar-xxxx/ and extracts all the classes in >> there. >> >> the fat jar is pretty big, so it took up a lot of space (particularly >> inodes ) and ran out of quota. >> >> I wonder why do we have to unjar these classes on the **client node** ? >> the jar won't even be accessed until on the compute nodes, right? >> > > -- Harsh J
