Hi Max, I am glad that it worked for you. I see that Mapreduce-121 is very old and if it works since 0.20.203, then we should probably not add in Oozie now.
Thanks, Virag On 8/6/13 6:21 PM, "Maxime Petazzoni" <[email protected]> wrote: >Ok, so I think I got it to work. I did have to make changes to >JavaActionExecutor though, because of this Hadoop bug. The same bug might >not be present in the most recent version of Hadoop (even though >MAPREDUCE-121 isn't marked as resolved) but it is definitely present in >0.20.2-cdh3u4. > >As I said the main problem comes from addToCache(), which adds fully >qualified URIs to the distributed cache, and this messes up the classpath >parsing later on when the job starts. So I changed addToCache() so that >it only adds URIs without scheme and authority/host. Obviously this puts >the limitation that everything must be on the same HDFS filesystem as the >application. > > uri = new URI(filePath); > URI baseUri = appPath.toUri(); > > /** > * Don't re-resolve cache URIs with the application base URI, > * otherwise the JARs added to the cache end up in the >distributed > * cache and in the mapred.job.classpath.files property with >colons > * in them, but colon is the classpath delimiter used by >Hadoop so > * the jobs ends up with the wrong classpath and can't run >correctly. > * > * if (uri.getScheme() == null) { > * String resolvedPath = uri.getPath(); > * if (!resolvedPath.startsWith("/")) { > * resolvedPath = baseUri.getPath() + "/" + >resolvedPath; > * } > * uri = new URI(baseUri.getScheme(), >baseUri.getAuthority(), resolvedPath, uri.getQuery(), uri.getFragment()); > * } > * > * Instead, simply resolve a potential relative path and >create a > * new URI without a scheme and host/authority. > */ > > String resolvedPath = uri.getPath(); > if (!resolvedPath.startsWith("/")) { > resolvedPath = baseUri.getPath() + "/" + resolvedPath; > } > uri = new URI(null, null, resolvedPath, uri.getQuery(), >uri.getFragment()); > > if (archive) { ... > >The other thing I had to do was to make sure these JARs were ahead of all >the Hadoop distribution JARs (in my situation I need to override Jackson >to version 1.9.9 and Hadoop comes with 1.5.2). By default JARs from the >user's classpath will come after the Hadoop distribution's JARs, so that >doesn't work. Thankfully since 0.20.203 (and 0.20.2-cdh3u4 contains that >backport) one can ask for the user's JARs to take precedence, so I added >to the end of createLauncherConf() the following: > > // have user and Oozie sharelibs placed in the distributed >cache > // take precedence over Hadoop libs > launcherJobConf.setUserClassesTakesPrecedence(true); > >This allowed me to get the application's lib JARs and Oozie sharelibs to >be correctly inserted into the job's classpath and take precedence over >all the Hadoop JARs. > >I'm not sure which parts of this you guys are interested in integrated >into Oozie, especially since this is all needed because of a bug in >Hadoop. Let me know, and we can work on putting an actual patch together >(or you can just use the snippets above, I don't care). > >Thanks for the help, >/Max >-- >Maxime Petazzoni >Sr. Platform Engineer >m 408.310.0595 >www.turn.com > >________________________________________ >From: Maxime Petazzoni [[email protected]] >Sent: Tuesday, August 06, 2013 3:24 PM >To: [email protected] >Subject: RE: Classpath (and extra JARs) for Java actions > >Could this be related to >https://issues.apache.org/jira/browse/MAPREDUCE-121 ? > >I do see that mapred.job.classpath.files is colon-delimited (with ':') >but Oozie puts fully qualified HDFS paths in there (which contains colons >in hdfs://localhost:9000/path). I can confirm that only the paths that >don't have hdfs://localhost:9000/... at the beginning are correctly seen >in the launcher's classpath. > >I'll try to change JavaActionExecutor to make sure things added to the >distributed cache are not added through fully qualified URLs and see if >that works. > >/Max >-- >Maxime Petazzoni >Sr. Platform Engineer >m 408.310.0595 >www.turn.com > >________________________________________ >From: Maxime Petazzoni [[email protected]] >Sent: Tuesday, August 06, 2013 2:31 PM >To: [email protected] >Subject: RE: Classpath (and extra JARs) for Java actions > >The Hadoop job config correctly shows all my JARs both in >mapred.job.classpath.files and mapred.cache.files, which indicates that >Oozie knows they are there and did something with it. Yet the classpath >listed by the launcher doesn't show these JARs and my job fails. > >It seems cleaner to me to use the shared libs, and I have them deployed >on HDFS and I see them in the listed classpath (apparently through >distcache, if I read the path correctly). But I don't see any of the >application's lib JAR files on the classpath. > >Any idea what's up? >/Max >-- >Maxime Petazzoni >Sr. Platform Engineer >m 408.310.0595 >www.turn.com > >________________________________________ >From: Virag Kothari [[email protected]] >Sent: Tuesday, August 06, 2013 2:20 PM >To: [email protected] >Subject: Re: Classpath (and extra JARs) for Java actions > >Hi Max, > >Jars in application lib should be available for all actions. The >documentation might be incorrect. >Did you check your hadoop job config to see which jars are added to >classpath? > >Following is the precedence order: 1) Application lib 2) oozie.libpath 3) >oozie.use.system.libpath. >Even though the priority is defined, it is recommended to use only one of >the ways at a time. >Also the switch to no launcher jar is not mandatory and governed by >'oozie.action.ship.launcher.jar' which is >set to true for 4.x. So you are not forced (although recommended) to use >shared lib. > >Thanks, >Virag > > > >On 8/6/13 2:01 PM, "Maxime Petazzoni" <[email protected]> wrote: > >>Hi all, >> >>My Java action needs some extra JAR files. If I understand the >>documentation (and my testing) correctly, the JARs I placed in the lib/ >>folder in my application directory on HDFS are only added to the >>classpath of MapReduce and Pig actions (why not all??). >> >>In the past I used oozie.libpath and that worked pretty well, but now in >>Oozie 4.x with the switch to no launcher jar and the need for the >>sharelibs to be on HDFS, I set oozie.use.system.libpath, which >>apparently doesn't play well when oozie.libpath is also set >>(JARs from oozie.libpath seems to be ignored, but interestingly not >>other file types like text/config files?). >> >>What's the recommended way of having extra JARs for Java actions with >>Oozie? What combination of oozie.libpath, oozie.use.system.libpath >>should I use? >> >>Any help greatly appreciated! >> >>Thanks in advance, >>/Max >>-- >>Maxime Petazzoni >>Sr. Platform Engineer >>m 408.310.0595 >>www.turn.com >
