Hi Max,

I am glad that it worked for you. I see that Mapreduce-121 is very old and
if it works since 0.20.203, then we should probably not add in Oozie now.

Thanks,
Virag

On 8/6/13 6:21 PM, "Maxime Petazzoni" <[email protected]> wrote:

>Ok, so I think I got it to work. I did have to make changes to
>JavaActionExecutor though, because of this Hadoop bug. The same bug might
>not be present in the most recent version of Hadoop (even though
>MAPREDUCE-121 isn't marked as resolved) but it is definitely present in
>0.20.2-cdh3u4.
>
>As I said the main problem comes from addToCache(), which adds fully
>qualified URIs to the distributed cache, and this messes up the classpath
>parsing later on when the job starts. So I changed addToCache() so that
>it only adds URIs without scheme and authority/host. Obviously this puts
>the limitation that everything must be on the same HDFS filesystem as the
>application.
>
>            uri = new URI(filePath);
>            URI baseUri = appPath.toUri();
>
>            /**
>             * Don't re-resolve cache URIs with the application base URI,
>             * otherwise the JARs added to the cache end up in the
>distributed
>             * cache and in the mapred.job.classpath.files property with
>colons
>             * in them, but colon is the classpath delimiter used by
>Hadoop so
>             * the jobs ends up with the wrong classpath and can't run
>correctly.
>             *
>             * if (uri.getScheme() == null) {
>             *     String resolvedPath = uri.getPath();
>             *     if (!resolvedPath.startsWith("/")) {
>             *         resolvedPath = baseUri.getPath() + "/" +
>resolvedPath;
>             *     }
>             *     uri = new URI(baseUri.getScheme(),
>baseUri.getAuthority(), resolvedPath, uri.getQuery(), uri.getFragment());
>             * }
>             *
>             * Instead, simply resolve a potential relative path and
>create a
>             * new URI without a scheme and host/authority.
>             */
>
>            String resolvedPath = uri.getPath();
>            if (!resolvedPath.startsWith("/")) {
>                resolvedPath = baseUri.getPath() + "/" + resolvedPath;
>            }
>            uri = new URI(null, null, resolvedPath, uri.getQuery(),
>uri.getFragment());
>
>            if (archive) { ...
>
>The other thing I had to do was to make sure these JARs were ahead of all
>the Hadoop distribution JARs (in my situation I need to override Jackson
>to version 1.9.9 and Hadoop comes with 1.5.2). By default JARs from the
>user's classpath will come after the Hadoop distribution's JARs, so that
>doesn't work. Thankfully since 0.20.203 (and 0.20.2-cdh3u4 contains that
>backport) one can ask for the user's JARs to take precedence, so I added
>to the end of createLauncherConf() the following:
>
>            // have user and Oozie sharelibs placed in the distributed
>cache
>            // take precedence over Hadoop libs
>            launcherJobConf.setUserClassesTakesPrecedence(true);
>
>This allowed me to get the application's lib JARs and Oozie sharelibs to
>be correctly inserted into the job's classpath and take precedence over
>all the Hadoop JARs.
>
>I'm not sure which parts of this you guys are interested in integrated
>into Oozie, especially since this is all needed because of a bug in
>Hadoop. Let me know, and we can work on putting an actual patch together
>(or you can just use the snippets above, I don't care).
>
>Thanks for the help,
>/Max
>--
>Maxime Petazzoni
>Sr. Platform Engineer
>m 408.310.0595
>www.turn.com
>
>________________________________________
>From: Maxime Petazzoni [[email protected]]
>Sent: Tuesday, August 06, 2013 3:24 PM
>To: [email protected]
>Subject: RE: Classpath (and extra JARs) for Java actions
>
>Could this be related to
>https://issues.apache.org/jira/browse/MAPREDUCE-121 ?
>
>I do see that mapred.job.classpath.files is colon-delimited (with ':')
>but Oozie puts fully qualified HDFS paths in there (which contains colons
>in hdfs://localhost:9000/path). I can confirm that only the paths that
>don't have hdfs://localhost:9000/... at the beginning are correctly seen
>in the launcher's classpath.
>
>I'll try to change JavaActionExecutor to make sure things added to the
>distributed cache are not added through fully qualified URLs and see if
>that works.
>
>/Max
>--
>Maxime Petazzoni
>Sr. Platform Engineer
>m 408.310.0595
>www.turn.com
>
>________________________________________
>From: Maxime Petazzoni [[email protected]]
>Sent: Tuesday, August 06, 2013 2:31 PM
>To: [email protected]
>Subject: RE: Classpath (and extra JARs) for Java actions
>
>The Hadoop job config correctly shows all my JARs both in
>mapred.job.classpath.files and mapred.cache.files, which indicates that
>Oozie knows they are there and did something with it. Yet the classpath
>listed by the launcher doesn't show these JARs and my job fails.
>
>It seems cleaner to me to use the shared libs, and I have them deployed
>on HDFS and I see them in the listed classpath (apparently through
>distcache, if I read the path correctly). But I don't see any of the
>application's lib JAR files on the classpath.
>
>Any idea what's up?
>/Max
>--
>Maxime Petazzoni
>Sr. Platform Engineer
>m 408.310.0595
>www.turn.com
>
>________________________________________
>From: Virag Kothari [[email protected]]
>Sent: Tuesday, August 06, 2013 2:20 PM
>To: [email protected]
>Subject: Re: Classpath (and extra JARs) for Java actions
>
>Hi Max,
>
>Jars in application lib should be available for all actions. The
>documentation might be incorrect.
>Did you check your hadoop job config to see which jars are added to
>classpath?
>
>Following is the precedence order: 1) Application lib 2) oozie.libpath 3)
>oozie.use.system.libpath.
>Even though the priority is defined, it is recommended to use only one of
>the ways at a time.
>Also the switch to no launcher jar is not mandatory and governed by
>'oozie.action.ship.launcher.jar' which is
>set to true for 4.x. So you are not forced (although recommended) to use
>shared lib.
>
>Thanks,
>Virag
>
>
>
>On 8/6/13 2:01 PM, "Maxime Petazzoni" <[email protected]> wrote:
>
>>Hi all,
>>
>>My Java action needs some extra JAR files. If I understand the
>>documentation (and my testing) correctly, the JARs I placed in the lib/
>>folder in my application directory on HDFS are only added to the
>>classpath of MapReduce and Pig actions (why not all??).
>>
>>In the past I used oozie.libpath and that worked pretty well, but now in
>>Oozie 4.x with the switch to no launcher jar and the need for the
>>sharelibs to be on HDFS, I set oozie.use.system.libpath, which
>>apparently doesn't play well when oozie.libpath is also set
>>(JARs from oozie.libpath seems to be ignored, but interestingly not
>>other file types like text/config files?).
>>
>>What's the recommended way of having extra JARs for Java actions with
>>Oozie? What combination of oozie.libpath, oozie.use.system.libpath
>>should I use?
>>
>>Any help greatly appreciated!
>>
>>Thanks in advance,
>>/Max
>>--
>>Maxime Petazzoni
>>Sr. Platform Engineer
>>m 408.310.0595
>>www.turn.com
>

Reply via email to