FWIW, I solved this by manually adding all necessary jars into the DistributedCache...ugly, but effective!
On Wed, Nov 26, 2014 at 12:29 PM, Mike Barretta <[email protected]> wrote: > Thank you for the quick reply. > > I am indeed using the Oozie workflow lib directory as described here: > http://oozie.apache.org/docs/3.3.2/WorkflowFunctionalSpec.html#a7_Workflow_Application_Deployment. > > > The primary job, which implements Tool, is able to run, it's just the jobs > launched by the doFn() which fail. Is there a step where I might need to > tell the Crunch pipeline about the jars loaded by Oozie? > > On Fri, Nov 21, 2014 at 5:27 PM, Micah Whitacre <[email protected]> > wrote: > >> The support of a lib folder inside of a jar is not necessarily guaranteed >> to be supported on all versions of Hadoop.[1] >> >> We typically go with the "uber" jar where we use maven-shade-plugin to >> actually explode the crunch dependencies and others into the assembly jar. >> Another approach since you are using Oozie is to include the jar in the >> workflow lib directory. That should put the jar on the classpath. The >> last approach is obviously to manually use DistributedCache yourself which >> will distribute it out to the cluster. >> >> [1] - >> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/ >> >> On Fri, Nov 21, 2014 at 4:15 PM, Mike Barretta <[email protected]> >> wrote: >> >>> All, >>> >>> I'm running an MRPipeline from crunch-core 0.11.0-hadoop2 on a CDH5.1 >>> cluster via oozie. While the main job runs okay, the doFn() it calls fails >>> due to the CNFE. The jar containing my classes does indeed contain >>> lib/crunch-core-0.11.0-hadoop2.jar. >>> >>> Does the crunch jar need to be added to the hadoop lib on all nodes? It >>> seems like that would/should be unnecessary. >>> >>> Thanks, >>> Mike >>> >> >> >
