Take a look at the pig-withouthadoop target in the build.xml from your pig release. Usage of the target is documented here (for a different goal, although):
http://thedatachef.blogspot.com/2011/01/apache-pig-08-with-cloudera-cdh3.html Essentially, the target allows you to build pig without hadoop's JARs, but then you're responsible for supplying the JARs back in at runtime (via bin/pig modifications or some other method). Norbert On Wed, Oct 12, 2011 at 1:46 AM, Babak Farhang <[email protected]> wrote: > Thanks for chiming in Dmitriy. > > Yes, I think I have verified that the version that ends up getting > called in my udf is in fact the one bundled with hadoop; not the one I > bundled w/ my udf jar. As I understand it, this is a common problem > any "container" app such as hadoop (or say, tomcat) that loads > "external" user-defined classes must deal with. Usually, the "how" of > achieving the desired behavior involves setting up rules for which > classes are looked up first in the parent loader (and if not found > there then in the child loader) and which classes are first looked up > in reverse (child loader first, then in parent loader). The rules that > govern this behavior are called the "delegation model" for the [child] > classloader. I'm new to this hadoop/pig environment and would like to > learn what these rules are. > > Hope I've made my question clearer :-) > > Regards > -Babak > > On Tue, Oct 11, 2011 at 8:04 PM, Dmitriy Ryaboy <[email protected]> > wrote: > > I think the problem isn't the classloader -- it's the fact that both > jackson > > and joda are already bundled into the pig jar (presumably, different > > versions of both libraries than the ones you are using). You need to > either > > repackage pig to not bundle those libraries, or make your code work with > > Pig's versions of joda and jackson. > > > > D > > > > On Tue, Oct 11, 2011 at 7:39 PM, Babak Farhang <[email protected]> > wrote: > > > >> Reading my original post over again, I see that I should have been > >> clearer. I *am* including a copy of the versions of the jackson and > >> joda libs that I need in my udf jar file. These libs are included in > >> "exploded" form (i.e. not as embedded jars within the udf jar file, > >> but in unzipped form alongside my own class files). However, they > >> don't seem to get picked up by the hadoop/pig classloader. Am I doing > >> this all wrong? > >> > >> On Tue, Oct 11, 2011 at 4:39 PM, Babak Farhang <[email protected]> > wrote: > >> > Greetings everyone, > >> > > >> > My pig script contains a call to my custom udf and I seem to be > >> > running into a couple of classloader issues when running it. Below are > >> > the specifics (the call stack), but I have some beginner general > >> > questions regarding classloaders in pig: > >> > > >> > 1. Is there a way to configure the classloader used to load the udf > >> > class and its deps? (I see, for example, a setClassloader method in > >> > the impl PigContext class which is not directly exposed to the user) > >> > > >> > 2. What's delegation model does pig's udf classloader use when > >> > resolving classes (e.g. parent classloader first, then child--or more > >> > likely something a bit more complicated)? > >> > > >> > Any info/ideas you can share would be much appreciated. > >> > Thx! > >> > > >> > -Babak > >> > > >> > =============================== > >> > > >> > Now about the context/specifics of my error: > >> > > >> > My UDF uses the joda time and the jackson json libs (versions 2.0, and > >> > 1.8.6, respectively) which I package along with my UDF in the jar that > >> > ends up being registered in my pig script. Here are the call stacks: > >> > > >> > Joda-related: > >> > Exception in thread "main" java.io.IOException: Resource not found: > >> > "org/joda/time/tz/data/ZoneInfoMap" ClassLoader: > >> > sun.misc.Launcher$AppClassLoader@4aad3ba4 > >> > at > >> > org.joda.time.tz.ZoneInfoProvider.openResource(ZoneInfoProvider.java:211) > >> > at > >> org.joda.time.tz.ZoneInfoProvider.<init>(ZoneInfoProvider.java:123) > >> > at > >> org.joda.time.tz.ZoneInfoProvider.<init>(ZoneInfoProvider.java:82) > >> > at > >> org.joda.time.DateTimeZone.getDefaultProvider(DateTimeZone.java:462) > >> > at > org.joda.time.DateTimeZone.setProvider0(DateTimeZone.java:416) > >> > at org.joda.time.DateTimeZone.<clinit>(DateTimeZone.java:115) > >> > at > >> > org.joda.time.chrono.GregorianChronology.<clinit>(GregorianChronology.java:71) > >> > at > >> org.joda.time.chrono.ISOChronology.<clinit>(ISOChronology.java:66) > >> > at org.joda.time.base.BaseDateTime.<init>(BaseDateTime.java:97) > >> > at org.joda.time.DateTime.<init>(DateTime.java:193) > >> > at com.qf.util.time.GregorianDate.<init>(GregorianDate.java:46) > >> > at > >> > com.qf.timeseries.TimeSeriesBinner.initBinDate(TimeSeriesBinner.java:482) > >> > at > >> com.qf.timeseries.TimeSeriesBinner.toBinned(TimeSeriesBinner.java:368) > >> > at > >> com.qf.timeseries.TimeSeriesBinner.toBinned(TimeSeriesBinner.java:267) > >> > at > >> com.qf.timeseries.TimeSeriesBinner.toBinned(TimeSeriesBinner.java:186) > >> > at > >> com.qf.timeseries.TimeSeriesBinner.bin(TimeSeriesBinner.java:148) > >> > at > >> com.qf.timeseries.BinnedTimeSeries.newInstance(BinnedTimeSeries.java:69) > >> > at > >> com.qf.timeseries.BinnedTimeSeries.newInstance(BinnedTimeSeries.java:46) > >> > at > >> > com.qf.pig.udf.BinnedTargetEntityDelayCorrelationMatrices.exec(BinnedTargetEntityDelayCorrelationMatrices.java:171) > >> > at > >> > com.qf.pig.udf.BinnedTargetEntityDelayCorrelationMatrices.exec(BinnedTargetEntityDelayCorrelationMatrices.java:29) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:240) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:240) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:240) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:433) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:401) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:381) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:251) > >> > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) > >> > at > >> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:571) > >> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413) > >> > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > >> > at java.security.AccessController.doPrivileged(Native Method) > >> > at javax.security.auth.Subject.doAs(Subject.java:396) > >> > at > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > >> > at org.apache.hadoop.mapred.Child.main(Child.java:262) > >> > > >> > Jackson-related: > >> > > >> > 2011-10-11 15:37:30,435 FATAL org.apache.hadoop.mapred.Child: Error > >> > running child : java.lang.NoSuchFieldError: WRITE_NULL_MAP_VALUES <-- > >> > (Note: there's a way in Jackson to config this to be more lenient, but > >> > I don't think I should be mucking w/ pig/hadoop's jackson lib) > >> > at > >> com.pico.result.JSONFactory.getObjectMapper(JSONFactory.java:20) > >> > at > >> > com.qf.timeseries.TargetTimeSeriesInput.newInstance(TargetTimeSeriesInput.java:48) > >> > at > >> > com.qf.pig.udf.BtedcmClasspathImpl.getTargetSeries(BtedcmClasspathImpl.java:163) > >> > at > >> > com.qf.pig.udf.BinnedTargetEntityDelayCorrelationMatrices.exec(BinnedTargetEntityDelayCorrelationMatrices.java:210) > >> > at > >> > com.qf.pig.udf.BinnedTargetEntityDelayCorrelationMatrices.exec(BinnedTargetEntityDelayCorrelationMatrices.java:29) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:240) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:240) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:240) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:433) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:401) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:381) > >> > at > >> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:251) > >> > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) > >> > at > >> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:571) > >> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413) > >> > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > >> > at java.security.AccessController.doPrivileged(Native Method) > >> > at javax.security.auth.Subject.doAs(Subject.java:396) > >> > at > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > >> > at org.apache.hadoop.mapred.Child.main(Child.java:262) > >> > > >> > > >
