Ok, I guess I'm just not used to these sorts of situations where the dependencies get so hairy
1) for a simply UDF, what dependencies are these that need to be included? 2) Is there a semi-easy way to clean this up? Thanks for your patience. I really am new to the whole dependencies game 2010/12/29 Dmitriy Ryaboy <[email protected]> > All the dependencies have to be on the classpath, including the > dependencies' dependencies... > > D > > > On Wed, Dec 29, 2010 at 3:12 PM, Jonathan Coveney <[email protected] > >wrote: > > > Also, just in general, does EVERY UDF we want to load have to be added to > > the classpath when you call pig? And just the .jar/.py file, or more than > > that? > > > > 2010/12/29 Jonathan Coveney <[email protected]> > > > > > Haha gotcha, I am not the greatest at all this package management. I > > think > > > we are getting close though... I added jython.jar, as well as my > test.py > > > file, and here is what I got when I ran it > > > > > > *sys-package-mgr*: processing new jar, > '/home/jcoveney/pig-0.8.0/pig.jar' > > > *sys-package-mgr*: processing new jar, '/home/jcoveney/udfs/test.jar' > > > 2010-12-29 17:56:47,118 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > > ERROR 1070: Could not resolve compress.compressuid using imports: [, > > > org.apache.pig.builtin., org.apache.pig.impl.builtin.] > > > > > > (I got the java thing because I ported my UDF to java to see if it > would > > be > > > any easier...) > > > > > > Here is the command I used to run it > > > java -cp $PIGPATH/pig.jar:$PIG_CLASSPATH org.apache.pig.Main -x local > > > udftest.pig > > > > > > $PIG_CLASSPATH = > > > > > > /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/jython.jar:/home/jcoveney/udfs/test.jar:/home/jcoveney/udfs/test.py > > > > > > Now, whether I use the python version or the java version, I get an > error > > > (well, the first one only applies to the python) > > > > > > init: Bootstrapping class not in Py.BOOTSTRAP_TYPES[class=class > > > org.python.core.PyStringMap] > > > 2010-12-29 18:03:02,967 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > > ERROR 1070: Could not resolve test.test using imports: [, > > > org.apache.pig.builtin., org.apache.pig.impl.builtin.] > > > > > > Any ideas? I followed the UDF manual, but perhaps my naming or > something > > is > > > off? I have no idea. Would love any help you can throw at me... > > > > > > 2010/12/29 [email protected] <[email protected]> > > > > > >> I think you took Dmitriy a bit to litterally ;) > > >> > > >> you need to put the actual filenames of the jars into PIG_CLASSPATH. > > >> If /home/jcoveney/usefulpig/conf:/home/jcoveney/jython > > >> is the directory that contains jython.jar (used purely as an example, > > I'm > > >> not certain what the actualy jar name is) then your PIG_CLASSPATH > should > > >> echo to: > > >> > > >> /home/jcoveney/jython/jython.jar > > >> > > >> plus whatever other jars you want to include. > > >> > > >> 2010/12/29 Jonathan Coveney <[email protected]> > > >> > > >> > Wait, ignore that error, that was the wrong one. > > >> > > > >> > This is it: > > >> > > > >> > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled > > internal > > >> > error. org/python/util/PythonInterpreter > > >> > > > >> > (I had set the classpath incorrectly, to *.* not ***) > > >> > > > >> > 2010/12/29 Jonathan Coveney <[email protected]> > > >> > > > >> > > echo $PIG_CLASSPATH > > >> > > /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/*** > > >> > > > > >> > > same error > > >> > > > > >> > > 2010-12-29 16:59:29,862 [main] ERROR > > org.apache.pig.tools.grunt.Grunt > > >> - > > >> > > ERROR 2998: Unhandled internal error. Could not initialize class > > >> > > org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter > > >> > > > > >> > > :S > > >> > > > > >> > > I really love that UDF's can be written in python...thanks for > > helping > > >> me > > >> > > try to get there. > > >> > > > > >> > > 2010/12/29 Dmitriy Ryaboy <[email protected]> > > >> > > > > >> > > You need to set the classpath to include the literal jar strings, > > not > > >> > just > > >> > >> the directory that contains them. > > >> > >> Try, /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/*** > > >> > >> > > >> > >> D > > >> > >> > > >> > >> On Wed, Dec 29, 2010 at 11:32 AM, Jonathan Coveney < > > >> [email protected] > > >> > >> >wrote: > > >> > >> > > >> > >> > Ok, strangely enough, it won't run locally either... it sees > the > > >> file, > > >> > >> but > > >> > >> > it's giving me an interpreter not found error, so it must be > > >> something > > >> > >> > else. > > >> > >> > > > >> > >> > PIG_CLASSPATH is equal > > >> > >> > to /home/jcoveney/usefulpig/conf:/home/jcoveney/jython > > >> > >> > and here is my test script > > >> > >> > > > >> > >> > register '/home/jcoveney/udfs/pytest.py' using jython as comp; > > >> > >> > > > >> > >> > the_in = LOAD 'input.txt' AS (thing:chararray); > > >> > >> > the_out = FOREACH the_out GENERATE comp.computation(thing) > > >> > >> > DUMP theout; > > >> > >> > > > >> > >> > but I don't think it's getting that far... it's still giving me > > the > > >> > same > > >> > >> > error. I'm just running it "pig -x local script.pig" > > >> > >> > > > >> > >> > > > >> > >> > 2010/12/29 Jonathan Coveney <[email protected]> > > >> > >> > > > >> > >> > > Ah, that might be it... my computer has it and I have it on > my > > >> path, > > >> > >> > > however, I do not know if the cluster has it... definitely > > >> something > > >> > >> to > > >> > >> > look > > >> > >> > > into. thanks. > > >> > >> > > > > >> > >> > > > > >> > >> > > 2010/12/29 [email protected] <[email protected]> > > >> > >> > > > > >> > >> > >> try adding the full path to the jar via PIG_CLASSPATH like > so: > > >> > >> > >> > > >> > >> > >> export PIG_CLASSPATH=/path/to/jython.jar > > >> > >> > >> > > >> > >> > >> then run pig. Also, I assume your doing your testing on a > > local > > >> > >> machine? > > >> > >> > >> if > > >> > >> > >> it's on a cluster, you need to make sure jython is on all > the > > >> > worker > > >> > >> > nodes > > >> > >> > >> and classpath is setup properly on all of them as well. > > >> > >> > >> > > >> > >> > >> On Wed, Dec 29, 2010 at 9:57 AM, Jonathan Coveney < > > >> > >> [email protected] > > >> > >> > >> >wrote: > > >> > >> > >> > > >> > >> > >> > I do have Jython installed and on PATH, but maybe I didn't > > >> > include > > >> > >> it > > >> > >> > in > > >> > >> > >> > the > > >> > >> > >> > right way? Where does it need to be? > > >> > >> > >> > > > >> > >> > >> > 2010/12/29 [email protected] <[email protected]> > > >> > >> > >> > > > >> > >> > >> > > Do you have Jython on your classpath? Currently Jython > > isn't > > >> > >> > >> distributed > > >> > >> > >> > in > > >> > >> > >> > > the 0.8.0 release tarball. > > >> > >> > >> > > > > >> > >> > >> > > On Mon, Dec 27, 2010 at 7:18 PM, Jonathan Coveney < > > >> > >> > [email protected] > > >> > >> > >> > > >wrote: > > >> > >> > >> > > > > >> > >> > >> > > > Oh and just to be sure, I have tried > > >> > >> > >> > > > @outputSchema("word:chararray") > > >> > >> > >> > > > @outputSchema("x:{t:(word:chararray)}") > > >> > >> > >> > > > as well (the former of which seems to be the "right" > > one, > > >> > >> whenever > > >> > >> > I > > >> > >> > >> > can > > >> > >> > >> > > > figure out what is wrong) > > >> > >> > >> > > > > > >> > >> > >> > > > I've tested my code separately in python and it is > > fine... > > >> > >> > >> > > > > > >> > >> > >> > > > 2010/12/28 Jonathan Coveney <[email protected]> > > >> > >> > >> > > > > > >> > >> > >> > > > > Aniket, I appreciate you taking a look at this. In > > >> general, > > >> > I > > >> > >> > >> found > > >> > >> > >> > the > > >> > >> > >> > > > > documentation around outputSchema pretty > confusing... > > >> for > > >> > >> > example, > > >> > >> > >> in > > >> > >> > >> > > > this > > >> > >> > >> > > > > example > > >> > >> > >> > > > > > > >> > >> > >> > > > > @outputSchema("x:{t:(word:chararray)}") > > >> > >> > >> > > > > def helloworld(): > > >> > >> > >> > > > > return ('Hello, World') > > >> > >> > >> > > > > > > >> > >> > >> > > > > > > >> > >> > >> > > > > Then, in the sample script below that, you have > > >> > >> > >> > > > > > > >> > >> > >> > > > > @outputSchema("t:(numformat:chararray)") > > >> > >> > >> > > > > def commaFormat(num): > > >> > >> > >> > > > > return '{:,}'.format(num) > > >> > >> > >> > > > > > > >> > >> > >> > > > > In this case, you have lost the x:{} (which makes > more > > >> > sense > > >> > >> to > > >> > >> > >> me. > > >> > >> > >> > > > > > > >> > >> > >> > > > > Perhaps this is because the latter function is meant > > to > > >> > >> operate > > >> > >> > on > > >> > >> > >> an > > >> > >> > >> > > > input > > >> > >> > >> > > > > and return a type (t), whereas the hello world > > function > > >> > >> should > > >> > >> > be > > >> > >> > >> > able > > >> > >> > >> > > to > > >> > >> > >> > > > > stand alone, and thus, has to return a bag? Not > > sure... > > >> > >> > >> > > > > > > >> > >> > >> > > > > Besides that, though, I changed my code per your > > >> suggestion > > >> > >> and > > >> > >> > >> tried > > >> > >> > >> > > > > > > >> > >> > >> > > > > @outputSchema("t:(word:chararray)") > > >> > >> > >> > > > > > > >> > >> > >> > > > > and still got the error. > > >> > >> > >> > > > > > > >> > >> > >> > > > > As a note, do I need to import anything in the > python > > >> > script > > >> > >> for > > >> > >> > >> > > > > outputSchema to work, or should it be fine since pig > > is > > >> > >> grabbing > > >> > >> > >> it? > > >> > >> > >> > > > > > > >> > >> > >> > > > > Once again, I really appreciate your help in the > > matter. > > >> I > > >> > >> feel > > >> > >> > >> > having > > >> > >> > >> > > > > people who weren't intimately related to the project > > >> have a > > >> > >> go > > >> > >> > at > > >> > >> > >> it > > >> > >> > >> > is > > >> > >> > >> > > > how > > >> > >> > >> > > > > you make it ultimately more usable and useful...but > > you > > >> > have > > >> > >> to > > >> > >> > >> > answer > > >> > >> > >> > > > some > > >> > >> > >> > > > > annoying questions on the way :P > > >> > >> > >> > > > > > > >> > >> > >> > > > > Thanks again. > > >> > >> > >> > > > > > > >> > >> > >> > > > > 2010/12/28 Aniket Mokashi <[email protected]> > > >> > >> > >> > > > > > > >> > >> > >> > > > > I think decorator used here is incorrect. > > >> > >> > >> > > > >> In general, "output:chararray" needs to be > > >> > >> > >> schema-string-compatible. > > >> > >> > >> > > > Also, > > >> > >> > >> > > > >> you are using "outputSchemaFunction", which is used > > in > > >> > case > > >> > >> you > > >> > >> > >> want > > >> > >> > >> > > to > > >> > >> > >> > > > >> write a udf that has output schema dependent on > input > > >> > schema > > >> > >> > (ęg > > >> > >> > >> > > > -square) > > >> > >> > >> > > > >> and this should have a function with decorator > > >> > >> "schemaFunction" > > >> > >> > >> > (named > > >> > >> > >> > > > >> "output" in your case). I think using > "outputSchema" > > >> > >> decorator > > >> > >> > >> would > > >> > >> > >> > > fix > > >> > >> > >> > > > >> the problem here. > > >> > >> > >> > > > >> > > >> > >> > >> > > > >> More details can be found at- > > >> > >> > >> > > > >> > > http://wiki.apache.org/pig/UDFsUsingScriptingLanguages > > >> > >> > >> > > > >> > > >> > >> > >> > > > >> Thanks, > > >> > >> > >> > > > >> Aniket > > >> > >> > >> > > > >> > > >> > >> > >> > > > >> On Mon, December 27, 2010 4:30 pm, Jonathan Coveney > > >> wrote: > > >> > >> > >> > > > >> > so I have module.py, and I want to be able to use > > it > > >> in > > >> > a > > >> > >> pig > > >> > >> > >> > > script. > > >> > >> > >> > > > It > > >> > >> > >> > > > >> > has no special imports or anything. I do have > > >> > >> > >> > > > >> > @outputSchemaFunction("output:chararray) > > >> > >> > >> > > > >> > > > >> > >> > >> > > > >> > > > >> > >> > >> > > > >> > In my pig script, I have this > > >> > >> > >> > > > >> > > > >> > >> > >> > > > >> > > > >> > >> > >> > > > >> > register '/my/udf/location/udf.py' using jython > as > > >> > myfunc; > > >> > >> > >> > > > >> > > > >> > >> > >> > > > >> > is there any reason why this wouldn't work? here > is > > >> the > > >> > >> error > > >> > >> > I > > >> > >> > >> > get: > > >> > >> > >> > > > >> > > > >> > >> > >> > > > >> > 2010-12-27 16:29:41,288 [main] ERROR > > >> > >> > >> > > org.apache.pig.tools.grunt.Grunt > > >> > >> > >> > > > - > > >> > >> > >> > > > >> > ERROR 2998: Unhandled internal error. > > >> > >> > >> > > > org/python/util/PythonInterpreter > > >> > >> > >> > > > >> > > > >> > >> > >> > > > >> > > > >> > >> > >> > > > >> > Not the most instructive error, but is there > > anything > > >> > more > > >> > >> I > > >> > >> > >> need > > >> > >> > >> > to > > >> > >> > >> > > > be > > >> > >> > >> > > > >> > doing to be able to use a python UDF? > > >> > >> > >> > > > >> > > > >> > >> > >> > > > >> > As an aside, are simply python UDF's as efficient > > as > > >> > Java > > >> > >> > ones? > > >> > >> > >> I > > >> > >> > >> > > like > > >> > >> > >> > > > >> > Python a lot and love the idea of being able to > UDF > > >> in > > >> > it, > > >> > >> > but > > >> > >> > >> can > > >> > >> > >> > > use > > >> > >> > >> > > > >> > java if necessary. > > >> > >> > >> > > > >> > > > >> > >> > >> > > > >> > > >> > >> > >> > > > >> > > >> > >> > >> > > > >> > > >> > >> > >> > > > > > > >> > >> > >> > > > > > >> > >> > >> > > > > >> > >> > >> > > > > >> > >> > >> > > > > >> > >> > >> > > -- > > >> > >> > >> > > http://about.me/soren/bio > > >> > >> > >> > > > > >> > >> > >> > > > >> > >> > >> > > >> > >> > >> > > >> > >> > >> > > >> > >> > >> -- > > >> > >> > >> http://about.me/soren/bio > > >> > >> > >> > > >> > >> > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > > > > >> > > > > >> > > > >> > > >> > > >> > > >> -- > > >> http://about.me/soren/bio > > >> > > > > > > > > >
