I think you took Dmitriy a bit to litterally ;) you need to put the actual filenames of the jars into PIG_CLASSPATH. If /home/jcoveney/usefulpig/conf:/home/jcoveney/jython is the directory that contains jython.jar (used purely as an example, I'm not certain what the actualy jar name is) then your PIG_CLASSPATH should echo to:
/home/jcoveney/jython/jython.jar plus whatever other jars you want to include. 2010/12/29 Jonathan Coveney <[email protected]> > Wait, ignore that error, that was the wrong one. > > This is it: > > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal > error. org/python/util/PythonInterpreter > > (I had set the classpath incorrectly, to *.* not ***) > > 2010/12/29 Jonathan Coveney <[email protected]> > > > echo $PIG_CLASSPATH > > /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/*** > > > > same error > > > > 2010-12-29 16:59:29,862 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > ERROR 2998: Unhandled internal error. Could not initialize class > > org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter > > > > :S > > > > I really love that UDF's can be written in python...thanks for helping me > > try to get there. > > > > 2010/12/29 Dmitriy Ryaboy <[email protected]> > > > > You need to set the classpath to include the literal jar strings, not > just > >> the directory that contains them. > >> Try, /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/*** > >> > >> D > >> > >> On Wed, Dec 29, 2010 at 11:32 AM, Jonathan Coveney <[email protected] > >> >wrote: > >> > >> > Ok, strangely enough, it won't run locally either... it sees the file, > >> but > >> > it's giving me an interpreter not found error, so it must be something > >> > else. > >> > > >> > PIG_CLASSPATH is equal > >> > to /home/jcoveney/usefulpig/conf:/home/jcoveney/jython > >> > and here is my test script > >> > > >> > register '/home/jcoveney/udfs/pytest.py' using jython as comp; > >> > > >> > the_in = LOAD 'input.txt' AS (thing:chararray); > >> > the_out = FOREACH the_out GENERATE comp.computation(thing) > >> > DUMP theout; > >> > > >> > but I don't think it's getting that far... it's still giving me the > same > >> > error. I'm just running it "pig -x local script.pig" > >> > > >> > > >> > 2010/12/29 Jonathan Coveney <[email protected]> > >> > > >> > > Ah, that might be it... my computer has it and I have it on my path, > >> > > however, I do not know if the cluster has it... definitely something > >> to > >> > look > >> > > into. thanks. > >> > > > >> > > > >> > > 2010/12/29 [email protected] <[email protected]> > >> > > > >> > >> try adding the full path to the jar via PIG_CLASSPATH like so: > >> > >> > >> > >> export PIG_CLASSPATH=/path/to/jython.jar > >> > >> > >> > >> then run pig. Also, I assume your doing your testing on a local > >> machine? > >> > >> if > >> > >> it's on a cluster, you need to make sure jython is on all the > worker > >> > nodes > >> > >> and classpath is setup properly on all of them as well. > >> > >> > >> > >> On Wed, Dec 29, 2010 at 9:57 AM, Jonathan Coveney < > >> [email protected] > >> > >> >wrote: > >> > >> > >> > >> > I do have Jython installed and on PATH, but maybe I didn't > include > >> it > >> > in > >> > >> > the > >> > >> > right way? Where does it need to be? > >> > >> > > >> > >> > 2010/12/29 [email protected] <[email protected]> > >> > >> > > >> > >> > > Do you have Jython on your classpath? Currently Jython isn't > >> > >> distributed > >> > >> > in > >> > >> > > the 0.8.0 release tarball. > >> > >> > > > >> > >> > > On Mon, Dec 27, 2010 at 7:18 PM, Jonathan Coveney < > >> > [email protected] > >> > >> > > >wrote: > >> > >> > > > >> > >> > > > Oh and just to be sure, I have tried > >> > >> > > > @outputSchema("word:chararray") > >> > >> > > > @outputSchema("x:{t:(word:chararray)}") > >> > >> > > > as well (the former of which seems to be the "right" one, > >> whenever > >> > I > >> > >> > can > >> > >> > > > figure out what is wrong) > >> > >> > > > > >> > >> > > > I've tested my code separately in python and it is fine... > >> > >> > > > > >> > >> > > > 2010/12/28 Jonathan Coveney <[email protected]> > >> > >> > > > > >> > >> > > > > Aniket, I appreciate you taking a look at this. In general, > I > >> > >> found > >> > >> > the > >> > >> > > > > documentation around outputSchema pretty confusing... for > >> > example, > >> > >> in > >> > >> > > > this > >> > >> > > > > example > >> > >> > > > > > >> > >> > > > > @outputSchema("x:{t:(word:chararray)}") > >> > >> > > > > def helloworld(): > >> > >> > > > > return ('Hello, World') > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > Then, in the sample script below that, you have > >> > >> > > > > > >> > >> > > > > @outputSchema("t:(numformat:chararray)") > >> > >> > > > > def commaFormat(num): > >> > >> > > > > return '{:,}'.format(num) > >> > >> > > > > > >> > >> > > > > In this case, you have lost the x:{} (which makes more > sense > >> to > >> > >> me. > >> > >> > > > > > >> > >> > > > > Perhaps this is because the latter function is meant to > >> operate > >> > on > >> > >> an > >> > >> > > > input > >> > >> > > > > and return a type (t), whereas the hello world function > >> should > >> > be > >> > >> > able > >> > >> > > to > >> > >> > > > > stand alone, and thus, has to return a bag? Not sure... > >> > >> > > > > > >> > >> > > > > Besides that, though, I changed my code per your suggestion > >> and > >> > >> tried > >> > >> > > > > > >> > >> > > > > @outputSchema("t:(word:chararray)") > >> > >> > > > > > >> > >> > > > > and still got the error. > >> > >> > > > > > >> > >> > > > > As a note, do I need to import anything in the python > script > >> for > >> > >> > > > > outputSchema to work, or should it be fine since pig is > >> grabbing > >> > >> it? > >> > >> > > > > > >> > >> > > > > Once again, I really appreciate your help in the matter. I > >> feel > >> > >> > having > >> > >> > > > > people who weren't intimately related to the project have a > >> go > >> > at > >> > >> it > >> > >> > is > >> > >> > > > how > >> > >> > > > > you make it ultimately more usable and useful...but you > have > >> to > >> > >> > answer > >> > >> > > > some > >> > >> > > > > annoying questions on the way :P > >> > >> > > > > > >> > >> > > > > Thanks again. > >> > >> > > > > > >> > >> > > > > 2010/12/28 Aniket Mokashi <[email protected]> > >> > >> > > > > > >> > >> > > > > I think decorator used here is incorrect. > >> > >> > > > >> In general, "output:chararray" needs to be > >> > >> schema-string-compatible. > >> > >> > > > Also, > >> > >> > > > >> you are using "outputSchemaFunction", which is used in > case > >> you > >> > >> want > >> > >> > > to > >> > >> > > > >> write a udf that has output schema dependent on input > schema > >> > (ęg > >> > >> > > > -square) > >> > >> > > > >> and this should have a function with decorator > >> "schemaFunction" > >> > >> > (named > >> > >> > > > >> "output" in your case). I think using "outputSchema" > >> decorator > >> > >> would > >> > >> > > fix > >> > >> > > > >> the problem here. > >> > >> > > > >> > >> > >> > > > >> More details can be found at- > >> > >> > > > >> http://wiki.apache.org/pig/UDFsUsingScriptingLanguages > >> > >> > > > >> > >> > >> > > > >> Thanks, > >> > >> > > > >> Aniket > >> > >> > > > >> > >> > >> > > > >> On Mon, December 27, 2010 4:30 pm, Jonathan Coveney wrote: > >> > >> > > > >> > so I have module.py, and I want to be able to use it in > a > >> pig > >> > >> > > script. > >> > >> > > > It > >> > >> > > > >> > has no special imports or anything. I do have > >> > >> > > > >> > @outputSchemaFunction("output:chararray) > >> > >> > > > >> > > >> > >> > > > >> > > >> > >> > > > >> > In my pig script, I have this > >> > >> > > > >> > > >> > >> > > > >> > > >> > >> > > > >> > register '/my/udf/location/udf.py' using jython as > myfunc; > >> > >> > > > >> > > >> > >> > > > >> > is there any reason why this wouldn't work? here is the > >> error > >> > I > >> > >> > get: > >> > >> > > > >> > > >> > >> > > > >> > 2010-12-27 16:29:41,288 [main] ERROR > >> > >> > > org.apache.pig.tools.grunt.Grunt > >> > >> > > > - > >> > >> > > > >> > ERROR 2998: Unhandled internal error. > >> > >> > > > org/python/util/PythonInterpreter > >> > >> > > > >> > > >> > >> > > > >> > > >> > >> > > > >> > Not the most instructive error, but is there anything > more > >> I > >> > >> need > >> > >> > to > >> > >> > > > be > >> > >> > > > >> > doing to be able to use a python UDF? > >> > >> > > > >> > > >> > >> > > > >> > As an aside, are simply python UDF's as efficient as > Java > >> > ones? > >> > >> I > >> > >> > > like > >> > >> > > > >> > Python a lot and love the idea of being able to UDF in > it, > >> > but > >> > >> can > >> > >> > > use > >> > >> > > > >> > java if necessary. > >> > >> > > > >> > > >> > >> > > > >> > >> > >> > > > >> > >> > >> > > > >> > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > > >> > >> > > > >> > >> > > -- > >> > >> > > http://about.me/soren/bio > >> > >> > > > >> > >> > > >> > >> > >> > >> > >> > >> > >> > >> -- > >> > >> http://about.me/soren/bio > >> > >> > >> > > > >> > > > >> > > >> > > > > > -- http://about.me/soren/bio
