Ok, strangely enough, it won't run locally either... it sees the file, but it's giving me an interpreter not found error, so it must be something else.
PIG_CLASSPATH is equal to /home/jcoveney/usefulpig/conf:/home/jcoveney/jython and here is my test script register '/home/jcoveney/udfs/pytest.py' using jython as comp; the_in = LOAD 'input.txt' AS (thing:chararray); the_out = FOREACH the_out GENERATE comp.computation(thing) DUMP theout; but I don't think it's getting that far... it's still giving me the same error. I'm just running it "pig -x local script.pig" 2010/12/29 Jonathan Coveney <[email protected]> > Ah, that might be it... my computer has it and I have it on my path, > however, I do not know if the cluster has it... definitely something to look > into. thanks. > > > 2010/12/29 [email protected] <[email protected]> > >> try adding the full path to the jar via PIG_CLASSPATH like so: >> >> export PIG_CLASSPATH=/path/to/jython.jar >> >> then run pig. Also, I assume your doing your testing on a local machine? >> if >> it's on a cluster, you need to make sure jython is on all the worker nodes >> and classpath is setup properly on all of them as well. >> >> On Wed, Dec 29, 2010 at 9:57 AM, Jonathan Coveney <[email protected] >> >wrote: >> >> > I do have Jython installed and on PATH, but maybe I didn't include it in >> > the >> > right way? Where does it need to be? >> > >> > 2010/12/29 [email protected] <[email protected]> >> > >> > > Do you have Jython on your classpath? Currently Jython isn't >> distributed >> > in >> > > the 0.8.0 release tarball. >> > > >> > > On Mon, Dec 27, 2010 at 7:18 PM, Jonathan Coveney <[email protected] >> > > >wrote: >> > > >> > > > Oh and just to be sure, I have tried >> > > > @outputSchema("word:chararray") >> > > > @outputSchema("x:{t:(word:chararray)}") >> > > > as well (the former of which seems to be the "right" one, whenever I >> > can >> > > > figure out what is wrong) >> > > > >> > > > I've tested my code separately in python and it is fine... >> > > > >> > > > 2010/12/28 Jonathan Coveney <[email protected]> >> > > > >> > > > > Aniket, I appreciate you taking a look at this. In general, I >> found >> > the >> > > > > documentation around outputSchema pretty confusing... for example, >> in >> > > > this >> > > > > example >> > > > > >> > > > > @outputSchema("x:{t:(word:chararray)}") >> > > > > def helloworld(): >> > > > > return ('Hello, World') >> > > > > >> > > > > >> > > > > Then, in the sample script below that, you have >> > > > > >> > > > > @outputSchema("t:(numformat:chararray)") >> > > > > def commaFormat(num): >> > > > > return '{:,}'.format(num) >> > > > > >> > > > > In this case, you have lost the x:{} (which makes more sense to >> me. >> > > > > >> > > > > Perhaps this is because the latter function is meant to operate on >> an >> > > > input >> > > > > and return a type (t), whereas the hello world function should be >> > able >> > > to >> > > > > stand alone, and thus, has to return a bag? Not sure... >> > > > > >> > > > > Besides that, though, I changed my code per your suggestion and >> tried >> > > > > >> > > > > @outputSchema("t:(word:chararray)") >> > > > > >> > > > > and still got the error. >> > > > > >> > > > > As a note, do I need to import anything in the python script for >> > > > > outputSchema to work, or should it be fine since pig is grabbing >> it? >> > > > > >> > > > > Once again, I really appreciate your help in the matter. I feel >> > having >> > > > > people who weren't intimately related to the project have a go at >> it >> > is >> > > > how >> > > > > you make it ultimately more usable and useful...but you have to >> > answer >> > > > some >> > > > > annoying questions on the way :P >> > > > > >> > > > > Thanks again. >> > > > > >> > > > > 2010/12/28 Aniket Mokashi <[email protected]> >> > > > > >> > > > > I think decorator used here is incorrect. >> > > > >> In general, "output:chararray" needs to be >> schema-string-compatible. >> > > > Also, >> > > > >> you are using "outputSchemaFunction", which is used in case you >> want >> > > to >> > > > >> write a udf that has output schema dependent on input schema (ęg >> > > > -square) >> > > > >> and this should have a function with decorator "schemaFunction" >> > (named >> > > > >> "output" in your case). I think using "outputSchema" decorator >> would >> > > fix >> > > > >> the problem here. >> > > > >> >> > > > >> More details can be found at- >> > > > >> http://wiki.apache.org/pig/UDFsUsingScriptingLanguages >> > > > >> >> > > > >> Thanks, >> > > > >> Aniket >> > > > >> >> > > > >> On Mon, December 27, 2010 4:30 pm, Jonathan Coveney wrote: >> > > > >> > so I have module.py, and I want to be able to use it in a pig >> > > script. >> > > > It >> > > > >> > has no special imports or anything. I do have >> > > > >> > @outputSchemaFunction("output:chararray) >> > > > >> > >> > > > >> > >> > > > >> > In my pig script, I have this >> > > > >> > >> > > > >> > >> > > > >> > register '/my/udf/location/udf.py' using jython as myfunc; >> > > > >> > >> > > > >> > is there any reason why this wouldn't work? here is the error I >> > get: >> > > > >> > >> > > > >> > 2010-12-27 16:29:41,288 [main] ERROR >> > > org.apache.pig.tools.grunt.Grunt >> > > > - >> > > > >> > ERROR 2998: Unhandled internal error. >> > > > org/python/util/PythonInterpreter >> > > > >> > >> > > > >> > >> > > > >> > Not the most instructive error, but is there anything more I >> need >> > to >> > > > be >> > > > >> > doing to be able to use a python UDF? >> > > > >> > >> > > > >> > As an aside, are simply python UDF's as efficient as Java ones? >> I >> > > like >> > > > >> > Python a lot and love the idea of being able to UDF in it, but >> can >> > > use >> > > > >> > java if necessary. >> > > > >> > >> > > > >> >> > > > >> >> > > > >> >> > > > > >> > > > >> > > >> > > >> > > >> > > -- >> > > http://about.me/soren/bio >> > > >> > >> >> >> >> -- >> http://about.me/soren/bio >> > >
