Re: Using a UDF written in Python

Jonathan Coveney Wed, 29 Dec 2010 11:32:46 -0800

Ok, strangely enough, it won't run locally either... it sees the file, but
it's giving me an interpreter not found error, so it must be something else.


PIG_CLASSPATH is equal
to /home/jcoveney/usefulpig/conf:/home/jcoveney/jython
and here is my test script

register '/home/jcoveney/udfs/pytest.py' using jython as comp;

the_in = LOAD 'input.txt' AS (thing:chararray);
the_out = FOREACH the_out GENERATE comp.computation(thing)
DUMP theout;

but I don't think it's getting that far... it's still giving me the same
error. I'm just running it "pig -x local script.pig"


2010/12/29 Jonathan Coveney <[email protected]>

> Ah, that might be it... my computer has it and I have it on my path,
> however, I do not know if the cluster has it... definitely something to look
> into. thanks.
>
>
> 2010/12/29 [email protected] <[email protected]>
>
>> try adding the full path to the jar via PIG_CLASSPATH like so:
>>
>> export PIG_CLASSPATH=/path/to/jython.jar
>>
>> then run pig. Also, I assume your doing your testing on a local machine?
>> if
>> it's on a cluster, you need to make sure jython is on all the worker nodes
>> and classpath is setup properly on all of them as well.
>>
>> On Wed, Dec 29, 2010 at 9:57 AM, Jonathan Coveney <[email protected]
>> >wrote:
>>
>> > I do have Jython installed and on PATH, but maybe I didn't include it in
>> > the
>> > right way? Where does it need to be?
>> >
>> > 2010/12/29 [email protected] <[email protected]>
>> >
>> > > Do you have Jython on your classpath? Currently Jython isn't
>> distributed
>> > in
>> > > the 0.8.0 release tarball.
>> > >
>> > > On Mon, Dec 27, 2010 at 7:18 PM, Jonathan Coveney <[email protected]
>> > > >wrote:
>> > >
>> > > > Oh and just to be sure, I have tried
>> > > > @outputSchema("word:chararray")
>> > > > @outputSchema("x:{t:(word:chararray)}")
>> > > > as well (the former of which seems to be the "right" one, whenever I
>> > can
>> > > > figure out what is wrong)
>> > > >
>> > > > I've tested my code separately in python and it is fine...
>> > > >
>> > > > 2010/12/28 Jonathan Coveney <[email protected]>
>> > > >
>> > > > > Aniket, I appreciate you taking a look at this. In general, I
>> found
>> > the
>> > > > > documentation around outputSchema pretty confusing... for example,
>> in
>> > > > this
>> > > > > example
>> > > > >
>> > > > > @outputSchema("x:{t:(word:chararray)}")
>> > > > > def helloworld():
>> > > > >   return ('Hello, World')
>> > > > >
>> > > > >
>> > > > > Then, in the sample script below that, you have
>> > > > >
>> > > > > @outputSchema("t:(numformat:chararray)")
>> > > > > def commaFormat(num):
>> > > > >   return '{:,}'.format(num)
>> > > > >
>> > > > > In this case, you have lost the x:{} (which makes more sense to
>> me.
>> > > > >
>> > > > > Perhaps this is because the latter function is meant to operate on
>> an
>> > > > input
>> > > > > and return a type (t), whereas the hello world function should be
>> > able
>> > > to
>> > > > > stand alone, and thus, has to return a bag? Not sure...
>> > > > >
>> > > > > Besides that, though, I changed my code per your suggestion and
>> tried
>> > > > >
>> > > > > @outputSchema("t:(word:chararray)")
>> > > > >
>> > > > > and still got the error.
>> > > > >
>> > > > > As a note, do I need to import anything in the python script for
>> > > > > outputSchema to work, or should it be fine since pig is grabbing
>> it?
>> > > > >
>> > > > > Once again, I really appreciate your help in the matter. I feel
>> > having
>> > > > > people who weren't intimately related to the project have a go at
>> it
>> > is
>> > > > how
>> > > > > you make it ultimately more usable and useful...but you have to
>> > answer
>> > > > some
>> > > > > annoying questions on the way :P
>> > > > >
>> > > > > Thanks again.
>> > > > >
>> > > > > 2010/12/28 Aniket Mokashi <[email protected]>
>> > > > >
>> > > > > I think decorator used here is incorrect.
>> > > > >> In general, "output:chararray" needs to be
>> schema-string-compatible.
>> > > > Also,
>> > > > >> you are using "outputSchemaFunction", which is used in case you
>> want
>> > > to
>> > > > >> write a udf that has output schema dependent on input schema (ęg
>> > > > -square)
>> > > > >> and this should have a function with decorator "schemaFunction"
>> > (named
>> > > > >> "output" in your case). I think using "outputSchema" decorator
>> would
>> > > fix
>> > > > >> the problem here.
>> > > > >>
>> > > > >> More details can be found at-
>> > > > >> http://wiki.apache.org/pig/UDFsUsingScriptingLanguages
>> > > > >>
>> > > > >> Thanks,
>> > > > >> Aniket
>> > > > >>
>> > > > >> On Mon, December 27, 2010 4:30 pm, Jonathan Coveney wrote:
>> > > > >> > so I have module.py, and I want to be able to use it in a pig
>> > > script.
>> > > > It
>> > > > >> > has no special imports or anything. I do have
>> > > > >> > @outputSchemaFunction("output:chararray)
>> > > > >> >
>> > > > >> >
>> > > > >> > In my pig script, I have this
>> > > > >> >
>> > > > >> >
>> > > > >> > register '/my/udf/location/udf.py' using jython as myfunc;
>> > > > >> >
>> > > > >> > is there any reason why this wouldn't work? here is the error I
>> > get:
>> > > > >> >
>> > > > >> > 2010-12-27 16:29:41,288 [main] ERROR
>> > > org.apache.pig.tools.grunt.Grunt
>> > > > -
>> > > > >> > ERROR 2998: Unhandled internal error.
>> > > > org/python/util/PythonInterpreter
>> > > > >> >
>> > > > >> >
>> > > > >> > Not the most instructive error, but is there anything more I
>> need
>> > to
>> > > > be
>> > > > >> > doing to be able to use a python UDF?
>> > > > >> >
>> > > > >> > As an aside, are simply python UDF's as efficient as Java ones?
>> I
>> > > like
>> > > > >> > Python a lot and love the idea of being able to UDF in it, but
>> can
>> > > use
>> > > > >> > java if necessary.
>> > > > >> >
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > http://about.me/soren/bio
>> > >
>> >
>>
>>
>>
>> --
>> http://about.me/soren/bio
>>
>
>

Re: Using a UDF written in Python

Reply via email to