I think you took Dmitriy a bit to litterally ;)

you need to put the actual filenames of the jars into PIG_CLASSPATH.
If /home/jcoveney/usefulpig/conf:/home/jcoveney/jython
is the directory that contains jython.jar (used purely as an example, I'm
not certain what the actualy jar name is) then your PIG_CLASSPATH should
echo to:

/home/jcoveney/jython/jython.jar

plus whatever other jars you want to include.

2010/12/29 Jonathan Coveney <[email protected]>

> Wait, ignore that error, that was the wrong one.
>
> This is it:
>
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal
> error. org/python/util/PythonInterpreter
>
> (I had set the classpath incorrectly, to *.* not ***)
>
> 2010/12/29 Jonathan Coveney <[email protected]>
>
> > echo $PIG_CLASSPATH
> > /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/***
> >
> > same error
> >
> > 2010-12-29 16:59:29,862 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 2998: Unhandled internal error. Could not initialize class
> > org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter
> >
> > :S
> >
> > I really love that UDF's can be written in python...thanks for helping me
> > try to get there.
> >
> > 2010/12/29 Dmitriy Ryaboy <[email protected]>
> >
> > You need to set the classpath to include the literal jar strings, not
> just
> >> the directory that contains them.
> >> Try, /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/***
> >>
> >> D
> >>
> >> On Wed, Dec 29, 2010 at 11:32 AM, Jonathan Coveney <[email protected]
> >> >wrote:
> >>
> >> > Ok, strangely enough, it won't run locally either... it sees the file,
> >> but
> >> > it's giving me an interpreter not found error, so it must be something
> >> > else.
> >> >
> >> > PIG_CLASSPATH is equal
> >> > to /home/jcoveney/usefulpig/conf:/home/jcoveney/jython
> >> > and here is my test script
> >> >
> >> > register '/home/jcoveney/udfs/pytest.py' using jython as comp;
> >> >
> >> > the_in = LOAD 'input.txt' AS (thing:chararray);
> >> > the_out = FOREACH the_out GENERATE comp.computation(thing)
> >> > DUMP theout;
> >> >
> >> > but I don't think it's getting that far... it's still giving me the
> same
> >> > error. I'm just running it "pig -x local script.pig"
> >> >
> >> >
> >> > 2010/12/29 Jonathan Coveney <[email protected]>
> >> >
> >> > > Ah, that might be it... my computer has it and I have it on my path,
> >> > > however, I do not know if the cluster has it... definitely something
> >> to
> >> > look
> >> > > into. thanks.
> >> > >
> >> > >
> >> > > 2010/12/29 [email protected] <[email protected]>
> >> > >
> >> > >> try adding the full path to the jar via PIG_CLASSPATH like so:
> >> > >>
> >> > >> export PIG_CLASSPATH=/path/to/jython.jar
> >> > >>
> >> > >> then run pig. Also, I assume your doing your testing on a local
> >> machine?
> >> > >> if
> >> > >> it's on a cluster, you need to make sure jython is on all the
> worker
> >> > nodes
> >> > >> and classpath is setup properly on all of them as well.
> >> > >>
> >> > >> On Wed, Dec 29, 2010 at 9:57 AM, Jonathan Coveney <
> >> [email protected]
> >> > >> >wrote:
> >> > >>
> >> > >> > I do have Jython installed and on PATH, but maybe I didn't
> include
> >> it
> >> > in
> >> > >> > the
> >> > >> > right way? Where does it need to be?
> >> > >> >
> >> > >> > 2010/12/29 [email protected] <[email protected]>
> >> > >> >
> >> > >> > > Do you have Jython on your classpath? Currently Jython isn't
> >> > >> distributed
> >> > >> > in
> >> > >> > > the 0.8.0 release tarball.
> >> > >> > >
> >> > >> > > On Mon, Dec 27, 2010 at 7:18 PM, Jonathan Coveney <
> >> > [email protected]
> >> > >> > > >wrote:
> >> > >> > >
> >> > >> > > > Oh and just to be sure, I have tried
> >> > >> > > > @outputSchema("word:chararray")
> >> > >> > > > @outputSchema("x:{t:(word:chararray)}")
> >> > >> > > > as well (the former of which seems to be the "right" one,
> >> whenever
> >> > I
> >> > >> > can
> >> > >> > > > figure out what is wrong)
> >> > >> > > >
> >> > >> > > > I've tested my code separately in python and it is fine...
> >> > >> > > >
> >> > >> > > > 2010/12/28 Jonathan Coveney <[email protected]>
> >> > >> > > >
> >> > >> > > > > Aniket, I appreciate you taking a look at this. In general,
> I
> >> > >> found
> >> > >> > the
> >> > >> > > > > documentation around outputSchema pretty confusing... for
> >> > example,
> >> > >> in
> >> > >> > > > this
> >> > >> > > > > example
> >> > >> > > > >
> >> > >> > > > > @outputSchema("x:{t:(word:chararray)}")
> >> > >> > > > > def helloworld():
> >> > >> > > > >   return ('Hello, World')
> >> > >> > > > >
> >> > >> > > > >
> >> > >> > > > > Then, in the sample script below that, you have
> >> > >> > > > >
> >> > >> > > > > @outputSchema("t:(numformat:chararray)")
> >> > >> > > > > def commaFormat(num):
> >> > >> > > > >   return '{:,}'.format(num)
> >> > >> > > > >
> >> > >> > > > > In this case, you have lost the x:{} (which makes more
> sense
> >> to
> >> > >> me.
> >> > >> > > > >
> >> > >> > > > > Perhaps this is because the latter function is meant to
> >> operate
> >> > on
> >> > >> an
> >> > >> > > > input
> >> > >> > > > > and return a type (t), whereas the hello world function
> >> should
> >> > be
> >> > >> > able
> >> > >> > > to
> >> > >> > > > > stand alone, and thus, has to return a bag? Not sure...
> >> > >> > > > >
> >> > >> > > > > Besides that, though, I changed my code per your suggestion
> >> and
> >> > >> tried
> >> > >> > > > >
> >> > >> > > > > @outputSchema("t:(word:chararray)")
> >> > >> > > > >
> >> > >> > > > > and still got the error.
> >> > >> > > > >
> >> > >> > > > > As a note, do I need to import anything in the python
> script
> >> for
> >> > >> > > > > outputSchema to work, or should it be fine since pig is
> >> grabbing
> >> > >> it?
> >> > >> > > > >
> >> > >> > > > > Once again, I really appreciate your help in the matter. I
> >> feel
> >> > >> > having
> >> > >> > > > > people who weren't intimately related to the project have a
> >> go
> >> > at
> >> > >> it
> >> > >> > is
> >> > >> > > > how
> >> > >> > > > > you make it ultimately more usable and useful...but you
> have
> >> to
> >> > >> > answer
> >> > >> > > > some
> >> > >> > > > > annoying questions on the way :P
> >> > >> > > > >
> >> > >> > > > > Thanks again.
> >> > >> > > > >
> >> > >> > > > > 2010/12/28 Aniket Mokashi <[email protected]>
> >> > >> > > > >
> >> > >> > > > > I think decorator used here is incorrect.
> >> > >> > > > >> In general, "output:chararray" needs to be
> >> > >> schema-string-compatible.
> >> > >> > > > Also,
> >> > >> > > > >> you are using "outputSchemaFunction", which is used in
> case
> >> you
> >> > >> want
> >> > >> > > to
> >> > >> > > > >> write a udf that has output schema dependent on input
> schema
> >> > (ęg
> >> > >> > > > -square)
> >> > >> > > > >> and this should have a function with decorator
> >> "schemaFunction"
> >> > >> > (named
> >> > >> > > > >> "output" in your case). I think using "outputSchema"
> >> decorator
> >> > >> would
> >> > >> > > fix
> >> > >> > > > >> the problem here.
> >> > >> > > > >>
> >> > >> > > > >> More details can be found at-
> >> > >> > > > >> http://wiki.apache.org/pig/UDFsUsingScriptingLanguages
> >> > >> > > > >>
> >> > >> > > > >> Thanks,
> >> > >> > > > >> Aniket
> >> > >> > > > >>
> >> > >> > > > >> On Mon, December 27, 2010 4:30 pm, Jonathan Coveney wrote:
> >> > >> > > > >> > so I have module.py, and I want to be able to use it in
> a
> >> pig
> >> > >> > > script.
> >> > >> > > > It
> >> > >> > > > >> > has no special imports or anything. I do have
> >> > >> > > > >> > @outputSchemaFunction("output:chararray)
> >> > >> > > > >> >
> >> > >> > > > >> >
> >> > >> > > > >> > In my pig script, I have this
> >> > >> > > > >> >
> >> > >> > > > >> >
> >> > >> > > > >> > register '/my/udf/location/udf.py' using jython as
> myfunc;
> >> > >> > > > >> >
> >> > >> > > > >> > is there any reason why this wouldn't work? here is the
> >> error
> >> > I
> >> > >> > get:
> >> > >> > > > >> >
> >> > >> > > > >> > 2010-12-27 16:29:41,288 [main] ERROR
> >> > >> > > org.apache.pig.tools.grunt.Grunt
> >> > >> > > > -
> >> > >> > > > >> > ERROR 2998: Unhandled internal error.
> >> > >> > > > org/python/util/PythonInterpreter
> >> > >> > > > >> >
> >> > >> > > > >> >
> >> > >> > > > >> > Not the most instructive error, but is there anything
> more
> >> I
> >> > >> need
> >> > >> > to
> >> > >> > > > be
> >> > >> > > > >> > doing to be able to use a python UDF?
> >> > >> > > > >> >
> >> > >> > > > >> > As an aside, are simply python UDF's as efficient as
> Java
> >> > ones?
> >> > >> I
> >> > >> > > like
> >> > >> > > > >> > Python a lot and love the idea of being able to UDF in
> it,
> >> > but
> >> > >> can
> >> > >> > > use
> >> > >> > > > >> > java if necessary.
> >> > >> > > > >> >
> >> > >> > > > >>
> >> > >> > > > >>
> >> > >> > > > >>
> >> > >> > > > >
> >> > >> > > >
> >> > >> > >
> >> > >> > >
> >> > >> > >
> >> > >> > > --
> >> > >> > > http://about.me/soren/bio
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >>
> >> > >>
> >> > >> --
> >> > >> http://about.me/soren/bio
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>



-- 
http://about.me/soren/bio

Reply via email to