Ok, I guess I'm just not used to these sorts of situations where the
dependencies get so hairy

1) for a simply UDF, what dependencies are these that need to be included?
2) Is there a semi-easy way to clean this up?

Thanks for your patience. I really am new to the whole dependencies game

2010/12/29 Dmitriy Ryaboy <[email protected]>

> All the dependencies have to be on the classpath, including the
> dependencies' dependencies...
>
> D
>
>
> On Wed, Dec 29, 2010 at 3:12 PM, Jonathan Coveney <[email protected]
> >wrote:
>
> > Also, just in general, does EVERY UDF we want to load have to be added to
> > the classpath when you call pig? And just the .jar/.py file, or more than
> > that?
> >
> > 2010/12/29 Jonathan Coveney <[email protected]>
> >
> > > Haha gotcha, I am not the greatest at all this package management. I
> > think
> > > we are getting close though... I added jython.jar, as well as my
> test.py
> > > file, and here is what I got when I ran it
> > >
> > > *sys-package-mgr*: processing new jar,
> '/home/jcoveney/pig-0.8.0/pig.jar'
> > > *sys-package-mgr*: processing new jar, '/home/jcoveney/udfs/test.jar'
> > > 2010-12-29 17:56:47,118 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > > ERROR 1070: Could not resolve compress.compressuid using imports: [,
> > > org.apache.pig.builtin., org.apache.pig.impl.builtin.]
> > >
> > > (I got the java thing because I ported my UDF to java to see if it
> would
> > be
> > > any easier...)
> > >
> > > Here is the command I used to run it
> > > java -cp $PIGPATH/pig.jar:$PIG_CLASSPATH org.apache.pig.Main -x local
> > > udftest.pig
> > >
> > > $PIG_CLASSPATH =
> > >
> >
> /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/jython.jar:/home/jcoveney/udfs/test.jar:/home/jcoveney/udfs/test.py
> > >
> > > Now, whether I use the python version or the java version, I get an
> error
> > > (well, the first one only applies to the python)
> > >
> > > init: Bootstrapping class not in Py.BOOTSTRAP_TYPES[class=class
> > > org.python.core.PyStringMap]
> > > 2010-12-29 18:03:02,967 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > > ERROR 1070: Could not resolve test.test using imports: [,
> > > org.apache.pig.builtin., org.apache.pig.impl.builtin.]
> > >
> > > Any ideas? I followed the UDF manual, but perhaps my naming or
> something
> > is
> > > off? I have no idea. Would love any help you can throw at me...
> > >
> > > 2010/12/29 [email protected] <[email protected]>
> > >
> > >> I think you took Dmitriy a bit to litterally ;)
> > >>
> > >> you need to put the actual filenames of the jars into PIG_CLASSPATH.
> > >> If /home/jcoveney/usefulpig/conf:/home/jcoveney/jython
> > >> is the directory that contains jython.jar (used purely as an example,
> > I'm
> > >> not certain what the actualy jar name is) then your PIG_CLASSPATH
> should
> > >> echo to:
> > >>
> > >> /home/jcoveney/jython/jython.jar
> > >>
> > >> plus whatever other jars you want to include.
> > >>
> > >> 2010/12/29 Jonathan Coveney <[email protected]>
> > >>
> > >> > Wait, ignore that error, that was the wrong one.
> > >> >
> > >> > This is it:
> > >> >
> > >> >  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled
> > internal
> > >> > error. org/python/util/PythonInterpreter
> > >> >
> > >> > (I had set the classpath incorrectly, to *.* not ***)
> > >> >
> > >> > 2010/12/29 Jonathan Coveney <[email protected]>
> > >> >
> > >> > > echo $PIG_CLASSPATH
> > >> > > /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/***
> > >> > >
> > >> > > same error
> > >> > >
> > >> > > 2010-12-29 16:59:29,862 [main] ERROR
> > org.apache.pig.tools.grunt.Grunt
> > >> -
> > >> > > ERROR 2998: Unhandled internal error. Could not initialize class
> > >> > > org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter
> > >> > >
> > >> > > :S
> > >> > >
> > >> > > I really love that UDF's can be written in python...thanks for
> > helping
> > >> me
> > >> > > try to get there.
> > >> > >
> > >> > > 2010/12/29 Dmitriy Ryaboy <[email protected]>
> > >> > >
> > >> > > You need to set the classpath to include the literal jar strings,
> > not
> > >> > just
> > >> > >> the directory that contains them.
> > >> > >> Try, /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/***
> > >> > >>
> > >> > >> D
> > >> > >>
> > >> > >> On Wed, Dec 29, 2010 at 11:32 AM, Jonathan Coveney <
> > >> [email protected]
> > >> > >> >wrote:
> > >> > >>
> > >> > >> > Ok, strangely enough, it won't run locally either... it sees
> the
> > >> file,
> > >> > >> but
> > >> > >> > it's giving me an interpreter not found error, so it must be
> > >> something
> > >> > >> > else.
> > >> > >> >
> > >> > >> > PIG_CLASSPATH is equal
> > >> > >> > to /home/jcoveney/usefulpig/conf:/home/jcoveney/jython
> > >> > >> > and here is my test script
> > >> > >> >
> > >> > >> > register '/home/jcoveney/udfs/pytest.py' using jython as comp;
> > >> > >> >
> > >> > >> > the_in = LOAD 'input.txt' AS (thing:chararray);
> > >> > >> > the_out = FOREACH the_out GENERATE comp.computation(thing)
> > >> > >> > DUMP theout;
> > >> > >> >
> > >> > >> > but I don't think it's getting that far... it's still giving me
> > the
> > >> > same
> > >> > >> > error. I'm just running it "pig -x local script.pig"
> > >> > >> >
> > >> > >> >
> > >> > >> > 2010/12/29 Jonathan Coveney <[email protected]>
> > >> > >> >
> > >> > >> > > Ah, that might be it... my computer has it and I have it on
> my
> > >> path,
> > >> > >> > > however, I do not know if the cluster has it... definitely
> > >> something
> > >> > >> to
> > >> > >> > look
> > >> > >> > > into. thanks.
> > >> > >> > >
> > >> > >> > >
> > >> > >> > > 2010/12/29 [email protected] <[email protected]>
> > >> > >> > >
> > >> > >> > >> try adding the full path to the jar via PIG_CLASSPATH like
> so:
> > >> > >> > >>
> > >> > >> > >> export PIG_CLASSPATH=/path/to/jython.jar
> > >> > >> > >>
> > >> > >> > >> then run pig. Also, I assume your doing your testing on a
> > local
> > >> > >> machine?
> > >> > >> > >> if
> > >> > >> > >> it's on a cluster, you need to make sure jython is on all
> the
> > >> > worker
> > >> > >> > nodes
> > >> > >> > >> and classpath is setup properly on all of them as well.
> > >> > >> > >>
> > >> > >> > >> On Wed, Dec 29, 2010 at 9:57 AM, Jonathan Coveney <
> > >> > >> [email protected]
> > >> > >> > >> >wrote:
> > >> > >> > >>
> > >> > >> > >> > I do have Jython installed and on PATH, but maybe I didn't
> > >> > include
> > >> > >> it
> > >> > >> > in
> > >> > >> > >> > the
> > >> > >> > >> > right way? Where does it need to be?
> > >> > >> > >> >
> > >> > >> > >> > 2010/12/29 [email protected] <[email protected]>
> > >> > >> > >> >
> > >> > >> > >> > > Do you have Jython on your classpath? Currently Jython
> > isn't
> > >> > >> > >> distributed
> > >> > >> > >> > in
> > >> > >> > >> > > the 0.8.0 release tarball.
> > >> > >> > >> > >
> > >> > >> > >> > > On Mon, Dec 27, 2010 at 7:18 PM, Jonathan Coveney <
> > >> > >> > [email protected]
> > >> > >> > >> > > >wrote:
> > >> > >> > >> > >
> > >> > >> > >> > > > Oh and just to be sure, I have tried
> > >> > >> > >> > > > @outputSchema("word:chararray")
> > >> > >> > >> > > > @outputSchema("x:{t:(word:chararray)}")
> > >> > >> > >> > > > as well (the former of which seems to be the "right"
> > one,
> > >> > >> whenever
> > >> > >> > I
> > >> > >> > >> > can
> > >> > >> > >> > > > figure out what is wrong)
> > >> > >> > >> > > >
> > >> > >> > >> > > > I've tested my code separately in python and it is
> > fine...
> > >> > >> > >> > > >
> > >> > >> > >> > > > 2010/12/28 Jonathan Coveney <[email protected]>
> > >> > >> > >> > > >
> > >> > >> > >> > > > > Aniket, I appreciate you taking a look at this. In
> > >> general,
> > >> > I
> > >> > >> > >> found
> > >> > >> > >> > the
> > >> > >> > >> > > > > documentation around outputSchema pretty
> confusing...
> > >> for
> > >> > >> > example,
> > >> > >> > >> in
> > >> > >> > >> > > > this
> > >> > >> > >> > > > > example
> > >> > >> > >> > > > >
> > >> > >> > >> > > > > @outputSchema("x:{t:(word:chararray)}")
> > >> > >> > >> > > > > def helloworld():
> > >> > >> > >> > > > >   return ('Hello, World')
> > >> > >> > >> > > > >
> > >> > >> > >> > > > >
> > >> > >> > >> > > > > Then, in the sample script below that, you have
> > >> > >> > >> > > > >
> > >> > >> > >> > > > > @outputSchema("t:(numformat:chararray)")
> > >> > >> > >> > > > > def commaFormat(num):
> > >> > >> > >> > > > >   return '{:,}'.format(num)
> > >> > >> > >> > > > >
> > >> > >> > >> > > > > In this case, you have lost the x:{} (which makes
> more
> > >> > sense
> > >> > >> to
> > >> > >> > >> me.
> > >> > >> > >> > > > >
> > >> > >> > >> > > > > Perhaps this is because the latter function is meant
> > to
> > >> > >> operate
> > >> > >> > on
> > >> > >> > >> an
> > >> > >> > >> > > > input
> > >> > >> > >> > > > > and return a type (t), whereas the hello world
> > function
> > >> > >> should
> > >> > >> > be
> > >> > >> > >> > able
> > >> > >> > >> > > to
> > >> > >> > >> > > > > stand alone, and thus, has to return a bag? Not
> > sure...
> > >> > >> > >> > > > >
> > >> > >> > >> > > > > Besides that, though, I changed my code per your
> > >> suggestion
> > >> > >> and
> > >> > >> > >> tried
> > >> > >> > >> > > > >
> > >> > >> > >> > > > > @outputSchema("t:(word:chararray)")
> > >> > >> > >> > > > >
> > >> > >> > >> > > > > and still got the error.
> > >> > >> > >> > > > >
> > >> > >> > >> > > > > As a note, do I need to import anything in the
> python
> > >> > script
> > >> > >> for
> > >> > >> > >> > > > > outputSchema to work, or should it be fine since pig
> > is
> > >> > >> grabbing
> > >> > >> > >> it?
> > >> > >> > >> > > > >
> > >> > >> > >> > > > > Once again, I really appreciate your help in the
> > matter.
> > >> I
> > >> > >> feel
> > >> > >> > >> > having
> > >> > >> > >> > > > > people who weren't intimately related to the project
> > >> have a
> > >> > >> go
> > >> > >> > at
> > >> > >> > >> it
> > >> > >> > >> > is
> > >> > >> > >> > > > how
> > >> > >> > >> > > > > you make it ultimately more usable and useful...but
> > you
> > >> > have
> > >> > >> to
> > >> > >> > >> > answer
> > >> > >> > >> > > > some
> > >> > >> > >> > > > > annoying questions on the way :P
> > >> > >> > >> > > > >
> > >> > >> > >> > > > > Thanks again.
> > >> > >> > >> > > > >
> > >> > >> > >> > > > > 2010/12/28 Aniket Mokashi <[email protected]>
> > >> > >> > >> > > > >
> > >> > >> > >> > > > > I think decorator used here is incorrect.
> > >> > >> > >> > > > >> In general, "output:chararray" needs to be
> > >> > >> > >> schema-string-compatible.
> > >> > >> > >> > > > Also,
> > >> > >> > >> > > > >> you are using "outputSchemaFunction", which is used
> > in
> > >> > case
> > >> > >> you
> > >> > >> > >> want
> > >> > >> > >> > > to
> > >> > >> > >> > > > >> write a udf that has output schema dependent on
> input
> > >> > schema
> > >> > >> > (ęg
> > >> > >> > >> > > > -square)
> > >> > >> > >> > > > >> and this should have a function with decorator
> > >> > >> "schemaFunction"
> > >> > >> > >> > (named
> > >> > >> > >> > > > >> "output" in your case). I think using
> "outputSchema"
> > >> > >> decorator
> > >> > >> > >> would
> > >> > >> > >> > > fix
> > >> > >> > >> > > > >> the problem here.
> > >> > >> > >> > > > >>
> > >> > >> > >> > > > >> More details can be found at-
> > >> > >> > >> > > > >>
> > http://wiki.apache.org/pig/UDFsUsingScriptingLanguages
> > >> > >> > >> > > > >>
> > >> > >> > >> > > > >> Thanks,
> > >> > >> > >> > > > >> Aniket
> > >> > >> > >> > > > >>
> > >> > >> > >> > > > >> On Mon, December 27, 2010 4:30 pm, Jonathan Coveney
> > >> wrote:
> > >> > >> > >> > > > >> > so I have module.py, and I want to be able to use
> > it
> > >> in
> > >> > a
> > >> > >> pig
> > >> > >> > >> > > script.
> > >> > >> > >> > > > It
> > >> > >> > >> > > > >> > has no special imports or anything. I do have
> > >> > >> > >> > > > >> > @outputSchemaFunction("output:chararray)
> > >> > >> > >> > > > >> >
> > >> > >> > >> > > > >> >
> > >> > >> > >> > > > >> > In my pig script, I have this
> > >> > >> > >> > > > >> >
> > >> > >> > >> > > > >> >
> > >> > >> > >> > > > >> > register '/my/udf/location/udf.py' using jython
> as
> > >> > myfunc;
> > >> > >> > >> > > > >> >
> > >> > >> > >> > > > >> > is there any reason why this wouldn't work? here
> is
> > >> the
> > >> > >> error
> > >> > >> > I
> > >> > >> > >> > get:
> > >> > >> > >> > > > >> >
> > >> > >> > >> > > > >> > 2010-12-27 16:29:41,288 [main] ERROR
> > >> > >> > >> > > org.apache.pig.tools.grunt.Grunt
> > >> > >> > >> > > > -
> > >> > >> > >> > > > >> > ERROR 2998: Unhandled internal error.
> > >> > >> > >> > > > org/python/util/PythonInterpreter
> > >> > >> > >> > > > >> >
> > >> > >> > >> > > > >> >
> > >> > >> > >> > > > >> > Not the most instructive error, but is there
> > anything
> > >> > more
> > >> > >> I
> > >> > >> > >> need
> > >> > >> > >> > to
> > >> > >> > >> > > > be
> > >> > >> > >> > > > >> > doing to be able to use a python UDF?
> > >> > >> > >> > > > >> >
> > >> > >> > >> > > > >> > As an aside, are simply python UDF's as efficient
> > as
> > >> > Java
> > >> > >> > ones?
> > >> > >> > >> I
> > >> > >> > >> > > like
> > >> > >> > >> > > > >> > Python a lot and love the idea of being able to
> UDF
> > >> in
> > >> > it,
> > >> > >> > but
> > >> > >> > >> can
> > >> > >> > >> > > use
> > >> > >> > >> > > > >> > java if necessary.
> > >> > >> > >> > > > >> >
> > >> > >> > >> > > > >>
> > >> > >> > >> > > > >>
> > >> > >> > >> > > > >>
> > >> > >> > >> > > > >
> > >> > >> > >> > > >
> > >> > >> > >> > >
> > >> > >> > >> > >
> > >> > >> > >> > >
> > >> > >> > >> > > --
> > >> > >> > >> > > http://about.me/soren/bio
> > >> > >> > >> > >
> > >> > >> > >> >
> > >> > >> > >>
> > >> > >> > >>
> > >> > >> > >>
> > >> > >> > >> --
> > >> > >> > >> http://about.me/soren/bio
> > >> > >> > >>
> > >> > >> > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> http://about.me/soren/bio
> > >>
> > >
> > >
> >
>

Reply via email to