Aniket, I appreciate you taking a look at this. In general, I found the
documentation around outputSchema pretty confusing... for example, in this
example
@outputSchema("x:{t:(word:chararray)}")
def helloworld():
return ('Hello, World')
Then, in the sample script below that, you have
@outputSchema("t:(numformat:chararray)")
def commaFormat(num):
return '{:,}'.format(num)
In this case, you have lost the x:{} (which makes more sense to me.
Perhaps this is because the latter function is meant to operate on an input
and return a type (t), whereas the hello world function should be able to
stand alone, and thus, has to return a bag? Not sure...
Besides that, though, I changed my code per your suggestion and tried
@outputSchema("t:(word:chararray)")
and still got the error.
As a note, do I need to import anything in the python script for
outputSchema to work, or should it be fine since pig is grabbing it?
Once again, I really appreciate your help in the matter. I feel having
people who weren't intimately related to the project have a go at it is how
you make it ultimately more usable and useful...but you have to answer some
annoying questions on the way :P
Thanks again.
2010/12/28 Aniket Mokashi <[email protected]>
> I think decorator used here is incorrect.
> In general, "output:chararray" needs to be schema-string-compatible. Also,
> you are using "outputSchemaFunction", which is used in case you want to
> write a udf that has output schema dependent on input schema (ęg -square)
> and this should have a function with decorator "schemaFunction" (named
> "output" in your case). I think using "outputSchema" decorator would fix
> the problem here.
>
> More details can be found at-
> http://wiki.apache.org/pig/UDFsUsingScriptingLanguages
>
> Thanks,
> Aniket
>
> On Mon, December 27, 2010 4:30 pm, Jonathan Coveney wrote:
> > so I have module.py, and I want to be able to use it in a pig script. It
> > has no special imports or anything. I do have
> > @outputSchemaFunction("output:chararray)
> >
> >
> > In my pig script, I have this
> >
> >
> > register '/my/udf/location/udf.py' using jython as myfunc;
> >
> > is there any reason why this wouldn't work? here is the error I get:
> >
> > 2010-12-27 16:29:41,288 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 2998: Unhandled internal error. org/python/util/PythonInterpreter
> >
> >
> > Not the most instructive error, but is there anything more I need to be
> > doing to be able to use a python UDF?
> >
> > As an aside, are simply python UDF's as efficient as Java ones? I like
> > Python a lot and love the idea of being able to UDF in it, but can use
> > java if necessary.
> >
>
>
>