It works! Only need to explicitly cast the results into bag of tuples.

from java.util.regex import *
from java.lang import *

@outputSchema("y:bag{t:tuple(word:chararray)}")
def strsplittobag(content,regex):
        toks = Pattern.compile(regex).split(content)
        outBag = []
        for tok in toks:
                tup = tok,
                outBag.append(tup)
        return outBag

Thank you all!

Shawn

On Wed, Jan 26, 2011 at 11:01 AM, Julien Le Dem <[email protected]> wrote:
> As a workaround, in Jython you can also use the java classes.
> Something like: (not tested)
>
> from java.util.regex import *
> from java.lang import *
>
> @outputSchema("y:bag{t:tuple(word:chararray)}")
>  def strsplittobag(content,regex):
>         return Pattern.compile(regex).split(content)
>
> Julien
>
> On 1/25/11 5:46 PM, "Richard Ding" <[email protected]> wrote:
>
> You're right. There're two issues here. First, the Jython script needs to
> locate the modules in its search path (e.g. python.path). If you have the
> right env variable set, Jython script should be able to find and import the
> module. Second, Pig currently doesn't automatically ship the module file to
> the backend, so even if you set the search path in the frontend, the backend
> still cannot locate the module.
>
> Finally, there is incompatibility between Python modules and Jython modules.
> You need to use Jython modules that come with Jython installation (in the
> Lib directory).
>
> We're looking into these issues and hoping to provide a solution in the next
> release.
>
> Thanks,
> -Richard
>
>
> On 1/25/11 12:50 PM, "Xiaomeng Wan" <[email protected]> wrote:
>
> Hi Daniel,
>
> I did put jython.jar in classpath. By comparing other python udfs with
> this one, I find those udfs which work do not import anything. Could
> that be the cause? Do I need to anything extra to import module in my
> udf?
>
> Thanks!
>
> Shawn
>
> On Mon, Jan 24, 2011 at 5:28 PM, Daniel Dai <[email protected]> wrote:
>> Put build/ivy/lib/Pig/jython-2.5.0.jar in your classpath (if not there, do
>> ant first). This is a bug we need to fix.
>>
>> Daniel
>>
>> Xiaomeng Wan wrote:
>>>
>>> Hi,
>>> I want to write a python udf to split string into bags
>>>
>>> ------------------------------------------------------------
>>> #!/usr/bin/python
>>>
>>> import re
>>> @outputSchema("y:bag{t:tuple(word:chararray)}")
>>> def strsplittobag(content,regex):
>>>        return re.compile(regex).split(content)
>>> ------------------------------------------------------------
>>>
>>> it gave an error saying "could not instantiate
>>> 'org.apache.pig.scripting.jython.JythonFunction' with arguments
>>> '[/home/.../mypyudfs.py, strsplittobag]'". I had some other python
>>> udfs working, so shouldn't be configuration problem. I am new to
>>> python, did I miss anything?
>>>
>>> Thanks!
>>>
>>> Shawn
>>>
>>
>>
>
>
>

Reply via email to