[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871025#action_12871025
 ] 

Arnab Nandi commented on PIG-928:
---------------------------------

Thanks Dmitriy! Lazy objects are a great idea. Note that I'm not saying that 
pythontoPig is slow per se -- it's just the biggest part of the profiler trace, 
and would be a great place for optimization. I ran some numbers on the patch, 
and it looks like outside of the runtime instantiation, there is a fairly small 
performance penalty with the current code (1.2x slower).

WordCount example from Alan's package.zip: 
||Data size||Native||Jython||Factor||
|10K|9s|18s|2|
|50K|14s|19s|1.35|
|500K|54s|64s|1.19|
(Full Data: 8x"War & Peace" from Proj. Gutenberg, 500K lines, 24MB)
(TOKENIZE was modified to spaces-only, both implementations have identical 
output)

Python code:
{noformat}
@outputSchema("s:{d:(word:chararray)}")
def tokenize(word):
  if word is not None:
    return word.split(' ')
{noformat}

> UDFs in scripting languages
> ---------------------------
>
>                 Key: PIG-928
>                 URL: https://issues.apache.org/jira/browse/PIG-928
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Alan Gates
>             Fix For: 0.8.0
>
>         Attachments: calltrace.png, package.zip, pig-greek.tgz, 
> pig.scripting.patch.arnab, pyg.tgz, scripting.tgz, scripting.tgz, test.zip
>
>
> It should be possible to write UDFs in scripting languages such as python, 
> ruby, etc.  This frees users from needing to compile Java, generate a jar, 
> etc.  It also opens Pig to programmers who prefer scripting languages over 
> Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to