Ashutosh Chauhan commented on PIG-928:

I did some quick benchmarking using BSF approach for UDFs written in Ruby, 
Python, Groovy and native builtin in Pig. It's a standard wordcount example 
where udf tokenizes an input string into number of words. I used pig 
sources(src/org/apache/pig) as input which has more then 210K lines. Since, I 
haven't yet figured out type translation so to be consistent in experiment, I 
passed data as String argument and return type as Object[] in all languages. 
Following are the numbers I got averaged over 3 runs:


This shows Groovy-BSF combo is super-slow and Ruby and Python is much better. 
These numbers must be seen as an absolute worst case. I believe type 
translations, compiling script in constructor and using the compiled version 
instead of evaluating script in every exec() call will give much better 
performance. Also, there might exist other optimizations.

Sometime next week, I will try to repeat the same experiment with javax.script

> UDFs in scripting languages
> ---------------------------
>                 Key: PIG-928
>                 URL: https://issues.apache.org/jira/browse/PIG-928
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Alan Gates
>         Attachments: package.zip
> It should be possible to write UDFs in scripting languages such as python, 
> ruby, etc.  This frees users from needing to compile Java, generate a jar, 
> etc.  It also opens Pig to programmers who prefer scripting languages over 
> Java.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to