Ashutosh Chauhan commented on PIG-928:
I did some quick benchmarking using BSF approach for UDFs written in Ruby,
Python, Groovy and native builtin in Pig. It's a standard wordcount example
where udf tokenizes an input string into number of words. I used pig
sources(src/org/apache/pig) as input which has more then 210K lines. Since, I
haven't yet figured out type translation so to be consistent in experiment, I
passed data as String argument and return type as Object in all languages.
Following are the numbers I got averaged over 3 runs:
This shows Groovy-BSF combo is super-slow and Ruby and Python is much better.
These numbers must be seen as an absolute worst case. I believe type
translations, compiling script in constructor and using the compiled version
instead of evaluating script in every exec() call will give much better
performance. Also, there might exist other optimizations.
Sometime next week, I will try to repeat the same experiment with javax.script
> UDFs in scripting languages
> Key: PIG-928
> URL: https://issues.apache.org/jira/browse/PIG-928
> Project: Pig
> Issue Type: New Feature
> Reporter: Alan Gates
> Attachments: package.zip
> It should be possible to write UDFs in scripting languages such as python,
> ruby, etc. This frees users from needing to compile Java, generate a jar,
> etc. It also opens Pig to programmers who prefer scripting languages over
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.