[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871025#action_12871025 ]
Arnab Nandi commented on PIG-928: --------------------------------- Thanks Dmitriy! Lazy objects are a great idea. Note that I'm not saying that pythontoPig is slow per se -- it's just the biggest part of the profiler trace, and would be a great place for optimization. I ran some numbers on the patch, and it looks like outside of the runtime instantiation, there is a fairly small performance penalty with the current code (1.2x slower). WordCount example from Alan's package.zip: ||Data size||Native||Jython||Factor|| |10K|9s|18s|2| |50K|14s|19s|1.35| |500K|54s|64s|1.19| (Full Data: 8x"War & Peace" from Proj. Gutenberg, 500K lines, 24MB) (TOKENIZE was modified to spaces-only, both implementations have identical output) Python code: {noformat} @outputSchema("s:{d:(word:chararray)}") def tokenize(word): if word is not None: return word.split(' ') {noformat} > UDFs in scripting languages > --------------------------- > > Key: PIG-928 > URL: https://issues.apache.org/jira/browse/PIG-928 > Project: Pig > Issue Type: New Feature > Reporter: Alan Gates > Fix For: 0.8.0 > > Attachments: calltrace.png, package.zip, pig-greek.tgz, > pig.scripting.patch.arnab, pyg.tgz, scripting.tgz, scripting.tgz, test.zip > > > It should be possible to write UDFs in scripting languages such as python, > ruby, etc. This frees users from needing to compile Java, generate a jar, > etc. It also opens Pig to programmers who prefer scripting languages over > Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.