[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872007#action_12872007 ]
Arnab Nandi commented on PIG-928: --------------------------------- Thanks for looking into the patch Ashutosh! Very good question, short answer: I couldn't come up with an elegant solution using {{define}} :) I spent a bunch of time thinking about the "right thing to do" before going this way. As Woody mentioned, my initial instinct was to do this in in {{define}}, but kept hitting roadblocks when working with {{define}}: # I came up with the analogy that "register" is like "import" in java, and "define" is like "alias" in bash. In this interpretation, whenever you want to introduce new code, you {{register}} it with Pig. Whenever you want to alias anything for convenience or to add meta-information, you {{define}} it. # Define is not amenable to multiple functions in the same script. #* For example, to follow the {{stream}} convention, {quote} \{define X 'x.py' [inputoutputspec][schemaspec];\}. {quote} Which function is the input/output spec for? A solution like {quote} \{[func1():schemaspec1,func2:schemaspec2]} {quote} is... ugly. #* Further, how do we access these functions? One solution is to have the namespace as a codeblock, e.g. X.func1(), which is doable by registering functions as "X.func1", but we're (mis)leading the user to believe there is some sort of real namespacing going on. I foresee multi-function files as a very common use case; people could have a "util.py" with their commonly used suite of functions instead of forcing 1 file per 2-3 line function. #* Note that Julien's @decorator idea cleanly solves this problem and I think it'll work for all languages. # With inline {{define}}, most languages have the convention of mentioning function definitions with the function name, input references & return schema spec, it seems redundant to force the user to break this convention and have something like {quote} \{define x as script('def X(a,b): return a + b;');}, {quote} and have x.X(). Lambdas can solve this problem halfway, you'll need to then worry about the schema spec and we're back at a kludgy solution! # My plan for inline functions is to write all to a temp file (1 per script engine) and then deal with them as registering a file. # Jython code runs in its own interpreter because I couldn't figure out how to load Jython bytecode into Java, this has something to do with the lack of a jythonc afaik(I may be wrong). There will be one interpreter per non-compilable scriptengine, for others(Janino, Groovy), we load the class directly into the runtime. # From a code-writing perspective, overloading {{define}} to tack on a third use-case despite would involve an overhaul to the POStream physical operator and felt very inelegant; register on the other hand is well contained to a single purpose -- including files for UDFs. # Consider the use of Janino as a ScriptEngine. Unlike the Jython scriptengine, this loads java UDFs into the native runtime and doesn't translate objects; so we're looking at potentially _zero_ loss of performance for inline UDFs (or register 'UDF.java'; ). The difference between native and script code gets blurry here... [tl;dr] ...and then I thought fair enough, let's just go with {{register}}! :D > UDFs in scripting languages > --------------------------- > > Key: PIG-928 > URL: https://issues.apache.org/jira/browse/PIG-928 > Project: Pig > Issue Type: New Feature > Reporter: Alan Gates > Fix For: 0.8.0 > > Attachments: calltrace.png, package.zip, pig-greek.tgz, > pig.scripting.patch.arnab, pyg.tgz, scripting.tgz, scripting.tgz, test.zip > > > It should be possible to write UDFs in scripting languages such as python, > ruby, etc. This frees users from needing to compile Java, generate a jar, > etc. It also opens Pig to programmers who prefer scripting languages over > Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.