[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872007#action_12872007
 ] 

Arnab Nandi commented on PIG-928:
---------------------------------

Thanks for looking into the patch Ashutosh! Very good question, short answer: I 
couldn't come up with an elegant solution using {{define}}  :)
 
I spent a bunch of time thinking about the "right thing to do" before going 
this way. As Woody mentioned, my initial instinct was to do this in in 
{{define}}, but kept hitting roadblocks when working with {{define}}:

# I came up with the analogy that "register" is like "import" in java, and 
"define" is like "alias" in bash. In this interpretation, whenever you want to 
introduce new code, you {{register}} it with Pig. Whenever you want to alias 
anything for convenience or to add meta-information, you {{define}} it. 
# Define is not amenable to multiple functions in the same script. 
#* For example, to follow the {{stream}} convention, {quote} \{define X 'x.py' 
[inputoutputspec][schemaspec];\}. {quote} Which function is the input/output 
spec for? A solution like {quote} \{[func1():schemaspec1,func2:schemaspec2]} 
{quote} is... ugly.
#* Further, how do we access these functions? One solution is to have the 
namespace as a codeblock, e.g. X.func1(), which is doable by registering 
functions as "X.func1", but we're (mis)leading the user to believe there is 
some sort of real namespacing going on. I foresee multi-function files as a 
very common use case; people could have a "util.py" with their commonly used 
suite of functions instead of forcing 1 file per 2-3 line function. 
#* Note that Julien's @decorator idea cleanly solves this problem and I think 
it'll work for all languages.
# With inline {{define}}, most languages have the convention of mentioning 
function definitions with the function name, input references & return schema 
spec, it seems redundant to force the user to break this convention and have 
something like {quote} \{define x as script('def X(a,b): return a + b;');}, 
{quote} and have x.X(). Lambdas can solve this problem halfway, you'll need to 
then worry about the schema spec and we're back at a kludgy solution!
# My plan for inline functions is to write all to a temp file (1 per script 
engine) and then deal with them as registering a file.
# Jython code runs in its own interpreter because I couldn't figure out how to 
load Jython bytecode into Java, this has something to do with the lack of a 
jythonc afaik(I may be wrong). There will be one interpreter per non-compilable 
scriptengine, for others(Janino, Groovy), we load the class directly into the 
runtime.
# From a code-writing perspective, overloading {{define}} to tack on a third 
use-case despite would involve an overhaul to the POStream physical operator 
and felt very inelegant; register on the other hand is well contained to a 
single purpose -- including files for UDFs.
# Consider the use of Janino as a ScriptEngine. Unlike the Jython scriptengine, 
this loads java UDFs into the native runtime and doesn't translate objects; so 
we're looking at potentially _zero_ loss of performance for inline UDFs (or 
register 'UDF.java'; ). The difference between native and script code gets 
blurry here...

[tl;dr] ...and then I thought fair enough, let's just go with {{register}}! :D

> UDFs in scripting languages
> ---------------------------
>
>                 Key: PIG-928
>                 URL: https://issues.apache.org/jira/browse/PIG-928
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Alan Gates
>             Fix For: 0.8.0
>
>         Attachments: calltrace.png, package.zip, pig-greek.tgz, 
> pig.scripting.patch.arnab, pyg.tgz, scripting.tgz, scripting.tgz, test.zip
>
>
> It should be possible to write UDFs in scripting languages such as python, 
> ruby, etc.  This frees users from needing to compile Java, generate a jar, 
> etc.  It also opens Pig to programmers who prefer scripting languages over 
> Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to