[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888100#action_12888100
 ] 

Ashutosh Chauhan commented on PIG-928:
--------------------------------------

* Do you want to allow: {{register myJavaUDFs.jar using 'java' as 
'javaNameSpace'}} ? Use-case could be that if we are allowing namespaces for 
non-java, why not allow for Java udfs as well. But then {{define}} is exactly 
for this purpose. So, it may make sense to throw exception for such a case.
* In ScriptEngine.getJarPath() shouldn't you throw a FileNotFoundException 
instead of returning null.
* Don't gobble up Checked Exceptions and then rethrow RuntimeExceptions. Throw 
checked exceptions, if you need to.
* ScriptEngine.getInstance() should be a singleton, no?
* In JythonScriptEngine.getFunction() I think you should check if 
interpreter.get(functionName) != null and then return it and call 
Interpreter.init(path) only if its null.
* In JythonUtils, for doing type conversion you should make use of both input 
and output schemas (whenever they are available) and avoid doing reflection for 
every element. You can get hold of input schema through outputSchema() of 
EvalFunc and then do UDFCOntext magic to use it. If schema == null || schema == 
bytearray, you need to resort to reflections. Similarily if outputSchema is 
available via decorators, use it to do type conversions.  
* In jythonUtils.pythonToPig() in case of Tuple, you first create Object[] then 
do Arrays.asList(), you can directly create List<Object> and avoid unnecessary 
casting. In the same method, you are only checking for long, dont you need to 
check for int, String  etc. and then do casting appropriately. Also, in default 
case I think we cant let object pass as it is using Object.class, it could be 
object of any type and may cause cryptic errors in Pipeline, if let through. We 
should throw an exception if we dont know what type of object it is. Similar 
argument for default case of pigToPython() 
* I didn't get why the changes are required in POUserFunc. Can you explain and 
also add it as comments in the code.

Testing:

* This is a big enough feature to warrant its own test file. So, consider 
adding a new test file (may be TestNonJavaUDF). Additionally, we see frequent 
timeouts on TestEvalPipeline, we dont want it to run any longer.
* Instead of adding query through pigServer.registerCode() api, add it through 
pigServer.registerQuery(register myscript.py using "jython"). This will make 
sure we are testing changes in QueryParser.jjt as well.
* Add more tests. Specifically, for complex types passed to the udfs (like bag) 
and returning a bag. You can get bags after doing a group-by. You can also take 
a look at original Julien's patch which contained a python script. Those I 
guess were at right level of complexity to be added as test-cases in our junit 
tests.

Nit-picks:

* Unnecessary import in JythonFunction.java
* In PigContext.java, you are using Vector and LinkedList, instead of usual 
ArrayList. Any particular reason for it, just curious?
* More documentation (in QuerParser.jjt, ScriptEngine, JythonScriptEngine 
(specifically for outputSchema, outputSchemaFunction, schemafunction))
* Also keep an eye of recent "mavenization" efforts of Pig, depending on when 
it gets checked-in you may (or may not) need to make changes to ivy

> UDFs in scripting languages
> ---------------------------
>
>                 Key: PIG-928
>                 URL: https://issues.apache.org/jira/browse/PIG-928
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Alan Gates
>            Assignee: Aniket Mokashi
>             Fix For: 0.8.0
>
>         Attachments: calltrace.png, package.zip, PIG-928.patch, 
> pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
> RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
> RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
> RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
> RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip
>
>
> It should be possible to write UDFs in scripting languages such as python, 
> ruby, etc.  This frees users from needing to compile Java, generate a jar, 
> etc.  It also opens Pig to programmers who prefer scripting languages over 
> Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to