[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888100#action_12888100 ]
Ashutosh Chauhan commented on PIG-928: -------------------------------------- * Do you want to allow: {{register myJavaUDFs.jar using 'java' as 'javaNameSpace'}} ? Use-case could be that if we are allowing namespaces for non-java, why not allow for Java udfs as well. But then {{define}} is exactly for this purpose. So, it may make sense to throw exception for such a case. * In ScriptEngine.getJarPath() shouldn't you throw a FileNotFoundException instead of returning null. * Don't gobble up Checked Exceptions and then rethrow RuntimeExceptions. Throw checked exceptions, if you need to. * ScriptEngine.getInstance() should be a singleton, no? * In JythonScriptEngine.getFunction() I think you should check if interpreter.get(functionName) != null and then return it and call Interpreter.init(path) only if its null. * In JythonUtils, for doing type conversion you should make use of both input and output schemas (whenever they are available) and avoid doing reflection for every element. You can get hold of input schema through outputSchema() of EvalFunc and then do UDFCOntext magic to use it. If schema == null || schema == bytearray, you need to resort to reflections. Similarily if outputSchema is available via decorators, use it to do type conversions. * In jythonUtils.pythonToPig() in case of Tuple, you first create Object[] then do Arrays.asList(), you can directly create List<Object> and avoid unnecessary casting. In the same method, you are only checking for long, dont you need to check for int, String etc. and then do casting appropriately. Also, in default case I think we cant let object pass as it is using Object.class, it could be object of any type and may cause cryptic errors in Pipeline, if let through. We should throw an exception if we dont know what type of object it is. Similar argument for default case of pigToPython() * I didn't get why the changes are required in POUserFunc. Can you explain and also add it as comments in the code. Testing: * This is a big enough feature to warrant its own test file. So, consider adding a new test file (may be TestNonJavaUDF). Additionally, we see frequent timeouts on TestEvalPipeline, we dont want it to run any longer. * Instead of adding query through pigServer.registerCode() api, add it through pigServer.registerQuery(register myscript.py using "jython"). This will make sure we are testing changes in QueryParser.jjt as well. * Add more tests. Specifically, for complex types passed to the udfs (like bag) and returning a bag. You can get bags after doing a group-by. You can also take a look at original Julien's patch which contained a python script. Those I guess were at right level of complexity to be added as test-cases in our junit tests. Nit-picks: * Unnecessary import in JythonFunction.java * In PigContext.java, you are using Vector and LinkedList, instead of usual ArrayList. Any particular reason for it, just curious? * More documentation (in QuerParser.jjt, ScriptEngine, JythonScriptEngine (specifically for outputSchema, outputSchemaFunction, schemafunction)) * Also keep an eye of recent "mavenization" efforts of Pig, depending on when it gets checked-in you may (or may not) need to make changes to ivy > UDFs in scripting languages > --------------------------- > > Key: PIG-928 > URL: https://issues.apache.org/jira/browse/PIG-928 > Project: Pig > Issue Type: New Feature > Reporter: Alan Gates > Assignee: Aniket Mokashi > Fix For: 0.8.0 > > Attachments: calltrace.png, package.zip, PIG-928.patch, > pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, > RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, > RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, > RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, > RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip > > > It should be possible to write UDFs in scripting languages such as python, > ruby, etc. This frees users from needing to compile Java, generate a jar, > etc. It also opens Pig to programmers who prefer scripting languages over > Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.