[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847986#action_12847986 ]
Julien Le Dem commented on PIG-928: ----------------------------------- @Woody The main advantage of embedding pig calls in the scripting language is that it enables iterative algorithms, which Pig is no very good at currently. Why would we limit users to UDFs when they can have their whole program in their scripting language of choice? 4. Python is a very interesting language to integrate with Pig because it has all the same native data structures (tuple:tuple, list:bag, dictionary:map) which makes the UDFs compact and easy to code. That said, in scripting languages that don't match as well as Python to the Pig types, using the schema to disambiguate will be a must have. When do we need to convert sequences and iterators ? Pig has only tuple, bag and map as complex types AFAIK. 5. agreed, It should be cached or initialised at the begining. 3. and 6. I'll investigate passing the main script through the classpath when I have time. One interpreter would be nice to save memory and initialization time. I'm not sure the shared state is such an advantage as UDFs should not rely on being run in the same process. Maybe I'm just missing something. About the multi language: I'm not against it, but there's not that much code to share. The scripting<->pig type conversion is specific to each language as you mentioned. also calling functions, getting a list of functions, defining output schemas will be specific. How I see the multilanguage: pig local|mapred -script {language} {scriptfile} main program: - generic: loads the sript file - generic: makes the script available in the classpath of the tasks (through a jar generated on the fly?) - specific: initializes the interpreter for the scripting language - specific: adds the global variables defined by pig for the main (in my case: decorators, pig server instance) - generic: loads the script in the interpreter - specific: figures out the list of functions and registers them automatically as UDFs in PIG using a dedicated UDF wrapper class - specific: run the main Pig execute call from the script: - generic: parse the Pig string to replace ${expression} by the value of the expression as evaluated by the interpreter in the local scope. UDF init: - generic: loads the script from the classpath - specific: initializes the interpreter for the scripting language - specific: add the global variables defined by pig for the UDFs (in my case: decorators) - generic: loads the script in the interpreter - specific: figures out the runtime for the outputSchema: function call or static schema (parsing of schema generic) UDF call: - specific: convert a pig tuple to a parameter list in the scripting language types - specific: call the function with the parameters - specific: convert the result to Pig types - generic: return the result > UDFs in scripting languages > --------------------------- > > Key: PIG-928 > URL: https://issues.apache.org/jira/browse/PIG-928 > Project: Pig > Issue Type: New Feature > Reporter: Alan Gates > Attachments: package.zip, pyg.tgz, scripting.tgz, scripting.tgz > > > It should be possible to write UDFs in scripting languages such as python, > ruby, etc. This frees users from needing to compile Java, generate a jar, > etc. It also opens Pig to programmers who prefer scripting languages over > Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.