Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The "TuringCompletePig" page has been changed by AlanGates. http://wiki.apache.org/pig/TuringCompletePig?action=diff&rev1=2&rev2=3 -------------------------------------------------- Thoughts? Preferences for one of the options I did not like? Comments welcome. + == Approach 2 == + And now for something completely different. + + After thinking on the above for a week or so it occurs to me that in dismissing making Pig Latin itself Turing complete I am conflating two tasks + that could be decoupled. The first is defining a grammar for the language and extending the parser. The second is building an execution engine to execute + Pig Latin scripts. It is the second that I am concerned is too much work. Defining the grammar and building the parser is relatively easy (as + we say in the Pig team at Yahoo, "parsers are easy"). + + So what if we did extend Pig Latin itself to be Turing complete, but the first pass over the language was to compile it down to Java code that made + use of the existing !PigServer class to execute the code? This meets all ten requirements given above (some extra work will need to be done to meet + requirement 8 on up front semantic checking, but it is possible). It deals with my initial concern that supporting Turing completeness in Pig Latin + is too much work. It also has the exceedingly nice feature that we do not have to pick any one scripting language. The more I talked to people the + more I discovered some wanted Python, some Ruby, some Perl, some Groovy, etc. This avoids that problem. And the extensions to Pig Latin themselves + will be simple enough that it should not be onerous for people to learn it. It also means that at some future time if we decide that we want more + control over how the language is executed we can make changes without people needing to switch from whatever scripting language we embed it in. + + A significant downside to this proposal is now users have to have a Java compiler along to run their Pig Latin scripts. + + The other concerns I gave above about making Pig Latin Turing complete are somewhat addressed, but not totally. It would be possible, though + painful, to use a Java debugger on the generated Java code. Syntax highlighting and completion files could be created for Vim, Emacs, Eclipse, and + whatever other favorite editors people have. + + === Specifics === + The grammar of the language should be kept as simple as possible. The goal is not to create a general purpose programming language. + Tasks requiring these features should still be written in UDFs in Java or a scripting language. + + Each Pig Latin file would be considered as a module. All functions would have global scope within that module and would be visible once the module is + imported. + + The type system would be existing Pig Latin types (we may need to add a list type). Types would be bound at run time (this is necessary to support + existing PL grammar where A = load ... is a declaration of A). + + The grammar would look something like: + + {{{ + program: + import + | register + | define + | func_definition + | block + + import: + IMPORT _modulename_ namespace_clause + + namespace_clause: + (empty) + | AS _namespacename_ + + register: + ... // as now + + define: + ... // as now + + func_definition: + DEF _functionname_ ( arg_list ) { block } + // not sure about this, having DEF and DEFINE different keywords. + // May want to reuse DEFINE here or DEFINE FUNCTION + + arg_list: + expr + | arg_list , expr + + block: + statement + | block statement + + statement: + ; + | assignment + | if + | while + | for + | return // only valid inside functions + | CONTINUE ; // only valid inside loops + | BREAK ; // only valid inside loops + | split + | store + | dump + | fs + + assignment: + _var_ = expr ; + | _var_ = LOAD _inputsrc_ ; + ... // GROUP, FILTER, etc. as now + + statement_or_block: + statement + | { block } + + if: + IF ( expr ) statement_or_block else + + else: + (empty) + | ELSE statement_or_block + + while: + WHILE ( expr ) statement_or_block + + for: + FOR ( assignment ; expr ; expr ) statement_or_block + + return: + RETURN ; + | RETURN expr ; + + // split, dump, store, fs as now + }}} + + So the example given initially would look like: + {{{ + error = 100.0; + infile = 'original.data'; + outfile = 'result.data'; + while (error > 1.0) { + A = load infile; + B = group A all; + C = foreach B generate flatten(doSomeCalculation(A)) as (result, error); + error = foreach C generate error; + store C into outfile; + if (error > 1.0) fs mv outfile infile; + } + }}} + + and would compile down to Java as: + + {{{ + import org.apache.pig.PigServer; + + public class PigLatinScript { + + public static void main(string[] args) { + Object error = new Double(100.0); + Object infile = new String("original.data"); + Object outfile = new String("result.data"); + while (error != null && (Double)error > 1.0) { + PigServer ps = new PigServer(); + ps.registerQuery("A = load infile;"); + ps.registerQuery("B = group A all;"); + ps.registerQuery("C = foreach B generate flatten(doSomeCalculation(A)) as (result, error);"); + ps.registerQuery("error = foreach C generate error;"); + ps.store("C", outfile); + Iterator<tuple> i = ps.openIterator("error"); + if (i.hasNext()) error = i.next(); + else error = null; + if (error != null && (Double)error > 1.0) fs mv outfile infile; + } + } + } + }}} +