Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "TuringCompletePig" page has been changed by AlanGates.
http://wiki.apache.org/pig/TuringCompletePig?action=diff&rev1=2&rev2=3

--------------------------------------------------

  
  Thoughts?  Preferences for one of the options I did not like?  Comments 
welcome.
  
+ == Approach 2 ==
+ And now for something completely different.
+ 
+ After thinking on the above for a week or so it occurs to me that in 
dismissing making Pig Latin itself Turing complete I am conflating two tasks
+ that could be decoupled.  The first is defining a grammar for the language 
and extending the parser.  The second is building an execution engine to execute
+ Pig Latin scripts.  It is the second that I am concerned is too much work.  
Defining the grammar and building the parser is relatively easy (as
+ we say in the Pig team at Yahoo, "parsers are easy").
+ 
+ So what if we did extend Pig Latin itself to be Turing complete, but the 
first pass over the language was to compile it down to Java code that made
+ use of the existing !PigServer class to execute the code?  This meets all ten 
requirements given above (some extra work will need to be done to meet
+ requirement 8 on up front semantic checking, but it is possible).  It deals 
with my initial concern that supporting Turing completeness in Pig Latin
+ is too much work.  It also has the exceedingly nice feature that we do not 
have to pick any one scripting language.  The more I talked to people the
+ more I discovered some wanted Python, some Ruby, some Perl, some Groovy, etc. 
 This avoids that problem.  And the extensions to Pig Latin themselves
+ will be simple enough that it should not be onerous for people to learn it.  
It also means that at some future time if we decide that we want more
+ control over how the language is executed we can make changes without people 
needing to switch from whatever scripting language we embed it in.
+ 
+ A significant downside to this proposal is now users have to have a Java 
compiler along to run their Pig Latin scripts.
+ 
+ The other concerns I gave above about making Pig Latin Turing complete are 
somewhat addressed, but not totally.  It would be possible, though
+ painful, to use a Java debugger on the generated Java code.  Syntax 
highlighting and completion files could be created for Vim, Emacs, Eclipse, and
+ whatever other favorite editors people have.
+ 
+ === Specifics ===
+ The grammar of the language should be kept as simple as possible.  The goal 
is not to create a general purpose programming language.
+ Tasks requiring these features should still be written in UDFs in Java or a 
scripting language.
+ 
+ Each Pig Latin file would be considered as a module.  All functions would 
have global scope within that module and would be visible once the module is
+ imported.
+ 
+ The type system would be existing Pig Latin types (we may need to add a list 
type).  Types would be bound at run time (this is necessary to support
+ existing PL grammar where A = load ... is a declaration of A).
+ 
+ The grammar would look something like:
+ 
+ {{{
+ program:
+       import
+     | register
+     | define
+     | func_definition
+     | block
+ 
+ import:
+       IMPORT _modulename_ namespace_clause
+ 
+ namespace_clause:
+       (empty)
+     | AS _namespacename_
+ 
+ register:
+       ... // as now
+ 
+ define:
+       ... // as now
+ 
+ func_definition:
+       DEF _functionname_ ( arg_list ) { block }
+       // not sure about this, having DEF and DEFINE different keywords.
+       // May want to reuse DEFINE here or DEFINE FUNCTION
+ 
+ arg_list:
+       expr
+     | arg_list , expr
+ 
+ block:
+       statement
+     | block statement
+ 
+ statement:
+       ;
+     | assignment
+     | if
+     | while
+     | for
+     | return // only valid inside functions
+     | CONTINUE ; // only valid inside loops
+     | BREAK ; // only valid inside loops
+     | split
+     | store
+     | dump
+     | fs
+ 
+ assignment:
+       _var_ = expr ;
+     | _var_ = LOAD _inputsrc_ ;
+     ... // GROUP, FILTER, etc. as now
+ 
+ statement_or_block:
+       statement
+     | { block }
+ 
+ if:
+       IF ( expr ) statement_or_block else
+ 
+ else:
+       (empty)
+     | ELSE statement_or_block
+ 
+ while:
+       WHILE ( expr ) statement_or_block
+ 
+ for:
+       FOR ( assignment ; expr ; expr ) statement_or_block
+ 
+ return:
+       RETURN ;
+     | RETURN expr ;
+ 
+ // split, dump, store, fs as now
+ }}}
+ 
+ So the example given initially would look like:
+ {{{
+     error = 100.0;
+     infile = 'original.data';
+     outfile = 'result.data';
+     while (error > 1.0) {
+         A = load infile;
+         B = group A all;
+         C = foreach B generate flatten(doSomeCalculation(A)) as (result, 
error);
+         error = foreach C generate error;
+         store C into outfile;
+         if (error > 1.0) fs mv outfile infile;
+     }
+ }}}
+ 
+ and would compile down to Java as:
+ 
+ {{{
+     import org.apache.pig.PigServer;
+ 
+     public class PigLatinScript {
+ 
+         public static void main(string[] args) {
+             Object error = new Double(100.0);
+             Object infile = new String("original.data");
+             Object outfile = new String("result.data");
+             while (error != null && (Double)error > 1.0) {
+                 PigServer ps = new PigServer();
+                 ps.registerQuery("A = load infile;");
+                 ps.registerQuery("B = group A all;");
+                 ps.registerQuery("C = foreach B generate 
flatten(doSomeCalculation(A)) as (result, error);");
+                 ps.registerQuery("error = foreach C generate error;");
+                 ps.store("C", outfile);
+                 Iterator<tuple> i = ps.openIterator("error");
+                 if (i.hasNext()) error = i.next();
+                 else error = null;
+                 if (error != null && (Double)error > 1.0) fs mv outfile 
infile;
+             }
+         }
+     }
+ }}}
+ 

Reply via email to