Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "TuringCompletePig" page has been changed by AlanGates.
http://wiki.apache.org/pig/TuringCompletePig?action=diff&rev1=1&rev2=2

--------------------------------------------------

  = Making Pig Latin Turing Complete =
  == Introduction ==
  As more users adopt Pig and begin writing their data processing in Pig Latin 
and as they use Pig to process more and more complex
- tasks, a consistent request from these users is to add branches, loops, and 
functions to Pig Latin.  This will enable Pig Latin to
+ tasks, a consistent request from these users has been to add branches, loops, 
and functions to Pig Latin.  This will enable Pig Latin to
  process a whole new class of problems.  Consider, for example, an algorithm 
that needs to iterate until an error estimate is less
  than a given threshold.  This might look like (this just suggests logic, not 
syntax):
  
@@ -22, +22 @@

  
  == Requirements ==
  The following should be provided by this Turing complete Pig Latin:
-  1. Branching.  This will be satisfied by a standard `if` `else if` `else` 
functionality
+  1. Branching.  This will be satisfied by a standard `if / else if / else` 
functionality
   1. Looping.  This should include standard `while` and some form of `for`.  
for could be C style or Python style (foreach).  Care needs to be taken to 
select syntax that does not cause confusion with the existing `foreach` 
operator in Pig Latin.
   1. Functions.  
   1. Modules.
@@ -49, +49 @@

   * Which scripting language to choose?  Perl, Python, and Ruby all have 
significant adoption and could make a claim to be the best choice.
   * Syntactic and semantic checking is usually delayed until an embedded bit 
of code is reached in the outer control flow.  Given that Pig jobs can run for 
hours this can mean spending hours to discover a simple typo.
  
- Consider for example if built a python class that wrapped !PigServer and then 
translated the above code snippet.
+ Consider for example if Pig provided a Jython class that wrapped !PigServer 
and then we translated the above code snippet.
  
  {{{
      error = 100.0
@@ -68, +68 @@

              grunt.exec("fs mv 'outfile' 'infile'")
  }}}
  
- All of these references to `pig` and `grunt` as objects with command strings 
is undesirable.
+ All of these references to `pig` and `grunt` as objects with command strings 
are undesirable.
  So while I believe that embedding is a much better approach due to the lower 
work load and the plethora of tools available for other
  languages, I do not believe the above is an acceptable way to do it.  Thus I 
would like to place three additional requirements on
  embedded Pig Latin beyond those given above for Turing complete Pig Latin:
@@ -79, +79 @@

  This overcomes two of the three drawbacks noted above.  It does not provide 
for a way to do certain optimizations such as loop
  unrolling, but I think that is acceptable.
  
+ Having rejected the quote style of programming we could choose the Domain 
Specific Language (DSL) option, where we define Pig operators in the
+ target language.  Again using Python as an example:
+ 
+ {{{
+    error = 100.0
+    infile = 'original.data'
+    pig = PigServer()
+    grunt = Grunt()
+    while error > 1.0:
+        A = pig.load(infile, { 'loader' => 'piggybank.MyLoader'});
+        B = A.group(pig.ALL);
+        C = B.foreach { 
+               innerBag = doSomeCalculation(:A);
+               generate innerBag.flatten().as(:result,  :error)
+        }
+        
+        PigIterator pi = pig.openIterator(C, 'outfile');
+        output = grunt.fs.cat('outfile'");
+        bla = output.partition("\t");
+        error = bla(2)
+        if error >= 1.0:
+            grunt.fs.mv('outfile', 'infile');
+ }}}
+ 
+ This meets requirements 7 and 9 above.  It can partially but not fully meet 
8.  It can check that we use the right operators and pass
+ them the right types.  It cannot check the semantics of the operators, for 
example that `infile` exists and is readable.  This might be ok,
+ because it might turn out that things that cannot be checked at script 
compile time should not be checked up front anyway.  As an example, it should 
not 
+ check for `infile` up front because the script may not have created it yet.
+ 
+ This approach has the advantage that it will integrate very nicely with tools 
from the target language.  Debuggers, IDE, etc. will all now
+ view some form of Pig Latin as native to their language.
+ 
+ It does however have drawback, which is that what we would be creating a new 
dialect of Pig Latin.  There would be a Pig Latin dialect used when writing it
+ directly, and a different dialect for embedding.  This leads to confusion and 
duplication of effort.  So I would like to suggest another
+ requirement:
+ 
+   1.#10 Pig Latin should appear the same in the embedded form as in the 
non-embedded form.
+     
- What might this look like?  Again using the script snippet at the top and 
embedding it in Jython, this might look like:
+ Given all these requirements, what might this look like?  Again using the 
script snippet at the top and embedding it in Jython:
  {{{
      error = 100.0
      infile = 'original.data'
@@ -120, +158 @@

  we have to pick a language that compiles to Java byte code.  That leaves us 
with Jython, Jruby, Groovy, or !JavaScript.  Out of that
  field we already have half of the implementation we need in Jython with 
[[https://issues.apache.org/jira/browse/PIG-928|PIG-928]]. 
  
+ Thoughts?  Preferences for one of the options I did not like?  Comments 
welcome.
+ 

Reply via email to