     * Stream Programming Model / MIT Stream-It
        * official page for stream-it: 
(Articles on the compiler might be useful)
- == Physical Plan Structure ==
+ == Logical and Physical Plans ==
   1. A criterion we are adopting in the redesign of the logical and physical 
layer in Pig is to promote what used to be EvalSpec’s and Cond’s to 
   1. Such approach provides: 1.) a clearer definition of the language; 2.) 
better identification of possibility for optimizations of various form
@@ -154, +154 @@

   1. These are the following exceptions:
      1. Logical Cogroup to be translated into a Physical LocalRearrange and 
      2. Chris mentioned that even Algebraic Functions are exceptions to this.
+ === Logical Plan ===
+ The logical plan will consist of a directed acyclic graph (DAG) of with 
+ operators as the nodes, and data flow between the operators as the edges.
+ The focus of the logical operators is post parse stage checking (such as type
+ checking), optimization, and translation to a physical plan.  The logical
+ operators will contain information that facilitates these objectives.  
+ A list of the classes used to model the logical plan follows, with a
+ description of the classes, and the interfaces defined for major classes, or
+ in some cases one interface defined to give an example of a set of classes.
+ ==== LogicalPlan ====
+ The class !LogicalPlan will contain the collection of logical operators.  It
+ will not contain the edges between the operators.  To see why, consider the 
following simple pig script:
+ {{{
+ a = load 'myfile';
+ b = filter a by $0 > 5;
+ store b into 'myfilteredfile';
+ }}}
+ This will generate a logical plan that looks something like this:
+ attachment:SimpleLP.jpg
+ Notice that the graph edges represent data flow between relational operators
+ (LOAD, FILTER, STORE, PROJECT) and between expression operators (greater than,
+ CONSTANT), and between the two types.  This means that edges in the graph have
+ meaning beyond simply input and output for a node.  For example, the filter
+ node has two inputs, the load node and its condition (in this case, the
+ greater than node).  These inputs however have different semantics, as tuples
+ coming from the load input are evaluated based on the boolean result coming
+ from the conditional input and then possibly passed on to the store node.  
Given the differing semantics of different
+ inputs, it seems better to encode the edges of the graph in the logical
+ operators themselves, rather than in a generic graph contain object.
+ In addition to containing the collection of logical operators, !LogicalPlan
+ will provide methods for callers to insert logical operators into the graph,
+ and connect the inputs and outputs of operators.  These connections will only
+ be for data flow inputs and outputs, not contextual inputs such as the
+ condition on a filter node.  This strains the model described above somewhat,
+ but it allows for generic manipulation of inputs and outpus of the operators
+ without every visitor to the tree needing to understand all the different
+ operator types.
+ The interface for !LogicalPlan is:
+ {{{
+ public class LogicalPlan {
+       private static final long serialVersionUID = 2L;
+       protected PigContext mContext = null;
+     protected Map<LogicalOperator, OperatorKey> mOps;
+     protected Map<OperatorKey, LogicalOperator> mKeys;
+     private List<LogicalOperator> mRoots;
+       public LogicalPlan(PigContext pigContext) {
+               ...
+       }
+       public LogicalOperator getRoots() {
+               ...
+     }
+       public PigContext getPigContext() {
+               return mContext;
+       }
+       public byte getOutputType(){
+               return root.getOutputType();
+       }
+     /**
+      * Given an operator, find its OperatorKey.
+      * @param op Logical operator.
+      * @return associated OperatorKey
+      */
+     public OperatorKey getOperatorKey(LogicalOperator op) {
+         return mOps.get(op);
+     }
+     /**
+      * Given an OperatorKey, find the associated operator.
+      * @param opKey OperatorKey
+      * @return associated operator.
+      */
+     public LogicalOperator getOperator(OperatorKey opKey) {
+         return mKeys.get(opKey);
+     }
+     /**
+      * Insert an operator into the plan.  This only inserts it as a node in
+      * the graph, it does not connect it to any other operators.  That should
+      * be done as a separate step using makeSuccessor or addSuccessor.
+      * @param op Logical Operator to add to the plan.
+      */
+     public void add(LogicalOperator op) {
+               ...
+     }
+     /**
+      * Make one operator the <b>sole</b> input of another.  If that operator
+      * already has an input, that operator will become the passed in
+      * operator's input.  So, for example, if the plan current contains
+      * three nodes:  a, b, c.  And a is currently c's input, and this
+      * function is called makeInput(b, c), then a will become b's input
+      * and b will become c's input.
+      * @param op Operator to make input of another operator.
+      * @param inputOf Operator to make op an input of.
+      * @throws IOException if op or inputOf are not in the plan.
+      */
+     public void makeInput(LogicalOperator op,
+                           LogicalOperator inputOf) throws IOException {
+         ...
+     }
+     /**
+      * Make one operator an <b>additional</b> input of another.  This can only
+      * legally be called on operators that can have multiple inputs, such as
+      * Cogroup, Generate, or BinaryExpression.
+      * @param op Operator to make input of another operator.
+      * @param inputOf Operator to make op an input of.
+      * @throws IOException if op or inputOf are not in the plan.
+      */
+     public void addInput(LogicalOperator op,
+                          LogicalOperator inputOf) throws IOException {
+         ...
+     }
+     /**
+      * Remove an operator from the plan.  Connections in the graph will be
+      * reconnected after the operator is removed.  So if a is b's input and b
+      * is c's input, and b is removed, then a will become c's input.
+      * @param op Operator to revmove.
+      * @throws IOException if op or inputOf are not in the plan.
+      */
+     public void remove(LogicalOperator op) throws IOException {
+         ...
+     }
+ }
+ }}}
+ ==== LogicalOperator ====
+ All logical operators will be a subclass of !LogicalOperator.
+ !LogicalOperator itself will contain lists of the inputs and outputs of the
+ operator, the schema for the operator, and the data type of the operator.
+ {{{
+ abstract public class LogicalOperator {
+     private static final long serialVersionUID = 2L;
+     /**
+        * Schema associated with this logical operator.
+        */
+     protected Schema mSchema;
+     /**
+      * OperatorKey associated with this operator.  This key is used to find 
+      * operator in the LogicalPlan.
+      */
+     protected OperatorKey mKey;
+     /**
+      * Datatype of this output of this operator.  Operators start out with 
data type
+      * set to UNKNOWN, and have it set for them by the type checker.
+      */
+     protected byte mType = DataType.UNKNOWN;
+     /**
+        * Requested level of parallelism for this operation.
+        */
+     protected int mRequestedParallelism;
+     /**
+        * References to an operators inputs
+        */
+     protected List<LogicalOperator> mInputs;
+     /**
+        * Back pointers so that the logical plan can be navigated in either 
+        */
+     protected List<LogicalOperator> mOutputs;
+     /**
+      * Equivalent to LogicalOperator(k, 0).
+      * @param - k Operator key to assign to this node.
+      */
+     public LogicalOperator(OperatorKey k) {
+         this(k, 0);
+     }
+     /**
+      * @param - k Operator key to assign to this node.
+      * @param = rp degree of requested parallelism with which to execute this 
+      */
+     public LogicalOperator(OperatorKey k, int rp) {
+               ...
+     }
+     /**
+      * Get the operator key for this operator.
+      */
+     public OperatorKey getOperatorKey() {
+         return mKey;
+     }
+     /**
+      * Set the schema for this oeprator.
+      * @param schema Schema to set.
+      */
+     public void setSchema(Schema schema) {
+         mSchema = schema;
+     }
+     /**
+      * Get a copy of the schema for the output of this operator.
+      */
+     public Schema getSchema() {
+         return mSchema;
+     }
+     /**
+      * Set the type of this operator.  This should only be called by the type
+      * checking routines.
+      * @param type - Type to set this operator to.
+      */
+     final public void setType(byte t) {
+         mType = t;
+     }
+     /**
+      * Get the type of this operator.
+      */
+     public byte getType() {
+         return mType;
+     }
+     /**
+      * Get a list of all inputs to the operator.
+      */
+     public List<LogicalOperator> getInputs() {
+         return mInputs;
+     }
+     /**
+      * Get a list of all outputs to the operator.
+      */
+     public List<LogicalOperator> getOutputs() {
+         return mOutputs;
+     }
+     public abstract void visit(LOVisitor v) throws ParseException;
+     public abstract String name();
+     @Override
+     public String toString() {
+               ...
+     }
+ }
+ }}}
+ Each of the relational operators will be modeled as a logical operator.  
There will be a class !ExpressionOperator that extends !LogicalOperator and
+ represents all types of expression operators.  The class hierarchy will look
+ like:
+ Extenders of !LogicalOperator:
+   * LOLoad
+   * LOStore
+   * LO!ForEach
+   * LOGenerate
+   * LOFilter
+   * LO!CoGroup
+   * LOSort
+   * LODistinct
+   * LOProject
+   * LO!MapLookup
+   * LOStream
+   * LOSplit
+   * LOUnion
+   * !ExpressionOperator (abstract, represents all expression types)
+ Extenders of !ExpressionOperator
+   * !BinaryExpressionOperator (abstract, represents all binary expressions)
+   * !UnaryExpressionOperator (abstract, represents all unary expressions)
+   * LO!BinCond
+   * LOConst (constant values)
+   * LO!UserFunc (invocation of user defined function)
+   * LOParend
+   * LOCast
+ Extenders of !BinaryExpressionOperator
+   * LOAnd
+   * LOOr
+   * LO!GreaterThan
+   * LO!GreaterThanEqual
+   * LOEqual
+   * LO!LesserThan
+   * LO!LesserThanEqual
+   * LO!NotEqual
+   * LOAdd
+   * LOSubstract
+   * LOMultiply
+   * LODivide
+   * LOMod
+ Extenders of !UnaryExpressionOperator
+   * LONot
+   * LONegative
+ ==== Logical Plan Visitors ====
+ The method for accessing the logical plan will be a visitor class, !LOVisitor.
+ This class will contain the logic for traversing logical plans.  Any class 
+ needs to operate on the plan should extend this class.  The extending class
+ need not provide logic to navigate the logical plan (unless it needs to
+ navigate it in some non-standard way).  It just needs to provide logic for the
+ specific operations it wants to do on the tree.
+ LOVisitor:
+ {{{
+ abstract public class LOVisitor {
+     /**
+      * Extenders of this class should implement this to either call 
+      * depthFirst() dependencyOrder().  If order is not important, 
+      * depthFirst() should be called, as it's faster.  If it is important
+      * that your nodes only be visited after all the nodes they depend on
+      * have been visited than you should call dependencyOrder() instead.
+      */
+     public abstract void visit();
+      /**
+      * Only LOFilter.visit() and subclass implementations of this function
+      * should ever call this method.
+      */
+     void visitFilter(LOFilter f) throws ParseException {
+         f.getCondition().visit(this);
+         f.getInput().visit(this);
+     }
+       // And so on with other LO operators
+     /**
+      * Visit the graph in a depth first traversal.
+      */
+     private void depthFirst() {
+         // TODO
+     }
+     /**
+      * Visit the graph in a way that guarantees that no node is visited before
+      * all the nodes it depends on (that is, all those giving it input) have
+      * already been visited.
+      */
+     private void dependencyOrder() {
+         // TODO
+     }
+ }
+ }}}
+ === Physical Plan Structure ===
  === Logical to Physical Translation Scheme ===

