[Pig Wiki] Update of PlanTestingHelper by PiSong

2008-05-26 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Pig Wiki for change 
notification.

The following page has been changed by PiSong:
http://wiki.apache.org/pig/PlanTestingHelper

New page:
= Plan Testing Helper =

This is a small utility that I developed for testing my type checking logic. I 
think it might be useful for other people as well so I have refactored a bit to 
make it more generic.

== Use cases ==
Here are steps that I do for type checking:-
 * Construct a plan
 * Run type-checking logic against the plan
 * Construct the expected plan
 * Compare structures of the actual plan and the expected plan.

Here are steps that one might do for query parser:-
 * Given a query string, construct the plan.
 * Construct the expected plan
 * Compare two plans

Here for testing plan optimizer:-
 * Construct a plan
 * Run optimizer
 * Construct the expected plan
 * Compare structures of the actual plan and the expected plan.

== What can be facilitated? ==
So there are two common bits from above use cases:-
 1. Construct the expected plan
 1. Compare two plans

== Construct a plan ==

 What is Dot Language? 
Dot language is a text graph description language. There are three main object 
types: node, edge, and graph. All of them can have custom attributes.

 Sample Dot graph 
{{{
digraph plan1 {

load [color=black]

load - distinct - split - splitOut1 [style=dotted] ;
split - splitOut2 ;
splitOut1 - cross ;
splitOut2 - cross ;
}
}}}

'''Note''': digraph dictates that this is a description of directed graph 
which is the domain we're interested in.

'''Note''': load [color=black] is attaching an attribute to the node. This 
is optional.

By extending Dot a bit, we can encode our logical plan in the following format:-

{{{
digraph graph1 {

load[key=114, type=LOLoad, schema=field1: int, field2: float] 
;
distinct[key=115, type=LODistinct, schema=field1: int, field2: 
float] ;
split   [key=116, type=LOSplit, schema=field1: int, field2: 
float] ;
splitout1   [key=117, type=LOForEach, schema=field1: int, field2: 
float] ;
splitout2   [key=117, type=LOForEach, schema=field1: int, field2: 
float] 
cross   [key=119, type=LOCross, schema=field1: int, field2: float, 
field3: chararray] ;

load - distinct - split - splitOut1 ;
split - splitOut2 ;
splitOut1 - cross ;
splitOut2 - cross ;
}
}}}
And this can be translated to a plan using a loader class (API will be provided)

== Compare two plans ==

I will provide API like this:-

{{{
/***
 * This abstract class is a base for plan comparer
 */

public abstract class PlanStructuralComparerE extends Operator,
 P extends OperatorPlanE {

/***
 * This method does structural comparison of two plans based on:-
 *  - Graph connectivity
 *
 * The current implementation is based on simple key-based
 * vertex matching.
 *
 * @param plan1 the first plan
 * @param plan2 the second plan
 * @param messages where the error messages go
 * @return
 */
public boolean structurallyEquals(P plan1, P plan2, StringBuilder messages) 
 ;


/***
 * Same as above in case just want to compare but
 * don't want to know the error messages
 * @param plan1
 * @param plan2
 * @return
 */
public boolean structurallyEquals(P plan1, P plan2) ;
}
}}}

A subtype which is interested in type information would look like this:-

{{{
/***
 * This class is used for LogicalPlan comparison
 */
public class LogicalPlanComparer
extends PlanStructuralComparerLogicalOperator, LogicalPlan {

/***
 * This method does naive structural comparison of two plans.
 *
 * Things we compare :-
 *  - Things compared in the super class
 *  - Types of matching nodes
 *  - Schema associated with each operator
 *
 * @param plan1
 * @param plan2
 * @param messages
 * @return
 */
@Override
public boolean structurallyEquals(LogicalPlan plan1,
  LogicalPlan plan2,
  StringBuilder messages) {
// Stage 1: Compare connectivity
if (!super.structurallyEquals(plan1, plan2, messages)) return false ;

// Stage 2: Compare node types
if (isMismatchNodeType(plan1, plan2, messages)) return false ;

// Stage 3: Compare schemas
if (isMismatchSchemas(plan1, plan2, messages)) return false ;
   
// else
return true ;
}
}}}

== Dot Trick ==
One can plot a graph written in Dot language by just doing like:-
{{{
dot -Tpng dot1.dot  dot1.png
}}}
Or alternatively,
{{{
dotty dot1.dot
}}}
NOTE: You need graphviz installed on your machine to do these things.

Here is a sample graph generated from the given sample.

http://people.apache.org/~pisong/dot1.png

= Current Status  Issues =
 * Working code  will be 

[Pig Wiki] Trivial Update of PlanTestingHelper by PiSong

2008-05-26 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Pig Wiki for change 
notification.

The following page has been changed by PiSong:
http://wiki.apache.org/pig/PlanTestingHelper

--
   What is Dot Language? 
  Dot language is a text graph description language. There are three main 
object types: node, edge, and graph. All of them can have custom attributes.
  
+ ( See http://en.wikipedia.org/wiki/DOT_language )
+ 
   Sample Dot graph 
  {{{
  digraph plan1 {
@@ -53, +55 @@

  
  {{{
  digraph graph1 {
+ 
+ node [schema=field1: int, field2: float]   /* node means the 
following nodes will inherit this attribute */
  
- load  [key=114, type=LOLoad, schema=field1: int, field2: float] 
;
- distinct  [key=115, type=LODistinct, schema=field1: int, field2: 
float] ;
- split [key=116, type=LOSplit, schema=field1: int, field2: 
float] ;
- splitout1 [key=117, type=LOForEach, schema=field1: int, 
field2: float] ;
- splitout2 [key=117, type=LOForEach, schema=field1: int, 
field2: float] 
+ load  [key=1, type=LOLoad] ;
+ distinct  [key=2, type=LODistinct] ;
+ split [key=3, type=LOSplit] ;
+ splitout1 [key=4, type=LOForEach] ;
+ splitout2 [key=5, type=LOForEach] 
- cross [key=119, type=LOCross, schema=field1: int, field2: float, 
field3: chararray] ;
+ cross [key=6, type=LOCross, schema=field1: int, field2: float, 
field3: int, field4: float] ;
  
  load - distinct - split - splitOut1 ;
  split - splitOut2 ;
@@ -67, +71 @@

  splitOut2 - cross ;
  }
  }}}
+ 
  And this can be translated to a plan using a loader class (API will be 
provided)
  
  == Compare two plans ==
@@ -164, +169 @@

  
  = Current Status  Issues =
   * Working code  will be available in 1-2 days (Today = 26th May)
-  * Doesn't work with inner plans yet. Inner plans may have to be constructed 
and compare separately.
+  * Doesn't work with inner plans yet. Inner plans may have to be constructed 
separately.
  
  == Appendix ==