Dear Wiki user,
You have subscribed to a wiki page or wiki category on Pig Wiki for change
notification.
The following page has been changed by PiSong:
http://wiki.apache.org/pig/PlanTestingHelper
New page:
= Plan Testing Helper =
This is a small utility that I developed for testing my type checking logic. I
think it might be useful for other people as well so I have refactored a bit to
make it more generic.
== Use cases ==
Here are steps that I do for type checking:-
* Construct a plan
* Run type-checking logic against the plan
* Construct the expected plan
* Compare structures of the actual plan and the expected plan.
Here are steps that one might do for query parser:-
* Given a query string, construct the plan.
* Construct the expected plan
* Compare two plans
Here for testing plan optimizer:-
* Construct a plan
* Run optimizer
* Construct the expected plan
* Compare structures of the actual plan and the expected plan.
== What can be facilitated? ==
So there are two common bits from above use cases:-
1. Construct the expected plan
1. Compare two plans
== Construct a plan ==
What is Dot Language?
Dot language is a text graph description language. There are three main object
types: node, edge, and graph. All of them can have custom attributes.
Sample Dot graph
{{{
digraph plan1 {
load [color=black]
load - distinct - split - splitOut1 [style=dotted] ;
split - splitOut2 ;
splitOut1 - cross ;
splitOut2 - cross ;
}
}}}
'''Note''': digraph dictates that this is a description of directed graph
which is the domain we're interested in.
'''Note''': load [color=black] is attaching an attribute to the node. This
is optional.
By extending Dot a bit, we can encode our logical plan in the following format:-
{{{
digraph graph1 {
load[key=114, type=LOLoad, schema=field1: int, field2: float]
;
distinct[key=115, type=LODistinct, schema=field1: int, field2:
float] ;
split [key=116, type=LOSplit, schema=field1: int, field2:
float] ;
splitout1 [key=117, type=LOForEach, schema=field1: int, field2:
float] ;
splitout2 [key=117, type=LOForEach, schema=field1: int, field2:
float]
cross [key=119, type=LOCross, schema=field1: int, field2: float,
field3: chararray] ;
load - distinct - split - splitOut1 ;
split - splitOut2 ;
splitOut1 - cross ;
splitOut2 - cross ;
}
}}}
And this can be translated to a plan using a loader class (API will be provided)
== Compare two plans ==
I will provide API like this:-
{{{
/***
* This abstract class is a base for plan comparer
*/
public abstract class PlanStructuralComparerE extends Operator,
P extends OperatorPlanE {
/***
* This method does structural comparison of two plans based on:-
* - Graph connectivity
*
* The current implementation is based on simple key-based
* vertex matching.
*
* @param plan1 the first plan
* @param plan2 the second plan
* @param messages where the error messages go
* @return
*/
public boolean structurallyEquals(P plan1, P plan2, StringBuilder messages)
;
/***
* Same as above in case just want to compare but
* don't want to know the error messages
* @param plan1
* @param plan2
* @return
*/
public boolean structurallyEquals(P plan1, P plan2) ;
}
}}}
A subtype which is interested in type information would look like this:-
{{{
/***
* This class is used for LogicalPlan comparison
*/
public class LogicalPlanComparer
extends PlanStructuralComparerLogicalOperator, LogicalPlan {
/***
* This method does naive structural comparison of two plans.
*
* Things we compare :-
* - Things compared in the super class
* - Types of matching nodes
* - Schema associated with each operator
*
* @param plan1
* @param plan2
* @param messages
* @return
*/
@Override
public boolean structurallyEquals(LogicalPlan plan1,
LogicalPlan plan2,
StringBuilder messages) {
// Stage 1: Compare connectivity
if (!super.structurallyEquals(plan1, plan2, messages)) return false ;
// Stage 2: Compare node types
if (isMismatchNodeType(plan1, plan2, messages)) return false ;
// Stage 3: Compare schemas
if (isMismatchSchemas(plan1, plan2, messages)) return false ;
// else
return true ;
}
}}}
== Dot Trick ==
One can plot a graph written in Dot language by just doing like:-
{{{
dot -Tpng dot1.dot dot1.png
}}}
Or alternatively,
{{{
dotty dot1.dot
}}}
NOTE: You need graphviz installed on your machine to do these things.
Here is a sample graph generated from the given sample.
http://people.apache.org/~pisong/dot1.png
= Current Status Issues =
* Working code will be