Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Pig Wiki" for change
notification.
The following page has been changed by PiSong:
http://wiki.apache.org/pig/PlanTestingHelper
New page:
= Plan Testing Helper =
This is a small utility that I developed for testing my type checking logic. I
think it might be useful for other people as well so I have refactored a bit to
make it more generic.
== Use cases ==
Here are steps that I do for type checking:-
* Construct a plan
* Run type-checking logic against the plan
* Construct the expected plan
* Compare structures of the actual plan and the expected plan.
Here are steps that one might do for query parser:-
* Given a query string, construct the plan.
* Construct the expected plan
* Compare two plans
Here for testing plan optimizer:-
* Construct a plan
* Run optimizer
* Construct the expected plan
* Compare structures of the actual plan and the expected plan.
== What can be facilitated? ==
So there are two common bits from above use cases:-
1. Construct the expected plan
1. Compare two plans
== Construct a plan ==
==== What is Dot Language? ====
Dot language is a text graph description language. There are three main object
types: node, edge, and graph. All of them can have custom attributes.
==== Sample Dot graph ====
{{{
digraph plan1 {
load [color="black"]
load -> distinct -> split -> splitOut1 [style=dotted] ;
split -> splitOut2 ;
splitOut1 -> cross ;
splitOut2 -> cross ;
}
}}}
'''Note''': "digraph" dictates that this is a description of directed graph
which is the domain we're interested in.
'''Note''': "load [color="black"]" is attaching an attribute to the node. This
is optional.
By extending Dot a bit, we can encode our logical plan in the following format:-
{{{
digraph graph1 {
load [key="114", type="LOLoad", schema="field1: int, field2: float"]
;
distinct [key="115", type="LODistinct", schema="field1: int, field2:
float"] ;
split [key="116", type="LOSplit", schema="field1: int, field2:
float"] ;
splitout1 [key="117", type="LOForEach", schema="field1: int, field2:
float"] ;
splitout2 [key="117", type="LOForEach", schema="field1: int, field2: float"]
cross [key="119", type="LOCross", schema="field1: int, field2: float, field3: chararray"] ;
load -> distinct -> split -> splitOut1 ;
split -> splitOut2 ;
splitOut1 -> cross ;
splitOut2 -> cross ;
}
}}}
And this can be translated to a plan using a loader class (API will be provided)
== Compare two plans ==
I will provide API like this:-
{{{
/***
* This abstract class is a base for plan comparer
*/
public abstract class PlanStructuralComparer<E extends Operator,
P extends OperatorPlan<E>> {
/***
* This method does structural comparison of two plans based on:-
* - Graph connectivity
*
* The current implementation is based on simple key-based
* vertex matching.
*
* @param plan1 the first plan
* @param plan2 the second plan
* @param messages where the error messages go
* @return
*/
public boolean structurallyEquals(P plan1, P plan2, StringBuilder messages)
;
/***
* Same as above in case just want to compare but
* don't want to know the error messages
* @param plan1
* @param plan2
* @return
*/
public boolean structurallyEquals(P plan1, P plan2) ;
}
}}}
A subtype which is interested in type information would look like this:-
{{{
/***
* This class is used for LogicalPlan comparison
*/
public class LogicalPlanComparer
extends PlanStructuralComparer<LogicalOperator, LogicalPlan> {
/***
* This method does naive structural comparison of two plans.
*
* Things we compare :-
* - Things compared in the super class
* - Types of matching nodes
* - Schema associated with each operator
*
* @param plan1
* @param plan2
* @param messages
* @return
*/
@Override
public boolean structurallyEquals(LogicalPlan plan1,
LogicalPlan plan2,
StringBuilder messages) {
// Stage 1: Compare connectivity
if (!super.structurallyEquals(plan1, plan2, messages)) return false ;
// Stage 2: Compare node types
if (isMismatchNodeType(plan1, plan2, messages)) return false ;
// Stage 3: Compare schemas
if (isMismatchSchemas(plan1, plan2, messages)) return false ;
// else
return true ;
}
}}}
== Dot Trick ==
One can plot a graph written in Dot language by just doing like:-
{{{
dot -Tpng dot1.dot > dot1.png
}}}
Or alternatively,
{{{
dotty dot1.dot
}}}
NOTE: You need graphviz installed on your machine to do these things.
Here is a sample graph generated from the given sample.
http://people.apache.org/~pisong/dot1.png
= Current Status & Issues =
* Working code will be available in 1-2 days (Today = 26th May)
* Doesn't work with inner plans yet. Inner plans may have to be constructed
and compare separately.
== Appendix ==
The API:-
OperatorPlanLoader - This class is an abstract base class for loading a plan from Dot
{{{
public abstract class OperatorPlanLoader<E extends Operator,
P extends OperatorPlan<E>> {
/***
* This method is used for loading an operator plan encoded in Dot format
* @param dotContent the dot content
* @param clazz the plan type to be created
* @return
*/
public P load(String dotContent) {
/***
* This method has be overridden to instantiate the correct node type
*
* @param node
* @param plan
* @return
*/
protected abstract E createOperator(Node node, P plan) ;
}
}}}
Structures captured from Dot (Before being converted to plan):-
{{{
/***
* This represents graph structure in DOT format
*/
public class DotGraph {
public String name;
public List<Edge> edges = new ArrayList<Edge>() ;
public List<Node> nodes = new ArrayList<Node>() ;
public Map<String, String> attributes = new HashMap<String,String>() ;
public DotGraph(String name) {
this.name = name ;
}
}
}}}
{{{
/***
* This represents a node in DOT format
*/
public class Node {
public String name ;
public Map<String, String> attributes = new HashMap<String,String>() ;
}
}}}
{{{
/**
* This represents an edge in DOT format.
* An edge in DOT can have attributes but we're not interested
*/
public class Edge {
public String fromNode ;
public String toNode ;
}
}}}