Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by PiSong: http://wiki.apache.org/pig/PlanTestingHelper New page: = Plan Testing Helper = This is a small utility that I developed for testing my type checking logic. I think it might be useful for other people as well so I have refactored a bit to make it more generic. == Use cases == Here are steps that I do for type checking:- * Construct a plan * Run type-checking logic against the plan * Construct the expected plan * Compare structures of the actual plan and the expected plan. Here are steps that one might do for query parser:- * Given a query string, construct the plan. * Construct the expected plan * Compare two plans Here for testing plan optimizer:- * Construct a plan * Run optimizer * Construct the expected plan * Compare structures of the actual plan and the expected plan. == What can be facilitated? == So there are two common bits from above use cases:- 1. Construct the expected plan 1. Compare two plans == Construct a plan == ==== What is Dot Language? ==== Dot language is a text graph description language. There are three main object types: node, edge, and graph. All of them can have custom attributes. ==== Sample Dot graph ==== {{{ digraph plan1 { load [color="black"] load -> distinct -> split -> splitOut1 [style=dotted] ; split -> splitOut2 ; splitOut1 -> cross ; splitOut2 -> cross ; } }}} '''Note''': "digraph" dictates that this is a description of directed graph which is the domain we're interested in. '''Note''': "load [color="black"]" is attaching an attribute to the node. This is optional. By extending Dot a bit, we can encode our logical plan in the following format:- {{{ digraph graph1 { load [key="114", type="LOLoad", schema="field1: int, field2: float"] ; distinct [key="115", type="LODistinct", schema="field1: int, field2: float"] ; split [key="116", type="LOSplit", schema="field1: int, field2: float"] ; splitout1 [key="117", type="LOForEach", schema="field1: int, field2: float"] ; splitout2 [key="117", type="LOForEach", schema="field1: int, field2: float"] cross [key="119", type="LOCross", schema="field1: int, field2: float, field3: chararray"] ; load -> distinct -> split -> splitOut1 ; split -> splitOut2 ; splitOut1 -> cross ; splitOut2 -> cross ; } }}} And this can be translated to a plan using a loader class (API will be provided) == Compare two plans == I will provide API like this:- {{{ /*** * This abstract class is a base for plan comparer */ public abstract class PlanStructuralComparer<E extends Operator, P extends OperatorPlan<E>> { /*** * This method does structural comparison of two plans based on:- * - Graph connectivity * * The current implementation is based on simple key-based * vertex matching. * * @param plan1 the first plan * @param plan2 the second plan * @param messages where the error messages go * @return */ public boolean structurallyEquals(P plan1, P plan2, StringBuilder messages) ; /*** * Same as above in case just want to compare but * don't want to know the error messages * @param plan1 * @param plan2 * @return */ public boolean structurallyEquals(P plan1, P plan2) ; } }}} A subtype which is interested in type information would look like this:- {{{ /*** * This class is used for LogicalPlan comparison */ public class LogicalPlanComparer extends PlanStructuralComparer<LogicalOperator, LogicalPlan> { /*** * This method does naive structural comparison of two plans. * * Things we compare :- * - Things compared in the super class * - Types of matching nodes * - Schema associated with each operator * * @param plan1 * @param plan2 * @param messages * @return */ @Override public boolean structurallyEquals(LogicalPlan plan1, LogicalPlan plan2, StringBuilder messages) { // Stage 1: Compare connectivity if (!super.structurallyEquals(plan1, plan2, messages)) return false ; // Stage 2: Compare node types if (isMismatchNodeType(plan1, plan2, messages)) return false ; // Stage 3: Compare schemas if (isMismatchSchemas(plan1, plan2, messages)) return false ; // else return true ; } }}} == Dot Trick == One can plot a graph written in Dot language by just doing like:- {{{ dot -Tpng dot1.dot > dot1.png }}} Or alternatively, {{{ dotty dot1.dot }}} NOTE: You need graphviz installed on your machine to do these things. Here is a sample graph generated from the given sample. http://people.apache.org/~pisong/dot1.png = Current Status & Issues = * Working code will be available in 1-2 days (Today = 26th May) * Doesn't work with inner plans yet. Inner plans may have to be constructed and compare separately. == Appendix == The API:- OperatorPlanLoader - This class is an abstract base class for loading a plan from Dot {{{ public abstract class OperatorPlanLoader<E extends Operator, P extends OperatorPlan<E>> { /*** * This method is used for loading an operator plan encoded in Dot format * @param dotContent the dot content * @param clazz the plan type to be created * @return */ public P load(String dotContent) { /*** * This method has be overridden to instantiate the correct node type * * @param node * @param plan * @return */ protected abstract E createOperator(Node node, P plan) ; } }}} Structures captured from Dot (Before being converted to plan):- {{{ /*** * This represents graph structure in DOT format */ public class DotGraph { public String name; public List<Edge> edges = new ArrayList<Edge>() ; public List<Node> nodes = new ArrayList<Node>() ; public Map<String, String> attributes = new HashMap<String,String>() ; public DotGraph(String name) { this.name = name ; } } }}} {{{ /*** * This represents a node in DOT format */ public class Node { public String name ; public Map<String, String> attributes = new HashMap<String,String>() ; } }}} {{{ /** * This represents an edge in DOT format. * An edge in DOT can have attributes but we're not interested */ public class Edge { public String fromNode ; public String toNode ; } }}}