[Pig Wiki] Update of "PlanTestingHelper" by PiSong

Apache Wiki Mon, 26 May 2008 18:52:34 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.


The following page has been changed by PiSong:
http://wiki.apache.org/pig/PlanTestingHelper

New page:
= Plan Testing Helper =

This is a small utility that I developed for testing my type checking logic. I 
think it might be useful for other people as well so I have refactored a bit to 
make it more generic.

== Use cases ==
Here are steps that I do for type checking:-
 * Construct a plan
 * Run type-checking logic against the plan
 * Construct the expected plan
 * Compare structures of the actual plan and the expected plan.

Here are steps that one might do for query parser:-
 * Given a query string, construct the plan.
 * Construct the expected plan
 * Compare two plans

Here for testing plan optimizer:-
 * Construct a plan
 * Run optimizer
 * Construct the expected plan
 * Compare structures of the actual plan and the expected plan.

== What can be facilitated? ==
So there are two common bits from above use cases:-
 1. Construct the expected plan
 1. Compare two plans

== Construct a plan ==

==== What is Dot Language? ====
Dot language is a text graph description language. There are three main object 
types: node, edge, and graph. All of them can have custom attributes.

==== Sample Dot graph ====
{{{
digraph plan1 {
    
    load [color="black"]

    load -> distinct -> split -> splitOut1 [style=dotted] ;
    split -> splitOut2 ;
    splitOut1 -> cross ;
    splitOut2 -> cross ;
}
}}}

'''Note''': "digraph" dictates that this is a description of directed graph 
which is the domain we're interested in.

'''Note''': "load [color="black"]" is attaching an attribute to the node. This 
is optional.

By extending Dot a bit, we can encode our logical plan in the following format:-

{{{
digraph graph1 {

    load        [key="114", type="LOLoad", schema="field1: int, field2: float"] 
;
    distinct    [key="115", type="LODistinct", schema="field1: int, field2: 
float"] ;
    split       [key="116", type="LOSplit", schema="field1: int, field2: 
float"] ;
    splitout1   [key="117", type="LOForEach", schema="field1: int, field2: 
float"] ;
    splitout2   [key="117", type="LOForEach", schema="field1: int, field2: 
float"] 
    cross       [key="119", type="LOCross", schema="field1: int, field2: float, 
field3: chararray"] ;

    load -> distinct -> split -> splitOut1 ;
    split -> splitOut2 ;
    splitOut1 -> cross ;
    splitOut2 -> cross ;
}
}}}
And this can be translated to a plan using a loader class (API will be provided)

== Compare two plans ==

I will provide API like this:-

{{{
/***
 * This abstract class is a base for plan comparer
 */

public abstract class PlanStructuralComparer<E extends Operator,
                                             P extends OperatorPlan<E>> {

    /***
     * This method does structural comparison of two plans based on:-
     *  - Graph connectivity
     *
     * The current implementation is based on simple key-based
     * vertex matching.
     *
     * @param plan1 the first plan
     * @param plan2 the second plan
     * @param messages where the error messages go
     * @return
     */
    public boolean structurallyEquals(P plan1, P plan2, StringBuilder messages) 
 ;


    /***
     * Same as above in case just want to compare but
     * don't want to know the error messages
     * @param plan1
     * @param plan2
     * @return
     */
    public boolean structurallyEquals(P plan1, P plan2) ;
}
}}}

A subtype which is interested in type information would look like this:-

{{{
/***
 * This class is used for LogicalPlan comparison
 */
public class LogicalPlanComparer
                extends PlanStructuralComparer<LogicalOperator, LogicalPlan> {

    /***
     * This method does naive structural comparison of two plans.
     *
     * Things we compare :-
     *  - Things compared in the super class
     *  - Types of matching nodes
     *  - Schema associated with each operator
     *
     * @param plan1
     * @param plan2
     * @param messages
     * @return
     */
    @Override
    public boolean structurallyEquals(LogicalPlan plan1,
                                      LogicalPlan plan2,
                                      StringBuilder messages) {
        // Stage 1: Compare connectivity
        if (!super.structurallyEquals(plan1, plan2, messages)) return false ;

        // Stage 2: Compare node types
        if (isMismatchNodeType(plan1, plan2, messages)) return false ;

        // Stage 3: Compare schemas
        if (isMismatchSchemas(plan1, plan2, messages)) return false ;
   
        // else
        return true ;
    }
}}}

== Dot Trick ==
One can plot a graph written in Dot language by just doing like:-
{{{
dot -Tpng dot1.dot > dot1.png
}}}
Or alternatively,
{{{
dotty dot1.dot
}}}
NOTE: You need graphviz installed on your machine to do these things.

Here is a sample graph generated from the given sample.

http://people.apache.org/~pisong/dot1.png

= Current Status & Issues =
 * Working code  will be available in 1-2 days (Today = 26th May)
 * Doesn't work with inner plans yet. Inner plans may have to be constructed 
and compare separately.

== Appendix ==

The API:-

OperatorPlanLoader - This class is an abstract base class for loading a plan 
from Dot  

{{{
public abstract class OperatorPlanLoader<E extends Operator,
                                         P extends OperatorPlan<E>> {

    /***
     * This method is used for loading an operator plan encoded in Dot format
     * @param dotContent the dot content
     * @param clazz the plan type to be created
     * @return
     */
    public P load(String dotContent) {

    /***
     * This method has be overridden to instantiate the correct node type
     *
     * @param node
     * @param plan
     * @return
     */
    protected abstract E createOperator(Node node, P plan) ;
}
}}}

Structures captured from Dot (Before being converted to plan):-

{{{
/***
 * This represents graph structure in DOT format
 */
public class DotGraph {

    public String name;
    public List<Edge> edges = new ArrayList<Edge>() ;
    public List<Node> nodes = new ArrayList<Node>() ;
    public Map<String, String> attributes = new HashMap<String,String>() ;


    public DotGraph(String name) {
        this.name = name ;
    }

}
}}}
{{{
/***
 * This represents a node in DOT format
 */
public class Node {

    public String name ;
    public Map<String, String> attributes = new HashMap<String,String>() ;

}
}}}
{{{
/**
 * This represents an edge in DOT format.
 * An edge in DOT can have attributes but we're not interested
 */
public class Edge {
    public String fromNode ;
    public String toNode ;
}
}}}

[Pig Wiki] Update of "PlanTestingHelper" by PiSong

Reply via email to