Hey Todd we run against entire pig scripts with some helper classes we built 
basically they preprocess the variables then call register script but the test 
looks like this:

    @Before
    public void setUp() throws Exception {
        Helper.delete(OUT_FILE);
        runner = new PigRunner();
    }


    @Test
    public void testRecordCount() throws Exception {
        runner.execute("myscript.pig", "param1=foo","param2=bar");

        Iterator<Tuple> tuples = runner.getPigServer().openIterator("foo");
        assertEquals(41L, Helper.countTuples(tuples));
    }

It's been very useful for us to test this way.  Would love to see more chatter 
about other techniques.

On Jul 20, 2010, at 3:26 PM, ToddG wrote:


> I'd like to include running various PIG scripts in my continuous build 
> system. Of course, I'll only use small datasets for this, and in the 
> beginning, I'll only target a local machine instance. However, this brings up 
> several questions:
> 
> 
> Q: Whats the best way to run PIG from java? Here's what I'm doing, following 
> a pattern I found in some of the pig tests:
> 
> 1. Create Pig resources in a base class (shamelessly copied from 
> PigExecTestCase):
> 
>    protected MiniCluster cluster;
>    protected PigServer pigServer;
> 
>    @Before
>    public void setUp() throws Exception {
> 
>        String execTypeString = System.getProperty("test.exectype");
>        if(execTypeString!=null && execTypeString.length()>0){
>            execType = PigServer.parseExecType(execTypeString);
>        }
>        if(execType == MAPREDUCE) {
>            cluster = MiniCluster.buildCluster();
>            pigServer = new PigServer(MAPREDUCE, cluster.getProperties());
>        } else {
>            pigServer = new PigServer(LOCAL);
>        }
>    }
> 
> 2. Test classes sub class this to get access to the MiniCluster and PigServer 
> (copied from TestPigSplit):
> 
>    @Test
>    public void notestLongEvalSpec() throws Exception{
>        inputFileName = "notestLongEvalSpec-input.txt";
>        createInput(new String[] {"0\ta"});
> 
>        pigServer.registerQuery("a = load '" + inputFileName + "';");
>        for (int i=0; i< 500; i++){
>            pigServer.registerQuery("a = filter a by $0 == '1';");
>        }
>        Iterator<Tuple> iter = pigServer.openIterator("a");
>        while (iter.hasNext()){
>            throw new Exception();
>        }
>    }
> 
> 3. ERROR
> 
> This pattern works for simple PIG directives, but I want to load up entire 
> pig scripts, which have REGISTER and DEFINE directives, then the 
> pigServer.registerQuery() fails with:
> 
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
> parsing. Unrecognized alias REGISTER
>    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1170)
>    at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
>    at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
>    at org.apache.pig.PigServer.registerQuery(PigServer.java:441)
>    at 
> com.audiencescience.apollo.reporting.NetworkRevenueReportTest.shouldParseNetworkRevenueReportScript(NetworkRevenueReportTest.java:74)
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 
> Any suggestions?
> 
> -Todd

Reply via email to