I'd like to include running various PIG scripts in my continuous build system. Of course, I'll only use small datasets for this, and in the beginning, I'll only target a local machine instance. However, this brings up several questions:

Q: Whats the best way to run PIG from java? Here's what I'm doing, following a pattern I found in some of the pig tests:

1. Create Pig resources in a base class (shamelessly copied from PigExecTestCase):

    protected MiniCluster cluster;
    protected PigServer pigServer;

    @Before
    public void setUp() throws Exception {

        String execTypeString = System.getProperty("test.exectype");
        if(execTypeString!=null && execTypeString.length()>0){
            execType = PigServer.parseExecType(execTypeString);
        }
        if(execType == MAPREDUCE) {
            cluster = MiniCluster.buildCluster();
            pigServer = new PigServer(MAPREDUCE, cluster.getProperties());
        } else {
            pigServer = new PigServer(LOCAL);
        }
    }

2. Test classes sub class this to get access to the MiniCluster and PigServer (copied from TestPigSplit):

    @Test
    public void notestLongEvalSpec() throws Exception{
        inputFileName = "notestLongEvalSpec-input.txt";
        createInput(new String[] {"0\ta"});

        pigServer.registerQuery("a = load '" + inputFileName + "';");
        for (int i=0; i< 500; i++){
            pigServer.registerQuery("a = filter a by $0 == '1';");
        }
        Iterator<Tuple> iter = pigServer.openIterator("a");
        while (iter.hasNext()){
            throw new Exception();
        }
    }

3. ERROR

This pattern works for simple PIG directives, but I want to load up entire pig scripts, which have REGISTER and DEFINE directives, then the pigServer.registerQuery() fails with:

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Unrecognized alias REGISTER
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1170)
    at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:441)
at com.audiencescience.apollo.reporting.NetworkRevenueReportTest.shouldParseNetworkRevenueReportScript(NetworkRevenueReportTest.java:74)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

Any suggestions?

-Todd

Reply via email to