I'd like to include running various PIG scripts in my continuous build
system. Of course, I'll only use small datasets for this, and in the
beginning, I'll only target a local machine instance. However, this
brings up several questions:
Q: Whats the best way to run PIG from java? Here's what I'm doing,
following a pattern I found in some of the pig tests:
1. Create Pig resources in a base class (shamelessly copied from
PigExecTestCase):
protected MiniCluster cluster;
protected PigServer pigServer;
@Before
public void setUp() throws Exception {
String execTypeString = System.getProperty("test.exectype");
if(execTypeString!=null && execTypeString.length()>0){
execType = PigServer.parseExecType(execTypeString);
}
if(execType == MAPREDUCE) {
cluster = MiniCluster.buildCluster();
pigServer = new PigServer(MAPREDUCE, cluster.getProperties());
} else {
pigServer = new PigServer(LOCAL);
}
}
2. Test classes sub class this to get access to the MiniCluster and
PigServer (copied from TestPigSplit):
@Test
public void notestLongEvalSpec() throws Exception{
inputFileName = "notestLongEvalSpec-input.txt";
createInput(new String[] {"0\ta"});
pigServer.registerQuery("a = load '" + inputFileName + "';");
for (int i=0; i< 500; i++){
pigServer.registerQuery("a = filter a by $0 == '1';");
}
Iterator<Tuple> iter = pigServer.openIterator("a");
while (iter.hasNext()){
throw new Exception();
}
}
3. ERROR
This pattern works for simple PIG directives, but I want to load up
entire pig scripts, which have REGISTER and DEFINE directives, then the
pigServer.registerQuery() fails with:
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error
during parsing. Unrecognized alias REGISTER
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1170)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
at org.apache.pig.PigServer.registerQuery(PigServer.java:441)
at
com.audiencescience.apollo.reporting.NetworkRevenueReportTest.shouldParseNetworkRevenueReportScript(NetworkRevenueReportTest.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Any suggestions?
-Todd