Hey Todd we run against entire pig scripts with some helper classes we built basically they preprocess the variables then call register script but the test looks like this:
@Before public void setUp() throws Exception { Helper.delete(OUT_FILE); runner = new PigRunner(); } @Test public void testRecordCount() throws Exception { runner.execute("myscript.pig", "param1=foo","param2=bar"); Iterator<Tuple> tuples = runner.getPigServer().openIterator("foo"); assertEquals(41L, Helper.countTuples(tuples)); } It's been very useful for us to test this way. Would love to see more chatter about other techniques. On Jul 20, 2010, at 3:26 PM, ToddG wrote: > I'd like to include running various PIG scripts in my continuous build > system. Of course, I'll only use small datasets for this, and in the > beginning, I'll only target a local machine instance. However, this brings up > several questions: > > > Q: Whats the best way to run PIG from java? Here's what I'm doing, following > a pattern I found in some of the pig tests: > > 1. Create Pig resources in a base class (shamelessly copied from > PigExecTestCase): > > protected MiniCluster cluster; > protected PigServer pigServer; > > @Before > public void setUp() throws Exception { > > String execTypeString = System.getProperty("test.exectype"); > if(execTypeString!=null && execTypeString.length()>0){ > execType = PigServer.parseExecType(execTypeString); > } > if(execType == MAPREDUCE) { > cluster = MiniCluster.buildCluster(); > pigServer = new PigServer(MAPREDUCE, cluster.getProperties()); > } else { > pigServer = new PigServer(LOCAL); > } > } > > 2. Test classes sub class this to get access to the MiniCluster and PigServer > (copied from TestPigSplit): > > @Test > public void notestLongEvalSpec() throws Exception{ > inputFileName = "notestLongEvalSpec-input.txt"; > createInput(new String[] {"0\ta"}); > > pigServer.registerQuery("a = load '" + inputFileName + "';"); > for (int i=0; i< 500; i++){ > pigServer.registerQuery("a = filter a by $0 == '1';"); > } > Iterator<Tuple> iter = pigServer.openIterator("a"); > while (iter.hasNext()){ > throw new Exception(); > } > } > > 3. ERROR > > This pattern works for simple PIG directives, but I want to load up entire > pig scripts, which have REGISTER and DEFINE directives, then the > pigServer.registerQuery() fails with: > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during > parsing. Unrecognized alias REGISTER > at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1170) > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) > at org.apache.pig.PigServer.registerQuery(PigServer.java:425) > at org.apache.pig.PigServer.registerQuery(PigServer.java:441) > at > com.audiencescience.apollo.reporting.NetworkRevenueReportTest.shouldParseNetworkRevenueReportScript(NetworkRevenueReportTest.java:74) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > Any suggestions? > > -Todd