Trying to attach the PigRunner class in case that helps give you a start using register script.
On Jul 20, 2010, at 11:56 PM, Corbin Hoenes wrote: > Hey Todd we run against entire pig scripts with some helper classes we built > basically they preprocess the variables then call register script but the > test looks like this: > > @Before > public void setUp() throws Exception { > Helper.delete(OUT_FILE); > runner = new PigRunner(); > } > > > @Test > public void testRecordCount() throws Exception { > runner.execute("myscript.pig", "param1=foo","param2=bar"); > > Iterator<Tuple> tuples = runner.getPigServer().openIterator("foo"); > assertEquals(41L, Helper.countTuples(tuples)); > } > > It's been very useful for us to test this way. Would love to see more > chatter about other techniques. > > On Jul 20, 2010, at 3:26 PM, ToddG wrote: > > >> I'd like to include running various PIG scripts in my continuous build >> system. Of course, I'll only use small datasets for this, and in the >> beginning, I'll only target a local machine instance. However, this brings >> up several questions: >> >> >> Q: Whats the best way to run PIG from java? Here's what I'm doing, following >> a pattern I found in some of the pig tests: >> >> 1. Create Pig resources in a base class (shamelessly copied from >> PigExecTestCase): >> >> protected MiniCluster cluster; >> protected PigServer pigServer; >> >> @Before >> public void setUp() throws Exception { >> >> String execTypeString = System.getProperty("test.exectype"); >> if(execTypeString!=null && execTypeString.length()>0){ >> execType = PigServer.parseExecType(execTypeString); >> } >> if(execType == MAPREDUCE) { >> cluster = MiniCluster.buildCluster(); >> pigServer = new PigServer(MAPREDUCE, cluster.getProperties()); >> } else { >> pigServer = new PigServer(LOCAL); >> } >> } >> >> 2. Test classes sub class this to get access to the MiniCluster and >> PigServer (copied from TestPigSplit): >> >> @Test >> public void notestLongEvalSpec() throws Exception{ >> inputFileName = "notestLongEvalSpec-input.txt"; >> createInput(new String[] {"0\ta"}); >> >> pigServer.registerQuery("a = load '" + inputFileName + "';"); >> for (int i=0; i< 500; i++){ >> pigServer.registerQuery("a = filter a by $0 == '1';"); >> } >> Iterator<Tuple> iter = pigServer.openIterator("a"); >> while (iter.hasNext()){ >> throw new Exception(); >> } >> } >> >> 3. ERROR >> >> This pattern works for simple PIG directives, but I want to load up entire >> pig scripts, which have REGISTER and DEFINE directives, then the >> pigServer.registerQuery() fails with: >> >> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during >> parsing. Unrecognized alias REGISTER >> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1170) >> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) >> at org.apache.pig.PigServer.registerQuery(PigServer.java:425) >> at org.apache.pig.PigServer.registerQuery(PigServer.java:441) >> at >> com.audiencescience.apollo.reporting.NetworkRevenueReportTest.shouldParseNetworkRevenueReportScript(NetworkRevenueReportTest.java:74) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> >> Any suggestions? >> >> -Todd >