nice, thanks macros and mock.Storage() are both new to me, I believe it will help a lot
On Mon, Oct 22, 2012 at 5:32 PM, Dmitriy Ryaboy <[email protected]> wrote: > Some testing tips: > > 1) parametrize your load/store statements so that if you have to run > in hadoop mode, it's easy to switch to debug inputs / outputs (and > debug input/output loaders and storers). It's vastly preferable to > test in local mode when possible, since the iterations are so much > faster. > > 2) it's a good thing that PigUnit makes you test small pieces of code! > Factor out macros so that you can create unit tests; don't copy and > paste code, use macros and the import statement. > > 3) Try using mock.Storage (see > https://issues.apache.org/jira/browse/PIG-2650) to automatically > create inputs and examine outputs in your unit tests, if you are on > pig 11. > > D > > On Fri, Oct 19, 2012 at 12:01 PM, Yang <[email protected]> wrote: > > I am using PigUnit, but it's somewhat limited: it can run only localmode, > > so I can't find issues that come with fairly large test data; you have to > > create small snippets of code that you cut out manually from your > original > > code, so after you tested a snippet to be fine, you have to copy-paste > that > > back into the production code, which introduces possible copy-paste > errors. > > if you compare this to java junit, this is really very crude: in java, > you > > have a class, and you can do junit testing on individual methods of the > > class, instead of having to copy paste and create a special "test > version" > > of that class. > > > > > > overall, I feel that testability is an area where PIG could spend a lot > > more efforts and it will greatly benefit its wider adoption. ----- some > > other tools (Cascading, Cascalog etc) advertise testability as one of > their > > important features. > > > > let me check out penny... thanks > > > > On Fri, Oct 19, 2012 at 2:18 AM, Jagat Singh <[email protected]> > wrote: > > > >> Hello , > >> > >> I understand the pain :) > >> > >> Have you seen PigUnit and Penny > >> > >> http://pig.apache.org/docs/r0.10.0/test.html > >> > >> > >> > >> On Fri, Oct 19, 2012 at 8:09 PM, Yang <[email protected]> wrote: > >> > >> > one of the greatest pains I face with debugging a pig code is that the > >> > iteration cycles are really long: > >> > the applications for which we use pig typically deal with large > dataset, > >> > and if a pig script involves many > >> > JOIN/generate/filter steps, every step takes a lot of time, but every > >> time > >> > I fix one step, I have to run from the start, > >> > which is meaningless. > >> > > >> > what I am doing so far to reduce the meaningless wasted time to re-run > >> > already-debugged steps, is to > >> > manually divide my script into many small scripts, and save the last > >> > variable out into hdfs, and once the > >> > small script is debugged fine, I load the previous variable in the > next > >> > small script > >> > > >> > after all small scripts are done, I connect them back manually to the > >> > original big script. > >> > > >> > > >> > is there a way to automate this? for example add a mark around a > >> particular > >> > step, and tells pig > >> > that the result is to be saved up, and all following steps are not to > be > >> > executed. and when we move > >> > onto the next step, it knows where to pick up the last-saved data. > >> > > >> > writing a preprocessor to do the above is not trivial so that I can't > >> whip > >> > up something immediately , cuz it needs to figure out the > >> > schemas of variables that propagate through the steps. > >> > > >> > > >> > Thanks > >> > Yang > >> > > >> >
