Hello , I understand the pain :)
Have you seen PigUnit and Penny http://pig.apache.org/docs/r0.10.0/test.html On Fri, Oct 19, 2012 at 8:09 PM, Yang <[email protected]> wrote: > one of the greatest pains I face with debugging a pig code is that the > iteration cycles are really long: > the applications for which we use pig typically deal with large dataset, > and if a pig script involves many > JOIN/generate/filter steps, every step takes a lot of time, but every time > I fix one step, I have to run from the start, > which is meaningless. > > what I am doing so far to reduce the meaningless wasted time to re-run > already-debugged steps, is to > manually divide my script into many small scripts, and save the last > variable out into hdfs, and once the > small script is debugged fine, I load the previous variable in the next > small script > > after all small scripts are done, I connect them back manually to the > original big script. > > > is there a way to automate this? for example add a mark around a particular > step, and tells pig > that the result is to be saved up, and all following steps are not to be > executed. and when we move > onto the next step, it knows where to pick up the last-saved data. > > writing a preprocessor to do the above is not trivial so that I can't whip > up something immediately , cuz it needs to figure out the > schemas of variables that propagate through the steps. > > > Thanks > Yang >
