Please take a look at http://pig.apache.org/docs/r0.9.1/perf.html#filter
On Wed, Apr 11, 2012 at 3:39 PM, Mohit Anchlia <[email protected]>wrote: > Is it possible to say something like > > > F = JOIN A BY (FILE_NAME,CREATED_DATE,FORM_ID,FORM_ID_ROOT), B BY > (FILE_NAME,CREATED_DATE,FORM_ID,FORM_ID_ROOT) AND FILTER A BY FORM_ID == 0; > > Also, how far does pig go in optimizing the job if I do specify the line > above for instance as: > > F = JOIN A BY (FILE_NAME,CREATED_DATE,FORM_ID,FORM_ID_ROOT), B BY > (FILE_NAME,CREATED_DATE,FORM_ID,FORM_ID_ROOT) > > G = FILTER F BY FORM_ID == 0; > > Would pig run only one reduce job or multiple in the case above? >
