Seems like something to put on the TODO if it isn't already there. I might look at the bug list to see what is in store for the future :)
The good news is that I just made the changes to my script to use merge so I'll benchmark again and see how much faster it is …. probably significantly faster :) On Fri, Aug 19, 2011 at 11:12 PM, Ashutosh Chauhan <[email protected]>wrote: > Hey Kevin, > > No, Pig currently doesn't auto-detect that data is getting sorted in > previous steps of script. So, you need to tell it by 'using merge'. > > Hope it helps, > Ashutosh > > On Fri, Aug 19, 2011 at 22:51, Kevin Burton <[email protected]> wrote: > > > I was reading about USING 'merge' with JOIN when relations are already > > sorted. > > > > I actually was just looking through some code and realized that one of my > > JOINs was on two relations that were *already* sorted due to a DISTINCT > and > > GROUP operation. > > > > I just added USING 'merge' and the initial results look the same. > > > > I haven't benchmarked it though. > > > > Does/would the existing optimizer be able to detect this and just use > merge > > without manual intervention? > > > > -- > > > > Founder/CEO Spinn3r.com > > > > Location: *San Francisco, CA* > > Skype: *burtonator* > > > > Skype-in: *(415) 871-0687* > > > -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* Skype-in: *(415) 871-0687*
