I'd never thought about this before, but some of my scripts could probably be made much quicker by taking advantage of this. From what operations are relations guaranteed to be sorted? Distinct, group by, order by, previous merge join I guess? Any others?
On 20 August 2011 07:12, Ashutosh Chauhan <[email protected]> wrote: > Hey Kevin, > > No, Pig currently doesn't auto-detect that data is getting sorted in > previous steps of script. So, you need to tell it by 'using merge'. > > Hope it helps, > Ashutosh > > On Fri, Aug 19, 2011 at 22:51, Kevin Burton <[email protected]> wrote: > >> I was reading about USING 'merge' with JOIN when relations are already >> sorted. >> >> I actually was just looking through some code and realized that one of my >> JOINs was on two relations that were *already* sorted due to a DISTINCT and >> GROUP operation. >> >> I just added USING 'merge' and the initial results look the same. >> >> I haven't benchmarked it though. >> >> Does/would the existing optimizer be able to detect this and just use merge >> without manual intervention? >> >> -- >> >> Founder/CEO Spinn3r.com >> >> Location: *San Francisco, CA* >> Skype: *burtonator* >> >> Skype-in: *(415) 871-0687* >> > -- http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg
