This is a great question! I could be wrong, but I don't believe there is a way to indicate this for a group-by. It definitely does matter for performance if your input is globally sorted. Currently a group by happens on reduce side. But if the input is globally sorted, this can happen map side for a significant performance boost.
I did see a CollectableLoadFunc <http://pig.apache.org/docs/r0.13.0/api/org/apache/pig/CollectableLoadFunc.html> interface that's used in the MergeJoin algorithm... I don't see why this couldn't be used for a map side group by also. On Sun, Oct 12, 2014 at 11:48 PM, Sunil S Nandihalli < sunil.nandiha...@gmail.com> wrote: > Hi Everybody, > Is there a way to indicate that the data is sorted by the key using which > the relations are being grouped? Or does it even matter for performance > whether we indicate it or not? > Thanks, > Sunil. >