Hi Alan,
Your language proposal sounds good.
For the implementation proposal, depends on what sorting we are
talking about. The sorting of a whole table, or the sorting of the
bags nested within foreach. They are implemented differently. The
former uses Hadoop, while the latter is done all in our code.
You implementation proposal looks good for the latter (except that
why would we create a new type of eval spec, we could change
sortdistinct spec to take an optional comparator argument ?)
For the sorting of the outer bag, we need to look for a way to pass
the user-defined comparator to Hadoop. Can someone more familiar with
hadoop internals shed some light on this? Right now, seems to me the
only way would be to generate a class that has the user-defined
comparator (because hadoop uses the compareTo method of the keyClass)
Utkarsh
On Nov 2, 2007, at 4:05 PM, Alan Gates wrote:
All,
I've posted a proposal at http://wiki.apache.org/pig/
UserDefinedOrdering for how to add user defined ordering to pig.
This is being urgently requested by some of our users.
Utkarsh, please review this and make sure I properly understood how
to hook things together in the logical and physical plans. I'm not
100% confident what I proposed will work in the current framework.
Alan.