Comments inlined.
Utkarsh Srivastava wrote:
Hi Alan,
Your language proposal sounds good.
For the implementation proposal, depends on what sorting we are
talking about. The sorting of a whole table, or the sorting of the
bags nested within foreach. They are implemented differently. The
former uses Hadoop, while the latter is done all in our code.
We need to be able to do both.
You implementation proposal looks good for the latter (except that why
would we create a new type of eval spec, we could change sortdistinct
spec to take an optional comparator argument ?)
As I understand the current code, the comparator for the sort is
obtained by calling getComparator on the ProjectSpec. So I was
proposing subclassing ProjectSpec to override the getComparator
function. Did I misunderstand how it currently works? Or are saying
we should change how it works and have sort get the comparator from
SortDistinctSpec instead?
For the sorting of the outer bag, we need to look for a way to pass
the user-defined comparator to Hadoop. Can someone more familiar with
hadoop internals shed some light on this? Right now, seems to me the
only way would be to generate a class that has the user-defined
comparator (because hadoop uses the compareTo method of the keyClass)
Utkarsh
On Nov 2, 2007, at 4:05 PM, Alan Gates wrote:
All,
I've posted a proposal at
http://wiki.apache.org/pig/UserDefinedOrdering for how to add user
defined ordering to pig. This is being urgently requested by some of
our users.
Utkarsh, please review this and make sure I properly understood how
to hook things together in the logical and physical plans. I'm not
100% confident what I proposed will work in the current framework.
Alan.