Neither group nor distinct produce total sorted order in the output, so Zebra 
is correct to not record the results as sorted.  Given our current 
implementation of group and distinct results are sorted per part file, but not 
across part files.

Alan.

On Aug 21, 2011, at 10:29 PM, Kevin Burton wrote:

> Both DISTINCT and GROUP cause the result to be ordered.
> 
> Why does using a merge cause this to fail?
> 
> Specifically, Zebra then thinks the results aren't sorted, when they are.
> 
> I think the problem is that Zebra actually writes the sort info into the
> table schema on disk but that with DISTINCT and GROUP it isn't written.
> 
> I'll have to see what other operations will result in a sorted table and
> then implement support for them as well.
> 
> I have a DISTINCT operation which can then be merge joined do another table
> which is already sorted.
> 
> Both are rather large files…… like 500GB … so avoiding a resort would be a
> good thing :)
> 
> Kevin
> 
> -- 
> 
> Founder/CEO Spinn3r.com
> 
> Location: *San Francisco, CA*
> Skype: *burtonator*
> 
> Skype-in: *(415) 871-0687*

Reply via email to