[ 
https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592555#action_12592555
 ] 

Mathieu Poumeyrol commented on PIG-202:
---------------------------------------

Thanks for having confirmed I was not wasting my time.

1. This is not what I'm trying to do. The current implementation, when asked 
for 9 quantiles among 100 elements (0..99) returns this:
(all, {(0), (12), (24), (36), (48), (60), (72), (84), (96)})
This does not lead to a good partition. The first part is empty or so, the last 
part is smaller than the "central" parts.

Sort.patch and Sort.v2.patch change FindQuantiles to make it return:
(all, {(11), (21), (31), (41), (51), (61), (71), (81), (91)})

It looks better. Actualy, it looks even nicer with a <= instead of < in the big 
if in the loop...
(all, {(10), (20), (30), (40), (50), (60), (70), (80), (90)})

The impact on sort performance of this fix in FindQuantiles is probably 
marginal. it just avoids some empty or smaller reduce jobs.  But it gives 
better quantiles to a end user trying to use the function.

2. I will try to run and time my test job over the weekend. The performance 
killer was not the small glitch in FindQuantiles, but the fact that the 
SortPartitioner's and the quantiles' comparator were not consistent. I'll try 
to give you some figures.

3. I will also generate a Sort.v3.patch (with the <= in FindQuantiles) using 
svn diff as eclipse tends to generates ugly patches with absolute paths.

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, 
> MapreducePlanCompiler.patch, Sort.patch, Sort.v2.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local 
> implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to