[
https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592555#action_12592555
]
Mathieu Poumeyrol commented on PIG-202:
---------------------------------------
Thanks for having confirmed I was not wasting my time.
1. This is not what I'm trying to do. The current implementation, when asked
for 9 quantiles among 100 elements (0..99) returns this:
(all, {(0), (12), (24), (36), (48), (60), (72), (84), (96)})
This does not lead to a good partition. The first part is empty or so, the last
part is smaller than the "central" parts.
Sort.patch and Sort.v2.patch change FindQuantiles to make it return:
(all, {(11), (21), (31), (41), (51), (61), (71), (81), (91)})
It looks better. Actualy, it looks even nicer with a <= instead of < in the big
if in the loop...
(all, {(10), (20), (30), (40), (50), (60), (70), (80), (90)})
The impact on sort performance of this fix in FindQuantiles is probably
marginal. it just avoids some empty or smaller reduce jobs. But it gives
better quantiles to a end user trying to use the function.
2. I will try to run and time my test job over the weekend. The performance
killer was not the small glitch in FindQuantiles, but the fact that the
SortPartitioner's and the quantiles' comparator were not consistent. I'll try
to give you some figures.
3. I will also generate a Sort.v3.patch (with the <= in FindQuantiles) using
svn diff as eclipse tends to generates ugly patches with absolute paths.
> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
> Key: PIG-202
> URL: https://issues.apache.org/jira/browse/PIG-202
> Project: Pig
> Issue Type: Bug
> Reporter: Mathieu Poumeyrol
> Attachments: EvalSpec.patch, InstantiateFunc.patch,
> MapreducePlanCompiler.patch, Sort.patch, Sort.v2.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local
> implementation, nor by quartile lookup job.
> Patch coming soon.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.