[ 
https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mathieu Poumeyrol updated PIG-202:
----------------------------------

    Attachment: Sort.patch

As requested, a all-in-one patch (Sort.patch) that:
 - call instantiateFunc on PO before the actual execution (fix using clause in 
local context)
 - discard the only one "late" comparator instantiation I could found (made 
redundant, dead code)
 - correct a marginal biais in the findQuantile builtin function (one of the 
two extremum quantile was bigger or smaller depending on truncation)
 - fix quantile job.

The quantile job issue is tricky. It is not easy to show how it misbehaves with 
a pig unit test, as the result is correct... FindQuantiles is responsible for 
defining a partition of the intermediary keyspace. Hadoop uses this partition 
through a SortPartitioner instance to split the reduce half of the Sort job 
among several reduce tasks. Now the FindQuartiles were using a StarSpec as a 
comparator, whereas SortPartitioner were using the UDF comparator to perform a 
Arrays.binarySearch. The binary search can not work correctly in these 
conditions, and this leads to widely unbalanced reduce tasks as most of the 
keys fall in the same partition. 

"Prooving" this point actualy required counting how many items go to which 
partition in SortPartitioner (some printf-like debugging). But honestly, I 
think the patch just makes a lot of sense.

The fix just provides the UDF compartor to the sort used internaly by the 
findQuartile job.

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, 
> MapreducePlanCompiler.patch, Sort.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local 
> implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to