[ 
https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592518#action_12592518
 ] 

Alan Gates commented on PIG-202:
--------------------------------

A couple of comments/questions:

# I don't understand your changes to FindQuantiles.  If I read the code 
correctly, its now taking the first x values out of the bag, instead of 
sampling at regular intervals from the whole bag.  This has the advantage of 
not needing to read the whole bag (though the code isn't taking advantage of 
that), but it will give a much worse sample, at least if the bag is ordered.  
Am I missing something?  Should it be that we take the first n values and then 
break if the bag is unordered, and every 1/n values if the bag is ordered?
# Have you done any performance testing to get an idea of the speed up this 
gives?  Obviously that will depend on the data set, but it would be interesting 
to see.

FWIW, I haven't been ignoring your work on this.  It seemed you were making 
good progress and getting feedback from Pi, so I hadn't jumped in yet.

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, 
> MapreducePlanCompiler.patch, Sort.patch, Sort.v2.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local 
> implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to