RE: [jira] Commented: (PIG-171) Top K

Haijun Cao Thu, 19 Jun 2008 14:09:18 -0700

Yes, I agree. TOP and SAMPLE are different operators. 

Haijun


-----Original Message-----
From: Daniel Dai (JIRA) [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 19, 2008 12:20 PM
To: [email protected]
Subject: [jira] Commented: (PIG-171) Top K


    [ 
https://issues.apache.org/jira/browse/PIG-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606518#action_12606518
 ] 

Daniel Dai commented on PIG-171:
--------------------------------

If we use "SAMPLE" instead of "LIMIT" for first k output, people will expect 
this is a fairly random sample. They may not notice that the sample they've got 
is just a "first k". To me, it seems to be more confusing. What Pi suggested is 
a dedicated "SAMPLE" operator. It should be a random sample and should have a 
different implementation. How do you think?

> Top K
> -----
>
>                 Key: PIG-171
>                 URL: https://issues.apache.org/jira/browse/PIG-171
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Amir Youssefi
>
> Frequently, users are interested on Top results (especially Top K rows) . 
> This can be implemented efficiently in Pig /Map Reduce settings to deliver 
> rapid results and low Network Bandwidth/Memory usage.
>  
>  Key point is to prune all data on the map side and keep only small set of 
> rows with Top criteria . We can do it in Algebraic function (combiner) with 
> multiple value output. Only a small data-set gets out of mapper node.
> The same idea is applicable to solve variants of this problem:
>   - An Algebraic Function for 'Top K Rows'
>   - An Algebraic Function for 'Top K' values ('Top Rank K' and 'Top Dense 
> Rank K')
>   - TOP K ORDER BY.
> Another words implementation is similar to combiners for aggregate functions 
> but instead of one value we get multiple ones. 
> I will add a sample implementation for Top K Rows and possibly TOP K ORDER BY 
> to clarify details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: [jira] Commented: (PIG-171) Top K

Reply via email to