[ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689104#action_12689104 ]
Ankur commented on PIG-732: --------------------------- Olga, Thanks for a quick review. > (1) Pig already support limit operator .... I have a relation where I need to group by field-1 and retain top-N occurrences of field-2. So I group by (field-1, field-2), generate counts and flattened tuple of the form (field-1, field2, <count>). Now I again group on field-1 and just retain top-N tuples. So I actually need to project bags of limited size. I don't think this can be done using LIMIT as it is not allowed inside FOREACH. > (2) Filtering UDFs are meant to be used as .... Moved TopN and SearchQuery UDFs to piggyBank/evaluation/util. Also moved the test cases to the appropriate location. > (3) Each file included needs to have Apache license header .... Done. > Utility UDFs > ------------- > > Key: PIG-732 > URL: https://issues.apache.org/jira/browse/PIG-732 > Project: Pig > Issue Type: New Feature > Reporter: Ankur > Priority: Minor > Attachments: udf.v1.patch, udf.v2.patch > > > Two utility UDFs and their respective test cases. > 1. TopN - Accepts number of tuples (N) to retain in output, field number > (type long) to use for comparison, and an sorted/unsorted bag of tuples. It > outputs a bag containing top N tuples. > 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines > (Yahoo, Google, AOL, Live) and extracts and normalizes the search query > present in it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.