Ankur commented on PIG-732:
Thanks for a quick review.
> (1) Pig already support limit operator ....
I have a relation where I need to group by field-1 and retain top-N occurrences
of field-2. So I group by (field-1, field-2), generate counts and flattened
tuple of the form (field-1, field2, <count>). Now I again group on field-1 and
just retain top-N tuples. So I actually need to project bags of limited size. I
don't think this can be done using LIMIT as it is not allowed inside FOREACH.
> (2) Filtering UDFs are meant to be used as ....
Moved TopN and SearchQuery UDFs to piggyBank/evaluation/util. Also moved the
test cases to the appropriate location.
> (3) Each file included needs to have Apache license header ....
> Utility UDFs
> Key: PIG-732
> URL: https://issues.apache.org/jira/browse/PIG-732
> Project: Pig
> Issue Type: New Feature
> Reporter: Ankur
> Priority: Minor
> Attachments: udf.v1.patch, udf.v2.patch
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number
> (type long) to use for comparison, and an sorted/unsorted bag of tuples. It
> outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines
> (Yahoo, Google, AOL, Live) and extracts and normalizes the search query
> present in it.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.