Olga Natkovich commented on PIG-732:


Thanks for contributing UDFs to PiggyBank!

A couple of questions/comments on your patch:

(1) Pig already supports limit operator. Would that serve your needs with TopN 
or you actually need to project bags of limitted size in foreach?
(2) Filtering UDFs are meant to be used as predicate in filter operators and as 
such should return Boolean values. I think your TopN should be in 
evaluation/util group
(3) Each file included needs to have Apache license header. You can just coppy 
it from one of the other files.

> Utility UDFs 
> -------------
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number 
> (type long) to use for comparison, and an sorted/unsorted bag of tuples. It 
> outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines 
> (Yahoo, Google, AOL, Live) and extracts and normalizes the search query 
> present in it.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to