Olga Natkovich commented on PIG-732:
Couple of additional comments:
(1) Top N
- You assume that you are getting data in as bytearrays (for n and fieldNum. It
would be better if you assume the actual types (int) andlet Pig to do
conversion for you because then your function will be able to work with data of
different types. You do that by adding getArgToFuncMapping function. You can
see the examples in other functions in the repository and also explanation of
usage in the UDF manual. This is also applicable for your second UDF.
- In the exec function, you check for 2 elements in the tuple but you are
- Looks like if you inserted too many elements you will be throwing away the
head of the queue. Is that what you want?
- You are not specifying tuple structure in your schema definition. This could
be an issue for some of your queries.
> Utility UDFs
> Key: PIG-732
> URL: https://issues.apache.org/jira/browse/PIG-732
> Project: Pig
> Issue Type: New Feature
> Reporter: Ankur
> Priority: Minor
> Attachments: udf.v1.patch, udf.v2.patch
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number
> (type long) to use for comparison, and an sorted/unsorted bag of tuples. It
> outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines
> (Yahoo, Google, AOL, Live) and extracts and normalizes the search query
> present in it.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.