Add LIMIT as a statement that works in nested FOREACH
-----------------------------------------------------
Key: PIG-741
URL: https://issues.apache.org/jira/browse/PIG-741
Project: Pig
Issue Type: New Feature
Reporter: David Ciemiewicz
I'd like to compute the top 10 results in each group.
The natural way to express this in Pig would be:
{code}
A = load '...' using PigStorage() as (
date: int,
count: int,
url: chararray
);
B = group A by ( date );
C = foreach B {
D = order A by count desc;
E = limit D 10;
generate
FLATTEN(E);
};
dump C;
{code}
Yeah, I could write a UDF / PiggyBank function to take the top n results. But
since LIMIT already exists as a statement, it seems like it should also work in
the nested foreach context.
Example workaround code.
{code}
C = foreach B {
D = order A by count desc;
E = util.TOP(D, 10);
generate
FLATTEN(E);
};
dump C;
{code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.