The example in the body of the ticket
https://issues.apache.org/jira/browse/PIG-1926 is exactly the script you
want.

Note that this is a new feature, you need 0.10 (not released yet -- in
trunk) to get this to work.

You could also do it with TOP as Norbert suggests, but that has a bit of
extra cost due to the sort TOP does.

D


On Thu, Sep 8, 2011 at 6:42 AM, Norbert Burger <[email protected]>wrote:

> Hi Ruslan -- no need to write your own UDF.  There is a built-in
> function TOP() which will extract for you the top N tuples of a
> relation, where N is a configurable parameter.  Take a look at:
>
> http://pig.apache.org/docs/r0.9.0/func.html#topx
>
> Norbert
>
> On Thu, Sep 8, 2011 at 9:13 AM, Ruslan Al-Fakikh
> <[email protected]> wrote:
> > Hey guys,
> >
> > How can I LIMIT a relation by percentage?
> > What I need is to sort a relation by a numeric column and then take
> > top 5% of tuples.
> > As far as I understand I cannot use an expression in the LIMIT
> > operator. Do I have to write my own UDF? What type of UDF should I use
> > then?
> >
> > --
> > Best Regards,
> > Ruslan Al-Fakikh
> >
>

Reply via email to