The example in the body of the ticket https://issues.apache.org/jira/browse/PIG-1926 is exactly the script you want.
Note that this is a new feature, you need 0.10 (not released yet -- in trunk) to get this to work. You could also do it with TOP as Norbert suggests, but that has a bit of extra cost due to the sort TOP does. D On Thu, Sep 8, 2011 at 6:42 AM, Norbert Burger <[email protected]>wrote: > Hi Ruslan -- no need to write your own UDF. There is a built-in > function TOP() which will extract for you the top N tuples of a > relation, where N is a configurable parameter. Take a look at: > > http://pig.apache.org/docs/r0.9.0/func.html#topx > > Norbert > > On Thu, Sep 8, 2011 at 9:13 AM, Ruslan Al-Fakikh > <[email protected]> wrote: > > Hey guys, > > > > How can I LIMIT a relation by percentage? > > What I need is to sort a relation by a numeric column and then take > > top 5% of tuples. > > As far as I understand I cannot use an expression in the LIMIT > > operator. Do I have to write my own UDF? What type of UDF should I use > > then? > > > > -- > > Best Regards, > > Ruslan Al-Fakikh > > >
