Hi Dmitriy -- great info, thanks. On Thu, Sep 8, 2011 at 12:19 PM, Dmitriy Ryaboy <[email protected]> wrote: > You could also do it with TOP as Norbert suggests, but that has a bit of > extra cost due to the sort TOP does.
Just for my understanding, doesn't the ORDER BY in the PIG-1926 example impose the same sort cost? Seems that you'd have pay for a sort as long as the requirement is top N. Norbert > On Thu, Sep 8, 2011 at 6:42 AM, Norbert Burger > <[email protected]>wrote: > >> Hi Ruslan -- no need to write your own UDF. There is a built-in >> function TOP() which will extract for you the top N tuples of a >> relation, where N is a configurable parameter. Take a look at: >> >> http://pig.apache.org/docs/r0.9.0/func.html#topx >> >> Norbert >> >> On Thu, Sep 8, 2011 at 9:13 AM, Ruslan Al-Fakikh >> <[email protected]> wrote: >> > Hey guys, >> > >> > How can I LIMIT a relation by percentage? >> > What I need is to sort a relation by a numeric column and then take >> > top 5% of tuples. >> > As far as I understand I cannot use an expression in the LIMIT >> > operator. Do I have to write my own UDF? What type of UDF should I use >> > then? >> > >> > -- >> > Best Regards, >> > Ruslan Al-Fakikh >> > >> >
