On Thu, Sep 8, 2011 at 10:03 AM, Norbert Burger <[email protected]>wrote:
> Hi Dmitriy -- great info, thanks. > > Just for my understanding, doesn't the ORDER BY in the PIG-1926 > example impose the same sort cost? Seems that you'd have pay for a > sort as long as the requirement is top N. > > TOP is actually more efficient than ORDER. In Ruslan's case, he doesn't need the order (or top) at all -- he just wants LIMIT, so that clause can be skipped. D > Norbert > > > On Thu, Sep 8, 2011 at 6:42 AM, Norbert Burger <[email protected] > >wrote: > > > >> Hi Ruslan -- no need to write your own UDF. There is a built-in > >> function TOP() which will extract for you the top N tuples of a > >> relation, where N is a configurable parameter. Take a look at: > >> > >> http://pig.apache.org/docs/r0.9.0/func.html#topx > >> > >> Norbert > >> > >> On Thu, Sep 8, 2011 at 9:13 AM, Ruslan Al-Fakikh > >> <[email protected]> wrote: > >> > Hey guys, > >> > > >> > How can I LIMIT a relation by percentage? > >> > What I need is to sort a relation by a numeric column and then take > >> > top 5% of tuples. > >> > As far as I understand I cannot use an expression in the LIMIT > >> > operator. Do I have to write my own UDF? What type of UDF should I use > >> > then? > >> > > >> > -- > >> > Best Regards, > >> > Ruslan Al-Fakikh > >> > > >> > > >
