On Thu, Sep 8, 2011 at 10:03 AM, Norbert Burger <[email protected]>wrote:

> Hi Dmitriy -- great info, thanks.
>
> Just for my understanding, doesn't the ORDER BY in the PIG-1926
> example impose the same sort cost?  Seems that you'd have pay for a
> sort as long as the requirement is top N.
>
>

TOP is actually more efficient than ORDER. In Ruslan's case, he doesn't need
the order (or top) at all -- he just wants LIMIT, so that clause can be
skipped.

D



> Norbert
>
> > On Thu, Sep 8, 2011 at 6:42 AM, Norbert Burger <[email protected]
> >wrote:
> >
> >> Hi Ruslan -- no need to write your own UDF.  There is a built-in
> >> function TOP() which will extract for you the top N tuples of a
> >> relation, where N is a configurable parameter.  Take a look at:
> >>
> >> http://pig.apache.org/docs/r0.9.0/func.html#topx
> >>
> >> Norbert
> >>
> >> On Thu, Sep 8, 2011 at 9:13 AM, Ruslan Al-Fakikh
> >> <[email protected]> wrote:
> >> > Hey guys,
> >> >
> >> > How can I LIMIT a relation by percentage?
> >> > What I need is to sort a relation by a numeric column and then take
> >> > top 5% of tuples.
> >> > As far as I understand I cannot use an expression in the LIMIT
> >> > operator. Do I have to write my own UDF? What type of UDF should I use
> >> > then?
> >> >
> >> > --
> >> > Best Regards,
> >> > Ruslan Al-Fakikh
> >> >
> >>
> >
>

Reply via email to