Thanks a lot Mike. This seems to be what I'm looking for ;)

I'm a bit disappointed that what I wanted to achieve isn't possible without
using any UDF.

Cheers,
-Marco


On Mon, Mar 18, 2013 at 9:40 PM, Mike Sukmanowsky <[email protected]> wrote:

> You should check out the quantile libraries in LinkedIn's DataFu UDFs:
> https://github.com/linkedin/datafu specifically
>
> https://github.com/linkedin/datafu/blob/master/src/java/datafu/pig/stats/Quantile.javafor
> relatively small inputs, and
>
> https://github.com/linkedin/datafu/blob/master/src/java/datafu/pig/stats/StreamingQuantile.javafor
> larger inputs.
>
> You can use this to receive the top x% for any given field and then you can
> use that within a filter
>
>
> On Mon, Mar 18, 2013 at 6:23 AM, Marco Cadetg <[email protected]> wrote:
>
> > Hi there,
> >
> > I would like to do something very similar to a nested foreach with using
> > order by and then limit. But I would like to limit on a relation to the
> > total number of records.
> >
> > users = load 'users' as (userid:chararray, money:long, region:chararray);
> > grouped_region = group users by region;
> > top_10_percent = foreach grouped_region {
> >             sorted = order users by money desc;
> >             top    = limit sorted $UKNOWN_HOWTO_SET; -- e.g. for the top
> > 10% it would be total users/10 in that region.
> >             generate group, flatten(top);
> > };
> >
> > Thanks a lot for any help on this.
> >
> > Cheers,
> > -Marco
> >
>
>
>
> --
> Mike Sukmanowsky
>
> Product Lead, http://parse.ly
> 989 Avenue of the Americas, 3rd Floor
> New York, NY  10018
> p: +1 (416) 953-4248
> e: [email protected]
>

Reply via email to