Thanks a lot Mike. This seems to be what I'm looking for ;) I'm a bit disappointed that what I wanted to achieve isn't possible without using any UDF.
Cheers, -Marco On Mon, Mar 18, 2013 at 9:40 PM, Mike Sukmanowsky <[email protected]> wrote: > You should check out the quantile libraries in LinkedIn's DataFu UDFs: > https://github.com/linkedin/datafu specifically > > https://github.com/linkedin/datafu/blob/master/src/java/datafu/pig/stats/Quantile.javafor > relatively small inputs, and > > https://github.com/linkedin/datafu/blob/master/src/java/datafu/pig/stats/StreamingQuantile.javafor > larger inputs. > > You can use this to receive the top x% for any given field and then you can > use that within a filter > > > On Mon, Mar 18, 2013 at 6:23 AM, Marco Cadetg <[email protected]> wrote: > > > Hi there, > > > > I would like to do something very similar to a nested foreach with using > > order by and then limit. But I would like to limit on a relation to the > > total number of records. > > > > users = load 'users' as (userid:chararray, money:long, region:chararray); > > grouped_region = group users by region; > > top_10_percent = foreach grouped_region { > > sorted = order users by money desc; > > top = limit sorted $UKNOWN_HOWTO_SET; -- e.g. for the top > > 10% it would be total users/10 in that region. > > generate group, flatten(top); > > }; > > > > Thanks a lot for any help on this. > > > > Cheers, > > -Marco > > > > > > -- > Mike Sukmanowsky > > Product Lead, http://parse.ly > 989 Avenue of the Americas, 3rd Floor > New York, NY 10018 > p: +1 (416) 953-4248 > e: [email protected] >
