How would I calculate the percentile in one pass? In order to calculate the percentile for each item, I need to know the total count. How do I get the total count, and then calculate each item's percentile in one pass?
I don't mind doing multiple passes - I am just not sure how to make the calculation. Thanks Dave Viner On Tue, Jun 29, 2010 at 9:59 AM, hc busy <[email protected]> wrote: > I think it's impossible to do this within one M/R. You will want to > implement it in two M/R in Pig, because you have to calculate the > percentile > in pass 1, and then perform the filter in pass 2. > > > On Tue, Jun 29, 2010 at 8:14 AM, Dave Viner <[email protected]> wrote: > > > Is there a UDF for generating the top X % of results? For example, in a > > log > > parsing context, it might be the set of search queries that represent the > > top 80% of all queries. > > > > I see in the piggybank that there is a TOP function, but that only takes > > the > > top *number* of results, rather a percentile. > > > > Thanks > > Dave Viner > > >
