How would I calculate the percentile in one pass?  In order to calculate the
percentile for each item, I need to know the total count.  How do I get the
total count, and then calculate each item's percentile in one pass?

I don't mind doing multiple passes - I am just not sure how to make the
calculation.

Thanks
Dave Viner


On Tue, Jun 29, 2010 at 9:59 AM, hc busy <[email protected]> wrote:

> I think it's impossible to do this within one M/R. You will want to
> implement it in two M/R in Pig, because you have to calculate the
> percentile
> in pass 1, and then perform the filter in pass 2.
>
>
> On Tue, Jun 29, 2010 at 8:14 AM, Dave Viner <[email protected]> wrote:
>
> > Is there a UDF for generating the top X % of results?  For example, in a
> > log
> > parsing context, it might be the set of search queries that represent the
> > top 80% of all queries.
> >
> > I see in the piggybank that there is a TOP function, but that only takes
> > the
> > top *number* of results, rather a percentile.
> >
> > Thanks
> > Dave Viner
> >
>

Reply via email to