I know of this type of a call would give you a subset of the table ... also I think you can use a group by clause to get it for groups of data.
> SELECT PERCENTILE(val, 0.5) FROM pct_test WHERE val > 100; Couldn't you use this call a few times to get the value for each percentile value? I think this is also a valid syntax though I've not used it. > SELECT PERCENTILE(val, array(0.25, 0.5, 0.75)) FROM pct_test WHERE val > 100; Note that neither of these are available with older versions of hive. Cheers, -Ajo. On Tue, Mar 1, 2011 at 12:51 PM, Sameer Kalburgi <sameerkalbu...@gmail.com>wrote: > Hello, > > Does anyone have any experience calculating the percentile / percentrank > for each row in a table? > > I see that there are built in UDAFs to calculate the percentile, but that > would only return a single value for the entire table. > > Essentially, I'm trying to recreate the Excel PercentRank function > described here: > http://msdn.microsoft.com/en-us/library/bb239448(v=office.12).aspx > > For example, the input: > > Score > -------- > 10 > 20 > 30 > 40 > 50 > > would yield > > PercentRank > ------------------ > 0.0 > 0.25 > 0.5 > 0.75 > 1.0 > > Any ideas? > > Thanks, > Sameer >