Re: Percentile window function in apache drill
Drill currently supports the ranking window functions [1] but not specifically the percentile function. [1] https://drill.apache.org/docs/ranking-window-functions/ On Mon, Aug 12, 2019 at 7:34 AM Ted Dunning wrote: > Currently, the way to do this is with window queries where you sort each > sub-group and grab the pertinent rows as an approximation of the quantiles > you want. > > Another way would be to use an approximate data structure like a t-digest > via an aggregating user-defined function (UDF). > > Last I checked, there were some limitations in the ability to do this, but > things may have changed. > > > On Sun, Aug 11, 2019 at 9:15 PM Manoj srivatsav > > wrote: > > > Hi, > > > > I have a time series data store in MapR DB JSON tables. Using drill to > > query data out of it. > > I need find 90th and 95 percentile values on the data that is there. > > Time series data is that of multiple sensors data per time. > > > > I was able to get min, max and avg values by doing > > select `time`, min(`sensor1`), max(`sensor`), avg(`sensor1`) from > > dfs.`maprdb` group by `timestamp` order by `timestamp`. > > > > I need to if there is a way to specify a percentile value to calculate in > > query. > > > > -- > > Thanks and Regards, > > Manoj Srivatsav > > >
Re: Percentile window function in apache drill
Currently, the way to do this is with window queries where you sort each sub-group and grab the pertinent rows as an approximation of the quantiles you want. Another way would be to use an approximate data structure like a t-digest via an aggregating user-defined function (UDF). Last I checked, there were some limitations in the ability to do this, but things may have changed. On Sun, Aug 11, 2019 at 9:15 PM Manoj srivatsav wrote: > Hi, > > I have a time series data store in MapR DB JSON tables. Using drill to > query data out of it. > I need find 90th and 95 percentile values on the data that is there. > Time series data is that of multiple sensors data per time. > > I was able to get min, max and avg values by doing > select `time`, min(`sensor1`), max(`sensor`), avg(`sensor1`) from > dfs.`maprdb` group by `timestamp` order by `timestamp`. > > I need to if there is a way to specify a percentile value to calculate in > query. > > -- > Thanks and Regards, > Manoj Srivatsav >
Percentile window function in apache drill
Hi, I have a time series data store in MapR DB JSON tables. Using drill to query data out of it. I need find 90th and 95 percentile values on the data that is there. Time series data is that of multiple sensors data per time. I was able to get min, max and avg values by doing select `time`, min(`sensor1`), max(`sensor`), avg(`sensor1`) from dfs.`maprdb` group by `timestamp` order by `timestamp`. I need to if there is a way to specify a percentile value to calculate in query. -- Thanks and Regards, Manoj Srivatsav