Re: Percentile window function in apache drill

2019-08-12 Thread Aman Sinha
Drill  currently supports the ranking window functions [1] but not
specifically the percentile function.

[1] https://drill.apache.org/docs/ranking-window-functions/

On Mon, Aug 12, 2019 at 7:34 AM Ted Dunning  wrote:

> Currently, the way to do this is with window queries where you sort each
> sub-group and grab the pertinent rows as an approximation of the quantiles
> you want.
>
> Another way would be to use an approximate data structure like a t-digest
> via an aggregating user-defined function (UDF).
>
> Last I checked, there were some limitations in the ability to do this, but
> things may have changed.
>
>
> On Sun, Aug 11, 2019 at 9:15 PM Manoj srivatsav  >
> wrote:
>
> > Hi,
> >
> > I have a time series data store in MapR DB JSON tables. Using drill to
> > query data out of it.
> > I need find 90th and 95 percentile values on the data that is there.
> > Time series data is that of multiple sensors data per time.
> >
> > I was able to get min, max and avg values by doing
> > select `time`, min(`sensor1`), max(`sensor`), avg(`sensor1`) from
> > dfs.`maprdb` group by `timestamp` order by `timestamp`.
> >
> > I need to if there is a way to specify a percentile value to calculate in
> > query.
> >
> > --
> > Thanks and Regards,
> > Manoj Srivatsav
> >
>


Re: Percentile window function in apache drill

2019-08-12 Thread Ted Dunning
Currently, the way to do this is with window queries where you sort each
sub-group and grab the pertinent rows as an approximation of the quantiles
you want.

Another way would be to use an approximate data structure like a t-digest
via an aggregating user-defined function (UDF).

Last I checked, there were some limitations in the ability to do this, but
things may have changed.


On Sun, Aug 11, 2019 at 9:15 PM Manoj srivatsav 
wrote:

> Hi,
>
> I have a time series data store in MapR DB JSON tables. Using drill to
> query data out of it.
> I need find 90th and 95 percentile values on the data that is there.
> Time series data is that of multiple sensors data per time.
>
> I was able to get min, max and avg values by doing
> select `time`, min(`sensor1`), max(`sensor`), avg(`sensor1`) from
> dfs.`maprdb` group by `timestamp` order by `timestamp`.
>
> I need to if there is a way to specify a percentile value to calculate in
> query.
>
> --
> Thanks and Regards,
> Manoj Srivatsav
>


Percentile window function in apache drill

2019-08-11 Thread Manoj srivatsav
Hi,

I have a time series data store in MapR DB JSON tables. Using drill to
query data out of it.
I need find 90th and 95 percentile values on the data that is there.
Time series data is that of multiple sensors data per time.

I was able to get min, max and avg values by doing
select `time`, min(`sensor1`), max(`sensor`), avg(`sensor1`) from
dfs.`maprdb` group by `timestamp` order by `timestamp`.

I need to if there is a way to specify a percentile value to calculate in
query.

-- 
Thanks and Regards,
Manoj Srivatsav