Re: Hive HLL for appx count distinct

2015-12-30 Thread Gopal Vijayaraghavan
> In the hive-hll-udf, you seem to mention about RRD. Is that something >supported by Hive? No. RRDTool is what most people are replacing with Hive to store time series data in. Raw RRDTool files on a local disk have no availability model (i.e lose a disk, you lose data). The rollup concept howe

Re: Hive HLL for appx count distinct

2015-12-30 Thread Buntu Dev
Thanks Gopal! In the hive-hll-udf, you seem to mention about RRD. Is that something supported by Hive? Will go over the Data Sketches as well, thanks for the pointer :) On Wed, Dec 30, 2015 at 4:29 PM, Gopal Vijayaraghavan wrote: > > > I'm trying to explore the HLL UDF option to compute # of u

Re: Hive HLL for appx count distinct

2015-12-30 Thread Gopal Vijayaraghavan
> I'm trying to explore the HLL UDF option to compute # of uniq users for >each time range (week, month, yr, etc.) and wanted to know if > its possible to just maintain HLL struct for each day and then use those >to compute the uniqs for various time > ranges using these per day structs instead of

Hive HLL for appx count distinct

2015-12-30 Thread Buntu Dev
I'm trying to explore the HLL UDF option to compute # of uniq users for each time range (week, month, yr, etc.) and wanted to know if its possible to just maintain HLL struct for each day and then use those to compute the uniqs for various time ranges using these per day structs instead of running