> In the hive-hll-udf, you seem to mention about RRD. Is that something >supported by Hive?
No. RRDTool is what most people are replacing with Hive to store time series data in. Raw RRDTool files on a local disk have no availability model (i.e lose a disk, you lose data). The rollup concept however is very powerful, to maintain distinct aggregates of a time-series (& throw out the expired ones), which is what my example was last 30 days HLL + last 23 hours HLL + generate HLL over current_hour. to count billions of distincts across them with a few megabytes of storage. This can be then further extended to build hundreds of bitsets per hour, one for each tracked A/B experiment to collect stats on. Cheers, Gopal