Hi,
Just like theta sketches are used for distinct count metrics like
impressions and clicks, is there a sketch (perhaps quantile?) that can be
used for metrics like ad_spend? If so, what are the error bounds?

There is a big opportunity that I see in storing very little data in
sketches (which I use as a set) which makes retrieval of aggregate/analytic
data very fast (although it is approximate).

This question is best explained with an example. Say I have a fact table
schema as follows


   1. *Age_Groups* is a dimension with 10 bucketed distinct values {less
   than 18, 19-25, 26-35....}
   2. *Gender* is a dimension with 3 distinct values F, M & Unknown
   3. *Country* is a  dimension with 200 possible values
   4. *Impressions* is a metric which is a long count (perhaps with Theta
   sketch or HLL)
   5. *Clicks* is a metric which is a long count (perhaps with Theta sketch
   or HLL)
   6. *Ad-Spend* is a metric which is a double which is a sum (*perhaps
   with quantile sketch??*)


The maximum possible number of entries in this table would be the cross
product of the cardinalities of the dimensions which is 10(Age_Group) x
3(Gender) x 200(Country) = 6000. Now instead of storing 6000 records, I
could store only (10 + 3 + 200) * *3 sketches(**one each for impression,
clicks and Ad-Spend)* = 639 sketches and accomplish the group by queries
using set operations that sketches provides.

For impression metric, one sketch for Gender=F, another sketch for
Gender=M, yet another for Gender=Unknown and so on for other dimensions as
well.
For click metric, one sketch for Gender=F, another sketch for Gender=M, yet
another for Gender=Unknown and so on for other dimensions as well.
Question is what sketch would I need for ad-spend?

On a side note, I have used Theta sketches in this manner and even
implemented ECLAT for frequent itemset computations and association rule
mining ... example from another project below with theta sketches used for
count, I do not know how to do the same for a metric like Ad-Spend.

Level Conjunction Count
2 H10 & MemberGender=F 74
2 M15 & MemberGender=F 83
2 G31 & MemberGender=F 59
2 MemberGender=F & R13 66
*In the example above, H10, M15 etc are International Disease codes for
specific diseases. *https://www.aapc.com/codes/code-search/

Hope this is a clear representation of the problem.

Regards
Vijay

Reply via email to