Hi, Just like theta sketches are used for distinct count metrics like impressions and clicks, is there a sketch (perhaps quantile?) that can be used for metrics like ad_spend? If so, what are the error bounds?
There is a big opportunity that I see in storing very little data in sketches (which I use as a set) which makes retrieval of aggregate/analytic data very fast (although it is approximate). This question is best explained with an example. Say I have a fact table schema as follows 1. *Age_Groups* is a dimension with 10 bucketed distinct values {less than 18, 19-25, 26-35....} 2. *Gender* is a dimension with 3 distinct values F, M & Unknown 3. *Country* is a dimension with 200 possible values 4. *Impressions* is a metric which is a long count (perhaps with Theta sketch or HLL) 5. *Clicks* is a metric which is a long count (perhaps with Theta sketch or HLL) 6. *Ad-Spend* is a metric which is a double which is a sum (*perhaps with quantile sketch??*) The maximum possible number of entries in this table would be the cross product of the cardinalities of the dimensions which is 10(Age_Group) x 3(Gender) x 200(Country) = 6000. Now instead of storing 6000 records, I could store only (10 + 3 + 200) * *3 sketches(**one each for impression, clicks and Ad-Spend)* = 639 sketches and accomplish the group by queries using set operations that sketches provides. For impression metric, one sketch for Gender=F, another sketch for Gender=M, yet another for Gender=Unknown and so on for other dimensions as well. For click metric, one sketch for Gender=F, another sketch for Gender=M, yet another for Gender=Unknown and so on for other dimensions as well. Question is what sketch would I need for ad-spend? On a side note, I have used Theta sketches in this manner and even implemented ECLAT for frequent itemset computations and association rule mining ... example from another project below with theta sketches used for count, I do not know how to do the same for a metric like Ad-Spend. Level Conjunction Count 2 H10 & MemberGender=F 74 2 M15 & MemberGender=F 83 2 G31 & MemberGender=F 59 2 MemberGender=F & R13 66 *In the example above, H10, M15 etc are International Disease codes for specific diseases. *https://www.aapc.com/codes/code-search/ Hope this is a clear representation of the problem. Regards Vijay