?????? Build Measures with count distinct high cardinality column

?????????? Thu, 10 Mar 2016 19:12:07 -0800

hive table records: 1000000hive table size: 70MB


cube info:
normal dimension : 8 (cardinality less than 6)
measures :  count (distinctuid),  the "uid" 's cardinality about 600000
                   count (distinct keyword), the "keyword" 's cardinality about 
100000



cast time:12 MIN
cube size: 765MB






------------------ ???????? ------------------
??????: "ShaoFeng Shi"<[email protected]>; 
????????: 2016??3??11??(??????) ????10:48
??????: "user"<[email protected]>; 
????: Re: Build Measures with count distinct high cardinality column



Which precision (error rate) you selected for this measure? "error rate < 
1.22%" will take much more storage than "error rate < 9.75%", user need select 
proper precision depends on need. 

Also, when you state "cuboid size was very large and cast much time", please 
provide detail information like source data size, dimension number, dimension 
cardinality,  measure definition, your hadoop cluster capacity, cube expansion 
rate, build time etc. Otherwise we couldn't make judgement and give comment.


2016-03-10 23:20 GMT+08:00 ?????????? <[email protected]>:
In measures step, I try to count distinct cardinality column (like user_id), 
then I found the cuboid size was very large and cast much time.is deprecated 
count distinct with the high cardinality column???





-- 
Best regards,

Shaofeng Shi

?????? Build Measures with count distinct high cardinality column

Reply via email to