hive table records: 1000000hive table size: 70MB
cube info:
normal dimension : 8 (cardinality less than 6)
measures : count (distinctuid), the "uid" 's cardinality about 600000
count (distinct keyword), the "keyword" 's cardinality about
100000
cast time:12 MIN
cube size: 765MB
------------------ ???????? ------------------
??????: "ShaoFeng Shi"<[email protected]>;
????????: 2016??3??11??(??????) ????10:48
??????: "user"<[email protected]>;
????: Re: Build Measures with count distinct high cardinality column
Which precision (error rate) you selected for this measure? "error rate <
1.22%" will take much more storage than "error rate < 9.75%", user need select
proper precision depends on need.
Also, when you state "cuboid size was very large and cast much time", please
provide detail information like source data size, dimension number, dimension
cardinality, measure definition, your hadoop cluster capacity, cube expansion
rate, build time etc. Otherwise we couldn't make judgement and give comment.
2016-03-10 23:20 GMT+08:00 ?????????? <[email protected]>:
In measures step, I try to count distinct cardinality column (like user_id),
then I found the cuboid size was very large and cast much time.is deprecated
count distinct with the high cardinality column???
--
Best regards,
Shaofeng Shi