what's the precision of the two distinct counter? is the 12 minutes the total time for building this cube?
2016-03-11 11:11 GMT+08:00 热爱大发挥 <[email protected]>: > hive table records: 1000000 > hive table size: 70MB > > cube info: > normal dimension : 8 (cardinality less than 6) > measures : count (distinctuid), the "uid" 's cardinality about 600000 > count (distinct keyword), the "keyword" 's cardinality > about 100000 > > cast time:12 MIN > cube size: 765MB > > > > ------------------ 原始邮件 ------------------ > *发件人:* "ShaoFeng Shi"<[email protected]>; > *发送时间:* 2016年3月11日(星期五) 上午10:48 > *收件人:* "user"<[email protected]>; > *主题:* Re: Build Measures with count distinct high cardinality column > > Which precision (error rate) you selected for this measure? "error rate < > 1.22%" will take much more storage than "error rate < 9.75%", user need > select proper precision depends on need. > > Also, when you state "cuboid size was very large and cast much time", > please provide detail information like source data size, dimension number, > dimension cardinality, measure definition, your hadoop cluster capacity, > cube expansion rate, build time etc. Otherwise we couldn't make judgement > and give comment. > > 2016-03-10 23:20 GMT+08:00 热爱大发挥 <[email protected]>: > >> In measures step, I try to count distinct cardinality column (like >> user_id), then I found the cuboid size was very large and cast much time. >> is deprecated count distinct with the high cardinality column??? >> > > > > -- > Best regards, > > Shaofeng Shi > > -- Best regards, Shaofeng Shi
