1、we have four data node:us,shenzhen-china,hongkong-china and eu,every data 
node has one MySql database,we want to deploy four kylin cluster to anlyse the 
data and merge the result to get the final result , so we need the distinct 
user set in every data node and merge it to get rid of duplicated users. It 
seems it's not a good scenarios for kylin.
2、If we want to get the count distinct on string column,such as user ID, it's a 
high cardinality column,how to estimate the memory that the global dict need? 
Will kylin expand the global dict and the bitmap about users if users increase 
every day?
3、If we deploy kylin with standalone hbase cluster , does all the data about 
result ,such as dict , bitmap will be stored in the hbase cluster ? so we don't 
need to set HA mode on the other hadoop cluster(hive+spark) because the data 
loss in this cluster will not damage the result , we just need to ensure the 
high availability on the hbase cluster?

在 2017-12-06 08:41:13,ShaoFeng Shi <[email protected]> 写道:
Hi Miao,

1. Currently, Kylin only returns the count in the bitmap, not IDs in it; It 
should be able to extend. Could you please describe your scenarios?
2. Yes, the Cube API will return each segment of the cube, and each segment has 
a start date and end date. Please check Kylin's Rest API document.


2017-12-05 18:31 GMT+08:00 崔苗 <[email protected]>:
1、If there is Bitmap stored in hbase,can we get the distinct user set if we 
need to know all the distinct users?
2、Is there any restuful api could get the cube's date_time,date_range_start and 
date_range_end?

在 2017-11-30 18:30:27,ShaoFeng Shi <[email protected]> 写道:
Hi Miao,

Kylin use HyperLogLog or Bitmap to persistent the distinct values; You can get 
some info from this blog: 
https://kylin.apache.org/blog/2016/08/01/count-distinct-in-kylin/


2017-11-30 9:25 GMT+08:00 崔苗 <[email protected]>:
Hi,
we want to get count(distinct user) group by hour/day/week/month/year,now we 
have a problem:
what's the content of count(distinct user) that kylin keeps,the distinct users 
set or just a count number? If we want to count (distinct user) by year,do we 
need to keep data for a year in hive?









--
Best regards,

Shaofeng Shi 史少锋
















--
Best regards,

Shaofeng Shi 史少锋










Reply via email to