Hi Miao,

For 1, Kylin is focusing on OLAP scenarios, so most queries are aggregated
query instead of detail query. But your scenario is a case that bitmap can
fit, if the result set isn't big, it is doable. Only need to decouple the
bitmap values (if the user id is integer family, no need to decode with
dictionary). This is something like the TopN measure.

For 2, yes the global dictionary will grow as user number grows.

For 3, If you use Kylin 2.1, the cube data, as well as metadata, will all
on HBase cluster.  Before Kylin 2.1, there is an issue will cause some
metadata file will be left on the hive cluster. With whatever deployment
topology, we suggest you backup the metadata periodically to minimize the
data loss possibility.

2017-12-06 9:45 GMT+08:00 崔苗 <[email protected]>:

> 1、we have four data node:us,shenzhen-china,hongkong-china and eu,every
> data node has one MySql database,we want to deploy four kylin cluster to
> anlyse the data and merge the result to get the final result , so we need
> the distinct user set in every data node and merge it to get rid of
> duplicated users. It seems it's not a good scenarios for kylin.
> 2、If we want to get the count distinct on string column,such as user ID,
> it's a high cardinality column,how to estimate the memory that the global
> dict need? Will kylin expand the global dict and the bitmap about users if
> users increase every day?
> 3、If we deploy kylin with standalone hbase cluster , does all the data
> about result ,such as dict , bitmap will be stored in the hbase cluster ?
> so we don't need to set HA mode on the other hadoop cluster(hive+spark)
> because the data loss in this cluster will not damage the result , we just
> need to ensure the high availability on the hbase cluster?
>
>
> 在 2017-12-06 08:41:13,ShaoFeng Shi <[email protected]> 写道:
>
> Hi Miao,
>
> 1. Currently, Kylin only returns the count in the bitmap, not IDs in it;
> It should be able to extend. Could you please describe your scenarios?
> 2. Yes, the Cube API will return each segment of the cube, and each
> segment has a start date and end date. Please check Kylin's Rest API
> document.
>
> 2017-12-05 18:31 GMT+08:00 崔苗 <[email protected]>:
>
>> 1、If there is Bitmap stored in hbase,can we get the distinct user set if
>> we need to know all the distinct users?
>> 2、Is there any restuful api could get the cube's
>> date_time,date_range_start and date_range_end?
>>
>>
>> 在 2017-11-30 18:30:27,ShaoFeng Shi <[email protected]> 写道:
>>
>> Hi Miao,
>>
>> Kylin use HyperLogLog or Bitmap to persistent the distinct values; You
>> can get some info from this blog: https://kylin.apache.org
>> /blog/2016/08/01/count-distinct-in-kylin/
>>
>> 2017-11-30 9:25 GMT+08:00 崔苗 <[email protected]>:
>>
>>> Hi,
>>> we want to get count(distinct user) group by
>>> hour/day/week/month/year,now we have a problem:
>>> what's the content of count(distinct user) that kylin keeps,the distinct
>>> users set or just a count number? If we want to count (distinct user) by
>>> year,do we need to keep data for a year in hive?
>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Reply via email to