Re: Re: Re: Re: count distinct

ShaoFeng Shi Fri, 08 Dec 2017 00:56:02 -0800

Correct; GlobalDictionary can only encode a Non-integer to an integer, but
not able to decode from integer to original value.


2017-12-08 16:16 GMT+08:00 崔苗 <[email protected]>:

> 1、the user_id is unique string id,but now we can't get user_id set from
> kylin,right?
>
>
> 在 2017-12-07 09:57:31，ShaoFeng Shi <[email protected]> 写道：
>
> Hi Miao,
>
> For 1, Kylin is focusing on OLAP scenarios, so most queries are aggregated
> query instead of detail query. But your scenario is a case that bitmap can
> fit, if the result set isn't big, it is doable. Only need to decouple the
> bitmap values (if the user id is integer family, no need to decode with
> dictionary). This is something like the TopN measure.
>
> For 2, yes the global dictionary will grow as user number grows.
>
> For 3, If you use Kylin 2.1, the cube data, as well as metadata, will all
> on HBase cluster.  Before Kylin 2.1, there is an issue will cause some
> metadata file will be left on the hive cluster. With whatever deployment
> topology, we suggest you backup the metadata periodically to minimize the
> data loss possibility.
>
> 2017-12-06 9:45 GMT+08:00 崔苗 <[email protected]>:
>
>> 1、we have four data node:us,shenzhen-china,hongkong-china and eu,every
>> data node has one MySql database,we want to deploy four kylin cluster to
>> anlyse the data and merge the result to get the final result , so we need
>> the distinct user set in every data node and merge it to get rid of
>> duplicated users. It seems it's not a good scenarios for kylin.
>> 2、If we want to get the count distinct on string column,such as user ID,
>> it's a high cardinality column,how to estimate the memory that the global
>> dict need? Will kylin expand the global dict and the bitmap about users if
>> users increase every day?
>> 3、If we deploy kylin with standalone hbase cluster , does all the data
>> about result ,such as dict , bitmap will be stored in the hbase cluster ?
>> so we don't need to set HA mode on the other hadoop cluster(hive+spark)
>> because the data loss in this cluster will not damage the result , we just
>> need to ensure the high availability on the hbase cluster?
>>
>>
>> 在 2017-12-06 08:41:13，ShaoFeng Shi <[email protected]> 写道：
>>
>> Hi Miao,
>>
>> 1. Currently, Kylin only returns the count in the bitmap, not IDs in it;
>> It should be able to extend. Could you please describe your scenarios?
>> 2. Yes, the Cube API will return each segment of the cube, and each
>> segment has a start date and end date. Please check Kylin's Rest API
>> document.
>>
>> 2017-12-05 18:31 GMT+08:00 崔苗 <[email protected]>:
>>
>>> 1、If there is Bitmap stored in hbase,can we get the distinct user set if
>>> we need to know all the distinct users?
>>> 2、Is there any restuful api could get the cube's
>>> date_time,date_range_start and date_range_end?
>>>
>>>
>>> 在 2017-11-30 18:30:27，ShaoFeng Shi <[email protected]> 写道：
>>>
>>> Hi Miao,
>>>
>>> Kylin use HyperLogLog or Bitmap to persistent the distinct values; You
>>> can get some info from this blog: https://kylin.apache.org
>>> /blog/2016/08/01/count-distinct-in-kylin/
>>>
>>> 2017-11-30 9:25 GMT+08:00 崔苗 <[email protected]>:
>>>
>>>> Hi,
>>>> we want to get count(distinct user) group by
>>>> hour/day/week/month/year,now we have a problem:
>>>> what's the content of count(distinct user) that kylin keeps,the
>>>> distinct users set or just a count number? If we want to count (distinct
>>>> user) by year,do we need to keep data for a year in hive?
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Re: Re: Re: Re: count distinct

Reply via email to