Correct; GlobalDictionary can only encode a Non-integer to an integer, but not able to decode from integer to original value.
2017-12-08 16:16 GMT+08:00 崔苗 <[email protected]>: > 1、the user_id is unique string id,but now we can't get user_id set from > kylin,right? > > > 在 2017-12-07 09:57:31,ShaoFeng Shi <[email protected]> 写道: > > Hi Miao, > > For 1, Kylin is focusing on OLAP scenarios, so most queries are aggregated > query instead of detail query. But your scenario is a case that bitmap can > fit, if the result set isn't big, it is doable. Only need to decouple the > bitmap values (if the user id is integer family, no need to decode with > dictionary). This is something like the TopN measure. > > For 2, yes the global dictionary will grow as user number grows. > > For 3, If you use Kylin 2.1, the cube data, as well as metadata, will all > on HBase cluster. Before Kylin 2.1, there is an issue will cause some > metadata file will be left on the hive cluster. With whatever deployment > topology, we suggest you backup the metadata periodically to minimize the > data loss possibility. > > 2017-12-06 9:45 GMT+08:00 崔苗 <[email protected]>: > >> 1、we have four data node:us,shenzhen-china,hongkong-china and eu,every >> data node has one MySql database,we want to deploy four kylin cluster to >> anlyse the data and merge the result to get the final result , so we need >> the distinct user set in every data node and merge it to get rid of >> duplicated users. It seems it's not a good scenarios for kylin. >> 2、If we want to get the count distinct on string column,such as user ID, >> it's a high cardinality column,how to estimate the memory that the global >> dict need? Will kylin expand the global dict and the bitmap about users if >> users increase every day? >> 3、If we deploy kylin with standalone hbase cluster , does all the data >> about result ,such as dict , bitmap will be stored in the hbase cluster ? >> so we don't need to set HA mode on the other hadoop cluster(hive+spark) >> because the data loss in this cluster will not damage the result , we just >> need to ensure the high availability on the hbase cluster? >> >> >> 在 2017-12-06 08:41:13,ShaoFeng Shi <[email protected]> 写道: >> >> Hi Miao, >> >> 1. Currently, Kylin only returns the count in the bitmap, not IDs in it; >> It should be able to extend. Could you please describe your scenarios? >> 2. Yes, the Cube API will return each segment of the cube, and each >> segment has a start date and end date. Please check Kylin's Rest API >> document. >> >> 2017-12-05 18:31 GMT+08:00 崔苗 <[email protected]>: >> >>> 1、If there is Bitmap stored in hbase,can we get the distinct user set if >>> we need to know all the distinct users? >>> 2、Is there any restuful api could get the cube's >>> date_time,date_range_start and date_range_end? >>> >>> >>> 在 2017-11-30 18:30:27,ShaoFeng Shi <[email protected]> 写道: >>> >>> Hi Miao, >>> >>> Kylin use HyperLogLog or Bitmap to persistent the distinct values; You >>> can get some info from this blog: https://kylin.apache.org >>> /blog/2016/08/01/count-distinct-in-kylin/ >>> >>> 2017-11-30 9:25 GMT+08:00 崔苗 <[email protected]>: >>> >>>> Hi, >>>> we want to get count(distinct user) group by >>>> hour/day/week/month/year,now we have a problem: >>>> what's the content of count(distinct user) that kylin keeps,the >>>> distinct users set or just a count number? If we want to count (distinct >>>> user) by year,do we need to keep data for a year in hive? >>>> >>>> >>>> >>> >>> >>> -- >>> Best regards, >>> >>> Shaofeng Shi 史少锋 >>> >>> >>> >> >> >> -- >> Best regards, >> >> Shaofeng Shi 史少锋 >> >> >> > > > -- > Best regards, > > Shaofeng Shi 史少锋 > > > -- Best regards, Shaofeng Shi 史少锋
