Generally speaking, we used dictionary to encode non-integer values, and 
mapping the dict id into bitmap to count.

In some details, original dictionary in Kylin is at segment level, which means 
that one same value in different segments may have different dict id, made the 
result wrong when count values across segments. 
We’ve introduced GlobalDictionary to solve this problem. Global Dict is at cube 
level, making sure one value has one stable dict id, no matter the value shows 
up in which or how many segments. The Global Dict is append-able, to support 
incremental cube building, and it’s also splittable with LRU cache, to reduce 
the memory cost, with huge dataset supporting, such as 500M etc. 

The code have been merge into master branch and will be released in v1.5.3, you 
can check it out. 

Any comment or discussion is welcome. 

Thanks.

> 在 2016年7月18日,15:41,big data <[email protected]> 写道:
> 
> I heard the Kylin support non-integer field by using bitmap index.
> 
> I just want to know how Kylin indexes the string field, and mapping each 
> item to bitmap?
> 
> Thanks.

Reply via email to