the original JIRA for global dict is https://issues.apache.org/jira/browse/KYLIN-1705, now it's pending on GUI part https://issues.apache.org/jira/browse/KYLIN-1904
On Tue, Jul 19, 2016 at 2:01 PM, big data <[email protected]> wrote: > Thank you ,Sun. I'm still downloading the code, so I first browse the > articles about Kylin dictionary, still some open questions about it: > > 1. This > article(http://kylin.apache.org/blog/2015/08/13/kylin-dictionary/) > describes the Trie structure for the dictionary, but I didn't catch the > generation of Seq No. in the Trie example. How dictonary generate the > seq no for each coming string? > > 2. If the string field is user id or device id with millions of (even > billiions of) UUID, the Trie will have fixed height (same length of > UUID, such as 32 bytes), so the dictionay will be too huge. Does Kylin > still calculate the accurate cardinality value? or approprete value? And > How Kylin can keep the query performance for the huge one? > > Thanks. > > > > 在 16/7/19 上午11:01, Yerui Sun 写道: > > Generally speaking, we used dictionary to encode non-integer values, and > mapping the dict id into bitmap to count. > > > > In some details, original dictionary in Kylin is at segment level, which > means that one same value in different segments may have different dict id, > made the result wrong when count values across segments. > > We’ve introduced GlobalDictionary to solve this problem. Global Dict is > at cube level, making sure one value has one stable dict id, no matter the > value shows up in which or how many segments. The Global Dict is > append-able, to support incremental cube building, and it’s also splittable > with LRU cache, to reduce the memory cost, with huge dataset supporting, > such as 500M etc. > > > > The code have been merge into master branch and will be released in > v1.5.3, you can check it out. > > > > Any comment or discussion is welcome. > > > > Thanks. > > > >> 在 2016年7月18日,15:41,big data <[email protected]> 写道: > >> > >> I heard the Kylin support non-integer field by using bitmap index. > >> > >> I just want to know how Kylin indexes the string field, and mapping each > >> item to bitmap? > >> > >> Thanks. > > . > > > > -- Regards, *Bin Mahone | 马洪宾*
