We have 30+ million event log of each day, and amost 15w+ cardinality of 6 dimension. We used 4.86% precision of HLL measure, and queried page by page.
hongbin ma <[email protected]>于2016年8月4日周四 下午11:39写道: > after you run such query, check out the KYLIN_HOME/logs/kylin.log, there > should be snippet like: > > 2016-08-04 00:48:31,990 INFO [http-bio-7070-exec-7] > service.QueryService:399 : Scan count for each storageContext: 12306477, > 2016-08-04 00:48:31,991 INFO [http-bio-7070-exec-7] > controller.QueryController:197 : Stats of SQL response: isException: false, > duration: 56152, total scan count 12306477 > 2016-08-04 00:48:32,000 WARN [http-bio-7070-exec-7] > > can you let us know " Scan count for each storageContext" and the size of > your query result? > > On Thu, Aug 4, 2016 at 2:21 PM, Li Yang <[email protected]> wrote: > >> Depending on how many rows and how many count distinct values are >> returned, the query may take much memory and become slow. >> >> By saying querying uv of a month data, how many rows do you expect? Also >> what's the precision of the HLL measure? Lower the precision can ease the >> problem too. >> >> On Fri, Jul 29, 2016 at 4:54 PM, 张天生 <[email protected]> wrote: >> >>> I'm using kylin 1.5.2.1. I built a cube for a month's event data of >>> advertisment impression/click/conversion. It consists of 6 dimensions and 8 >>> measures. It consists of 2 uv measures, and uv measure was computed by >>> DISTINCT COUNT. The cube size is 2G. When i queried uv measures of a month >>> data, the memory quickly increased to 30G+, and the quey was also slowly. I >>> don't known why it occupied so much memory, but cube size is only 2G, >>> memory data expanded so big. Hower, when i executed simple silimar sum or >>> count query ,it was fast and occupied memory not too much. >>> >> >> > > > -- > Regards, > > *Bin Mahone | 马洪宾* >
