Hi Prashant, the query is valid, no limit on the order of the elements; Could you please open a JIRA <https://issues.apache.org/jira/secure/Dashboard.jspa> for Kylin with the version number? Thanks for the reporting!
2016-01-12 3:49 GMT+08:00 Prashant Prakash <[email protected]>: > Hi, > > I am experiencing strange issue with count(distinct) query in kylin. We > are using hllc12 for evaluating uniques for a measure in a table > partitioned over date. > The uniques estimate for individual dates 2016-01-07, 2016-01-08, > 2016-01-09 are 93,728,324, 90,982,364, 45,485,278 respectively. > But the uniques across days, which is calculated through > HyperLogLogPlusCounter.merge operation gives a value 67,980,576. > > 1. Is the query for distinct across days a valid usage for kylin ? > > Sample query: > SELECT COUNT(DISTINCT f.userid) AS m1 FROM kylin.fact_publishers_uniques f > WHERE > dt in ('2016-01-09', '2016-01-08', '2016-01-07') > > Theoretically the lower bound for uniques across days should at least be > the maximum of uniques for each day, the final number does not seems > correct. > To debug the issue we also calculated uniques across 2016-01-07, > 2016-01-08. It accounts to 164,637,916. Its only when we merge data for > 2016-01-09 we get spurious value. > > 2. Is there any limit on the relative order elements being merged ? > > Regards, > Prashant > -- Best regards, Shaofeng Shi
