Hi Prashant, the query is valid, no limit on the order of the elements;
Could you please open a JIRA
<https://issues.apache.org/jira/secure/Dashboard.jspa> for Kylin with the
version number? Thanks for the reporting!

2016-01-12 3:49 GMT+08:00 Prashant Prakash <[email protected]>:

> Hi,
>
> I am experiencing strange issue with count(distinct) query in kylin. We
> are using hllc12 for evaluating uniques for a measure in a table
> partitioned over date.
> The uniques estimate for individual dates 2016-01-07, 2016-01-08,
> 2016-01-09 are 93,728,324, 90,982,364, 45,485,278 respectively.
> But the uniques across days, which is calculated through
> HyperLogLogPlusCounter.merge operation gives a value 67,980,576.
>
> 1. Is the query for distinct across days a valid usage for kylin ?
>
> Sample query:
> SELECT COUNT(DISTINCT f.userid) AS m1 FROM kylin.fact_publishers_uniques f 
> WHERE
> dt in ('2016-01-09', '2016-01-08', '2016-01-07')
>
> Theoretically the lower bound for uniques across days should at least be
> the maximum of uniques for each day, the final number does not seems
> correct.
> To debug the issue we also calculated uniques across  2016-01-07,
> 2016-01-08. It accounts to 164,637,916. Its only when we merge data for
> 2016-01-09 we get spurious value.
>
> 2. Is there any limit on the relative order elements being merged ?
>
> Regards,
> Prashant
>



-- 
Best regards,

Shaofeng Shi

Reply via email to