Hello, I am trying to use Kylin to perform distinct count over a running window. I tried two different methods, but Kylin failed to execute both. I can reproduce the problem using following queries.
======== Query 1 ========
SELECT
t.part_dt,
( SELECT
count( distinct buyer_id )
FROM
KYLIN_SALES
WHERE part_dt BETWEEN TIMESTAMPADD ( day, -3, t.part_dt ) AND t.part_dt
) as total
FROM ( values
( date '2012-01-01' ),
( date '2012-01-02' ),
( date '2012-01-03' ),
( date '2012-01-04' ),
( date '2012-01-05' ),
( date '2012-01-06' )
) as t( part_dt )
======== Query 2 ========
SELECT
t.part_dt ,
count(distinct kylin_sales.buyer_id) over (
partition by t.part_dt
order by t.part_dt
range interval '3' day preceding )
as buyers
FROM
( values
( date '2012-02-01' ),
( date '2012-02-02' ),
( date '2012-02-03' ),
( date '2012-02-04' ),
( date '2012-02-05' ),
( date '2012-02-06' )
) as t( part_dt )
LEFT OUTER JOIN KYLIN_SALES
ON t.part_dt = KYLIN_SALES.PART_DT
I have added following model and cubes as follows:
Model:
Dimensions: KYLIN_SALES: {"PART_DT", "SELLER_ID"}
Measures: KYLIN_SALES.BUYER_ID
Cube:
Dimensions: PART_DT, SELLER_ID
Measures: COUNT_DISTINCT(exact): Value:KYLIN_SALES.BUYER_ID
Rowkeys: SELLER_ID (dict), PART_DT (date)
This behavior occurred on both CDH-5.8.2 (using package for CDH 5.7)
and FusionInsight V100R002C80SPC200 (using binary package for HBase 1.x).
Currently I don't have a sandbox do reproduce this behavior.
The equivalent queries of the first query on Postgres and MySQL can be found
at following links. Because of the different dialects supported by the RDBMS
engines, the queries are modified correspondingly from the Kylin dialect.
http://sqlfiddle.com/#!17/1ded8/2 and http://sqlfiddle.com/#!9/4f4afe/8
Second query does not have equivalent ones in Postgres or MySQL since both
do not support distinct in window function.
Corresponding logs during execution of the queries are attected.
Is this behavior expected? If so, is there a workaround for this case?
Best regards.
Lifan Su
kylin.errlog
Description: Binary data
kylin.errlog2
Description: Binary data
