Re: count distinct

Li Yang Thu, 04 Aug 2016 00:54:46 -0700

1) Kylin has precise count distinct, it has slight limitation compare to
the approximate count distinct.
https://issues.apache.org/jira/browse/KYLIN-1186

2) Wide data set support is possible, but has to be dealt carefully.
Because pre-calculate all the 700 dimensions in all possible ways is not
feasible, in-depth cube tuning is mandatory. That requires a very good
understanding of your query pattern.

3) Kylin does pre-calculation. It is NOT possible to switch between precise
/ approximate count distinct at query time.

On Tue, Aug 2, 2016 at 12:41 PM, Ruslan Dautkhanov <[email protected]>
wrote:

> Any information on this topic will be highly appreciated.
>
> Thanks!
>
>
>
>
> --
> Ruslan Dautkhanov
>
> On Wed, Jul 27, 2016 at 4:04 PM, Ruslan Dautkhanov <[email protected]>
> wrote:
>
>> Hello,
>>
>> 1)
>> How efficient is Kylin in materializing count distinct in its cubes?
>> We're more intrested in exact count distinct.
>>
>> 2) How effiecient is Kylin for wide datasets? We have around 700
>> dimensions.
>> Size of dataset - tens of billions records.
>> Is it feasible to run such a workload on, for example, a 10-node Hadoop
>> cluster?
>>
>> 3)  (This is a less critical question than the first two )
>> Does Kylin has a session-level setting to switch between approx and exact
>> count distinct?
>> Like Impala has a session-level setting APPX_COUNT_DISTINCT
>> So without changing application queries, users can switch if they're
>> intrerested
>> in approx or exact counts?
>>
>>
>> Thank you,
>> Ruslan Dautkhanov
>>
>
>

Re: count distinct

Reply via email to