1) Kylin has precise count distinct, it has slight limitation compare to the approximate count distinct. https://issues.apache.org/jira/browse/KYLIN-1186
2) Wide data set support is possible, but has to be dealt carefully. Because pre-calculate all the 700 dimensions in all possible ways is not feasible, in-depth cube tuning is mandatory. That requires a very good understanding of your query pattern. 3) Kylin does pre-calculation. It is NOT possible to switch between precise / approximate count distinct at query time. On Tue, Aug 2, 2016 at 12:41 PM, Ruslan Dautkhanov <[email protected]> wrote: > Any information on this topic will be highly appreciated. > > Thanks! > > > > > -- > Ruslan Dautkhanov > > On Wed, Jul 27, 2016 at 4:04 PM, Ruslan Dautkhanov <[email protected]> > wrote: > >> Hello, >> >> 1) >> How efficient is Kylin in materializing count distinct in its cubes? >> We're more intrested in exact count distinct. >> >> 2) How effiecient is Kylin for wide datasets? We have around 700 >> dimensions. >> Size of dataset - tens of billions records. >> Is it feasible to run such a workload on, for example, a 10-node Hadoop >> cluster? >> >> 3) (This is a less critical question than the first two ) >> Does Kylin has a session-level setting to switch between approx and exact >> count distinct? >> Like Impala has a session-level setting APPX_COUNT_DISTINCT >> So without changing application queries, users can switch if they're >> intrerested >> in approx or exact counts? >> >> >> Thank you, >> Ruslan Dautkhanov >> > >
