A less resource intensive approach would be to use approx count distinct - https://phoenix.apache.org/language/functions.html#approx_count_distinct
You would still need the secondary index though, as James suggested, if you want it to run fast. On Fri, Mar 16, 2018 at 10:26 AM Flavio Pompermaier <pomperma...@okkam.it> wrote: > Thanks for the tip James. I didn't know that syntax for doing the count on > a distinct value! > This version is able to end, the first one wasn't able to finish even > giving a huge amount of memory to HBase (the cardinality of SOMEFIELD is > very big indeed). > > Thanks a lot, > Flavio > > On Fri, Mar 16, 2018 at 5:45 PM, James Taylor <jamestay...@apache.org> > wrote: > >> Hi Flavio, >> You'll need to add a secondary index to SOMEFIELD (or SOMEFIELD + VALID) >> to speed that up. You can write it more simply as SELECT COUNT(DISTINCT >> SOMEFIELD) FROM TEST.MYTABLE WHERE VALID AND SOMEFIELD IS NOT NULL. >> >> Otherwise, you'll end up doing a full table scan (and use a fair amount >> of memory depending on the cardinality of SOMEFIELD). The above with a >> secondary index would skip to the distinct values instead. >> >> Thanks, >> James >> >> On Fri, Mar 16, 2018 at 8:30 AM, Flavio Pompermaier <pomperma...@okkam.it >> > wrote: >> >>> Hi to all, >>> I'm running a query like this one on my Phoenix 4.13 (on CDH 5.11.2): >>> >>> SELECT COUNT(*) FROM ( >>> SELECT DISTINCT(SOMEFIELD) >>> FROM TEST.MYTABLE >>> WHERE VALID = TRUE AND SOMEFIELD IS NOT NULL >>> ) >>> >>> Unfortunately the query timeouts (timeout is 10 min).... Any suggestion >>> about how to tune my installation? >>> >>> Best, >>> Flavi >>> >> >