Thanks for the tip James. I didn't know that syntax for doing the count on a distinct value! This version is able to end, the first one wasn't able to finish even giving a huge amount of memory to HBase (the cardinality of SOMEFIELD is very big indeed).
Thanks a lot, Flavio On Fri, Mar 16, 2018 at 5:45 PM, James Taylor <jamestay...@apache.org> wrote: > Hi Flavio, > You'll need to add a secondary index to SOMEFIELD (or SOMEFIELD + VALID) > to speed that up. You can write it more simply as SELECT COUNT(DISTINCT > SOMEFIELD) FROM TEST.MYTABLE WHERE VALID AND SOMEFIELD IS NOT NULL. > > Otherwise, you'll end up doing a full table scan (and use a fair amount of > memory depending on the cardinality of SOMEFIELD). The above with a > secondary index would skip to the distinct values instead. > > Thanks, > James > > On Fri, Mar 16, 2018 at 8:30 AM, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > >> Hi to all, >> I'm running a query like this one on my Phoenix 4.13 (on CDH 5.11.2): >> >> SELECT COUNT(*) FROM ( >> SELECT DISTINCT(SOMEFIELD) >> FROM TEST.MYTABLE >> WHERE VALID = TRUE AND SOMEFIELD IS NOT NULL >> ) >> >> Unfortunately the query timeouts (timeout is 10 min).... Any suggestion >> about how to tune my installation? >> >> Best, >> Flavi >> >