Thanks for the tip James. I didn't know that syntax for doing the count on
a distinct value!
This version is able to end, the first one wasn't able to finish even
giving a huge amount of memory to HBase (the cardinality of SOMEFIELD is
very big indeed).

Thanks a lot,
Flavio

On Fri, Mar 16, 2018 at 5:45 PM, James Taylor <jamestay...@apache.org>
wrote:

> Hi Flavio,
> You'll need to add a secondary index to SOMEFIELD (or SOMEFIELD + VALID)
> to speed that up. You can write it more simply as SELECT COUNT(DISTINCT
> SOMEFIELD) FROM TEST.MYTABLE WHERE VALID AND SOMEFIELD IS NOT NULL.
>
> Otherwise, you'll end up doing a full table scan (and use a fair amount of
> memory depending on the cardinality of SOMEFIELD). The above with a
> secondary index would skip to the distinct values instead.
>
> Thanks,
> James
>
> On Fri, Mar 16, 2018 at 8:30 AM, Flavio Pompermaier <pomperma...@okkam.it>
> wrote:
>
>> Hi to all,
>> I'm running a query like this one on my Phoenix 4.13 (on CDH 5.11.2):
>>
>> SELECT COUNT(*) FROM (
>>   SELECT DISTINCT(SOMEFIELD)
>>   FROM TEST.MYTABLE
>>   WHERE VALID = TRUE AND SOMEFIELD IS NOT NULL
>> )
>>
>> Unfortunately the query timeouts (timeout is 10 min).... Any suggestion
>> about how to tune my installation?
>>
>> Best,
>> Flavi
>>
>

Reply via email to