Any insight here..?
On Fri, Mar 16, 2018 at 7:23 PM, Flavio Pompermaier
wrote:
> Thanks everybody for the help.
> I'm just curios to understand why the first query didn't complete. The
> query is quite complex but the available memory should be more than enough.
>
>
Thanks everybody for the help.
I'm just curios to understand why the first query didn't complete. The
query is quite complex but the available memory should be more than enough.
MYTABLE has 222,547,674 rows. On Parquet it takes 15 GB, while uncompressed
(in memory during the download of the data)
A less resource intensive approach would be to use approx count distinct -
https://phoenix.apache.org/language/functions.html#approx_count_distinct
You would still need the secondary index though, as James suggested, if you
want it to run fast.
On Fri, Mar 16, 2018 at 10:26 AM Flavio Pompermaier
Thanks for the tip James. I didn't know that syntax for doing the count on
a distinct value!
This version is able to end, the first one wasn't able to finish even
giving a huge amount of memory to HBase (the cardinality of SOMEFIELD is
very big indeed).
Thanks a lot,
Flavio
On Fri, Mar 16, 2018
Hi Flavio,
You'll need to add a secondary index to SOMEFIELD (or SOMEFIELD + VALID) to
speed that up. You can write it more simply as SELECT COUNT(DISTINCT
SOMEFIELD) FROM TEST.MYTABLE WHERE VALID AND SOMEFIELD IS NOT NULL.
Otherwise, you'll end up doing a full table scan (and use a fair amount