Re: Slow query help

2018-03-19 Thread Flavio Pompermaier
Any insight here..? On Fri, Mar 16, 2018 at 7:23 PM, Flavio Pompermaier wrote: > Thanks everybody for the help. > I'm just curios to understand why the first query didn't complete. The > query is quite complex but the available memory should be more than enough. > >

Re: Slow query help

2018-03-16 Thread Flavio Pompermaier
Thanks everybody for the help. I'm just curios to understand why the first query didn't complete. The query is quite complex but the available memory should be more than enough. MYTABLE has 222,547,674 rows. On Parquet it takes 15 GB, while uncompressed (in memory during the download of the data)

Re: Slow query help

2018-03-16 Thread Samarth Jain
A less resource intensive approach would be to use approx count distinct - https://phoenix.apache.org/language/functions.html#approx_count_distinct You would still need the secondary index though, as James suggested, if you want it to run fast. On Fri, Mar 16, 2018 at 10:26 AM Flavio Pompermaier

Re: Slow query help

2018-03-16 Thread Flavio Pompermaier
Thanks for the tip James. I didn't know that syntax for doing the count on a distinct value! This version is able to end, the first one wasn't able to finish even giving a huge amount of memory to HBase (the cardinality of SOMEFIELD is very big indeed). Thanks a lot, Flavio On Fri, Mar 16, 2018

Re: Slow query help

2018-03-16 Thread James Taylor
Hi Flavio, You'll need to add a secondary index to SOMEFIELD (or SOMEFIELD + VALID) to speed that up. You can write it more simply as SELECT COUNT(DISTINCT SOMEFIELD) FROM TEST.MYTABLE WHERE VALID AND SOMEFIELD IS NOT NULL. Otherwise, you'll end up doing a full table scan (and use a fair amount