Hi Constantin

Before I could explain on the slowness part let me answer your 2nd question,

Phoenix is on top of HBase. HBase is a distributed NoSQL DB. So the data that 
is residing inside logical entities called regions are spread across different 
nodes (region servers).  There is nothing like a table that is in one location 
where you can keep updating the count of rows that is getting inserted.

Which means that when you need  count(*) you may have to aggregate the count 
from every region distributed across region servers. So in other words a table 
is not a single entity it is a collection of regions.

Coming to your slowness in query, the update statistics query allows you to 
parallelize the query into logical chunks on a single region.  Suppose there 
are 100K rows in a region the statistics collected would allow you to run a 
query parallely for eg say execute parallely on 10 equal chunks of 10000 rows 
within that region.

Have you modified any of the parameters related to statistics like this one 
‘phoenix.stats.guidepost.width’.


Regards
Ram
From: Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.com]
Sent: Wednesday, February 11, 2015 2:51 PM
To: user@phoenix.apache.org
Subject: Update statistics made query 2-3x slower

Hello all,


1.     Is there a good explanation why updating the statistics:
update statistics tableX;

made this query 2x times slower?   (it was 27 seconds before, now it’s 
somewhere between 60 – 90 seconds)
select count(*) from tableX;
+------------------------------------------+
|                 COUNT(1)                 |
+------------------------------------------+
| 5786227                                  |
+------------------------------------------+
1 row selected (62.718 seconds)

(If possible ☺ ) how can I “drop” those statistics?

2. Why there is nothing (like a counter / attribute for the table) to obtain 
the number of rows in one table fast?

Thank you,
   Constantin

Reply via email to