Ya . I agree with Mujtaba here. We can dig in more to know the reason why your queries are getting slower. I believe the query is getting executed as number of smaller chunks than optimal way may be because of the data size/width.?
Regards Ram From: Mujtaba Chohan [mailto:mujt...@apache.org] Sent: Friday, February 13, 2015 1:50 AM To: user@phoenix.apache.org Subject: Re: Update statistics made query 2-3x slower Constantin - If possible can you please share your schema, approx. row/columns width, number of region servers in your cluster plus their heap size, HBase/Phoenix version and any default property overrides so we can identify why stats are slowing things down in your case. Thanks, Mujtaba On Thu, Feb 12, 2015 at 12:56 AM, Ciureanu, Constantin (GfK) <constantin.ciure...@gfk.com<mailto:constantin.ciure...@gfk.com>> wrote: It worked! Without stats it’s again faster (2-3x times) – but I do understand that all other normal queries might benefit from the stats. Thank you Mujtaba for the info, Thank you Vasudevan for the explanations, I already used HBase and I agree it’s hard to have a counter for the table rows (especially if the tombstones for deleted rows are still there – ie. not compacted yet). Constantin From: Mujtaba Chohan [mailto:mujt...@apache.org<mailto:mujt...@apache.org>] Sent: Wednesday, February 11, 2015 8:54 PM To: user@phoenix.apache.org<mailto:user@phoenix.apache.org> Subject: Re: Update statistics made query 2-3x slower To compare performance without stats, try deleting related rows from SYSTEM.STATS or an easier way, just truncate SYSTEM.STATS table from HBase shell and restart your region servers. //mujtaba On Wed, Feb 11, 2015 at 10:29 AM, Vasudevan, Ramkrishna S <ramkrishna.s.vasude...@intel.com<mailto:ramkrishna.s.vasude...@intel.com>> wrote: Hi Constantin Before I could explain on the slowness part let me answer your 2nd question, Phoenix is on top of HBase. HBase is a distributed NoSQL DB. So the data that is residing inside logical entities called regions are spread across different nodes (region servers). There is nothing like a table that is in one location where you can keep updating the count of rows that is getting inserted. Which means that when you need count(*) you may have to aggregate the count from every region distributed across region servers. So in other words a table is not a single entity it is a collection of regions. Coming to your slowness in query, the update statistics query allows you to parallelize the query into logical chunks on a single region. Suppose there are 100K rows in a region the statistics collected would allow you to run a query parallely for eg say execute parallely on 10 equal chunks of 10000 rows within that region. Have you modified any of the parameters related to statistics like this one ‘phoenix.stats.guidepost.width’. Regards Ram From: Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.com<mailto:constantin.ciure...@gfk.com>] Sent: Wednesday, February 11, 2015 2:51 PM To: user@phoenix.apache.org<mailto:user@phoenix.apache.org> Subject: Update statistics made query 2-3x slower Hello all, 1. Is there a good explanation why updating the statistics: update statistics tableX; made this query 2x times slower? (it was 27 seconds before, now it’s somewhere between 60 – 90 seconds) select count(*) from tableX; +------------------------------------------+ | COUNT(1) | +------------------------------------------+ | 5786227 | +------------------------------------------+ 1 row selected (62.718 seconds) (If possible ☺ ) how can I “drop” those statistics? 2. Why there is nothing (like a counter / attribute for the table) to obtain the number of rows in one table fast? Thank you, Constantin