Re: Speeding Up Group By Queries

2017-05-25 Thread cmbendre
of machines the only choice we have if we want to support sub-second group by queries ? Thanks Chaitanya -- View this message in context: http://apache-phoenix-user-list.1124778.n5.nabble.com/Speeding-Up-Group-By-Queries-tp1434p3589.html Sent from the Apache Phoenix User List mailing list archive

Re: Speeding Up Group By Queries

2016-04-12 Thread James Taylor
Will the 10-100 million records get larger? Or is that just for a single user and you plan to have many users? If not, have you considered using a standard RDBMS like MySQL, Postgres, or Maria DB? Thanks, James On Tuesday, April 12, 2016, Amit Shah wrote: > Thanks James for

Re: Speeding Up Group By Queries

2016-04-12 Thread Amit Shah
Thanks James for the reply. Please see my comments below Secondary indexes[1] on the non-primary key columns is the way to improve > performance for these case. Take a look at this[2] presentation for more > detail. I have done a brief reading on secondary indexes and I will go through the

Re: Speeding Up Group By Queries

2016-04-11 Thread James Taylor
Hi Amit, If a query doesn't filter on the primary key columns, the entire table must be scanned (hence it'll be slower). Secondary indexes[1] on the non-primary key columns is the way to improve performance for these case. Take a look at this[2] presentation for more detail. Also, a 3 node

Re: Speeding Up Group By Queries

2016-04-11 Thread Amit Shah
Hi Mujtaba, I observed that if the where-clause and group-by queries are applied on the primary key columns, then they are superfast (~ 200 ms). This is not the case with queries that have non-primary key columns in the where clause and group by queries. I tried configuring the bucket cache but

Re: Speeding Up Group By Queries

2016-03-29 Thread Amit Shah
Hi Mujtaba, Could these improvements be because of region distribution across region servers? Along with the optimizations you had suggested I had also used hbase-region-inspector to move regions evenly across the region server. Below is the table schema for the TRANSACTIONS table CREATE TABLE

Re: Speeding Up Group By Queries

2016-03-29 Thread Mujtaba Chohan
Optimization did help somewhat but not to the extent I was expecting. See chart below. [image: Inline image 1] Can you share your table schema so I can experiment with it? Another thing you can try is reducing guidepost width for this table by executing

Re: Speeding Up Group By Queries

2016-03-29 Thread Amit Shah
Hi Mujtaba, I did try the two optimization techniques by recreating the table and then loading it again with 10 mil records. They do not seem to help out much in terms of the timings. Kindly find the phoenix log file attached. Let me know if I am missing anything. Thanks, Amit. On Mon, Mar 28,

Re: Speeding Up Group By Queries

2016-03-28 Thread Mujtaba Chohan
Here's the chart for time it takes for each of the parallel scans after split. On RS where data is not read from disk scan gets back in ~20 secs but for the RS which has 6 it's ~45 secs. [image: Inline image 2] Yes I see disk reads with 607 ios/second on the hosts that stores 6 regions > Two

Re: Speeding Up Group By Queries

2016-03-25 Thread James Taylor
Hi Amit, Using 4.7.0-HBase-1.1 release, I see the index being used for that query (see below). An index will help some, as the aggregation can be done in place as the scan over the index is occurring (as opposed to having to hold the distinct values found during grouping in memory per chunk of

Re: Speeding Up Group By Queries

2016-03-25 Thread Mujtaba Chohan
That seems excessively slow for 10M rows which should be in order of few seconds at most without index. 1. How wide is your table 2. How many region servers is your data distributed on and what's the heap size? 3. Do you see lots of disk I/O on region servers during aggregation? 4. Can you try

Speeding Up Group By Queries

2016-03-25 Thread Amit Shah
Hi, I am trying to evaluate apache hbase (version 1.0.0) and phoenix (version 4.6) deployed through cloudera for our OLAP workfload. I have a table that has 10 mil rows. I try to execute the below roll up query and it takes around 2 mins to return 1,850 rows. SELECT SUM(UNIT_CNT_SOLD),