Thank you all. 1. The other query is:
select keyword, sum(cnt) as count from my_table where regymdt >= to_timestamp('2017-04-03 00:00') and regymdt <= to_timestamp('2017-04-03 23:00') and *obj_id in ( obj_ids comes here... )* and obj_grp = '100' group by keyword order by count desc; This query is pretty fast because of the first two PKs. The reason obj_id comes at second PK is, we first assumed 'select with obj_id' will be called much more often, and then 'select with obj_grp' is also required. 2. 4.8.0-HBase-1.1 version is used. I'm testing on HBase server for our product, so that upgrading looks not easy. : ( If there definitely exist index bug fixes, I could try though. 3. With Jonathan's reply and optional PK, this schema looks worthy to try: create table my_table ( obj_grp varchar(50), obj_id varchar(50) not null, regymdt timestamp not null, keyword varchar not null, cnt integer, post_tp varchar(20) constraint pk primary key (obj_grp, obj_id, regymdt, keyword) ) Thanks, NaHeon 2017-04-05 15:17 GMT+09:00 Jonathan Leech <jonat...@gmail.com>: > Also, your schema as defined seems to have a pretty common hbase > anti-pattern, using a steadily increasing value as the first part of the > rowkey (assuming your data is also loaded in order of time). This will lead > to region hotspotting on the load, region splitting, etc. Additionally, > your queries may not take advantage of the parallelism of the cluster if > they end up on a small number of regions. If the queries are more likely to > be on the most recent data, as is often the case, you'll end up with a > single server doing the bulk the work in the cluster. > > - Jonathan > > On Apr 4, 2017, at 10:10 PM, James Taylor <jamestay...@apache.org> wrote: > > What other queries do you want to be fast? What version of Phoenix are you > using? Have you seen our new Tuning Guide [1]? > > You could try moving OBJ_ID to end of PK and adding OBJ_GRP as the > leading PK column (it's fine to include it in PK even if it's optional). > > Your index should be used, though. Can you try doing an explain on the > query with 4.10 and file a JIRA if the index isn't being used? > > Thanks, > James > > [1] http://phoenix.apache.org/tuning_guide.html > > On Tue, Apr 4, 2017 at 8:01 PM NaHeon Kim <honey.and...@gmail.com> wrote: > >> Hi all, >> >> My team has a Phoenix table containing over 30,000,000 rows and try to >> speed up its query performance. >> It'll be perfect if all queries could be done within 1~2 seconds. >> >> ----- table schema ----- >> >> CREATE TABLE MY_TABLE ( >> REGYMDT timestamp not null, >> OBJ_ID varchar(50) not null, >> KEYWORD varchar not null, >> OBJ_GRP varchar(50), >> CNT integer, >> POST_TP varchar(20), >> CONSTRAINT PK PRIMARY KEY (REGYMDT, OBJ_ID, KEYWORD) >> ) IMMUTABLE_ROWS = true; >> >> create index objgrp_reg_kwd on my_table(obj_grp, regymdt, keyword) >> include (cnt); >> >> ----- tuning-needed query ----- >> >> select keyword, sum(cnt) as count >> from my_table >> where regymdt >= to_timestamp('2017-04-03 00:00') and regymdt <= >> to_timestamp('2017-04-03 23:00') >> and obj_grp = '100' >> group by keyword >> order by count desc; >> >> >> The reason why obj_grp isn't PK is that it's optional value. >> I could use obj_grp's default value and propagate it to PK, but still not >> think it's desired pattern. >> >> The query above takes more than 20 seconds! >> It does *not use* objgrp_reg_kwd index, doing just regymdt range scan. >> >> Thanks in advance! >> NaHeon >> >>