Re: Ask performance advice

Jonathan Leech Tue, 04 Apr 2017 23:18:08 -0700

Also, your schema as defined seems to have a pretty common hbase anti-pattern, 
using a steadily increasing value as the first part of the rowkey (assuming 
your data is also loaded in order of time). This will lead to region 
hotspotting on the load, region splitting, etc. Additionally, your queries may 
not take advantage of the parallelism of the cluster if they end up on a small 
number of regions. If the queries are more likely to be on the most recent 
data, as is often the case, you'll end up with a single server doing the bulk 
the work in the cluster.


- Jonathan

> On Apr 4, 2017, at 10:10 PM, James Taylor <jamestay...@apache.org> wrote:
> 
> What other queries do you want to be fast? What version of Phoenix are you 
> using? Have you seen our new Tuning Guide [1]? 
> 
> You could try moving OBJ_ID to end of PK and adding OBJ_GRP as the leading PK 
> column (it's fine to include it in PK even if it's optional).
> 
> Your index should be used, though. Can you try doing an explain on the query 
> with 4.10 and file a  JIRA if the index isn't being used?
> 
> Thanks,
> James
> 
> [1] http://phoenix.apache.org/tuning_guide.html
> 
>> On Tue, Apr 4, 2017 at 8:01 PM NaHeon Kim <honey.and...@gmail.com> wrote:
>> Hi all,
>> 
>> My team has a Phoenix table containing over 30,000,000 rows and try to speed 
>> up its query performance.
>> It'll be perfect if all queries could be done within 1~2 seconds.
>> 
>> ----- table schema -----
>> 
>> CREATE TABLE MY_TABLE (
>>    REGYMDT timestamp not null,
>>    OBJ_ID varchar(50) not null,
>>    KEYWORD varchar not null,
>>    OBJ_GRP  varchar(50),
>>    CNT integer,
>>    POST_TP varchar(20),
>>    CONSTRAINT PK PRIMARY KEY (REGYMDT, OBJ_ID, KEYWORD)
>> ) IMMUTABLE_ROWS = true;
>> 
>> create index objgrp_reg_kwd on my_table(obj_grp, regymdt, keyword) include 
>> (cnt);
>> 
>> ----- tuning-needed query -----
>> 
>> select keyword, sum(cnt) as count
>> from my_table
>> where regymdt >= to_timestamp('2017-04-03 00:00') and regymdt <= 
>> to_timestamp('2017-04-03 23:00')
>> and obj_grp = '100'
>> group by keyword
>> order by count desc;
>> 
>> 
>> The reason why obj_grp isn't PK is that it's optional value.
>> I could use obj_grp's default value and propagate it to PK, but still not 
>> think it's desired pattern.
>> 
>> The query above takes more than 20 seconds!
>> It does not use objgrp_reg_kwd index, doing just regymdt range scan.
>> 
>> Thanks in advance!
>> NaHeon
>>

Re: Ask performance advice

Reply via email to