If you are filtering on leading part of row key which is highly selective then you would be better off not using salt buckets all together rather than having 100 parallel scan and block reads in your case. In our test with billion+ row table, non-salted table offer much better performance since it ends up reading fewer blocks from a single region.
//mujtaba On Mon, Feb 1, 2016 at 1:16 PM, Serega Sheypak <serega.shey...@gmail.com> wrote: > Hi, here is my table DDL: > CREATE TABLE IF NOT EXISTS id_ref > ( > id1 VARCHAR NOT NULL, > value1 VARCHAR, > > id2 VARCHAR NOT NULL, > value2 VARCHAR > CONSTRAINT id_ref_pk PRIMARY KEY (id1, id2) > )IMMUTABLE_ROWS=true,SALT_BUCKETS=100, VERSIONS=1, TTL=691200 > > I'm trying to analyze result of explain: > > explain select id1, value1, id2, value2 from id_ref where id1 = 'xxx' > > . . . . . . . . . . . . . . . . . . . . . . .> ; > > *+------------------------------------------+* > > *| ** PLAN ** |* > > *+------------------------------------------+* > > *| *CLIENT 100-CHUNK PARALLEL 100-WAY RANGE SCAN OVER ID_REF > [0,'1fd5c44a75549162ca1602dda55f6d129cab61a6']* |* > > *| *CLIENT MERGE SORT * |* > > *+------------------------------------------+* > > > What happens? Client spawns 100 parallel scans (because of bucketing) and > waits for 100 responses? > > Is it effective? What is the right way to optimize such query pattern: > "select by first part of primary key"? Reduce the amount of buckets? I get > exeption a while after restarting app: > > > *Task org.apache.phoenix.job.JobManager$JobFutureTask@60a40644 rejected > from org.apache.phoenix.job.JobManager$1@58e3fe9aRunning, pool size = 128, > active threads = 121, queued tasks = 5000, completed tasks = 2629565* > > >