If you are filtering on leading part of row key which is highly selective
then you would be better off not using salt buckets all together rather
than having 100 parallel scan and block reads in your case. In our test
with billion+ row table, non-salted table offer much better performance
since it ends up reading fewer blocks from a single region.

//mujtaba

On Mon, Feb 1, 2016 at 1:16 PM, Serega Sheypak <serega.shey...@gmail.com>
wrote:

> Hi, here is my table DDL:
> CREATE TABLE IF NOT EXISTS id_ref
> (
>    id1     VARCHAR   NOT NULL,
>    value1  VARCHAR,
>
>    id2     VARCHAR NOT NULL,
>    value2  VARCHAR
>    CONSTRAINT id_ref_pk  PRIMARY KEY (id1, id2)
> )IMMUTABLE_ROWS=true,SALT_BUCKETS=100, VERSIONS=1, TTL=691200
>
> I'm trying to analyze result of explain:
>
> explain select id1, value1, id2, value2 from id_ref where id1 = 'xxx'
>
> . . . . . . . . . . . . . . . . . . . . . . .> ;
>
> *+------------------------------------------+*
>
> *| **                  PLAN                  ** |*
>
> *+------------------------------------------+*
>
> *| *CLIENT 100-CHUNK PARALLEL 100-WAY RANGE SCAN OVER ID_REF
> [0,'1fd5c44a75549162ca1602dda55f6d129cab61a6']* |*
>
> *| *CLIENT MERGE SORT                       * |*
>
> *+------------------------------------------+*
>
>
> What happens? Client spawns 100 parallel scans (because of bucketing) and
> waits for 100 responses?
>
> Is it effective? What is the right way to optimize such query pattern:
> "select by first part of primary key"? Reduce the amount of buckets? I get
> exeption a while after restarting app:
>
>
> *Task org.apache.phoenix.job.JobManager$JobFutureTask@60a40644 rejected
> from org.apache.phoenix.job.JobManager$1@58e3fe9aRunning, pool size = 128,
> active threads = 121, queued tasks = 5000, completed tasks = 2629565*
>
>
>

Reply via email to