Hi Hieu,
You could try add the /*+ SERIAL */ hint to see if that has any impact.
Also, have you tried not salting the table? The SALT_BUCKETS value of 128
is pretty high.
For the other issue, do you have a lot of deleted cells? You might try
running a major compaction. You might try adding a secondary index
on "doubleCol" if that's a common query.
Thanks,
James
On Thu, Apr 11, 2019 at 5:44 PM Hieu Nguyen wrote:
> Hi,
>
> I am using Phoenix 4.14-cdh5.11, with sqlline-thin as the client. I am
> seeing strange patterns around SELECT query execution time:
> 1. Increasing the LIMIT past a certain "threshold" results in
> significantly slower execution time.
> 2. Adding just one column (BIGINT) to the SELECT results in significantly
> slower execution time.
>
> This is our schema (names are changed for readability):
> CREATE TABLE "metadata" (
> "pk" VARCHAR PRIMARY KEY
> )
> SALT_BUCKETS = 128,
> COLUMN_ENCODED_BYTES = 0,
> BLOOMFILTER = 'ROWCOL',
> COMPRESSION = 'GZ';
>
> CREATE VIEW "extended" (
> "doubleCol" DOUBLE,
> "intCol" BIGINT,
> "intCol2" BIGINT,
> "intCol3" BIGINT,
> "stringCol" VARCHAR,
> "stringCol2" VARCHAR,
> "stringCol3" VARCHAR,
> "stringCol4" VARCHAR,
> "stringCol5" VARCHAR,
> "stringCol6" VARCHAR,
> "stringCol7" VARCHAR,
> ) AS SELECT * FROM "metadata"
>
> We have other views created that also select from "metadata" that define
> their own columns. Overall, there are 1 million rows in this table, and
> 20k rows match the condition "doubleCol" > 100.
>
> Base query:
> SELECT
> "pk","doubleCol","intCol","intCol2","stringCol","stringCol2","intCol3"
> FROM "templatealldatattype-7d55c5a6-efe3-419d-9bce-9fea7c14f8bc"
> WHERE "doubleCol" > 100
> LIMIT 1
> -> 1.976 seconds
>
> Decreasing LIMIT to 9500 (only 5% decrease in number of rows):
> SELECT
> "pk","doubleCol","intCol","intCol2","stringCol","stringCol2","intCol3"
> FROM "templatealldatattype-7d55c5a6-efe3-419d-9bce-9fea7c14f8bc"
> WHERE "doubleCol" > 100
> LIMIT 9500
> -> 0.409 seconds
>
> Removing "intCol3" from SELECT, keeping LIMIT at 1:
> SELECT "pk","doubleCol","intCol","intCol2","stringCol","stringCol2"
> FROM "templatealldatattype-7d55c5a6-efe3-419d-9bce-9fea7c14f8bc"
> WHERE "doubleCol" > 100
> LIMIT 1
> -> 0.339 seconds
>
> I ran each of these queries a few times in a row. There was small
> variation in execution time, but the 2nd and 3rd queries never were slower
> than the 1st query.
>
> The EXPLAIN plan did not change, except the ROW LIMIT value when
> explaining the 2nd query (9500 instead of 1).
>
> ++-+++
> |PLAN
> | EST_BYTES_READ | EST_ROWS_READ |
> EST_INFO_TS |
>
> ++-+++
> | CLIENT 128-CHUNK 382226 ROWS 314572800 BYTES PARALLEL 128-WAY ROUND
> ROBIN FULL SCAN OVER metadata | 314572800 | 382226 |
> 1554973434637 |
> | SERVER FILTER BY "doubleCol" > 100.0
>| 314572800 | 382226 |
> 1554973434637 |
> | SERVER 1 ROW LIMIT
>| 314572800 | 382226 |
> 1554973434637 |
> | CLIENT 1 ROW LIMIT
>| 314572800 | 382226 |
> 1554973434637 |
>
> ++-+++
>
> I tried adding the SEEK_TO_COLUMN and NO_SEEK_TO_COLUMN hints as suggested
> in a similar thread (
> https://lists.apache.org/thread.html/4ef8384ecd31f30fdaf5837e3abc613142426d899e916c7aae4a46d4@%3Cuser.phoenix.apache.org%3E),
> but they had no effect.
>
> Any pointers to how we can investigate the 4-5x slowdown when increasing
> LIMIT by only ~5% or when selecting just one more BIGINT column? Could we
> have exceeded some threshold in the result size that caused the query to
> perform a lot slower for seemingly small changes in the query?
>
> Thanks,
> -Hieu
>