Re: Large differences in query execution time for similar queries

James Taylor Wed, 17 Apr 2019 08:31:29 -0700

Hi Hieu,
You could try add the /*+ SERIAL */ hint to see if that has any impact.
Also, have you tried not salting the table? The SALT_BUCKETS value of 128
is pretty high.


For the other issue, do you have a lot of deleted cells? You might try
running a major compaction. You might try adding a secondary index
on "doubleCol" if that's a common query.

Thanks,
James

On Thu, Apr 11, 2019 at 5:44 PM Hieu Nguyen <h...@box.com> wrote:

> Hi,
>
> I am using Phoenix 4.14-cdh5.11, with sqlline-thin as the client.  I am
> seeing strange patterns around SELECT query execution time:
> 1. Increasing the LIMIT past a certain "threshold" results in
> significantly slower execution time.
> 2. Adding just one column (BIGINT) to the SELECT results in significantly
> slower execution time.
>
> This is our schema (names are changed for readability):
> CREATE TABLE "metadata" (
>   "pk"                           VARCHAR PRIMARY KEY
> )
> SALT_BUCKETS = 128,
> COLUMN_ENCODED_BYTES = 0,
> BLOOMFILTER = 'ROWCOL',
> COMPRESSION = 'GZ';
>
> CREATE VIEW "extended" (
> "doubleCol" DOUBLE,
> "intCol" BIGINT,
> "intCol2" BIGINT,
> "intCol3" BIGINT,
> "stringCol" VARCHAR,
> "stringCol2" VARCHAR,
> "stringCol3" VARCHAR,
> "stringCol4" VARCHAR,
> "stringCol5" VARCHAR,
> "stringCol6" VARCHAR,
> "stringCol7" VARCHAR,
> ) AS SELECT * FROM "metadata"
>
> We have other views created that also select from "metadata" that define
> their own columns.  Overall, there are 1 million rows in this table, and
> 20k rows match the condition "doubleCol" > 100.
>
> Base query:
> SELECT
> "pk","doubleCol","intCol","intCol2","stringCol","stringCol2","intCol3"
> FROM "templatealldatattype-7d55c5a6-efe3-419d-9bce-9fea7c14f8bc"
> WHERE "doubleCol" > 100
> LIMIT 10000
> -> 1.976 seconds
>
> Decreasing LIMIT to 9500 (only 5% decrease in number of rows):
> SELECT
> "pk","doubleCol","intCol","intCol2","stringCol","stringCol2","intCol3"
> FROM "templatealldatattype-7d55c5a6-efe3-419d-9bce-9fea7c14f8bc"
> WHERE "doubleCol" > 100
> LIMIT 9500
> -> 0.409 seconds
>
> Removing "intCol3" from SELECT, keeping LIMIT at 10000:
> SELECT "pk","doubleCol","intCol","intCol2","stringCol","stringCol2"
> FROM "templatealldatattype-7d55c5a6-efe3-419d-9bce-9fea7c14f8bc"
> WHERE "doubleCol" > 100
> LIMIT 10000
> -> 0.339 seconds
>
> I ran each of these queries a few times in a row.  There was small
> variation in execution time, but the 2nd and 3rd queries never were slower
> than the 1st query.
>
> The EXPLAIN plan did not change, except the ROW LIMIT value when
> explaining the 2nd query (9500 instead of 10000).
>
> +----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> |                                                PLAN
>                           | EST_BYTES_READ  | EST_ROWS_READ  |
> EST_INFO_TS   |
>
> +----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> | CLIENT 128-CHUNK 382226 ROWS 314572800 BYTES PARALLEL 128-WAY ROUND
> ROBIN FULL SCAN OVER metadata  | 314572800       | 382226         |
> 1554973434637  |
> |     SERVER FILTER BY "doubleCol" > 100.0
>                            | 314572800       | 382226         |
> 1554973434637  |
> |     SERVER 10000 ROW LIMIT
>                            | 314572800       | 382226         |
> 1554973434637  |
> | CLIENT 10000 ROW LIMIT
>                            | 314572800       | 382226         |
> 1554973434637  |
>
> +----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>
> I tried adding the SEEK_TO_COLUMN and NO_SEEK_TO_COLUMN hints as suggested
> in a similar thread (
> https://lists.apache.org/thread.html/4ef8384ecd31f30fdaf5837e3abc613142426d899e916c7aae4a46d4@%3Cuser.phoenix.apache.org%3E),
> but they had no effect.
>
> Any pointers to how we can investigate the 4-5x slowdown when increasing
> LIMIT by only ~5% or when selecting just one more BIGINT column?  Could we
> have exceeded some threshold in the result size that caused the query to
> perform a lot slower for seemingly small changes in the query?
>
> Thanks,
> -Hieu
>

Re: Large differences in query execution time for similar queries

Reply via email to