Hi Hieu, You could try add the /*+ SERIAL */ hint to see if that has any impact. Also, have you tried not salting the table? The SALT_BUCKETS value of 128 is pretty high.
For the other issue, do you have a lot of deleted cells? You might try running a major compaction. You might try adding a secondary index on "doubleCol" if that's a common query. Thanks, James On Thu, Apr 11, 2019 at 5:44 PM Hieu Nguyen <h...@box.com> wrote: > Hi, > > I am using Phoenix 4.14-cdh5.11, with sqlline-thin as the client. I am > seeing strange patterns around SELECT query execution time: > 1. Increasing the LIMIT past a certain "threshold" results in > significantly slower execution time. > 2. Adding just one column (BIGINT) to the SELECT results in significantly > slower execution time. > > This is our schema (names are changed for readability): > CREATE TABLE "metadata" ( > "pk" VARCHAR PRIMARY KEY > ) > SALT_BUCKETS = 128, > COLUMN_ENCODED_BYTES = 0, > BLOOMFILTER = 'ROWCOL', > COMPRESSION = 'GZ'; > > CREATE VIEW "extended" ( > "doubleCol" DOUBLE, > "intCol" BIGINT, > "intCol2" BIGINT, > "intCol3" BIGINT, > "stringCol" VARCHAR, > "stringCol2" VARCHAR, > "stringCol3" VARCHAR, > "stringCol4" VARCHAR, > "stringCol5" VARCHAR, > "stringCol6" VARCHAR, > "stringCol7" VARCHAR, > ) AS SELECT * FROM "metadata" > > We have other views created that also select from "metadata" that define > their own columns. Overall, there are 1 million rows in this table, and > 20k rows match the condition "doubleCol" > 100. > > Base query: > SELECT > "pk","doubleCol","intCol","intCol2","stringCol","stringCol2","intCol3" > FROM "templatealldatattype-7d55c5a6-efe3-419d-9bce-9fea7c14f8bc" > WHERE "doubleCol" > 100 > LIMIT 10000 > -> 1.976 seconds > > Decreasing LIMIT to 9500 (only 5% decrease in number of rows): > SELECT > "pk","doubleCol","intCol","intCol2","stringCol","stringCol2","intCol3" > FROM "templatealldatattype-7d55c5a6-efe3-419d-9bce-9fea7c14f8bc" > WHERE "doubleCol" > 100 > LIMIT 9500 > -> 0.409 seconds > > Removing "intCol3" from SELECT, keeping LIMIT at 10000: > SELECT "pk","doubleCol","intCol","intCol2","stringCol","stringCol2" > FROM "templatealldatattype-7d55c5a6-efe3-419d-9bce-9fea7c14f8bc" > WHERE "doubleCol" > 100 > LIMIT 10000 > -> 0.339 seconds > > I ran each of these queries a few times in a row. There was small > variation in execution time, but the 2nd and 3rd queries never were slower > than the 1st query. > > The EXPLAIN plan did not change, except the ROW LIMIT value when > explaining the 2nd query (9500 instead of 10000). > > +----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | > EST_INFO_TS | > > +----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+ > | CLIENT 128-CHUNK 382226 ROWS 314572800 BYTES PARALLEL 128-WAY ROUND > ROBIN FULL SCAN OVER metadata | 314572800 | 382226 | > 1554973434637 | > | SERVER FILTER BY "doubleCol" > 100.0 > | 314572800 | 382226 | > 1554973434637 | > | SERVER 10000 ROW LIMIT > | 314572800 | 382226 | > 1554973434637 | > | CLIENT 10000 ROW LIMIT > | 314572800 | 382226 | > 1554973434637 | > > +----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+ > > I tried adding the SEEK_TO_COLUMN and NO_SEEK_TO_COLUMN hints as suggested > in a similar thread ( > https://lists.apache.org/thread.html/4ef8384ecd31f30fdaf5837e3abc613142426d899e916c7aae4a46d4@%3Cuser.phoenix.apache.org%3E), > but they had no effect. > > Any pointers to how we can investigate the 4-5x slowdown when increasing > LIMIT by only ~5% or when selecting just one more BIGINT column? Could we > have exceeded some threshold in the result size that caused the query to > perform a lot slower for seemingly small changes in the query? > > Thanks, > -Hieu >