On Mar 26, 2018, at 10:59 PM, Justin Pryzby wrote:
> On Mon, Mar 26, 2018 at 12:43:02PM +0300, Konstantin Knizhnik wrote:
>> Hi, hackers.
>>
>> I was faced with the following bad performance use case with Postgres: there
>> is a huge append-only table with serial key (ID)
>> which is permanently appended using multithreaded pgloader.
>
> I think this could be similar to the problem I reported here:
> https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com#20160524173914.ga11...@telsasoft.com
>
> The analysis at the time was that, due to "repeated keys", a btree index on
> the
> timestamp column had non-consecutive heap TIDs (btree insertion uses random
> early insertion point to avoid superlinear lookup cost during such
> insertions).
>
> But, our case also involved multiple child processes simultaneously inserting
> into same table, and I wonder if "repeated keys" were more or less unrelated
> to
> the problem. The behavior is maybe caused easily by simultaneous insertions
> "clustered" around the same target: for us: now(), for you: nextval().
>
>> But now effective_io_concurrency parameter is applicable only for bitmap
> ...
>> Will it be useful to support it also for index scan?
>> Or there are some other ways to address this problem?
>
> Does your case perform well with bitmap heap scan (I mean bitmap scan of the
> single index)? It seems to me that prefetch wouldn't help, as it would just
> incur the same random cost you're already seeing; the solution may be to
> choose
> another plan(bitmap) with sequential access to enable read-ahead,
>
> Also: Claudio mentioned here that bitmap prefetch can cause the kernel to
> avoid
> its own readahead, negatively affecting some queries:
> https://www.postgresql.org/message-id/flat/8fb758a1-d7fa-4dcc-fb5b-07a992ae6a32%40gmail.com#20180207054227.ge17...@telsasoft.com
>
> What's the pg_stats "correlation" for the table column with index being
> scanned? How many tuples? Would you send explain(analyze,buffers) for the
> problem query, and with SET enable_bitmapscan=off; ?
Thank you very much for your feedback.
Setting random_page_cost to 10 forces Postgres to use bitmap scan:
paradoxdb=# explain select * from public.people_raw where ("ID" < 2068113880)
and not (("ID" < 1669762074)) order by 2,1,3;
QUERY PLAN
—---—
Sort (cost=269989441.71..270965997.99 rows=390622510 width=351)
Sort Key: "LastName", "FirstName", "MiddleName"
-> Index Scan using "People_pkey" on people_raw (cost=0.58..88742525.10
rows=390622510 width=351)
Index Cond: (("ID" < 2068113880) AND ("ID" >= 1669762074))
(4 rows)
paradoxdb=# set random_page_cost to 10;
SET
paradoxdb=# explain select * from public.people_raw where ("ID" < 2068113880)
and not (("ID" < 1669762074)) order by 2,1,3;
QUERY PLAN
—-—
Sort (cost=397271527.57..398248083.84 rows=390622510 width=351)
Sort Key: "LastName", "FirstName", "MiddleName"
-> Bitmap Heap Scan on people_raw (cost=14905651.30..108450832.96
rows=390622510 width=351)
Recheck Cond: (("ID" < 2068113880) AND ("ID" >= 1669762074))
-> Bitmap Index Scan on "People_pkey" (cost=0.00..14807995.68
rows=390622510 width=0)
Index Cond: (("ID" < 2068113880) AND ("ID" >= 1669762074))
(6 rows)
Correlation of "ID" column is 0.978884.
Sorry, it is customer's system and I have limited access to it.
Next load of data is expected during this week, so we will check how bitmap
scan can improve performance.
According to IO traffic, I expect up to 100 times improvement.