On Mar 26, 2018, at 10:59 PM, Justin Pryzby wrote: > On Mon, Mar 26, 2018 at 12:43:02PM +0300, Konstantin Knizhnik wrote: >> Hi, hackers. >> >> I was faced with the following bad performance use case with Postgres: there >> is a huge append-only table with serial key (ID) >> which is permanently appended using multithreaded pgloader. > > I think this could be similar to the problem I reported here: > https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com#20160524173914.ga11...@telsasoft.com > > The analysis at the time was that, due to "repeated keys", a btree index on > the > timestamp column had non-consecutive heap TIDs (btree insertion uses random > early insertion point to avoid superlinear lookup cost during such > insertions). > > But, our case also involved multiple child processes simultaneously inserting > into same table, and I wonder if "repeated keys" were more or less unrelated > to > the problem. The behavior is maybe caused easily by simultaneous insertions > "clustered" around the same target: for us: now(), for you: nextval(). > >> But now effective_io_concurrency parameter is applicable only for bitmap > ... >> Will it be useful to support it also for index scan? >> Or there are some other ways to address this problem? > > Does your case perform well with bitmap heap scan (I mean bitmap scan of the > single index)? It seems to me that prefetch wouldn't help, as it would just > incur the same random cost you're already seeing; the solution may be to > choose > another plan(bitmap) with sequential access to enable read-ahead, > > Also: Claudio mentioned here that bitmap prefetch can cause the kernel to > avoid > its own readahead, negatively affecting some queries: > https://www.postgresql.org/message-id/flat/8fb758a1-d7fa-4dcc-fb5b-07a992ae6a32%40gmail.com#20180207054227.ge17...@telsasoft.com > > What's the pg_stats "correlation" for the table column with index being > scanned? How many tuples? Would you send explain(analyze,buffers) for the > problem query, and with SET enable_bitmapscan=off; ?
Thank you very much for your feedback. Setting random_page_cost to 10 forces Postgres to use bitmap scan: paradoxdb=# explain select * from public.people_raw where ("ID" < 2068113880) and not (("ID" < 1669762074)) order by 2,1,3; QUERY PLAN —---------------------------------------------------------------------------------------------------— Sort (cost=269989441.71..270965997.99 rows=390622510 width=351) Sort Key: "LastName", "FirstName", "MiddleName" -> Index Scan using "People_pkey" on people_raw (cost=0.58..88742525.10 rows=390622510 width=351) Index Cond: (("ID" < 2068113880) AND ("ID" >= 1669762074)) (4 rows) paradoxdb=# set random_page_cost to 10; SET paradoxdb=# explain select * from public.people_raw where ("ID" < 2068113880) and not (("ID" < 1669762074)) order by 2,1,3; QUERY PLAN —---------------------------------------------------------------------------------------------— Sort (cost=397271527.57..398248083.84 rows=390622510 width=351) Sort Key: "LastName", "FirstName", "MiddleName" -> Bitmap Heap Scan on people_raw (cost=14905651.30..108450832.96 rows=390622510 width=351) Recheck Cond: (("ID" < 2068113880) AND ("ID" >= 1669762074)) -> Bitmap Index Scan on "People_pkey" (cost=0.00..14807995.68 rows=390622510 width=0) Index Cond: (("ID" < 2068113880) AND ("ID" >= 1669762074)) (6 rows) Correlation of "ID" column is 0.978884. Sorry, it is customer's system and I have limited access to it. Next load of data is expected during this week, so we will check how bitmap scan can improve performance. According to IO traffic, I expect up to 100 times improvement.