Re: Index scan prefetch?

konstantin knizhnik Tue, 27 Mar 2018 01:32:06 -0700

On Mar 26, 2018, at 10:59 PM, Justin Pryzby wrote:

> On Mon, Mar 26, 2018 at 12:43:02PM +0300, Konstantin Knizhnik wrote:
>> Hi, hackers.
>> 
>> I was faced with the following bad performance use case with Postgres: there
>> is a huge append-only table with serial key (ID)
>> which is permanently appended using multithreaded pgloader.
> 
> I think this could be similar to the problem I reported here:
> https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com#[email protected]
> 
> The analysis at the time was that, due to "repeated keys", a btree index on 
> the
> timestamp column had non-consecutive heap TIDs (btree insertion uses random
> early insertion point to avoid superlinear lookup cost during such 
> insertions).
> 
> But, our case also involved multiple child processes simultaneously inserting
> into same table, and I wonder if "repeated keys" were more or less unrelated 
> to
> the problem.  The behavior is maybe caused easily by simultaneous insertions
> "clustered" around the same target: for us: now(), for you: nextval().
> 
>> But now effective_io_concurrency parameter is applicable only for bitmap
> ...
>> Will it be useful to support it also for index scan?
>> Or there are some other ways to address this problem?
> 
> Does your case perform well with bitmap heap scan (I mean bitmap scan of the
> single index)?  It seems to me that prefetch wouldn't help, as it would just
> incur the same random cost you're already seeing; the solution may be to 
> choose
> another plan(bitmap) with sequential access to enable read-ahead,
> 
> Also: Claudio mentioned here that bitmap prefetch can cause the kernel to 
> avoid
> its own readahead, negatively affecting some queries:
> https://www.postgresql.org/message-id/flat/8fb758a1-d7fa-4dcc-fb5b-07a992ae6a32%40gmail.com#[email protected]
> 
> What's the pg_stats "correlation" for the table column with index being
> scanned?  How many tuples?  Would you send explain(analyze,buffers) for the
> problem query, and with SET enable_bitmapscan=off; ?


Thank you very much for your feedback.

Setting random_page_cost to 10 forces Postgres to use bitmap scan:

paradoxdb=# explain select * from public.people_raw where ("ID" < 2068113880) 
and not (("ID" < 1669762074)) order by 2,1,3;
                                              QUERY PLAN                        
                       
—---------------------------------------------------------------------------------------------------—
 Sort  (cost=269989441.71..270965997.99 rows=390622510 width=351)
   Sort Key: "LastName", "FirstName", "MiddleName"
   ->  Index Scan using "People_pkey" on people_raw  (cost=0.58..88742525.10 
rows=390622510 width=351)
         Index Cond: (("ID" < 2068113880) AND ("ID" >= 1669762074))
(4 rows)

paradoxdb=# set random_page_cost to 10;
SET
paradoxdb=# explain select * from public.people_raw where ("ID" < 2068113880) 
and not (("ID" < 1669762074)) order by 2,1,3;
                                           QUERY PLAN                           
                 
—---------------------------------------------------------------------------------------------—
 Sort  (cost=397271527.57..398248083.84 rows=390622510 width=351)
   Sort Key: "LastName", "FirstName", "MiddleName"
   ->  Bitmap Heap Scan on people_raw  (cost=14905651.30..108450832.96 
rows=390622510 width=351)
         Recheck Cond: (("ID" < 2068113880) AND ("ID" >= 1669762074))
         ->  Bitmap Index Scan on "People_pkey"  (cost=0.00..14807995.68 
rows=390622510 width=0)
               Index Cond: (("ID" < 2068113880) AND ("ID" >= 1669762074))
(6 rows)

Correlation of "ID" column is 0.978884.

Sorry, it is customer's system and I have limited access to it. 
Next load of data is expected during this week, so we will check how bitmap 
scan can improve performance.
According to IO traffic, I expect up to 100 times improvement.

Re: Index scan prefetch?

Reply via email to