Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On Mon, Jul 24, 2017 at 10:50 AM, Joshua D. Drake wrote: > Does this suggest that we don't have a cleanup problem but a fragmentation > problem (or both at least for the index)? Having an index that is almost > twice the uncleaned up size isn't that uncommon. As Tom pointed out up-thread, it's important to distinguish between inherent overhead, and overhead due to garbage that needs to be cleaned-up by vacuum. It's really hard to delineate which is which here, and I'm not going to try to put a number on it. What I will point out is that you can see quite a significant difference between the space utilization of a B-Tree without any dead tuples, just from the order in which tuples are initially inserted. You can get about a 1/3 loss of space by inserting randomly, rather than inserting in sorted order, which is what REINDEX will more or less do for you. That's because random workloads almost entirely get 50:50 page splits, whereas sorted input will always split the rightmost page, and so will always get 90:10 splits. The space in the random case isn't exactly wasted; it's there for the taking, for key values that happen to fit on the page. You effectively require a larger average reserve of free space on pages with the random workload, because the implementation does not and cannot reason that it would be best to concentrate free space in parts of the keyspace where there is most need for it. That having been said, I do think that this workload suffers from index bloat in a way that isn't so easily explained. It does seem to be an issue with VACUUM controlling bloat in the index in particular. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On 07/23/2017 12:03 PM, Joshua D. Drake wrote: As you can see even with aggressive vacuuming, over a period of 36 hours life gets increasingly miserable. The largest table is: postgres=# select pg_size_pretty(pg_total_relation_size('bmsql_order_line')); pg_size_pretty 148 GB (1 row) [snip] With the PK being postgres=# select pg_size_pretty(pg_relation_size('bmsql_order_line_pkey')); pg_size_pretty 48 GB (1 row) I tried to see how much data we are dealing with here: -hackers, I cleaned up the table with VACUUM FULL and ended up with the following: postgres=# select pg_size_pretty(pg_total_relation_size('bmsql_order_line')); pg_size_pretty 118 GB (1 row) postgres=# select pg_size_pretty(pg_relation_size('bmsql_order_line_pkey')); pg_size_pretty 27 GB (1 row) Does this suggest that we don't have a cleanup problem but a fragmentation problem (or both at least for the index)? Having an index that is almost twice the uncleaned up size isn't that uncommon. Thanks in advance, JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.us * Unless otherwise stated, opinions are my own. * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
Hello, I changed the test to run for 6 hours at a time regardless of number of transactions. I also changed the du command to only look at the database (previously wal logs were included). This is the clearest indication of the problem I have been able to produce. Again, this is with 128 clients and 500 warehouses. The first test is a clean test, everything dropped, vacuumed etc... Each subsequent test is just starting the test again to have breakpoints. -+--- autovacuum | on autovacuum_analyze_scale_factor | 0.1 autovacuum_analyze_threshold| 50 autovacuum_freeze_max_age | 2 autovacuum_max_workers | 12 autovacuum_multixact_freeze_max_age | 4 autovacuum_naptime | 10 autovacuum_vacuum_cost_delay| 0 autovacuum_vacuum_cost_limit| 5000 autovacuum_vacuum_scale_factor | 0.1 autovacuum_vacuum_threshold | 50 autovacuum_work_mem | -1 log_autovacuum_min_duration | -1 max_wal_size| 640 checkpoint_timeout | 86400 checkpoint_completion_target| 0.5 Starting base metric 50G /srv/main/base Test 1: 90G /srv/main/base TPS: 838 Test 2: 121G/srv/main/base TPS: 725 Test 3: 146G/srv/main/base TPS: 642 Test 4: 171G/srv/main/base TPS: 549 Test 5: 189G/srv/main/base TPS: 489 Test 6: 208G/srv/main/base TPS: 454 As you can see even with aggressive vacuuming, over a period of 36 hours life gets increasingly miserable. The largest table is: postgres=# select pg_size_pretty(pg_total_relation_size('bmsql_order_line')); pg_size_pretty 148 GB (1 row) postgres=# \d bmsql_order_line Table "public.bmsql_order_line" Column |Type | Modifiers +-+--- ol_w_id| integer | not null ol_d_id| integer | not null ol_o_id| integer | not null ol_number | integer | not null ol_i_id| integer | not null ol_delivery_d | timestamp without time zone | ol_amount | numeric(6,2)| ol_supply_w_id | integer | ol_quantity| integer | ol_dist_info | character(24) | Indexes: "bmsql_order_line_pkey" PRIMARY KEY, btree (ol_w_id, ol_d_id, ol_o_id, ol_number) Foreign-key constraints: "ol_order_fkey" FOREIGN KEY (ol_w_id, ol_d_id, ol_o_id) REFERENCES bmsql_oorder(o_w_id, o_d_id, o_id) "ol_stock_fkey" FOREIGN KEY (ol_supply_w_id, ol_i_id) REFERENCES bmsql_stock(s_w_id, s_i_id) With the PK being postgres=# select pg_size_pretty(pg_relation_size('bmsql_order_line_pkey')); pg_size_pretty 48 GB (1 row) I tried to see how much data we are dealing with here: postgres=# select count(*) from bmsql_order_line; count --- 910324839 (1 row) Time: 503965.767 ms And just to show that we were pushing to get these numbers: avg-cpu: %user %nice %system %iowait %steal %idle 2.380.002.201.980.00 93.44 Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn sdb2027.40 239.99 0.05 1199 0 sda 0.80 0.00 0.01 0 0 So we have 910M rows, and it took 8.39941667 minutes to count them at 240MB/s. I know this is a lot of data and as I said previously, happy to let anyone look at it. However, we clearly have something deeper to look into. Thanks in advance, JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.us * Unless otherwise stated, opinions are my own. * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On 07/20/2017 08:58 PM, Joshua D. Drake wrote: On 07/19/2017 07:57 PM, Tom Lane wrote: Peter Geoghegan writes: Test 1: 55G/srv/main TPS:955 Test 2: 112G/srv/main TPS:531 (Not sure what happened here, long checkpoint?) Test 3: 109G/srv/main TPS:868 Test 4: 143G TPS:840 Test 5: 154G TPS: 722 I am running the query here: https://wiki.postgresql.org/wiki/Index_Maintenance#Summarize_keyspace_of_a_B-Tree_index This query didn't finish after 12 hours. Here is the new set: name | setting -+--- autovacuum | on autovacuum_analyze_scale_factor | 0.1 autovacuum_analyze_threshold| 50 autovacuum_freeze_max_age | 2 autovacuum_max_workers | 12 autovacuum_multixact_freeze_max_age | 4 autovacuum_naptime | 10 autovacuum_vacuum_cost_delay| 0 autovacuum_vacuum_cost_limit| 5000 autovacuum_vacuum_scale_factor | 0.1 autovacuum_vacuum_threshold | 50 autovacuum_work_mem | -1 log_autovacuum_min_duration | -1 I have only ran one test but it is pretty telling: Test 1: 60G /srv/main TPS: 914 Test 2: 92G /srv/main TPS: Still running I will post a update after the third or fourth test depending on the numbers. I created this instance exactly for these tests so if someone wants to poke around I can give access. Thanks, JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.us * Unless otherwise stated, opinions are my own. * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On 07/20/2017 11:54 PM, Sokolov Yura wrote: On 2017-07-21 06:58, Joshua D. Drake wrote: On 07/19/2017 07:57 PM, Tom Lane wrote: Peter Geoghegan writes: -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.us * Unless otherwise stated, opinions are my own. * Have you measured increased vacuum ring buffer? No, not yet. I think we are still in the proving the problem stage. JD This will require recompilation, though. With regards, -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.us * Unless otherwise stated, opinions are my own. * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On 2017-07-21 06:58, Joshua D. Drake wrote: On 07/19/2017 07:57 PM, Tom Lane wrote: Peter Geoghegan writes: My argument for the importance of index bloat to the more general bloat problem is simple: any bloat that accumulates, that cannot be cleaned up, will probably accumulate until it impacts performance quite noticeably. But that just begs the question: *does* it accumulate indefinitely, or does it eventually reach a more-or-less steady state? The traditional wisdom about btrees, for instance, is that no matter how full you pack them to start with, the steady state is going to involve something like 1/3rd free space. You can call that bloat if you want, but it's not likely that you'll be able to reduce the number significantly without paying exorbitant costs. I'm not claiming that we don't have any problems, but I do think it's important to draw a distinction between bloat and normal operating overhead. Agreed but we aren't talking about 30% I don't think. Here is where I am at. It took until 30 minutes ago for the tests to finish: name | setting -+--- autovacuum | on autovacuum_analyze_scale_factor | 0.1 autovacuum_analyze_threshold| 50 autovacuum_freeze_max_age | 2 autovacuum_max_workers | 3 autovacuum_multixact_freeze_max_age | 4 autovacuum_naptime | 60 autovacuum_vacuum_cost_delay| 20 autovacuum_vacuum_cost_limit| -1 autovacuum_vacuum_scale_factor | 0.2 autovacuum_vacuum_threshold | 50 autovacuum_work_mem | -1 log_autovacuum_min_duration | -1 Test 1: 55G /srv/main TPS:955 Test 2: 112G/srv/main TPS:531 (Not sure what happened here, long checkpoint?) Test 3: 109G/srv/main TPS:868 Test 4: 143G TPS:840 Test 5: 154G TPS:722 I am running the query here: https://wiki.postgresql.org/wiki/Index_Maintenance#Summarize_keyspace_of_a_B-Tree_index And will post a followup. Once the query finishes I am going to launch the tests with autovacuum_vacuum_cost_limit of 5000. Is there anything else you folks would like me to change? JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.us * Unless otherwise stated, opinions are my own. * Have you measured increased vacuum ring buffer? This will require recompilation, though. With regards, -- Sokolov Yura Postgres Professional: https://postgrespro.ru The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On 21/07/17 15:58, Joshua D. Drake wrote: On 07/19/2017 07:57 PM, Tom Lane wrote: Peter Geoghegan writes: My argument for the importance of index bloat to the more general bloat problem is simple: any bloat that accumulates, that cannot be cleaned up, will probably accumulate until it impacts performance quite noticeably. But that just begs the question: *does* it accumulate indefinitely, or does it eventually reach a more-or-less steady state? The traditional wisdom about btrees, for instance, is that no matter how full you pack them to start with, the steady state is going to involve something like 1/3rd free space. You can call that bloat if you want, but it's not likely that you'll be able to reduce the number significantly without paying exorbitant costs. I'm not claiming that we don't have any problems, but I do think it's important to draw a distinction between bloat and normal operating overhead. Agreed but we aren't talking about 30% I don't think. Here is where I am at. It took until 30 minutes ago for the tests to finish: name | setting -+--- autovacuum | on autovacuum_analyze_scale_factor | 0.1 autovacuum_analyze_threshold| 50 autovacuum_freeze_max_age | 2 autovacuum_max_workers | 3 autovacuum_multixact_freeze_max_age | 4 autovacuum_naptime | 60 autovacuum_vacuum_cost_delay| 20 autovacuum_vacuum_cost_limit| -1 autovacuum_vacuum_scale_factor | 0.2 autovacuum_vacuum_threshold | 50 autovacuum_work_mem | -1 log_autovacuum_min_duration | -1 Test 1: 55G/srv/main TPS:955 Test 2: 112G/srv/main TPS:531 (Not sure what happened here, long checkpoint?) Test 3: 109G/srv/main TPS:868 Test 4: 143G TPS:840 Test 5: 154G TPS: 722 I am running the query here: https://wiki.postgresql.org/wiki/Index_Maintenance#Summarize_keyspace_of_a_B-Tree_index And will post a followup. Once the query finishes I am going to launch the tests with autovacuum_vacuum_cost_limit of 5000. Is there anything else you folks would like me to change? I usually advise setting autovacuum_naptime = 10s (or even 5s) for workloads that do a lot of updates (or inserts + deletes) - as on modern HW a lot of churn can happen in 1 minute, and that just makes vacuum's job harder. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On 07/19/2017 07:57 PM, Tom Lane wrote: Peter Geoghegan writes: My argument for the importance of index bloat to the more general bloat problem is simple: any bloat that accumulates, that cannot be cleaned up, will probably accumulate until it impacts performance quite noticeably. But that just begs the question: *does* it accumulate indefinitely, or does it eventually reach a more-or-less steady state? The traditional wisdom about btrees, for instance, is that no matter how full you pack them to start with, the steady state is going to involve something like 1/3rd free space. You can call that bloat if you want, but it's not likely that you'll be able to reduce the number significantly without paying exorbitant costs. I'm not claiming that we don't have any problems, but I do think it's important to draw a distinction between bloat and normal operating overhead. Agreed but we aren't talking about 30% I don't think. Here is where I am at. It took until 30 minutes ago for the tests to finish: name | setting -+--- autovacuum | on autovacuum_analyze_scale_factor | 0.1 autovacuum_analyze_threshold| 50 autovacuum_freeze_max_age | 2 autovacuum_max_workers | 3 autovacuum_multixact_freeze_max_age | 4 autovacuum_naptime | 60 autovacuum_vacuum_cost_delay| 20 autovacuum_vacuum_cost_limit| -1 autovacuum_vacuum_scale_factor | 0.2 autovacuum_vacuum_threshold | 50 autovacuum_work_mem | -1 log_autovacuum_min_duration | -1 Test 1: 55G /srv/main TPS:955 Test 2: 112G/srv/main TPS:531 (Not sure what happened here, long checkpoint?) Test 3: 109G/srv/main TPS:868 Test 4: 143G TPS:840 Test 5: 154G TPS:722 I am running the query here: https://wiki.postgresql.org/wiki/Index_Maintenance#Summarize_keyspace_of_a_B-Tree_index And will post a followup. Once the query finishes I am going to launch the tests with autovacuum_vacuum_cost_limit of 5000. Is there anything else you folks would like me to change? JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.us * Unless otherwise stated, opinions are my own. * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On Thu, Jul 20, 2017 at 7:45 AM, Claudio Freire wrote: >> For the purposes of this discussion, I'm mostly talking about >> duplicates within a page on a unique index. If the keyspace owned by >> an int4 unique index page only covers 20 distinct values, it will only >> ever cover 20 distinct values, now and forever, despite the fact that >> there is room for about 400 (a 90/10 split leaves you with 366 items + >> 1 high key). > > Microvacuum could also help. > > If during a scan you find pointers that point to dead (in vacuum terms) > tuples, the pointers in the index could be deleted. That could be done > during insert into unique indexes before a split, to avoid the split. > > Chances are, if there are duplicates, at least a few of them will be dead. My whole point is that that could easily fail to happen early enough to prevent a pagesplit that is only needed because there is a short term surge in the number of duplicate versions that need to be available for one old snapshot. A pagesplit can be a permanent solution to a temporary problem. Page deletion can only occur under tight conditions that are unlikely to *ever* be met in many cases. Imagine if it was impossible to insert physical duplicates into unique indexes. In that world, you'd end up bloating some overflow data structure in UPDATE heavy cases (where HOT doesn't work out). The bloat wouldn't go on leaf pages, and so you wouldn't get page splits, and so you wouldn't end up with leaf pages that can only store 20 distinct values now and forever, because that's the range of values represented by downlinks and the leaf's high key. That's a situation we actually saw for the leftmost leaf page in Alik's Zipfian distribution test. The way that the keyspace is broken up is supposed to be balanced, and to have long term utility. Working against that to absorb a short term bloat problem is penny wise, pound foolish. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On Thu, Jul 20, 2017 at 6:28 AM, Stephen Frost wrote: > Greetings, > > * Sokolov Yura (y.soko...@postgrespro.ru) wrote: > > I wrote two days ago about vacuum ring buffer: > > https://www.postgresql.org/message-id/8737e9bddb82501da1134f021bf492 > 9a%40postgrespro.ru > > > > Increasing Vacuum's ring buffer to size of Bulk Writer's one reduces > > autovacuum time in 3-10 times. > > (for both patched and unpatched version I used single non-default > > setting > > 'autovacuum_cost_delay=2ms'). > > > > This is single line change, and it improves things a lot. > > Right- when the database fits in the OS cache but not in shared_buffers. > On a system with a slow fsync, increasing the ring buffer helps a lot even if database doesn't fit in the OS cache. When the next buffer allocation runs into a dirtied buffer in the ring, it needs to sync the WAL up through that buffer's LSN before it can write it out and reuse it. With a small ring, this means a lot of WAL flushing needs to be done. > I do agree that's a useful improvement to make based on your testing. > > It's not clear off-hand how much that would improve this case, as > the database size appears to pretty quickly get beyond the OS memory > size (and only in the first test is the DB starting size less than > system memory to begin with). > Also, this system probably has a pretty fast fdatasync, considering it is SSD. Cheers, Jeff
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On Thu, Jul 20, 2017 at 12:08 AM, Peter Geoghegan wrote: >> The traditional >> wisdom about btrees, for instance, is that no matter how full you pack >> them to start with, the steady state is going to involve something like >> 1/3rd free space. You can call that bloat if you want, but it's not >> likely that you'll be able to reduce the number significantly without >> paying exorbitant costs. > > For the purposes of this discussion, I'm mostly talking about > duplicates within a page on a unique index. If the keyspace owned by > an int4 unique index page only covers 20 distinct values, it will only > ever cover 20 distinct values, now and forever, despite the fact that > there is room for about 400 (a 90/10 split leaves you with 366 items + > 1 high key). Microvacuum could also help. If during a scan you find pointers that point to dead (in vacuum terms) tuples, the pointers in the index could be deleted. That could be done during insert into unique indexes before a split, to avoid the split. Chances are, if there are duplicates, at least a few of them will be dead. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On 07/20/2017 06:28 AM, Stephen Frost wrote: It's not clear off-hand how much that would improve this case, as the database size appears to pretty quickly get beyond the OS memory size (and only in the first test is the DB starting size less than system memory to begin with). FYI, I will be posting new numbers in a few hours. I had been planning on posting this last night but... KDE. JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.us * Unless otherwise stated, opinions are my own. * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
Greetings, * Sokolov Yura (y.soko...@postgrespro.ru) wrote: > I wrote two days ago about vacuum ring buffer: > https://www.postgresql.org/message-id/8737e9bddb82501da1134f021bf4929a%40postgrespro.ru > > Increasing Vacuum's ring buffer to size of Bulk Writer's one reduces > autovacuum time in 3-10 times. > (for both patched and unpatched version I used single non-default > setting > 'autovacuum_cost_delay=2ms'). > > This is single line change, and it improves things a lot. Right- when the database fits in the OS cache but not in shared_buffers. I do agree that's a useful improvement to make based on your testing. It's not clear off-hand how much that would improve this case, as the database size appears to pretty quickly get beyond the OS memory size (and only in the first test is the DB starting size less than system memory to begin with). Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On 2017-07-20 05:52, Masahiko Sawada wrote: On Thu, Jul 20, 2017 at 8:24 AM, Stephen Frost wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: "Joshua D. Drake" writes: > At PGConf US Philly last week I was talking with Jim and Jan about > performance. One of the items that came up is that PostgreSQL can't run > full throttle for long periods of time. The long and short is that no > matter what, autovacuum can't keep up. This is what I have done: Try reducing autovacuum_vacuum_cost_delay more, and/or increasing autovacuum_vacuum_cost_limit. Or get rid of the cost delay entirely and let autovacuum actually go as fast as it can when it's run. The assertion that it can't keep up is still plausible, but configuring autovacuum to sleep regularly and then complaining that it's not able to keep up doesn't make sense. Reducing the nap time might also be helpful if autovacuum is going as fast as it can and it's able to clear a table in less than a minute. There have been discussions on this list about parallel vacuum of a particular table as well; to address this issue I'd encourage reviewing those discussions and looking at writing a patch to implement that feature as that would address the case where the table is large enough that autovacuum simply can't get through all of it before the other backends have used all space available and then substantially increased the size of the relation (leading to vacuum on the table running for longer). Yeah, the parallel vacuum of a particular table might help this issue unless disk I/O is bottle-neck. I'm planning work on that. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center I wrote two days ago about vacuum ring buffer: https://www.postgresql.org/message-id/8737e9bddb82501da1134f021bf4929a%40postgrespro.ru Increasing Vacuum's ring buffer to size of Bulk Writer's one reduces autovacuum time in 3-10 times. (for both patched and unpatched version I used single non-default setting 'autovacuum_cost_delay=2ms'). This is single line change, and it improves things a lot. With regards, -- Sokolov Yura Postgres Professional: https://postgrespro.ru The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On Wed, Jul 19, 2017 at 7:57 PM, Tom Lane wrote: > Peter Geoghegan writes: >> My argument for the importance of index bloat to the more general >> bloat problem is simple: any bloat that accumulates, that cannot be >> cleaned up, will probably accumulate until it impacts performance >> quite noticeably. > > But that just begs the question: *does* it accumulate indefinitely, or > does it eventually reach a more-or-less steady state? Yes, I believe it does reach a more-or-less steady state. It saturates when there is a lot of contention, because then you actually can reuse the bloat. If it didn't saturate, and instead became arbitrarily bad, then we'd surely have heard about that before now. The bloat is not entirely wasted, because it actually prevents you from getting even more bloat in that part of the keyspace. > The traditional > wisdom about btrees, for instance, is that no matter how full you pack > them to start with, the steady state is going to involve something like > 1/3rd free space. You can call that bloat if you want, but it's not > likely that you'll be able to reduce the number significantly without > paying exorbitant costs. For the purposes of this discussion, I'm mostly talking about duplicates within a page on a unique index. If the keyspace owned by an int4 unique index page only covers 20 distinct values, it will only ever cover 20 distinct values, now and forever, despite the fact that there is room for about 400 (a 90/10 split leaves you with 366 items + 1 high key). I don't know if I should really even call this bloat, since the term is so overloaded, although this is what other database systems call index bloat. I like to think of it as "damage to the keyspace", although that terminology seems unlikely to catch on. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
Peter Geoghegan writes: > My argument for the importance of index bloat to the more general > bloat problem is simple: any bloat that accumulates, that cannot be > cleaned up, will probably accumulate until it impacts performance > quite noticeably. But that just begs the question: *does* it accumulate indefinitely, or does it eventually reach a more-or-less steady state? The traditional wisdom about btrees, for instance, is that no matter how full you pack them to start with, the steady state is going to involve something like 1/3rd free space. You can call that bloat if you want, but it's not likely that you'll be able to reduce the number significantly without paying exorbitant costs. I'm not claiming that we don't have any problems, but I do think it's important to draw a distinction between bloat and normal operating overhead. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On Thu, Jul 20, 2017 at 8:24 AM, Stephen Frost wrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> "Joshua D. Drake" writes: >> > At PGConf US Philly last week I was talking with Jim and Jan about >> > performance. One of the items that came up is that PostgreSQL can't run >> > full throttle for long periods of time. The long and short is that no >> > matter what, autovacuum can't keep up. This is what I have done: >> >> Try reducing autovacuum_vacuum_cost_delay more, and/or increasing >> autovacuum_vacuum_cost_limit. > > Or get rid of the cost delay entirely and let autovacuum actually go as > fast as it can when it's run. The assertion that it can't keep up is > still plausible, but configuring autovacuum to sleep regularly and then > complaining that it's not able to keep up doesn't make sense. > > Reducing the nap time might also be helpful if autovacuum is going as > fast as it can and it's able to clear a table in less than a minute. > > There have been discussions on this list about parallel vacuum of a > particular table as well; to address this issue I'd encourage reviewing > those discussions and looking at writing a patch to implement that > feature as that would address the case where the table is large enough > that autovacuum simply can't get through all of it before the other > backends have used all space available and then substantially increased > the size of the relation (leading to vacuum on the table running for > longer). Yeah, the parallel vacuum of a particular table might help this issue unless disk I/O is bottle-neck. I'm planning work on that. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On Wed, Jul 19, 2017 at 3:54 PM, Alvaro Herrera wrote: >> Index bloat is a general problem that B-Trees have in all other major >> systems, but I think that PostgreSQL has a tendency to allow indexes >> to become progressively more bloated over time, in a way that it often >> can never recover from [1]. > > Interesting assertion. I don't pretend to understand the complicated feedback loops that may exist for workloads that are very reliant on pruning, kill_prior_tuples cleanup, very aggressive vacuuming, etc. I'll just say that it seems very reasonable to suppose that they exist. My argument for the importance of index bloat to the more general bloat problem is simple: any bloat that accumulates, that cannot be cleaned up, will probably accumulate until it impacts performance quite noticeably. Index bloat may not seem as important as I suggest to some. The ideas I'm talking about were received somewhat skeptically at pgCon, when I brought them up informally. Maybe this is a more cogent explanation: if your indexes were somehow magically never bloated, but the heap could become just as bloated, then it might not matter that much simply because the heap pages wouldn't be accessed by index scans. Heap page accesses may demonstrably be the bottleneck today, without that magic in place, but perhaps only because indexes point to the bloat in the first place. It could be a bit like the situation with bloating UNDO segments in Oracle; those are not the worst thing to have to bloat. And, the kill_prior_tuples stuff is only as strong as the weakest link (oldest snapshot). > Many years ago I set to implement btree page > merging[1] from ideas in a 1996 paper[2], though that work never saw the > light of day. Maybe it can be valuable now. Lots of other database systems don't implement B-Tree page merging, because it's hard to make work with techniques like Lehman & Yao B-Link trees, and the average case space utilization still ends up being good enough. It may be more worthwhile for us, though. Apparently Kevin has some ideas here. If I'm right about this index bloat issue, then I'd sooner tackle the problem by preventing bloat in unique indexes in the fist place, by using some additional indirection, a topic that I've said plenty about recently. I think that you can sometimes, though fairly rarely, see cases that get *really* out of hand, but with things like in-database queues, that have unusually aggressive update patterns where a great many duplicates are generated in indexes [1]. Our handling of duplicates in unique indexes [2] is surely a problem there. > Another thing to consider is indirect indexes, wherein you get less heap > bloat because more updates can become HOT. I think that the stuff I'm talking about, about having indirection for the primary key and making sure unique indexes actually disallow even physical duplicates actually enables indirect indexes. Remember how I pointed out issues with unique indexes and VACUUM when you presented on it at the pgCon unconference? I think that those problems may be solvable through centralizing everything by making duplicates within leaf pages physically impossible for unique indexes (not just logically impossible). One chokepoint for dealing with bloat cleanup for every index is the entire point of your indirect index design, but that requires that you actually have no ambiguity about what every logical pointer (in every secondary index) points to. > If we also allow heap to be pruned from line pointers by walking indexes > to remove specific pointers, instead of requiring indexes to be scanned > whole for the removal, as proposed by Andres, we could get further > along. Yeah. I talked to Robert about doing that just today. That seems very worthwhile, no matter how it ends up being used (whether it's by VACUUM, something closer to an synchronous deletion, or whatever). [1] https://brandur.org/postgres-queues [2] https://wiki.postgresql.org/wiki/Key_normalization#Avoiding_unnecessary_unique_index_enforcement -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
* Tom Lane (t...@sss.pgh.pa.us) wrote: > "Joshua D. Drake" writes: > > At PGConf US Philly last week I was talking with Jim and Jan about > > performance. One of the items that came up is that PostgreSQL can't run > > full throttle for long periods of time. The long and short is that no > > matter what, autovacuum can't keep up. This is what I have done: > > Try reducing autovacuum_vacuum_cost_delay more, and/or increasing > autovacuum_vacuum_cost_limit. Or get rid of the cost delay entirely and let autovacuum actually go as fast as it can when it's run. The assertion that it can't keep up is still plausible, but configuring autovacuum to sleep regularly and then complaining that it's not able to keep up doesn't make sense. Reducing the nap time might also be helpful if autovacuum is going as fast as it can and it's able to clear a table in less than a minute. There have been discussions on this list about parallel vacuum of a particular table as well; to address this issue I'd encourage reviewing those discussions and looking at writing a patch to implement that feature as that would address the case where the table is large enough that autovacuum simply can't get through all of it before the other backends have used all space available and then substantially increased the size of the relation (leading to vacuum on the table running for longer). Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On 07/19/2017 03:29 PM, Tom Lane wrote: > "Joshua D. Drake" writes: >> At PGConf US Philly last week I was talking with Jim and Jan about >> performance. One of the items that came up is that PostgreSQL can't run >> full throttle for long periods of time. The long and short is that no >> matter what, autovacuum can't keep up. This is what I have done: > > Try reducing autovacuum_vacuum_cost_delay more, and/or increasing > autovacuum_vacuum_cost_limit. I would try autovacuum_vacuum_cost_delay = 0 and for any tables > 1 million rows: autovacuum_vacuum_scale_factor: 0 autovacuum_vacuum_threshold: 10 (perhaps even smaller) Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
Peter Geoghegan wrote: > Index bloat is a general problem that B-Trees have in all other major > systems, but I think that PostgreSQL has a tendency to allow indexes > to become progressively more bloated over time, in a way that it often > can never recover from [1]. Interesting assertion. Many years ago I set to implement btree page merging[1] from ideas in a 1996 paper[2], though that work never saw the light of day. Maybe it can be valuable now. Another thing to consider is indirect indexes, wherein you get less heap bloat because more updates can become HOT. If we also allow heap to be pruned from line pointers by walking indexes to remove specific pointers, instead of requiring indexes to be scanned whole for the removal, as proposed by Andres, we could get further along. [1] https://www.postgresql.org/message-id/20020912235429.4714071a.alvhe...@atentus.com [2] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.9961 -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
"Joshua D. Drake" writes: > At PGConf US Philly last week I was talking with Jim and Jan about > performance. One of the items that came up is that PostgreSQL can't run > full throttle for long periods of time. The long and short is that no > matter what, autovacuum can't keep up. This is what I have done: Try reducing autovacuum_vacuum_cost_delay more, and/or increasing autovacuum_vacuum_cost_limit. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On Wed, Jul 19, 2017 at 3:11 PM, Joshua D. Drake wrote: > The good news is, PostgreSQL is not doing half bad against 128 connections > with only 16vCPU. The bad news is we more than doubled our disk size without > getting reuse or bloat under control. The concern here is that under heavy > write loads that are persistent, we will eventually bloat out and have to > vacuum full, no matter what. I know that Jan has done some testing and the > best he could get is something like 8 days before PostgreSQL became unusable > (but don't quote me on that). > > I am open to comments, suggestions, running multiple tests with different > parameters or just leaving this in the archive for people to reference. Did you see my blogpost on planet PostgreSQL from last night? https://pgeoghegan.blogspot.com/2017/07/postgresql-index-bloat-microscope.html Perhaps you could use my query to instrument an interesting index, to see what that turns up. I would really like to get a better sense of how often and to what extent index bloat is a problem that VACUUM is just inherently incapable of keeping under control over time. The timeline for performance to degrade with real production workloads is very interesting to me. It's really hard to simulate certain types of problems that you will see frequently in production. Index bloat is a general problem that B-Trees have in all other major systems, but I think that PostgreSQL has a tendency to allow indexes to become progressively more bloated over time, in a way that it often can never recover from [1]. This may be a particular problem with unique indexes, where many physical duplicates accumulate in pages. Duplicates that are theoretically reclaimable, but due to how the keyspace is split up, will never actually be reclaimed [2]. [1] postgr.es/m/CAH2-Wzmf6intNY1ggiNzOziiO5Eq=DsXfeptODGxO=2j-i1...@mail.gmail.com [2] https://wiki.postgresql.org/wiki/Key_normalization#VACUUM_and_nbtree_page_deletion -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers