Re: [HACKERS] Vacuuming big btree indexes without pages with deleted items
31 марта 2015 г., в 23:33, Kevin Grittner kgri...@ymail.com написал(а): Jim Nasby jim.na...@bluetreble.com wrote: On 3/27/15 5:15 AM, Vladimir Borodin wrote: Master writes this record to xlog in btvacuumscan function after vacuuming of all index pages. And in case of no pages with deleted items xlog record would contain lastBlockVacuumed 0. In btree_xlog_vacuum replica reads all blocks from lastBlockVacuumed to last block of the index while applying this record because there is no api in the buffer manager to understand if the page is unpinned. So if the index is quite big (200+ GB in described case) it takes much time to do it. 2. Is it possible not to write to xlog record with lastBlockVacuumed 0 in some cases? For example, in case of not deleting any pages. Possibly, but that's much higher risk. Without studying it, if we wanted to mess around with that it might actually make more sense to XLOG a set of blkno's that got vacuumed, but I suspect that wouldn't be a win. I feel pretty confident that it would be a win in some significant cases, but it could be worse in some cases by changing sequential access to random, unless we use heuristics to protect against that. But... Or maybe there are some better ways of improving this situation? This is a start of a better way: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ed5b87f96d473962ec5230fd820abfeaccb2069 If we expand on that commit to cover non-MVCC index scans, index-only scans, and index scans of non-WAL-logged indexes, then this whole aspect of btree vacuum can be eliminated. It seems extremely dubious that all of that could be done for 9.5, and it's certainly not material for back-patching to any stable branches, but it would be a more complete and better-performing fix than the alternatives being discussed here. Kevin, thanks for your work in this direction. This way seems to be definitely better. It doesn’t matter that it would not be included in 9.5 and back-patched to stable versions. This thread is mostly about what could be done in the future. If other cases (including index-only scans) would be addressed in 9.6, for example, that would be really cool. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- May the force be with you… https://simply.name
Re: [HACKERS] Vacuuming big btree indexes without pages with deleted items
Jim Nasby jim.na...@bluetreble.com wrote: On 3/27/15 5:15 AM, Vladimir Borodin wrote: Master writes this record to xlog in btvacuumscan function after vacuuming of all index pages. And in case of no pages with deleted items xlog record would contain lastBlockVacuumed 0. In btree_xlog_vacuum replica reads all blocks from lastBlockVacuumed to last block of the index while applying this record because there is no api in the buffer manager to understand if the page is unpinned. So if the index is quite big (200+ GB in described case) it takes much time to do it. 2. Is it possible not to write to xlog record with lastBlockVacuumed 0 in some cases? For example, in case of not deleting any pages. Possibly, but that's much higher risk. Without studying it, if we wanted to mess around with that it might actually make more sense to XLOG a set of blkno's that got vacuumed, but I suspect that wouldn't be a win. I feel pretty confident that it would be a win in some significant cases, but it could be worse in some cases by changing sequential access to random, unless we use heuristics to protect against that. But... Or maybe there are some better ways of improving this situation? This is a start of a better way: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ed5b87f96d473962ec5230fd820abfeaccb2069 If we expand on that commit to cover non-MVCC index scans, index-only scans, and index scans of non-WAL-logged indexes, then this whole aspect of btree vacuum can be eliminated. It seems extremely dubious that all of that could be done for 9.5, and it's certainly not material for back-patching to any stable branches, but it would be a more complete and better-performing fix than the alternatives being discussed here. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuuming big btree indexes without pages with deleted items
On 3/27/15 5:15 AM, Vladimir Borodin wrote: Hi all. I have described [0] a problem with delaying replicas after vacuuming a relation with big btree index. It stucks in replaying WAL record of type XLOG_BTREE_VACUUM like that (with lastBlockVacuumed 0): rmgr: Btree len (rec/tot): 20/52, tx: 0, lsn: 4115/56126DC0, prev 4115/56126D90, bkp: , desc: vacuum: rel 1663/16420/16796; blk 31222118, lastBlockVacuumed 0 Master writes this record to xlog in btvacuumscan [1] function after vacuuming of all index pages. And in case of no pages with deleted items xlog record would contain lastBlockVacuumed 0. In btree_xlog_vacuum [2] replica reads all blocks from lastBlockVacuumed to last block of the index while applying this record because there is no api in the buffer manager to understand if the page is unpinned. So if the index is quite big (200+ GB in described case) it takes much time to do it. So the questions are: 1. Aren’t there still any api in buffer manager to understand that the page is not in shared_buffers without reading it? I don't know offhand, but since XLogReadBufferExtended already has a mode argument it wouldn't be too hard to add it there. 2. Is it possible not to write to xlog record with lastBlockVacuumed 0 in some cases? For example, in case of not deleting any pages. Possibly, but that's much higher risk. Without studying it, if we wanted to mess around with that it might actually make more sense to XLOG a set of blkno's that got vacuumed, but I suspect that wouldn't be a win. Or maybe there are some better ways of improving this situation? [0] http://www.postgresql.org/message-id/fe82a9a7-0d52-41b5-a9ed-967f6927c...@simply.name [1] http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtree.c;hb=refs/heads/REL9_4_STABLE#l813 [2] http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtxlog.c;hb=refs/heads/REL9_4_STABLE#l482 -- May the force be with you… https://simply.name -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Vacuuming big btree indexes without pages with deleted items
Hi all. I have described [0] a problem with delaying replicas after vacuuming a relation with big btree index. It stucks in replaying WAL record of type XLOG_BTREE_VACUUM like that (with lastBlockVacuumed 0): rmgr: Btree len (rec/tot): 20/52, tx: 0, lsn: 4115/56126DC0, prev 4115/56126D90, bkp: , desc: vacuum: rel 1663/16420/16796; blk 31222118, lastBlockVacuumed 0 Master writes this record to xlog in btvacuumscan [1] function after vacuuming of all index pages. And in case of no pages with deleted items xlog record would contain lastBlockVacuumed 0. In btree_xlog_vacuum [2] replica reads all blocks from lastBlockVacuumed to last block of the index while applying this record because there is no api in the buffer manager to understand if the page is unpinned. So if the index is quite big (200+ GB in described case) it takes much time to do it. So the questions are: 1. Aren’t there still any api in buffer manager to understand that the page is not in shared_buffers without reading it? 2. Is it possible not to write to xlog record with lastBlockVacuumed 0 in some cases? For example, in case of not deleting any pages. Or maybe there are some better ways of improving this situation? [0] http://www.postgresql.org/message-id/fe82a9a7-0d52-41b5-a9ed-967f6927c...@simply.name [1] http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtree.c;hb=refs/heads/REL9_4_STABLE#l813 [2] http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtxlog.c;hb=refs/heads/REL9_4_STABLE#l482 -- May the force be with you… https://simply.name