Re: [HACKERS] Vacuuming big btree indexes without pages with deleted items

2015-04-01 Thread Vladimir Borodin

 31 марта 2015 г., в 23:33, Kevin Grittner kgri...@ymail.com написал(а):
 
 Jim Nasby jim.na...@bluetreble.com wrote:
 On 3/27/15 5:15 AM, Vladimir Borodin wrote:
 
 Master writes this record to xlog in btvacuumscan function after
 vacuuming of all index pages. And in case of no pages with
 deleted items xlog record would contain lastBlockVacuumed 0.
 
 In btree_xlog_vacuum replica reads all blocks from
 lastBlockVacuumed to last block of the index while applying this
 record because there is no api in the buffer manager to
 understand if the page is unpinned.
 
 So if the index is quite big (200+ GB in described case) it
 takes much time to do it.
 
 2. Is it possible not to write to xlog record with
 lastBlockVacuumed 0 in some cases? For example, in case of not
 deleting any pages.
 
 Possibly, but that's much higher risk. Without studying it, if we
 wanted to mess around with that it might actually make more sense
 to XLOG a set of blkno's that got vacuumed, but I suspect that
 wouldn't be a win.
 
 I feel pretty confident that it would be a win in some significant
 cases, but it could be worse in some cases by changing sequential
 access to random, unless we use heuristics to protect against
 that.  But...
 
 Or maybe there are some better ways of improving this situation?
 
 This is a start of a better way:
 
 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ed5b87f96d473962ec5230fd820abfeaccb2069
 
 If we expand on that commit to cover non-MVCC index scans,
 index-only scans, and index scans of non-WAL-logged indexes, then
 this whole aspect of btree vacuum can be eliminated.  It seems
 extremely dubious that all of that could be done for 9.5, and it's
 certainly not material for back-patching to any stable branches,
 but it would be a more complete and better-performing fix than the
 alternatives being discussed here.

Kevin, thanks for your work in this direction.

This way seems to be definitely better. It doesn’t matter that it would not be 
included in 9.5 and back-patched to stable versions. This thread is mostly 
about what could be done in the future. If other cases (including index-only 
scans) would be addressed in 9.6, for example, that would be really cool.

 
 --
 Kevin Grittner
 EDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


--
May the force be with you…
https://simply.name



Re: [HACKERS] Vacuuming big btree indexes without pages with deleted items

2015-03-31 Thread Kevin Grittner
Jim Nasby jim.na...@bluetreble.com wrote:
 On 3/27/15 5:15 AM, Vladimir Borodin wrote:

 Master writes this record to xlog in btvacuumscan function after
 vacuuming of all index pages. And in case of no pages with
 deleted items xlog record would contain lastBlockVacuumed 0.

 In btree_xlog_vacuum replica reads all blocks from
 lastBlockVacuumed to last block of the index while applying this
 record because there is no api in the buffer manager to
 understand if the page is unpinned.

 So if the index is quite big (200+ GB in described case) it
 takes much time to do it.

 2. Is it possible not to write to xlog record with
 lastBlockVacuumed 0 in some cases? For example, in case of not
 deleting any pages.

 Possibly, but that's much higher risk. Without studying it, if we
 wanted to mess around with that it might actually make more sense
 to XLOG a set of blkno's that got vacuumed, but I suspect that
 wouldn't be a win.

I feel pretty confident that it would be a win in some significant
cases, but it could be worse in some cases by changing sequential
access to random, unless we use heuristics to protect against
that.  But...

 Or maybe there are some better ways of improving this situation?

This is a start of a better way:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ed5b87f96d473962ec5230fd820abfeaccb2069

If we expand on that commit to cover non-MVCC index scans,
index-only scans, and index scans of non-WAL-logged indexes, then
this whole aspect of btree vacuum can be eliminated.  It seems
extremely dubious that all of that could be done for 9.5, and it's
certainly not material for back-patching to any stable branches,
but it would be a more complete and better-performing fix than the
alternatives being discussed here.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuuming big btree indexes without pages with deleted items

2015-03-30 Thread Jim Nasby

On 3/27/15 5:15 AM, Vladimir Borodin wrote:

Hi all.

I have described [0] a problem with delaying replicas after vacuuming a
relation with big btree index. It stucks in replaying WAL record of
type XLOG_BTREE_VACUUM like that (with lastBlockVacuumed 0):

rmgr: Btree   len (rec/tot): 20/52, tx:  0, lsn:
4115/56126DC0, prev 4115/56126D90, bkp: , desc: vacuum: rel
1663/16420/16796; blk 31222118, lastBlockVacuumed 0

Master writes this record to xlog in btvacuumscan [1] function after
vacuuming of all index pages. And in case of no pages with deleted items
xlog record would contain lastBlockVacuumed 0.

In btree_xlog_vacuum [2] replica reads all blocks from lastBlockVacuumed
to last block of the index while applying this record because there is
no api in the buffer manager to understand if the page is unpinned.

So if the index is quite big (200+ GB in described case) it takes much
time to do it. So the questions are:

1. Aren’t there still any api in buffer manager to understand that the
page is not in shared_buffers without reading it?
I don't know offhand, but since XLogReadBufferExtended already has a 
mode argument it wouldn't be too hard to add it there.



2. Is it possible not to write to xlog record with lastBlockVacuumed 0
in some cases? For example, in case of not deleting any pages.


Possibly, but that's much higher risk. Without studying it, if we wanted 
to mess around with that it might actually make more sense to XLOG a set 
of blkno's that got vacuumed, but I suspect that wouldn't be a win.



Or maybe there are some better ways of improving this situation?

[0]
http://www.postgresql.org/message-id/fe82a9a7-0d52-41b5-a9ed-967f6927c...@simply.name
[1]
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtree.c;hb=refs/heads/REL9_4_STABLE#l813
[2]
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtxlog.c;hb=refs/heads/REL9_4_STABLE#l482

--
May the force be with you…
https://simply.name




--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Vacuuming big btree indexes without pages with deleted items

2015-03-27 Thread Vladimir Borodin
Hi all.

I have described [0] a problem with delaying replicas after vacuuming a 
relation with big btree index. It stucks in replaying WAL record of type 
XLOG_BTREE_VACUUM like that (with lastBlockVacuumed 0):

rmgr: Btree   len (rec/tot): 20/52, tx:  0, lsn: 
4115/56126DC0, prev 4115/56126D90, bkp: , desc: vacuum: rel 
1663/16420/16796; blk 31222118, lastBlockVacuumed 0

Master writes this record to xlog in btvacuumscan [1] function after vacuuming 
of all index pages. And in case of no pages with deleted items xlog record 
would contain lastBlockVacuumed 0.

In btree_xlog_vacuum [2] replica reads all blocks from lastBlockVacuumed to 
last block of the index while applying this record because there is no api in 
the buffer manager to understand if the page is unpinned. 

So if the index is quite big (200+ GB in described case) it takes much time to 
do it. So the questions are:

1. Aren’t there still any api in buffer manager to understand that the page is 
not in shared_buffers without reading it?
2. Is it possible not to write to xlog record with lastBlockVacuumed 0 in some 
cases? For example, in case of not deleting any pages.

Or maybe there are some better ways of improving this situation?

[0] 
http://www.postgresql.org/message-id/fe82a9a7-0d52-41b5-a9ed-967f6927c...@simply.name
[1] 
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtree.c;hb=refs/heads/REL9_4_STABLE#l813
[2] 
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtxlog.c;hb=refs/heads/REL9_4_STABLE#l482

--
May the force be with you…
https://simply.name