Hi, On Fri, 10 May 2024 at 16:55, Nazir Bilal Yavuz <byavu...@gmail.com> wrote: > > Hi, > > On Fri, 10 May 2024 at 16:21, Nazir Bilal Yavuz <byavu...@gmail.com> wrote: > > > > Hi, > > > > On Fri, 10 May 2024 at 14:49, Alena Rybakina <lena.riback...@yandex.ru> > > wrote: > > > > > > Hi! I could try to check it with the test, but I want to ask you about > > > details, because I'm not sure that I completely understand the test case. > > > > > > You mean that we need to have two backends and on one of them we deleted > > > the tuples before vacuum called the other, do you? > > >
There should be some other backend(s) which will try to read the same buffer with the ongoing VACUUM operation. I think it works now but the reproduction steps are a bit racy. See: 1- Build Postgres with attached diff, it is the same see_previous_output.diff that I shared two mails ago. 2- Run Postgres, all settings are default. 3- Use two client backends, let's name them as A and B client backends. 4- On A client backend, run: CREATE TABLE vacuum_fix (aid int, bid int) with (autovacuum_enabled=false); INSERT INTO vacuum_fix SELECT *, * FROM generate_series(1, 20000000); VACUUM vacuum_fix; UPDATE vacuum_fix SET aid = aid + 1, bid = bid + 1; 5- Now it will be a bit racy, SQL commands below need to be run at the same time. The aim is for VACUUM on A client backend and SELECT on B client backend to read the same buffers at the same time. So, some of the buffers will be double counted. Firstly, run VACUUM on A client backend; immediately after running VACUUM, run SELECT on B backend. A client backend: VACUUM VERBOSE vacuum_fix; B client backend: SELECT * from vacuum_fix WHERE aid = -1; This is the output of the VACUUM VERBOSE on my end: INFO: vacuuming "test.public.vacuum_fix" INFO: finished vacuuming "test.public.vacuum_fix": index scans: 0 pages: 0 removed, 176992 remain, 176992 scanned (100.00% of total) ... ... buffer usage: 254181 hits, 99030 misses in the previous version, 99865 misses in the patched version, 106830 dirtied ... VACUUM Time: 2578.217 ms (00:02.578) VACUUM does not run parallel, so this test case does not trigger what is fixed in this thread. As it can be seen, there is ~1000 buffers difference. I am not sure if there is an easier way to reproduce this but I hope this helps. -- Regards, Nazir Bilal Yavuz Microsoft
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 84cc983b6e6..582973d575b 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -309,6 +309,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, PgStat_Counter startreadtime = 0, startwritetime = 0; WalUsage startwalusage = pgWalUsage; + int64 StartPageMiss = VacuumPageMiss; BufferUsage startbufferusage = pgBufferUsage; ErrorContextCallback errcallback; char **indnames = NULL; @@ -606,6 +607,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, StringInfoData buf; char *msgfmt; int32 diff; + int64 PageMissOp = VacuumPageMiss - StartPageMiss; double read_rate = 0, write_rate = 0; @@ -748,8 +750,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"), read_rate, write_rate); appendStringInfo(&buf, - _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"), + _("buffer usage: %lld hits, %lld misses in the previous version, %lld misses in the patched version, %lld dirtied\n"), (long long) (bufferusage.shared_blks_hit + bufferusage.local_blks_hit), + (long long) (PageMissOp), (long long) (bufferusage.shared_blks_read + bufferusage.local_blks_read), (long long) (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied)); appendStringInfo(&buf,