Re: gist microvacuum doesn't appear to care about hot standby?

2019-04-05 Thread Andres Freund
Hi,

On 2018-12-21 02:40:18 +0300, Alexander Korotkov wrote:
> On Thu, Dec 20, 2018 at 1:41 AM Alexander Korotkov
>  wrote:
> > Please, find attached two patches I'm going to commit: for master and
> > for backpatching.
> 
> So, pushed.

I noticed that I didn't adapt this in

commit 558a9165e081d1936573e5a7d576f5febd7fb55a
Author: Andres Freund 
Date:   2019-03-26 14:41:46 -0700

Compute XID horizon for page level index vacuum on primary.


Attached you thus can find the conversion of gist to the new logic for
computing the horizon. Any comments?

Greetings,

Andres Freund
>From f12b1105e4a0c356083a8dbae81e6a2c11da7bbe Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 5 Apr 2019 22:01:51 -0700
Subject: [PATCH] Convert gist to compute page level xid horizon on primary.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/gist/gist.c |   8 +-
 src/backend/access/gist/gistxlog.c | 153 +
 src/backend/access/rmgrdesc/gistdesc.c |   7 +-
 src/include/access/gist_private.h  |   2 +-
 src/include/access/gistxlog.h  |   3 +-
 src/include/access/xlog_internal.h |   2 +-
 6 files changed, 18 insertions(+), 157 deletions(-)

diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 2db790c840c..769aca42e3b 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -1616,6 +1616,7 @@ gistprunepage(Relation rel, Page page, Buffer buffer, Relation heapRel)
 	int			ndeletable = 0;
 	OffsetNumber offnum,
 maxoff;
+	TransactionId latestRemovedXid = InvalidTransactionId;
 
 	Assert(GistPageIsLeaf(page));
 
@@ -1634,6 +1635,11 @@ gistprunepage(Relation rel, Page page, Buffer buffer, Relation heapRel)
 			deletable[ndeletable++] = offnum;
 	}
 
+	if (XLogStandbyInfoActive() && RelationNeedsWAL(rel))
+		latestRemovedXid =
+			index_compute_xid_horizon_for_tuples(rel, heapRel, buffer,
+ deletable, ndeletable);
+
 	if (ndeletable > 0)
 	{
 		START_CRIT_SECTION();
@@ -1658,7 +1664,7 @@ gistprunepage(Relation rel, Page page, Buffer buffer, Relation heapRel)
 
 			recptr = gistXLogDelete(buffer,
 	deletable, ndeletable,
-	heapRel->rd_node);
+	latestRemovedXid);
 
 			PageSetLSN(page, recptr);
 		}
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index 4fb1855e890..503db34d863 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -165,152 +165,6 @@ gistRedoPageUpdateRecord(XLogReaderState *record)
 		UnlockReleaseBuffer(buffer);
 }
 
-/*
- * Get the latestRemovedXid from the heap pages pointed at by the index
- * tuples being deleted. See also btree_xlog_delete_get_latestRemovedXid,
- * on which this function is based.
- */
-static TransactionId
-gistRedoDeleteRecordGetLatestRemovedXid(XLogReaderState *record)
-{
-	gistxlogDelete *xlrec = (gistxlogDelete *) XLogRecGetData(record);
-	OffsetNumber *todelete;
-	Buffer		ibuffer,
-hbuffer;
-	Page		ipage,
-hpage;
-	RelFileNode rnode;
-	BlockNumber blkno;
-	ItemId		iitemid,
-hitemid;
-	IndexTuple	itup;
-	HeapTupleHeader htuphdr;
-	BlockNumber hblkno;
-	OffsetNumber hoffnum;
-	TransactionId latestRemovedXid = InvalidTransactionId;
-	int			i;
-
-	/*
-	 * If there's nothing running on the standby we don't need to derive a
-	 * full latestRemovedXid value, so use a fast path out of here.  This
-	 * returns InvalidTransactionId, and so will conflict with all HS
-	 * transactions; but since we just worked out that that's zero people,
-	 * it's OK.
-	 *
-	 * XXX There is a race condition here, which is that a new backend might
-	 * start just after we look.  If so, it cannot need to conflict, but this
-	 * coding will result in throwing a conflict anyway.
-	 */
-	if (CountDBBackends(InvalidOid) == 0)
-		return latestRemovedXid;
-
-	/*
-	 * In what follows, we have to examine the previous state of the index
-	 * page, as well as the heap page(s) it points to.  This is only valid if
-	 * WAL replay has reached a consistent database state; which means that
-	 * the preceding check is not just an optimization, but is *necessary*. We
-	 * won't have let in any user sessions before we reach consistency.
-	 */
-	if (!reachedConsistency)
-		elog(PANIC, "gistRedoDeleteRecordGetLatestRemovedXid: cannot operate with inconsistent data");
-
-	/*
-	 * Get index page.  If the DB is consistent, this should not fail, nor
-	 * should any of the heap page fetches below.  If one does, we return
-	 * InvalidTransactionId to cancel all HS transactions.  That's probably
-	 * overkill, but it's safe, and certainly better than panicking here.
-	 */
-	XLogRecGetBlockTag(record, 0, , NULL, );
-	ibuffer = XLogReadBufferExtended(rnode, MAIN_FORKNUM, blkno, RBM_NORMAL);
-	if (!BufferIsValid(ibuffer))
-		return InvalidTransactionId;
-	LockBuffer(ibuffer, BUFFER_LOCK_EXCLUSIVE);
-	ipage = (Page) BufferGetPage(ibuffer);
-
-	/*
-	 * Loop through the deleted 

Re: gist microvacuum doesn't appear to care about hot standby?

2018-12-20 Thread Alexander Korotkov
On Thu, Dec 20, 2018 at 1:41 AM Alexander Korotkov
 wrote:
> Please, find attached two patches I'm going to commit: for master and
> for backpatching.

So, pushed.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: gist microvacuum doesn't appear to care about hot standby?

2018-12-19 Thread Alexander Korotkov
On Tue, Dec 18, 2018 at 2:04 AM Alexander Korotkov
 wrote:
> On Mon, Dec 17, 2018 at 3:35 PM Alexander Korotkov
>  wrote:
> > On Mon, Dec 17, 2018 at 3:40 AM Alexander Korotkov
> >  wrote:
> > > On Mon, Dec 17, 2018 at 1:25 AM Andres Freund  wrote:
> > > > On 2018-12-17 01:03:52 +0300, Alexander Korotkov wrote:
> > > > > Sorry for delay.  Attached patch implements conflict handling for gist
> > > > > microvacuum like btree and hash.  I'm going to push it if no
> > > > > objections.
> > > > >
> > > > > Note, that it implements new WAL record type.  So, new WAL can\t be
> > > > > replayed on old minor release.  I'm note sure if we claim that it's
> > > > > usually possible.  Should we state something explicitly for this case?
> > > >
> > > > Please hold off committing for a bit. Adding new WAL records in a minor
> > > > release ought to be very well considered and a measure of last resort.
> > > >
> > > > Couldn't we determine the xid horizon on the primary, and reuse an
> > > > existing WAL record to trigger the conflict?  Or something along those
> > > > lines?
> > >
> > > I thought about that, but decided it's better to mimic B-tree and hash
> > > behavior rather than invent new logic in a minor release.  But given
> > > that new WAL record in minor release in substantial problem, that
> > > argument doesn't matter.
> > >
> > > Yes, it seems to be possible.  We can determine xid horizon on primary
> > > in the same way you proposed for B-tree and hash [1] and use
> > > XLOG_HEAP2_CLEANUP_INFO record to trigger the conflict.  Do you like
> > > me to make such patch for GiST based on your patch?
> >
> > Got another tricky idea.  Now, deleted offset numbers are written to
> > buffer data.  We can also append them to record data.  So, basing on
> > record length we can resolve conflicts when offsets are provided in
> > record data.  Unpatched version will just ignore extra record data
> > tail.  That would cost us some redundant bigger wal records, but solve
> > other problems.  Any thoughts?
>
> Please, find backpatch version of patch implementing this approach
> attached.  I found it more attractive than placing xid horizon
> calculation to primary.  Because xid horizon calculation on primary is
> substantially new behavior, which is unwanted for backpatching.  I've
> not yet tested this patch.
>
> I'm going to test this patch including WAL compatibility.  If
> everything will be OK, then commit.

I've managed to reproduce the problem and test my backpatch solution.

primary (patched)
standby 1 (patched)
standby 2 (unpatched)

drop table if exists test;
create table test (p point) with (fillfactor = 50, autovacuum_enabled = false);
insert into test (select point(i % 100, i / 100) from
generate_series(0,) i);
vacuum test;
create index test_gist_idx on test using gist (p);
alter table test set (fillfactor = 100);

begin isolation level repeatable read;
select count(*) from test where p <@ box(point(0,0),point(99,99));
 count
---
 1
(1 row)

begin isolation level repeatable read;
select count(*) from test where p <@ box(point(0,0),point(99,99));
 count
---
 1
(1 row)

delete from test where p[0]::int % 10 = 0 and p[1]::int % 10 = 0;
set enable_seqscan = off;
set enable_bitmapscan = off;
set enable_indexonlyscan = off;
select count(*) from test where p <@ box(point(0,0),point(99,99));
insert into test (select point(i % 100, i / 100) from
generate_series(0,) i);

select count(*) from test where p <@ box(point(0,0),point(99,99));
 count
---
 1
(1 row)

select count(*) from test where p <@ box(point(0,0),point(99,99));
 count
---
  9961
(1 row)

select count(*) from test where p <@ box(point(0,0),point(99,99));
FATAL:  terminating connection due to conflict with recovery
DETAIL:  User query might have needed to see row versions that
must be removed.
HINT:  In a moment you should be able to reconnect to the database
and repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

So, two standbys were reading the same WAL generated by patched
primary.  Patched standby got conflict: it gives correct query answer
then drops transaction.  Unpatched replicate WAL stream without
conflict.  So, it gives wrong query answer as if it was reading WAL
from unpatched master.

If experimenting with unpatched primary, both standbys gives wrong
query answer without conflict.

Please, find attached two patches I'm going to commit: for master and
for backpatching.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


gist-microvacuum-conflict-handling-2.patch
Description: Binary data



Re: gist microvacuum doesn't appear to care about hot standby?

2018-12-17 Thread Alexander Korotkov
On Mon, Dec 17, 2018 at 3:35 PM Alexander Korotkov
 wrote:
> On Mon, Dec 17, 2018 at 3:40 AM Alexander Korotkov
>  wrote:
> > On Mon, Dec 17, 2018 at 1:25 AM Andres Freund  wrote:
> > > On 2018-12-17 01:03:52 +0300, Alexander Korotkov wrote:
> > > > Sorry for delay.  Attached patch implements conflict handling for gist
> > > > microvacuum like btree and hash.  I'm going to push it if no
> > > > objections.
> > > >
> > > > Note, that it implements new WAL record type.  So, new WAL can\t be
> > > > replayed on old minor release.  I'm note sure if we claim that it's
> > > > usually possible.  Should we state something explicitly for this case?
> > >
> > > Please hold off committing for a bit. Adding new WAL records in a minor
> > > release ought to be very well considered and a measure of last resort.
> > >
> > > Couldn't we determine the xid horizon on the primary, and reuse an
> > > existing WAL record to trigger the conflict?  Or something along those
> > > lines?
> >
> > I thought about that, but decided it's better to mimic B-tree and hash
> > behavior rather than invent new logic in a minor release.  But given
> > that new WAL record in minor release in substantial problem, that
> > argument doesn't matter.
> >
> > Yes, it seems to be possible.  We can determine xid horizon on primary
> > in the same way you proposed for B-tree and hash [1] and use
> > XLOG_HEAP2_CLEANUP_INFO record to trigger the conflict.  Do you like
> > me to make such patch for GiST based on your patch?
>
> Got another tricky idea.  Now, deleted offset numbers are written to
> buffer data.  We can also append them to record data.  So, basing on
> record length we can resolve conflicts when offsets are provided in
> record data.  Unpatched version will just ignore extra record data
> tail.  That would cost us some redundant bigger wal records, but solve
> other problems.  Any thoughts?

Please, find backpatch version of patch implementing this approach
attached.  I found it more attractive than placing xid horizon
calculation to primary.  Because xid horizon calculation on primary is
substantially new behavior, which is unwanted for backpatching.  I've
not yet tested this patch.

I'm going to test this patch including WAL compatibility.  If
everything will be OK, then commit.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


gist-microvacuum-conflict-handling-backpatch.patch
Description: Binary data


Re: gist microvacuum doesn't appear to care about hot standby?

2018-12-17 Thread Peter Geoghegan
On Sun, Dec 16, 2018 at 4:41 PM Alexander Korotkov
 wrote:
> I thought about that, but decided it's better to mimic B-tree and hash
> behavior rather than invent new logic in a minor release.  But given
> that new WAL record in minor release in substantial problem, that
> argument doesn't matter.

It might be better to mimic B-Tree and hash on the master branch.

-- 
Peter Geoghegan



Re: gist microvacuum doesn't appear to care about hot standby?

2018-12-17 Thread Alexander Korotkov
On Mon, Dec 17, 2018 at 3:40 AM Alexander Korotkov
 wrote:
> On Mon, Dec 17, 2018 at 1:25 AM Andres Freund  wrote:
> > On 2018-12-17 01:03:52 +0300, Alexander Korotkov wrote:
> > > Sorry for delay.  Attached patch implements conflict handling for gist
> > > microvacuum like btree and hash.  I'm going to push it if no
> > > objections.
> > >
> > > Note, that it implements new WAL record type.  So, new WAL can\t be
> > > replayed on old minor release.  I'm note sure if we claim that it's
> > > usually possible.  Should we state something explicitly for this case?
> >
> > Please hold off committing for a bit. Adding new WAL records in a minor
> > release ought to be very well considered and a measure of last resort.
> >
> > Couldn't we determine the xid horizon on the primary, and reuse an
> > existing WAL record to trigger the conflict?  Or something along those
> > lines?
>
> I thought about that, but decided it's better to mimic B-tree and hash
> behavior rather than invent new logic in a minor release.  But given
> that new WAL record in minor release in substantial problem, that
> argument doesn't matter.
>
> Yes, it seems to be possible.  We can determine xid horizon on primary
> in the same way you proposed for B-tree and hash [1] and use
> XLOG_HEAP2_CLEANUP_INFO record to trigger the conflict.  Do you like
> me to make such patch for GiST based on your patch?

Got another tricky idea.  Now, deleted offset numbers are written to
buffer data.  We can also append them to record data.  So, basing on
record length we can resolve conflicts when offsets are provided in
record data.  Unpatched version will just ignore extra record data
tail.  That would cost us some redundant bigger wal records, but solve
other problems.  Any thoughts?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: gist microvacuum doesn't appear to care about hot standby?

2018-12-16 Thread Alexander Korotkov
On Mon, Dec 17, 2018 at 1:25 AM Andres Freund  wrote:
> On 2018-12-17 01:03:52 +0300, Alexander Korotkov wrote:
> > On Thu, Dec 13, 2018 at 7:28 AM Alexander Korotkov
> >  wrote:
> > > On Thu, Dec 13, 2018 at 1:45 AM Andres Freund  wrote:
> > > > Is there any reason something like that isn't necessary for gist? If so,
> > > > where's that documented? If not, uh, isn't that a somewhat serious bug
> > > > in gist?
> > >
> > > Thank you for pointing!  This looks like a bug for me too.  I'm going
> > > to investigate more on this and provide a fix in next couple of days.
> >
> > Sorry for delay.  Attached patch implements conflict handling for gist
> > microvacuum like btree and hash.  I'm going to push it if no
> > objections.
> >
> > Note, that it implements new WAL record type.  So, new WAL can\t be
> > replayed on old minor release.  I'm note sure if we claim that it's
> > usually possible.  Should we state something explicitly for this case?
>
> Please hold off committing for a bit. Adding new WAL records in a minor
> release ought to be very well considered and a measure of last resort.
>
> Couldn't we determine the xid horizon on the primary, and reuse an
> existing WAL record to trigger the conflict?  Or something along those
> lines?

I thought about that, but decided it's better to mimic B-tree and hash
behavior rather than invent new logic in a minor release.  But given
that new WAL record in minor release in substantial problem, that
argument doesn't matter.

Yes, it seems to be possible.  We can determine xid horizon on primary
in the same way you proposed for B-tree and hash [1] and use
XLOG_HEAP2_CLEANUP_INFO record to trigger the conflict.  Do you like
me to make such patch for GiST based on your patch?

1. 
https://www.postgresql.org/message-id/20181214014235.dal5ogljs3bmlq44%40alap3.anarazel.de

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: gist microvacuum doesn't appear to care about hot standby?

2018-12-16 Thread Andres Freund
Hi,

On 2018-12-17 01:03:52 +0300, Alexander Korotkov wrote:
> On Thu, Dec 13, 2018 at 7:28 AM Alexander Korotkov
>  wrote:
> > On Thu, Dec 13, 2018 at 1:45 AM Andres Freund  wrote:
> > > Is there any reason something like that isn't necessary for gist? If so,
> > > where's that documented? If not, uh, isn't that a somewhat serious bug
> > > in gist?
> >
> > Thank you for pointing!  This looks like a bug for me too.  I'm going
> > to investigate more on this and provide a fix in next couple of days.
> 
> Sorry for delay.  Attached patch implements conflict handling for gist
> microvacuum like btree and hash.  I'm going to push it if no
> objections.
> 
> Note, that it implements new WAL record type.  So, new WAL can\t be
> replayed on old minor release.  I'm note sure if we claim that it's
> usually possible.  Should we state something explicitly for this case?

Please hold off committing for a bit. Adding new WAL records in a minor
release ought to be very well considered and a measure of last resort.

Couldn't we determine the xid horizon on the primary, and reuse an
existing WAL record to trigger the conflict?  Or something along those
lines?

Greetings,

Andres Freund



Re: gist microvacuum doesn't appear to care about hot standby?

2018-12-16 Thread Alexander Korotkov
On Thu, Dec 13, 2018 at 7:28 AM Alexander Korotkov
 wrote:
> On Thu, Dec 13, 2018 at 1:45 AM Andres Freund  wrote:
> > Is there any reason something like that isn't necessary for gist? If so,
> > where's that documented? If not, uh, isn't that a somewhat serious bug
> > in gist?
>
> Thank you for pointing!  This looks like a bug for me too.  I'm going
> to investigate more on this and provide a fix in next couple of days.

Sorry for delay.  Attached patch implements conflict handling for gist
microvacuum like btree and hash.  I'm going to push it if no
objections.

Note, that it implements new WAL record type.  So, new WAL can\t be
replayed on old minor release.  I'm note sure if we claim that it's
usually possible.  Should we state something explicitly for this case?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


gist-microvacuum-conflict-handling.patch
Description: Binary data


Re: gist microvacuum doesn't appear to care about hot standby?

2018-12-12 Thread Alexander Korotkov
On Thu, Dec 13, 2018 at 1:45 AM Andres Freund  wrote:
> Is there any reason something like that isn't necessary for gist? If so,
> where's that documented? If not, uh, isn't that a somewhat serious bug
> in gist?

Thank you for pointing!  This looks like a bug for me too.  I'm going
to investigate more on this and provide a fix in next couple of days.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company