Re: Cleaning up nbtree after logical decoding on standby work

2023-06-10 Thread Peter Geoghegan
On Fri, Jun 9, 2023 at 12:23 PM Peter Geoghegan  wrote:
> > I'm not sure there is that concensus (for me half the changes shouldn't be
> > done, the rest should be in 17), but in the end it doesn't matter that much.

I pushed this just now. I have also closed out the open item.

> > > --- a/src/include/utils/tuplesort.h
> > > +++ b/src/include/utils/tuplesort.h
> > > @@ -399,9 +399,7 @@ extern Tuplesortstate *tuplesort_begin_heap(TupleDesc 
> > > tupDesc,
> > >   
> > > int workMem, SortCoordinate coordinate,
> > >   
> > > int sortopt);
> > >  extern Tuplesortstate *tuplesort_begin_cluster(TupleDesc tupDesc,
> > > - 
> > >Relation indexRel,
> > > - 
> > >Relation heaprel,
> > > - 
> > >int workMem,
> > > + 
> > >Relation indexRel, int workMem,
> > >   
> > >SortCoordinate coordinate,
> > >   
> > >int sortopt);
> >
> > I think we should continue to provide the table here, even if we don't need 
> > it
> > today.
>
> I don't see why, but okay. I'll do it that way.

I didn't end up doing that in the version that I pushed (heaprel was
removed from tuplesort_begin_cluster in the final version after all),
since I couldn't justify the use of NewHeap over OldHeap at the call
site in heapam_handler.c. If you're interested in following up with
this yourself, I have no objections.

--
Peter Geoghegan




Re: Cleaning up nbtree after logical decoding on standby work

2023-06-09 Thread Peter Geoghegan
On Fri, Jun 9, 2023 at 9:40 PM Andres Freund  wrote:
> I don't think minimizing heaprel being passed around is a worthwhile goal, the
> contrary actually: It just makes it painful to use heaprel anywhere, because
> it causes precisely these cascading changes of adding/removing the parameter
> to a bunch of functions. If anything we should do the opposite.
>
>
> > It's not like nbtree ever really "used P_NEW". It doesn't actually
> > depend on any of the P_NEW handling inside bufmgr.c. It looks a little
> > like it might, but that's just an accident.
>
> That part I am entirely on board with, as mentioned earlier. It doesn't seem
> like 16 material though.

Obviously you shouldn't need a heaprel to lock a page. As it happened
GiST already worked without this sort of P_NEW idiom, which is why
commit 61b313e4 hardly made any changes at all to GiST, even though
the relevant parts of GiST are heavily based on nbtree. Did you just
forget to plaster similar heaprel arguments all over GiST and SP-GiST?

I'm really disappointed that you're still pushing back here, even
after I got a +1 on backpatching from Heikki. This should have been
straightforward.

-- 
Peter Geoghegan




Re: Cleaning up nbtree after logical decoding on standby work

2023-06-09 Thread Andres Freund
Hi,

On 2023-06-09 12:23:36 -0700, Peter Geoghegan wrote:
> On Fri, Jun 9, 2023 at 12:03 PM Andres Freund  wrote:
> > > My new plan is to commit this tomorrow, since the clear consensus is
> > > that we should go ahead with this for 16.
> >
> > I'm not sure there is that concensus (for me half the changes shouldn't be
> > done, the rest should be in 17), but in the end it doesn't matter that much.
> 
> Really? What parts are you opposed to in principle? I don't see why
> you wouldn't support everything or nothing for 17 (questions of style
> aside). I don't see what's ambiguous about what we should do here,
> barring the 16-or-17 question.

I don't think minimizing heaprel being passed around is a worthwhile goal, the
contrary actually: It just makes it painful to use heaprel anywhere, because
it causes precisely these cascading changes of adding/removing the parameter
to a bunch of functions. If anything we should do the opposite.


> It's not like nbtree ever really "used P_NEW". It doesn't actually
> depend on any of the P_NEW handling inside bufmgr.c. It looks a little
> like it might, but that's just an accident.

That part I am entirely on board with, as mentioned earlier. It doesn't seem
like 16 material though.

Greetings,

Andres Freund




Re: Cleaning up nbtree after logical decoding on standby work

2023-06-09 Thread Peter Geoghegan
On Fri, Jun 9, 2023 at 12:03 PM Andres Freund  wrote:
> From what I can tell it's largely consistent with other parameters of the
> respective function. E.g. btinsert(), _bt_doinsert() use camelCase for most
> parameters, so heapRel fits in.  There are a few cases where it's not obvious
> what the pattern is intended to be :/.

It's not 100% clear what the underlying principle is, but we mix
camelCase and underscore styles all the time, so that's always kinda
true.

> > My new plan is to commit this tomorrow, since the clear consensus is
> > that we should go ahead with this for 16.
>
> I'm not sure there is that concensus (for me half the changes shouldn't be
> done, the rest should be in 17), but in the end it doesn't matter that much.

Really? What parts are you opposed to in principle? I don't see why
you wouldn't support everything or nothing for 17 (questions of style
aside). I don't see what's ambiguous about what we should do here,
barring the 16-or-17 question.

It's not like nbtree ever really "used P_NEW". It doesn't actually
depend on any of the P_NEW handling inside bufmgr.c. It looks a little
like it might, but that's just an accident.

> > --- a/src/include/utils/tuplesort.h
> > +++ b/src/include/utils/tuplesort.h
> > @@ -399,9 +399,7 @@ extern Tuplesortstate *tuplesort_begin_heap(TupleDesc 
> > tupDesc,
> > 
> >   int workMem, SortCoordinate coordinate,
> > 
> >   int sortopt);
> >  extern Tuplesortstate *tuplesort_begin_cluster(TupleDesc tupDesc,
> > -   
> >  Relation indexRel,
> > -   
> >  Relation heaprel,
> > -   
> >  int workMem,
> > +   
> >  Relation indexRel, int workMem,
> > 
> >  SortCoordinate coordinate,
> > 
> >  int sortopt);
>
> I think we should continue to provide the table here, even if we don't need it
> today.

I don't see why, but okay. I'll do it that way.

-- 
Peter Geoghegan




Re: Cleaning up nbtree after logical decoding on standby work

2023-06-09 Thread Andres Freund
Hi,

On 2023-06-08 08:50:31 -0700, Peter Geoghegan wrote:
> On Thu, Jun 8, 2023 at 7:22 AM Alvaro Herrera  wrote:
> > IMO this kind of change definitely does not have place in a post-beta1
> > restructuring patch.  We rarely indulge in case-fixing exercises at any
> > other time, and I don't see any good argument why post-beta1 is a better
> > time for it.
>
> There is a glaring inconsistency. Now about half of the relevant
> functions in nbtree.h use "heaprel", while the other half use
> "heapRel". Obviously that's not the end of the world, but it's
> annoying. It's exactly the kind of case-fixing exercise that does tend
> to happen.

>From what I can tell it's largely consistent with other parameters of the
respective function. E.g. btinsert(), _bt_doinsert() use camelCase for most
parameters, so heapRel fits in.  There are a few cases where it's not obvious
what the pattern is intended to be :/.



> My new plan is to commit this tomorrow, since the clear consensus is
> that we should go ahead with this for 16.

I'm not sure there is that concensus (for me half the changes shouldn't be
done, the rest should be in 17), but in the end it doesn't matter that much.



> --- a/src/include/utils/tuplesort.h
> +++ b/src/include/utils/tuplesort.h
> @@ -399,9 +399,7 @@ extern Tuplesortstate *tuplesort_begin_heap(TupleDesc 
> tupDesc,
>   
> int workMem, SortCoordinate coordinate,
>   
> int sortopt);
>  extern Tuplesortstate *tuplesort_begin_cluster(TupleDesc tupDesc,
> - 
>Relation indexRel,
> - 
>Relation heaprel,
> - 
>int workMem,
> + 
>Relation indexRel, int workMem,
>   
>SortCoordinate coordinate,
>   
>int sortopt);

I think we should continue to provide the table here, even if we don't need it
today.

Greetings,

Andres Freund




Re: Cleaning up nbtree after logical decoding on standby work

2023-06-08 Thread Peter Geoghegan
On Thu, Jun 8, 2023 at 7:22 AM Alvaro Herrera  wrote:
> IMO this kind of change definitely does not have place in a post-beta1
> restructuring patch.  We rarely indulge in case-fixing exercises at any
> other time, and I don't see any good argument why post-beta1 is a better
> time for it.

There is a glaring inconsistency. Now about half of the relevant
functions in nbtree.h use "heaprel", while the other half use
"heapRel". Obviously that's not the end of the world, but it's
annoying. It's exactly the kind of case-fixing exercise that does tend
to happen.

I'm not going to argue this point any further, though. I will make
this change at a later date. That will introduce an inconsistency
between branches, of course, but apparently there isn't any
alternative.

> I suggest that you should strive to keep the patch as
> small as possible.

Attached is v4, which goes back to using "heaprel" in new-to-16 code.
As a result, it is slightly smaller than v3.

My new plan is to commit this tomorrow, since the clear consensus is
that we should go ahead with this for 16.

-- 
Peter Geoghegan


v4-0001-nbtree-Allocate-new-pages-in-separate-function.patch
Description: Binary data


Re: Cleaning up nbtree after logical decoding on standby work

2023-06-08 Thread Alvaro Herrera
On 2023-Jun-07, Peter Geoghegan wrote:

> On Wed, Jun 7, 2023 at 5:12 PM Andres Freund  wrote:

> > I don't really understand why the patch does s/heaprel/heapRel/.
> 
> That has been the style used within nbtree for many years now.

IMO this kind of change definitely does not have place in a post-beta1
restructuring patch.  We rarely indulge in case-fixing exercises at any
other time, and I don't see any good argument why post-beta1 is a better
time for it.  I suggest that you should strive to keep the patch as
small as possible.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Cómo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad; jugar
al alfarero y ver qué formas se pueden sacar del otro" (C. Halloway en
La Feria de las Tinieblas, R. Bradbury)




Re: Cleaning up nbtree after logical decoding on standby work

2023-06-07 Thread Peter Geoghegan
On Wed, Jun 7, 2023 at 5:12 PM Andres Freund  wrote:
> -1. For me separating the P_NEW path makes a lot of sense, but isn't 16
> material.  I don't agree that it's a problem to have heaprel as a parameter in
> a bunch of places that don't strictly need it today.

As I've said, this is primarily about keeping all of the branches
consistent. I agree that there is no particular known consequence to
not doing this, and have said as much several times.

> I don't really understand why the patch does s/heaprel/heapRel/.

That has been the style used within nbtree for many years now.

> Most of these
> functions aren't actually using camelcase parameters? This part of the change
> just blows up the size, making it harder to review.

I wonder what made it impossibly hard to review the first time around.
The nbtree aspects of this work were pretty much written on
auto-pilot. I had no intention of making a fuss about it, but then I
never expected this push back.

-- 
Peter Geoghegan




Re: Cleaning up nbtree after logical decoding on standby work

2023-06-07 Thread Andres Freund
Hi,

On 2023-06-05 12:04:29 -0700, Peter Geoghegan wrote:
> On Sun, May 28, 2023 at 7:49 PM Heikki Linnakangas  wrote:
> > IMO this makes sense for v16. These new arguments were introduced in
> > v16, so if we have second thoughts, now is the right time to change
> > them, before v16 is released. It will reduce the backpatching effort in
> > the future; if we apply this in v17, then v16 will be the odd one out.
>
> My current plan is to commit this later in the week, unless there are
> further objections. Wednesday or possibly Thursday.

-1. For me separating the P_NEW path makes a lot of sense, but isn't 16
material.  I don't agree that it's a problem to have heaprel as a parameter in
a bunch of places that don't strictly need it today.

I don't really understand why the patch does s/heaprel/heapRel/. Most of these
functions aren't actually using camelcase parameters? This part of the change
just blows up the size, making it harder to review.


 12 files changed, 317 insertions(+), 297 deletions(-)
...

Greetings,

Andres Freund




Re: Cleaning up nbtree after logical decoding on standby work

2023-06-06 Thread Alvaro Herrera
On 2023-Jun-05, Peter Geoghegan wrote:

> My current plan is to commit this later in the week, unless there are
> further objections. Wednesday or possibly Thursday.

I've added this as an open item for 16, with Peter and Andres as owners.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)




Re: Cleaning up nbtree after logical decoding on standby work

2023-06-05 Thread Peter Geoghegan
On Sun, May 28, 2023 at 7:49 PM Heikki Linnakangas  wrote:
> IMO this makes sense for v16. These new arguments were introduced in
> v16, so if we have second thoughts, now is the right time to change
> them, before v16 is released. It will reduce the backpatching effort in
> the future; if we apply this in v17, then v16 will be the odd one out.

My current plan is to commit this later in the week, unless there are
further objections. Wednesday or possibly Thursday.

> Maybe add an assertion for that in _bt_search(), too. I know you added
> one in _bt_getroot(), and _bt_search() calls that as the very first
> thing. But I think it would be useful as documentation in _bt_search(), too.

Attached revision v3 does it that way.

> Maybe it would be more straightforward to always require heapRel in
> _bt_search() and _bt_getroot(), regardless of whether it's BT_READ or
> BT_WRITE, even if the functions don't use it with BT_READ. It would be
> less mental effort in the callers to just always pass in 'heapRel'.

Perhaps, but it would also necessitate keeping heapRel in
_bt_get_endpoint()'s signature. That would mean that
_bt_get_endpoint() would needlessly pass its own superfluous heapRel
arg to _bt_search(), while presumably never passing the same heapRel
to _bt_gettrueroot() (not to be confused with _bt_getroot) in the
"level == 0" case. These inconsistencies seem kind of jarring.

Besides, every _bt_search() caller must already understand that
_bt_search does non-obvious extra work for BT_WRITE callers -- that's
nothing new. The requirement that BT_WRITE callers pass a heapRel
exists precisely because code that is used only during BT_WRITE calls
might ultimately reach _bt_allocbuf() indirectly. The "no heapRel
required in BT_READ case" seems directly relevant to callers --
avoiding _bt_allocbuf() during _bt_search() calls during Hot Standby
(or during VACUUM) is a basic requirement that callers more or less
ask for and expect already. (Bear in mind that the new rule going
forward is that _bt_allocbuf() is the one and only choke point where
new pages/buffers can be allocated by nbtree, and the only possible
source of recovery conflicts during REDO besides opportunistic
deletion record conflicts -- so it really isn't strange for _bt_search
callers to be thinking about whether _bt_allocbuf is safe to call.)

-- 
Peter Geoghegan


v3-0001-nbtree-Allocate-new-pages-in-separate-function.patch
Description: Binary data


Re: Cleaning up nbtree after logical decoding on standby work

2023-05-29 Thread Drouvot, Bertrand

Hi,

On 5/26/23 7:28 PM, Peter Geoghegan wrote:

On Fri, May 26, 2023 at 1:56 AM Alvaro Herrera  wrote:

I suppose you're not thinking of applying this to current master, but
instead just leave it for when pg17 opens, right?  I mean, clearly it
seems far too invasive to put it in after beta1.


I was planning on targeting 16 with this. Although I only posted a
patch recently, I complained about the problems in this area shortly
after the code first went in. It's fairly obvious to me that the
changes made to nbtree went quite a bit further than they needed to.


Thanks Peter for the call out and the follow up on this!

As you already mentioned in this thread, all the changes I've done in
61b313e47e were purely "mechanical" as the main goal was to move forward the
logical decoding on standby thread and..


Admittedly that's partly because I'm an expert on this particular
code.



it was not obvious to me (as I'm not an expert as you are in this area) that
many of those changes were "excessive".


Now, to be fair to Bertrand, it *looks* more complicated than it
really is, in large part due to the obscure case where VACUUM finishes
an interrupted page split (during page deletion), which itself goes on
to cause a page split one level up.


Thanks ;-)

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Cleaning up nbtree after logical decoding on standby work

2023-05-28 Thread Heikki Linnakangas

On 29/05/2023 03:31, Michael Paquier wrote:

On Fri, May 26, 2023 at 04:48:37PM -0700, Peter Geoghegan wrote:

I'd have thought the subject line "Cleaning up nbtree after logical
decoding on standby work" made it quite clear that this patch was
targeting 16.


Hmm, okay.  I was understanding that as something for v17, honestly.


IMO this makes sense for v16. These new arguments were introduced in 
v16, so if we have second thoughts, now is the right time to change 
them, before v16 is released. It will reduce the backpatching effort in 
the future; if we apply this in v17, then v16 will be the odd one out.


For the patch itself:


@@ -75,6 +74,10
  * _bt_search() -- Search the tree for a particular scankey,
  * or more precisely for the first leaf page it could be on.
  *
+ * rel must always be provided.  heaprel must be provided by callers that pass
+ * access = BT_WRITE, since we may need to allocate a new root page for caller
+ * in passing (or when finishing a page split results in a parent page split).
+ *
  * The passed scankey is an insertion-type scankey (see nbtree/README),
  * but it can omit the rightmost column(s) of the index.
  *


Maybe add an assertion for that in _bt_search(), too. I know you added 
one in _bt_getroot(), and _bt_search() calls that as the very first 
thing. But I think it would be useful as documentation in _bt_search(), too.


Maybe it would be more straightforward to always require heapRel in 
_bt_search() and _bt_getroot(), regardless of whether it's BT_READ or 
BT_WRITE, even if the functions don't use it with BT_READ. It would be 
less mental effort in the callers to just always pass in 'heapRel'.


Overall, +1 on this patch, and +1 for committing it to v16.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Cleaning up nbtree after logical decoding on standby work

2023-05-28 Thread Michael Paquier
On Fri, May 26, 2023 at 04:48:37PM -0700, Peter Geoghegan wrote:
> I'd have thought the subject line "Cleaning up nbtree after logical
> decoding on standby work" made it quite clear that this patch was
> targeting 16.

Hmm, okay.  I was understanding that as something for v17, honestly.

> It's not refactoring work -- not really. The whole idea of outright
> removing the use of P_NEW in nbtree was where I landed with this after
> a couple of hours of work. In fact I almost posted a version without
> that, though that was worse in every way to my final approach.
> 
> I first voiced concerns about this whole area way back on April 4,
> which is only 3 days after commit 61b313e4 went in:
> 
> https://postgr.es/m/CAH2-Wz=jgryxwm74g1khst0znpunhezyjnvsjno2t3jswtb...@mail.gmail.com

Sure.  My take is that if this patch were to be sent at the beginning
of April, it could have been considered in v16.  However, deciding
such a matter at the end of May after beta1 has been released is a
different call.  You may want to make sure that the RMT is OK with
that, at the end.
--
Michael


signature.asc
Description: PGP signature


Re: Cleaning up nbtree after logical decoding on standby work

2023-05-26 Thread Peter Geoghegan
On Fri, May 26, 2023 at 4:05 PM Michael Paquier  wrote:
> On Fri, May 26, 2023 at 10:56:53AM +0200, Alvaro Herrera wrote:
> > I can't make up my mind about this.  What do others think?
>
> When I looked at the patch yesterday, my impression was that this
> would be material for v17 as it is refactoring work, not v16.

I'd have thought the subject line "Cleaning up nbtree after logical
decoding on standby work" made it quite clear that this patch was
targeting 16.

It's not refactoring work -- not really. The whole idea of outright
removing the use of P_NEW in nbtree was where I landed with this after
a couple of hours of work. In fact I almost posted a version without
that, though that was worse in every way to my final approach.

I first voiced concerns about this whole area way back on April 4,
which is only 3 days after commit 61b313e4 went in:

https://postgr.es/m/CAH2-Wz=jgryxwm74g1khst0znpunhezyjnvsjno2t3jswtb...@mail.gmail.com

--
Peter Geoghegan




Re: Cleaning up nbtree after logical decoding on standby work

2023-05-26 Thread Michael Paquier
On Fri, May 26, 2023 at 10:56:53AM +0200, Alvaro Herrera wrote:
> I suppose you're not thinking of applying this to current master, but
> instead just leave it for when pg17 opens, right?  I mean, clearly it
> seems far too invasive to put it in after beta1.  On the other hand,
> it's painful to know that we're going to have code that exists only on
> 16 and not any other release, in an area that's likely to have bugs here
> and there, so we're going to need to heavily adjust backpatches for 16
> especially.
> 
> I can't make up my mind about this.  What do others think?

When I looked at the patch yesterday, my impression was that this
would be material for v17 as it is refactoring work, not v16.
--
Michael


signature.asc
Description: PGP signature


Re: Cleaning up nbtree after logical decoding on standby work

2023-05-26 Thread Peter Geoghegan
On Fri, May 26, 2023 at 2:49 PM Andres Freund  wrote:
> What do we gain by not passing down the heap relation to those places?

Just clearer code, free from noisey changes. Easier when backpatching, too.

> If you're concerned about the efficiency of passing down the parameters,
> I doubt it will make a meaningful difference, because the parameter can
> just stay in the register to be passed down further.

I'm not concerned about the efficiency of passing down heapRel in so
many places.

As things stand, there is no suggestion that any _bt_getbuf() call is
exempt from the requirement to pass down a heaprel -- every caller
(internal and external) goes to the trouble of making sure that they
comply with the apparent requirement to supply a heapRel. In some
cases callers do so just to be able to read the metapage. Even code as
far removed from nbtree as heapam_relation_copy_for_cluster() will now
go to the trouble of passing its own heap rel, just to perform a
CLUSTER-based tuplesort. The relevant tuplesort call site even has
comments that try to justify this approach, with a reference to
_bt_log_reuse_page(). So heapam_handler.c now references a static
helper function private to nbtpage.c -- an obvious modularity
violation.

It's not even the modularity violation itself that bothers me. It's
just 100% unnecessary for heapam_relation_copy_for_cluster() to do any
of this, because there simply isn't going to be a call to
_bt_log_reuse_page() during its cluster tuplesort, no matter what.
This has nothing to do with any underlying implementation detail from
nbtree, or from any other index AM.

-- 
Peter Geoghegan




Re: Cleaning up nbtree after logical decoding on standby work

2023-05-26 Thread Andres Freund
Hi,

On 2023-05-25 18:50:31 -0700, Peter Geoghegan wrote:
> Commit 61b313e4, which prepared index access method code for the
> logical decoding on standbys work, made quite a few mechanical
> changes. Many routines were revised in order to make sure that heaprel
> was available in _bt_getbuf()'s P_NEW (new page allocation) path. But
> this went further than it really had to. Many of the changes to nbtree
> seem excessive.
> 
> We only need a heaprel in those code paths that might end up calling
> _bt_getbuf() with blkno = P_NEW. This includes most callers that pass
> access = BT_WRITE, and all callers that pass access = BT_READ. This
> doesn't have to be haphazard -- there just aren't that many places
> that can allocate new nbtree pages.

What do we gain by not passing down the heap relation to those places?
If you're concerned about the efficiency of passing down the parameters,
I doubt it will make a meaningful difference, because the parameter can
just stay in the register to be passed down further.

Note that I do agree with some aspects of the change for other reasons,
see below...

> It's just page splits, and new
> root page allocations (which are actually a slightly different kind of
> page split). The rule doesn't need to be complicated (to be fair it
> looks more complicated than it really is).
> 
> Attached patch completely removes the changes to _bt_getbuf()'s
> signature from 61b313e4. This is possible without any loss of
> functionality. The patch splits _bt_getbuf () in two: the code that
> handles BT_READ/BT_WRITE stays in _bt_getbuf(), which is now far
> shorter. Handling of new page allocation is moved to a new routine
> I've called _bt_alloc(). This is clearer in general, and makes it
> clear what the rules really are. Any code that might need to call
> _bt_alloc() must be prepared for that, by having a heaprel to pass to
> it (the slightly complicated case is interrupted page splits).

I think it's a very good idea to split the "new page" case off
_bt_getbuf().  We probably should evolve the design in the area, and
that will be easier with such a change.


Greetings,

Andres Freund




Re: Cleaning up nbtree after logical decoding on standby work

2023-05-26 Thread Peter Geoghegan
On Fri, May 26, 2023 at 10:28 AM Peter Geoghegan  wrote:
> I've added several defensive assertions that make it hard to get the
> details wrong. These will catch the issue much earlier than the main
> "heapRel != NULL" assertion in _bt_allocbuf(). So, the rules are
> reasonably straightforward and enforceable.

Though it's not an issue new to 16, or a problem that anybody is
obligated to deal with on this thread, I wonder: Why is it okay that
we're using "rel" (the index rel) for our TestForOldSnapshot() calls
in nbtree, rather than using heapRel? Why wouldn't the rules be the
same as they are for the new code paths needed for logical decoding on
standbys?  (Namely, the "use heapRel, not rel" rule.)

More concretely, I'm pretty sure that RelationIsUsedAsCatalogTable()
(which TestForOldSnapshot() relies on) gives an answer that would
change if we decided to pass heapRel to the main TestForOldSnapshot()
call within _bt_moveright(), instead of doing what we actually do,
which is to just pass it the index rel. I suppose that that
interaction might have been overlooked when bugfix commit bf9a60ee33
first added RelationIsUsedAsCatalogTable() -- since that happened a
couple of months after the initial "snapshot too old" commit went in,
a fix that happened under time pressure.

More generally, the high level rules/invariants that govern when
TestForOldSnapshot() should be called (and with what rel/snapshot)
feel less than worked out. I find it suspicious that there isn't any
attempt to relate TestForOldSnapshot() behaviors to the conceptually
similar PredicateLockPage() behavior. We don't need predicate locks on
internal pages, but TestForOldSnapshot() *does* get called for
internal pages. Many PredicateLockPage() calls happen very close to
TestForOldSnapshot() calls, each of which use the same snapshot -- not
addressing that seems like a glaring omission to me.

Basically it seems like there should be one standard set of rules for
all this stuff. Though it's not the fault of Bertrand or Andres, all
that we have now is two poorly documented sets of rules that partially
overlap. This has long bothered me.

-- 
Peter Geoghegan




Re: Cleaning up nbtree after logical decoding on standby work

2023-05-26 Thread Peter Geoghegan
On Fri, May 26, 2023 at 1:56 AM Alvaro Herrera  wrote:
> I suppose you're not thinking of applying this to current master, but
> instead just leave it for when pg17 opens, right?  I mean, clearly it
> seems far too invasive to put it in after beta1.

I was planning on targeting 16 with this. Although I only posted a
patch recently, I complained about the problems in this area shortly
after the code first went in. It's fairly obvious to me that the
changes made to nbtree went quite a bit further than they needed to.
Admittedly that's partly because I'm an expert on this particular
code.

> On the other hand,
> it's painful to know that we're going to have code that exists only on
> 16 and not any other release, in an area that's likely to have bugs here
> and there, so we're going to need to heavily adjust backpatches for 16
> especially.

Right -- it's important to keep things reasonably consistent to ease
backpatching. Though I doubt that changes to nbtree itself will turn
out to be buggy -- with or without my patch. The changes to nbtree
were all pretty mechanical. A little too mechanical, in fact.

As I said already, there just aren't that many ways that new nbtree
pages can come into existence -- it's naturally limited to page splits
(including root page splits), and the case where we need to add a new
root page that's also a leaf page at the point that the first ever
tuple is inserted into the index (before that we just have a metapage)
-- so I only have three _bt_allocbuf() callers to worry about. It's
completely self-evident (even to people that know little about nbtree)
that the only type of page access that could possibly need a heapRel
in the first place is P_NEW (i.e., a new page allocation). Once you
know all that, this situation begins to look much more
straightforward.

Now, to be fair to Bertrand, it *looks* more complicated than it
really is, in large part due to the obscure case where VACUUM finishes
an interrupted page split (during page deletion), which itself goes on
to cause a page split one level up. So it's possible (barely) that
VACUUM will enlarge an index by one page, which requires a heapRel,
just like any other place where an index is enlarged by a page split
(I documented all this in commit 35bc0ec7).

I've added several defensive assertions that make it hard to get the
details wrong. These will catch the issue much earlier than the main
"heapRel != NULL" assertion in _bt_allocbuf(). So, the rules are
reasonably straightforward and enforceable.

-- 
Peter Geoghegan




Re: Cleaning up nbtree after logical decoding on standby work

2023-05-26 Thread Peter Geoghegan
On Fri, May 26, 2023 at 12:46 AM Michael Paquier  wrote:
> Nice cleanup overall.

Thanks.

To be clear, when I said "it would be nice to get rid of P_NEW", what
I meant was that it would be nice to go much further than what I've
done in the patch by getting rid of the general idea of P_NEW. So the
handling of P_NEW at the top of ReadBuffer_common() would be removed,
for example. (Note that nbtree doesn't actually rely on that code,
even now; while its use of the P_NEW constant creates the impression
that it needs that bufmgr.c code, it actually doesn't, even now.)

> +if (XLogStandbyInfoActive() && RelationNeedsWAL(rel))
>  {
> -/* Okay to use page.  Initialize and return it. */
> -_bt_pageinit(page, BufferGetPageSize(buf));
> -return buf;
> +safexid = BTPageGetDeleteXid(page);
> +isCatalogRel = RelationIsAccessibleInLogicalDecoding(heaprel);
> +_bt_log_reuse_page(rel, blkno, safexid, isCatalogRel);
>
> There is only one caller of _bt_log_reuse_page(), so assigning a
> boolean rather than the heap relation is a bit strange to me.  I think
> that calling RelationIsAccessibleInLogicalDecoding() within
> _bt_log_reuse_page() where xlrec_reuse is filled with its data is much
> more natural, like HEAD.

Attached is v2, which deals with this by moving the code from
_bt_log_reuse_page() into _bt_allocbuf() itself -- there is no need
for a separate logging function. This structure seems like a clear
improvement, since such logging is largely the point of having a
separate _bt_allocbuf() function that deals with new page allocations
and requires a valid heapRel in all cases.

v2 also renames "heaprel" to "heapRel" in function signatures, for
consistency with older code that always used that convention.

-- 
Peter Geoghegan


v2-0001-nbtree-Remove-use-of-P_NEW.patch
Description: Binary data


Re: Cleaning up nbtree after logical decoding on standby work

2023-05-26 Thread Alvaro Herrera
On 2023-May-25, Peter Geoghegan wrote:

> Attached patch completely removes the changes to _bt_getbuf()'s
> signature from 61b313e4.

I suppose you're not thinking of applying this to current master, but
instead just leave it for when pg17 opens, right?  I mean, clearly it
seems far too invasive to put it in after beta1.  On the other hand,
it's painful to know that we're going to have code that exists only on
16 and not any other release, in an area that's likely to have bugs here
and there, so we're going to need to heavily adjust backpatches for 16
especially.

I can't make up my mind about this.  What do others think?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)




Re: Cleaning up nbtree after logical decoding on standby work

2023-05-26 Thread Michael Paquier
On Thu, May 25, 2023 at 06:50:31PM -0700, Peter Geoghegan wrote:
> It's possible that Bertand would have done it this way to begin with
> were it not for the admittedly pretty bad nbtree convention around
> P_NEW. It would be nice to get rid of P_NEW in the near future, too --
> I gather that there was discussion of that in the context of recent
> work in this area.

Nice cleanup overall.

+if (XLogStandbyInfoActive() && RelationNeedsWAL(rel))
 {
-/* Okay to use page.  Initialize and return it. */
-_bt_pageinit(page, BufferGetPageSize(buf));
-return buf;
+safexid = BTPageGetDeleteXid(page);
+isCatalogRel = RelationIsAccessibleInLogicalDecoding(heaprel);
+_bt_log_reuse_page(rel, blkno, safexid, isCatalogRel);

There is only one caller of _bt_log_reuse_page(), so assigning a
boolean rather than the heap relation is a bit strange to me.  I think
that calling RelationIsAccessibleInLogicalDecoding() within
_bt_log_reuse_page() where xlrec_reuse is filled with its data is much
more natural, like HEAD.  One argument in favor of HEAD is that it is
not possible to pass down a wrong value for isCatalogRel, but your
patch would make that possible.
--
Michael


signature.asc
Description: PGP signature


Cleaning up nbtree after logical decoding on standby work

2023-05-25 Thread Peter Geoghegan
Commit 61b313e4, which prepared index access method code for the
logical decoding on standbys work, made quite a few mechanical
changes. Many routines were revised in order to make sure that heaprel
was available in _bt_getbuf()'s P_NEW (new page allocation) path. But
this went further than it really had to. Many of the changes to nbtree
seem excessive.

We only need a heaprel in those code paths that might end up calling
_bt_getbuf() with blkno = P_NEW. This includes most callers that pass
access = BT_WRITE, and all callers that pass access = BT_READ. This
doesn't have to be haphazard -- there just aren't that many places
that can allocate new nbtree pages. It's just page splits, and new
root page allocations (which are actually a slightly different kind of
page split). The rule doesn't need to be complicated (to be fair it
looks more complicated than it really is).

Attached patch completely removes the changes to _bt_getbuf()'s
signature from 61b313e4. This is possible without any loss of
functionality. The patch splits _bt_getbuf () in two: the code that
handles BT_READ/BT_WRITE stays in _bt_getbuf(), which is now far
shorter. Handling of new page allocation is moved to a new routine
I've called _bt_alloc(). This is clearer in general, and makes it
clear what the rules really are. Any code that might need to call
_bt_alloc() must be prepared for that, by having a heaprel to pass to
it (the slightly complicated case is interrupted page splits).

It's possible that Bertand would have done it this way to begin with
were it not for the admittedly pretty bad nbtree convention around
P_NEW. It would be nice to get rid of P_NEW in the near future, too --
I gather that there was discussion of that in the context of recent
work in this area.

-- 
Peter Geoghegan


v1-0001-nbtree-Remove-use-of-P_NEW.patch
Description: Binary data