Re: heapgettup refactoring

2023-03-19 Thread David Rowley
On Wed, 8 Feb 2023 at 15:09, David Rowley  wrote:
> Using the tests mentioned in [1], I tested out
> remove_HeapScanDescData_rs_inited_field.patch. It's not looking very
> promising at all.

In light of the performance regression from removing the rs_inited
field, let's just forget doing that for now.  It does not really seem
that important compared to the other work that's already been done.

If one of us gets time during the v17 cycle, then maybe we can revisit it then.

I'll mark the patch as committed in the CF app.

David




Re: heapgettup refactoring

2023-02-07 Thread David Rowley
On Wed, 8 Feb 2023 at 09:41, Melanie Plageman  wrote:
>
> On Tue, Feb 07, 2023 at 05:40:13PM +1300, David Rowley wrote:
> > I ended up adjusting HeapScanDescData more than what is minimally
> > required to remove rs_inited. I wondered if rs_cindex should be closer
> > to rs_cblock in the struct so that we're more likely to be adjusting
> > the same cache line than if that field were closer to the end of the
> > struct.  We don't need rs_coffset and rs_cindex at the same time, so I
> > made it a union. I see that the struct is still 712 bytes before and
> > after this change. I've not yet tested to see if there are any
> > performance gains to this change.
> >
>
> I like the idea of using a union.

Using the tests mentioned in [1], I tested out
remove_HeapScanDescData_rs_inited_field.patch. It's not looking very
promising at all.

seqscan.sql test:

master (e2c78e7ab)
tps = 27.769076 (without initial connection time)
tps = 28.155233 (without initial connection time)
tps = 26.990389 (without initial connection time)

master + remove_HeapScanDescData_rs_inited_field.patch
tps = 23.990490 (without initial connection time)
tps = 23.450662 (without initial connection time)
tps = 23.600194 (without initial connection time)

master + remove_HeapScanDescData_rs_inited_field.patch without union
HeapScanDescData change (just remove rs_inited field)
tps = 24.419007 (without initial connection time)
tps = 24.221389 (without initial connection time)
tps = 24.187756 (without initial connection time)


countall.sql test:

master (e2c78e7ab)

tps = 33.999408 (without initial connection time)
tps = 33.664292 (without initial connection time)
tps = 33.869115 (without initial connection time)

master + remove_HeapScanDescData_rs_inited_field.patch
tps = 31.194316 (without initial connection time)
tps = 30.804987 (without initial connection time)
tps = 30.770236 (without initial connection time)

master + remove_HeapScanDescData_rs_inited_field.patch without union
HeapScanDescData change (just remove rs_inited field)
tps = 32.626187 (without initial connection time)
tps = 32.876362 (without initial connection time)
tps = 32.481729 (without initial connection time)

I don't really have any explanation for why this slows performance so
much. My thoughts are that if the performance of scans is this
sensitive to the order of the fields in the struct then it's an
independent project to learn out why and what we can realistically
change to get the best performance here.

> So, I was wondering what we should do about initializing rs_coffset here
> since it doesn't fall under "don't initialize it because it is only used
> for page-at-a-time mode". It might not be required for us to initialize
> it in initscan, but we do bother to initialize other "current scan
> state" fields. We could check if pagemode is enabled and initialize
> rs_coffset or rs_cindex depending on that.

Maybe master should be initialising this field already. I didn't quite
see it as important as it's never used before rs_inited is set to
true. Maybe setting it to InvalidOffsetNumber is a good idea just in
case something tries to use it before it gets set.

> >   * Note: when we fall off the end of the scan in either direction, we
> > - * reset rs_inited.  This means that a further request with the same
> > - * scan direction will restart the scan, which is a bit odd, but a
> > - * request with the opposite scan direction will start a fresh scan
> > + * reset rs_cblock to InvalidBlockNumber.  This means that a further 
> > request
> > + * with the same scan direction will restart the scan, which is a bit odd, 
> > but
> > + * a request with the opposite scan direction will start a fresh scan
> >   * in the proper direction.  The latter is required behavior for cursors,
> >   * while the former case is generally undefined behavior in Postgres
> >   * so we don't care too much.
>
> Not the fault of this patch, but I am having trouble parsing this
> comment. What does restart the scan mean? I get that it is undefined
> behavior, but it is confusing because it kind of sounds like a rescan
> which is not what it is (right?). And what exactly is a fresh scan? It
> is probably correct, I just don't really understand what it is saying.
> Feel free to ignore this aside, as I think your change is correctly
> updating the comment.

I struggled with this too. It just looks incorrect. As far as I see
it, once the scan ends we do the same thing regardless of what the
scan direction is. Maybe it's worth looking back at when that comment
was added and seeing if it was true when it was written and then see
what changed. I think it's worth improving that independently.

I think I'd like to focus on the cleanup stuff before looking into
getting rid of rs_inited. I'm not sure I'm going to get time to do the
required performance tests to look into why removing rs_inited slows
things down so much.

> Also, a few random other complaints about the comments in
> HeapScanDescData that are 

Re: heapgettup refactoring

2023-02-07 Thread Melanie Plageman
On Tue, Feb 07, 2023 at 05:40:13PM +1300, David Rowley wrote:
> On Fri, 3 Feb 2023 at 15:26, David Rowley  wrote:
> > I've pushed all but the final 2 patches now.
> 
> I just pushed the final patch in the series.

Cool!

> I held back on moving the setting of rs_inited back into the
> heapgettup_initial_block() helper function as I wondered if we should
> even keep that field.
> 
> It seems that rs_cblock == InvalidBlockNumber in all cases where
> rs_inited == false, so maybe it's better just to use that as a
> condition to check if the scan has started or not. I've attached a
> patch which does that.
> 
> I ended up adjusting HeapScanDescData more than what is minimally
> required to remove rs_inited. I wondered if rs_cindex should be closer
> to rs_cblock in the struct so that we're more likely to be adjusting
> the same cache line than if that field were closer to the end of the
> struct.  We don't need rs_coffset and rs_cindex at the same time, so I
> made it a union. I see that the struct is still 712 bytes before and
> after this change. I've not yet tested to see if there are any
> performance gains to this change.
> 

I like the idea of using a union.

> diff --git a/src/backend/access/heap/heapam.c 
> b/src/backend/access/heap/heapam.c
> index 7eb79cee58..e171d6e38b 100644
> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -321,13 +321,15 @@ initscan(HeapScanDesc scan, ScanKey key, bool 
> keep_startblock)
>   }
>  
>   scan->rs_numblocks = InvalidBlockNumber;
> - scan->rs_inited = false;
>   scan->rs_ctup.t_data = NULL;
>   ItemPointerSetInvalid(>rs_ctup.t_self);
>   scan->rs_cbuf = InvalidBuffer;
>   scan->rs_cblock = InvalidBlockNumber;
>  
> - /* page-at-a-time fields are always invalid when not rs_inited */
> + /*
> +  * page-at-a-time fields are always invalid when
> +  * rs_cblock == InvalidBlockNumber
> +  */

So, I was wondering what we should do about initializing rs_coffset here
since it doesn't fall under "don't initialize it because it is only used
for page-at-a-time mode". It might not be required for us to initialize
it in initscan, but we do bother to initialize other "current scan
state" fields. We could check if pagemode is enabled and initialize
rs_coffset or rs_cindex depending on that.

Then maybe the comment should call out the specific page-at-a-time
fields that are automatically invalid? (e.g. rs_ntuples, rs_vistuples)

I presume the point of the comment is to explain why those fields are
not being initialized here, which was a question I had when I looked at
initscan(), so it seems like we should make sure it explains that.

> @@ -717,9 +720,9 @@ heapgettup_advance_block(HeapScanDesc scan, BlockNumber 
> block, ScanDirection dir
>   * the scankeys.
>   *
>   * Note: when we fall off the end of the scan in either direction, we
> - * reset rs_inited.  This means that a further request with the same
> - * scan direction will restart the scan, which is a bit odd, but a
> - * request with the opposite scan direction will start a fresh scan
> + * reset rs_cblock to InvalidBlockNumber.  This means that a further request
> + * with the same scan direction will restart the scan, which is a bit odd, 
> but
> + * a request with the opposite scan direction will start a fresh scan
>   * in the proper direction.  The latter is required behavior for cursors,
>   * while the former case is generally undefined behavior in Postgres
>   * so we don't care too much.

Not the fault of this patch, but I am having trouble parsing this
comment. What does restart the scan mean? I get that it is undefined
behavior, but it is confusing because it kind of sounds like a rescan
which is not what it is (right?). And what exactly is a fresh scan? It
is probably correct, I just don't really understand what it is saying.
Feel free to ignore this aside, as I think your change is correctly
updating the comment.

> @@ -2321,13 +2316,12 @@ heapam_scan_sample_next_block(TableScanDesc scan, 
> SampleScanState *scanstate)
>   ReleaseBuffer(hscan->rs_cbuf);
>   hscan->rs_cbuf = InvalidBuffer;
>   hscan->rs_cblock = InvalidBlockNumber;
> - hscan->rs_inited = false;
>  
>   return false;
>   }
>  
>   heapgetpage(scan, blockno);
> - hscan->rs_inited = true;
> + Assert(hscan->rs_cblock != InvalidBlockNumber);

Quite nice to have this assert.

> diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
> index 8d28bc93ef..c6efd59eb5 100644
> --- a/src/include/access/heapam.h
> +++ b/src/include/access/heapam.h
> @@ -56,9 +56,18 @@ typedef struct HeapScanDescData
>   /* rs_numblocks is usually InvalidBlockNumber, meaning "scan whole rel" 
> */
>  
>   /* scan current state */
> - boolrs_inited;  /* false = scan not init'd yet 
> */
> - OffsetNumber rs_coffset;/* current offset # in 
> 

Re: heapgettup refactoring

2023-02-06 Thread David Rowley
On Fri, 3 Feb 2023 at 15:26, David Rowley  wrote:
> I've pushed all but the final 2 patches now.

I just pushed the final patch in the series.  I held back on moving
the setting of rs_inited back into the heapgettup_initial_block()
helper function as I wondered if we should even keep that field.

It seems that rs_cblock == InvalidBlockNumber in all cases where
rs_inited == false, so maybe it's better just to use that as a
condition to check if the scan has started or not. I've attached a
patch which does that.

I ended up adjusting HeapScanDescData more than what is minimally
required to remove rs_inited. I wondered if rs_cindex should be closer
to rs_cblock in the struct so that we're more likely to be adjusting
the same cache line than if that field were closer to the end of the
struct.  We don't need rs_coffset and rs_cindex at the same time, so I
made it a union. I see that the struct is still 712 bytes before and
after this change. I've not yet tested to see if there are any
performance gains to this change.

David
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7eb79cee58..e171d6e38b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -321,13 +321,15 @@ initscan(HeapScanDesc scan, ScanKey key, bool 
keep_startblock)
}
 
scan->rs_numblocks = InvalidBlockNumber;
-   scan->rs_inited = false;
scan->rs_ctup.t_data = NULL;
ItemPointerSetInvalid(>rs_ctup.t_self);
scan->rs_cbuf = InvalidBuffer;
scan->rs_cblock = InvalidBlockNumber;
 
-   /* page-at-a-time fields are always invalid when not rs_inited */
+   /*
+* page-at-a-time fields are always invalid when
+* rs_cblock == InvalidBlockNumber
+*/
 
/*
 * copy the scan key, if appropriate
@@ -355,7 +357,8 @@ heap_setscanlimits(TableScanDesc sscan, BlockNumber 
startBlk, BlockNumber numBlk
 {
HeapScanDesc scan = (HeapScanDesc) sscan;
 
-   Assert(!scan->rs_inited);   /* else too late to change */
+   /* we can't set any limits when there's a scan already running */
+   Assert(scan->rs_cblock == InvalidBlockNumber);
/* else rs_startblock is significant */
Assert(!(scan->rs_base.rs_flags & SO_ALLOW_SYNC));
 
@@ -493,7 +496,7 @@ heapgetpage(TableScanDesc sscan, BlockNumber block)
 static BlockNumber
 heapgettup_initial_block(HeapScanDesc scan, ScanDirection dir)
 {
-   Assert(!scan->rs_inited);
+   Assert(scan->rs_cblock == InvalidBlockNumber);
 
/* When there are no pages to scan, return InvalidBlockNumber */
if (scan->rs_nblocks == 0 || scan->rs_numblocks == 0)
@@ -559,7 +562,7 @@ heapgettup_start_page(HeapScanDesc scan, ScanDirection dir, 
int *linesleft,
 {
Pagepage;
 
-   Assert(scan->rs_inited);
+   Assert(scan->rs_cblock != InvalidBlockNumber);
Assert(BufferIsValid(scan->rs_cbuf));
 
/* Caller is responsible for ensuring buffer is locked if needed */
@@ -592,7 +595,7 @@ heapgettup_continue_page(HeapScanDesc scan, ScanDirection 
dir, int *linesleft,
 {
Pagepage;
 
-   Assert(scan->rs_inited);
+   Assert(scan->rs_cblock != InvalidBlockNumber);
Assert(BufferIsValid(scan->rs_cbuf));
 
/* Caller is responsible for ensuring buffer is locked if needed */
@@ -717,9 +720,9 @@ heapgettup_advance_block(HeapScanDesc scan, BlockNumber 
block, ScanDirection dir
  * the scankeys.
  *
  * Note: when we fall off the end of the scan in either direction, we
- * reset rs_inited.  This means that a further request with the same
- * scan direction will restart the scan, which is a bit odd, but a
- * request with the opposite scan direction will start a fresh scan
+ * reset rs_cblock to InvalidBlockNumber.  This means that a further request
+ * with the same scan direction will restart the scan, which is a bit odd, but
+ * a request with the opposite scan direction will start a fresh scan
  * in the proper direction.  The latter is required behavior for cursors,
  * while the former case is generally undefined behavior in Postgres
  * so we don't care too much.
@@ -737,12 +740,11 @@ heapgettup(HeapScanDesc scan,
OffsetNumber lineoff;
int linesleft;
 
-   if (unlikely(!scan->rs_inited))
+   if (unlikely(scan->rs_cblock == InvalidBlockNumber))
{
block = heapgettup_initial_block(scan, dir);
/* ensure rs_cbuf is invalid when we get InvalidBlockNumber */
Assert(block != InvalidBlockNumber || 
!BufferIsValid(scan->rs_cbuf));
-   scan->rs_inited = true;
}
else
{
@@ -824,7 +826,6 @@ continue_page:
scan->rs_cbuf = InvalidBuffer;
scan->rs_cblock = InvalidBlockNumber;
tuple->t_data = NULL;
-   scan->rs_inited = false;
 }
 
 /* 
@@ -852,12 +853,11 @@ heapgettup_pagemode(HeapScanDesc scan,
 

Re: heapgettup refactoring

2023-02-02 Thread David Rowley
On Fri, 3 Feb 2023 at 06:23, Melanie Plageman  wrote:
> Your code also switches to saving the OffsetNumber -- just in a separate
> variable of OffsetNumber type. I am fine with this if it your rationale
> is that it is not a good idea to store a smaller number in a larger
> datatype. However, the benefit I saw in reusing rs_cindex is that we
> could someday converge the code for heapgettup() and
> heapgettup_pagemode() even more. Even in my final refactor, there is a
> lot of duplicate code between the two.

I was more concerned about the reuse of an unrelated field.  I'm
struggling to imagine why using the separate field would cause any
issues around not being able to reduce the code duplication any more
than we otherwise would. Surely in one case you need to get the offset
by indexing the rs_vistuples[] array and the other is the offset
directly. The only thing I can think of that would allow us not to
have a condition there would be if we populated the rs_vistuples[]
array with 1..n.  I doubt should do that and if we did, we could just
use the rs_cindex to index that without having to worry that we're
using an unrelated field for something.

I've pushed all but the final 2 patches now.

David




Re: heapgettup refactoring

2023-02-02 Thread Melanie Plageman
On Thu, Feb 02, 2023 at 07:00:37PM +1300, David Rowley wrote:
> I've attached a version of the next patch in the series. I admit to
> making a couple of adjustments. I couldn't bring myself to remove the
> snapshot local variable in this commit. We can deal with that when we
> come to it in whichever patch that needs to be changed in.

That seems fine to keep the diff easy to understand. Also,
heapgettup_pagemode() didn't have a snapshot local variable either.

> The only other thing I really did was question your use of rs_cindex
> to store the last OffsetNumber.  I ended up adding a new field which
> slots into the padding between a bool and BlockNumber field named
> rs_coffset for this purpose. I noticed what Andres wrote [1] earlier
> in this thread about that, so thought we should move away from looking
> at the last tuple to get this number
> 
> [1] https://postgr.es/m/20221101010948.hsf33emgnwzvi...@awork3.anarazel.de

So, what Andres had said was: 

> Hm. Finding the next offset via rs_ctup doesn't seem quite right. For one,
> it's not actually that cheap to extract the offset from an ItemPointer because
> of the the way we pack it into ItemPointerData.

Because I was doing this in an earlier version:

> > + HeapTuple   tuple = &(scan->rs_ctup);

And then in the later part of the code got tuple->t_self.

I did this because the code in master does this:

  lineoff = /* previous offnum */
Min(lines,
  OffsetNumberPrev(ItemPointerGetOffsetNumber(&(tuple->t_self;

So I figured it was the same. Based on Andres' feedback, I switched to
saving the offset number in the scan descriptor and figured I could
reuse rs_cindex since it is larger than an OffsetNumber.

Your code also switches to saving the OffsetNumber -- just in a separate
variable of OffsetNumber type. I am fine with this if it your rationale
is that it is not a good idea to store a smaller number in a larger
datatype. However, the benefit I saw in reusing rs_cindex is that we
could someday converge the code for heapgettup() and
heapgettup_pagemode() even more. Even in my final refactor, there is a
lot of duplicate code between the two.

- Melanie




Re: heapgettup refactoring

2023-02-01 Thread David Rowley
On Thu, 2 Feb 2023 at 10:12, Melanie Plageman  wrote:
> FWIW, I like setting rs_inited in heapgettup_initial_block() better in
> the final refactor, but I agree with you that in this patch on its own
> it is better in the body of heapgettup() and heapgettup_pagemode().

We can reconsider that when we get to that patch. It just felt a bit
ugly to add an InvalidBlockNumber check after calling
table_block_parallelscan_nextpage()

> I believe this else is superfluous since we returned above.

TBH, that's on purpose. I felt that it just looked better that way as
the code all fitted onto my screen.  I think if the function was
longer and people had to scroll down to read it, it can often be
better to return and reduce the nesting. This allows you to mentally
not that a certain case is handled above.  However, since all these
helper functions seem to fit onto a screen without too much trouble,
it just seems better (to me) if they all follow the format that I
mentioned earlier. I might live to regret that as we often see
get-rid-of-useless-else-clause patches coming up. I'm starting to
wonder if someone's got some alarm that goes off every time one gets
committed, but we'll see. I'd much rather have consistency between the
helper functions than save a few bytes on tab characters. It would be
different if the indentation were shifting things too far right, but
that's not going to happen in a function that all fits onto a screen
at once.

I've attached a version of the next patch in the series. I admit to
making a couple of adjustments. I couldn't bring myself to remove the
snapshot local variable in this commit. We can deal with that when we
come to it in whichever patch that needs to be changed in.  The only
other thing I really did was question your use of rs_cindex to store
the last OffsetNumber.  I ended up adding a new field which slots into
the padding between a bool and BlockNumber field named rs_coffset for
this purpose. I noticed what Andres wrote [1] earlier in this thread
about that, so thought we should move away from looking at the last
tuple to get this number

I've attached the rebased and updated patch.

David

[1] https://postgr.es/m/20221101010948.hsf33emgnwzvi...@awork3.anarazel.de
From 32652b312d0c802e3105c064c24ff5b99694399c Mon Sep 17 00:00:00 2001
From: David Rowley 
Date: Thu, 2 Feb 2023 15:07:27 +1300
Subject: [PATCH v9] Further refactor of heapgettup and heapgettup_pagemode

Backward and forward scans share much of the same page acquisition code.
Here we consolidate that code to reduce some duplication.

Additionally, add a new rs_coffset field to HeapScanDescData to track the
offset of the current tuple.  The new field fits nicely into the padding
between a bool and BlockNumber field and saves having to look at the last
returned tuple to figure out which offset we should be looking at for the
current tuple.

Author: Melanie Plageman
Reviewed-by: David Rowley
Discussion: 
https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_t8qeqzqol02rpi...@mail.gmail.com
---
 src/backend/access/heap/heapam.c | 200 ++-
 src/include/access/heapam.h  |   1 +
 2 files changed, 63 insertions(+), 138 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b34e20a71f..ec6b7962c5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -576,101 +576,67 @@ heapgettup(HeapScanDesc scan,
BlockNumber block;
boolfinished;
Pagepage;
-   int lines;
OffsetNumber lineoff;
int linesleft;
ItemId  lpp;
 
-   /*
-* calculate next starting lineoff, given scan direction
-*/
-   if (ScanDirectionIsForward(dir))
+   if (unlikely(!scan->rs_inited))
{
-   if (!scan->rs_inited)
-   {
-   block = heapgettup_initial_block(scan, dir);
+   block = heapgettup_initial_block(scan, dir);
 
-   /*
-* Check if we have reached the end of the scan 
already. This
-* could happen if the table is empty or if the 
parallel workers
-* have already finished the scan before we did 
anything ourselves
-*/
-   if (block == InvalidBlockNumber)
-   {
-   Assert(!BufferIsValid(scan->rs_cbuf));
-   tuple->t_data = NULL;
-   return;
-   }
-   heapgetpage((TableScanDesc) scan, block);
-   lineoff = FirstOffsetNumber;/* first offnum */
-   scan->rs_inited = true;
-   }
-   else
+   /*
+* Check if we have reached the end of the scan already. This 
could
+* 

Re: heapgettup refactoring

2023-02-01 Thread Melanie Plageman
On Thu, Feb 02, 2023 at 12:21:20AM +1300, David Rowley wrote:
> On Tue, 31 Jan 2023 at 12:18, Melanie Plageman
>  wrote:
> > v7 attached
> 
> I've been looking over the v7-0002 patch today and I did make a few
> adjustments to heapgettup_initial_block() as I would prefer to see the
> branching of each of all these helper functions follow the pattern of:
> 
> if ()
> {
> if ()
> 
> else
> 
> }
> else
> {
> 
> }
> 
> which wasn't quite what the function was doing.

I'm fine with this. One code comment about the new version inline.

> Along the way, I noticed that 0002 has a subtle bug that does not seem
> to be present once the remaining patches are applied.  I think I'm
> happier to push these along the lines of how you have them in the
> patches, so I've held off pushing for now due to the bug and the
> change I had to make to fix it.
> 
> The problem is around the setting of scan->rs_inited = true; you've
> moved that into heapgettup_initial_block() and you've correctly not
> initialised the scan for empty tables when you return
> InvalidBlockNumber, however, you've not correctly considered the fact
> that table_block_parallelscan_nextpage() could also return
> InvalidBlockNumber if the parallel workers manage to grab all of the
> blocks before the current process gets the first block. I don't know
> for sure, but it looks like this could cause problems when
> heapgettup() or heapgettup_pagemode() got called again for a rescan.
> We'd have returned the NULL tuple to indicate that no further tuples
> exist, but we'll have left rs_inited set to true which looks like
> it'll cause issues.

Ah, yes. In the later patches in the series, I handle all end of scan
cases (regardless of whether or not there was a beginning) in a single
place at the end of the function. There I release the buffer and reset
all state -- including setting rs_inited to false. So, that made it okay
to set rs_inited to true in heapgettup_initial_block().

When splitting it up, I made a mistake and missed the case you
mentioned. Thanks for catching that!

FWIW, I like setting rs_inited in heapgettup_initial_block() better in
the final refactor, but I agree with you that in this patch on its own
it is better in the body of heapgettup() and heapgettup_pagemode().
 
> I wondered if it might be better to do the scan->rs_inited = true; in
> heapgettup() and heapgettup_pagemode() instead. The attached v8 patch
> does it this way. Despite this fixing that bug, I think this might be
> a slightly better division of duties.

LGTM.

> From cbd37463bdaa96afed4c7c739c8e91b770a9f8a7 Mon Sep 17 00:00:00 2001
> From: David Rowley 
> Date: Wed, 1 Feb 2023 19:35:16 +1300
> Subject: [PATCH v8] Refactor heapam.c adding heapgettup_initial_block function
> 
> Here we adjust heapgettup() and heapgettup_pagemode() to move the code
> that fetches the first block out into a helper function.  This removes
> some code duplication.
> 
> Author: Melanie Plageman
> Reviewed-by: David Rowley
> Discussion: 
> https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_t8qeqzqol02rpi...@mail.gmail.com
> ---
>  src/backend/access/heap/heapam.c | 225 ++-
>  1 file changed, 103 insertions(+), 122 deletions(-)
> 
> diff --git a/src/backend/access/heap/heapam.c 
> b/src/backend/access/heap/heapam.c
> index 0a8bac25f5..40168cc9ca 100644
> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -483,6 +483,67 @@ heapgetpage(TableScanDesc sscan, BlockNumber block)
>   scan->rs_ntuples = ntup;
>  }
>  
> +/*
> + * heapgettup_initial_block - return the first BlockNumber to scan
> + *
> + * Returns InvalidBlockNumber when there are no blocks to scan.  This can
> + * occur with empty tables and in parallel scans when parallel workers get 
> all
> + * of the pages before we can get a chance to get our first page.
> + */
> +static BlockNumber
> +heapgettup_initial_block(HeapScanDesc scan, ScanDirection dir)
> +{
> + Assert(!scan->rs_inited);
> +
> + /* When there are no pages to scan, return InvalidBlockNumber */
> + if (scan->rs_nblocks == 0 || scan->rs_numblocks == 0)
> + return InvalidBlockNumber;
> +
> + if (ScanDirectionIsForward(dir))
> + {
> + /* serial scan */
> + if (scan->rs_base.rs_parallel == NULL)
> + return scan->rs_startblock;

I believe this else is superfluous since we returned above.

> + else
> + {
> + /* parallel scan */
> + 
> table_block_parallelscan_startblock_init(scan->rs_base.rs_rd,
> + 
>  scan->rs_parallelworkerdata,
> + 
>  (ParallelBlockTableScanDesc) 
> scan->rs_base.rs_parallel);
> +
> + /* may return InvalidBlockNumber 

Re: heapgettup refactoring

2023-02-01 Thread David Rowley
On Tue, 31 Jan 2023 at 12:18, Melanie Plageman
 wrote:
> v7 attached

I've been looking over the v7-0002 patch today and I did make a few
adjustments to heapgettup_initial_block() as I would prefer to see the
branching of each of all these helper functions follow the pattern of:

if ()
{
if ()

else

}
else
{

}

which wasn't quite what the function was doing.

Along the way, I noticed that 0002 has a subtle bug that does not seem
to be present once the remaining patches are applied.  I think I'm
happier to push these along the lines of how you have them in the
patches, so I've held off pushing for now due to the bug and the
change I had to make to fix it.

The problem is around the setting of scan->rs_inited = true; you've
moved that into heapgettup_initial_block() and you've correctly not
initialised the scan for empty tables when you return
InvalidBlockNumber, however, you've not correctly considered the fact
that table_block_parallelscan_nextpage() could also return
InvalidBlockNumber if the parallel workers manage to grab all of the
blocks before the current process gets the first block. I don't know
for sure, but it looks like this could cause problems when
heapgettup() or heapgettup_pagemode() got called again for a rescan.
We'd have returned the NULL tuple to indicate that no further tuples
exist, but we'll have left rs_inited set to true which looks like
it'll cause issues.

I wondered if it might be better to do the scan->rs_inited = true; in
heapgettup() and heapgettup_pagemode() instead. The attached v8 patch
does it this way. Despite this fixing that bug, I think this might be
a slightly better division of duties.

If you're ok with the attached (and everyone else is too), then I can
push it in the (NZ) morning.  The remaining patches would need to be
rebased due to my changes.

David
From cbd37463bdaa96afed4c7c739c8e91b770a9f8a7 Mon Sep 17 00:00:00 2001
From: David Rowley 
Date: Wed, 1 Feb 2023 19:35:16 +1300
Subject: [PATCH v8] Refactor heapam.c adding heapgettup_initial_block function

Here we adjust heapgettup() and heapgettup_pagemode() to move the code
that fetches the first block out into a helper function.  This removes
some code duplication.

Author: Melanie Plageman
Reviewed-by: David Rowley
Discussion: 
https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_t8qeqzqol02rpi...@mail.gmail.com
---
 src/backend/access/heap/heapam.c | 225 ++-
 1 file changed, 103 insertions(+), 122 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0a8bac25f5..40168cc9ca 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -483,6 +483,67 @@ heapgetpage(TableScanDesc sscan, BlockNumber block)
scan->rs_ntuples = ntup;
 }
 
+/*
+ * heapgettup_initial_block - return the first BlockNumber to scan
+ *
+ * Returns InvalidBlockNumber when there are no blocks to scan.  This can
+ * occur with empty tables and in parallel scans when parallel workers get all
+ * of the pages before we can get a chance to get our first page.
+ */
+static BlockNumber
+heapgettup_initial_block(HeapScanDesc scan, ScanDirection dir)
+{
+   Assert(!scan->rs_inited);
+
+   /* When there are no pages to scan, return InvalidBlockNumber */
+   if (scan->rs_nblocks == 0 || scan->rs_numblocks == 0)
+   return InvalidBlockNumber;
+
+   if (ScanDirectionIsForward(dir))
+   {
+   /* serial scan */
+   if (scan->rs_base.rs_parallel == NULL)
+   return scan->rs_startblock;
+   else
+   {
+   /* parallel scan */
+   
table_block_parallelscan_startblock_init(scan->rs_base.rs_rd,
+   
 scan->rs_parallelworkerdata,
+   
 (ParallelBlockTableScanDesc) 
scan->rs_base.rs_parallel);
+
+   /* may return InvalidBlockNumber if there are no more 
blocks */
+   return 
table_block_parallelscan_nextpage(scan->rs_base.rs_rd,
+   
 scan->rs_parallelworkerdata,
+   
 (ParallelBlockTableScanDesc) 
scan->rs_base.rs_parallel);
+   }
+   }
+   else
+   {
+   /* backward parallel scan not supported */
+   Assert(scan->rs_base.rs_parallel == NULL);
+
+   /*
+* Disable reporting to syncscan logic in a backwards scan; 
it's not
+* very likely anyone else is doing the same thing at the same 
time,
+* and much more likely that we'll just bollix things for 

Re: heapgettup refactoring

2023-01-31 Thread David Rowley
On Tue, 31 Jan 2023 at 12:18, Melanie Plageman
 wrote:
>
> On Fri, Jan 27, 2023 at 10:34 PM David Rowley  wrote:
> > 4. I think it might be a good idea to use unlikely() in if
> > (!scan->rs_inited). The idea is to help coax the compiler into moving
> > that code off to a cold path. That's likely especially important if
> > heapgettup_initial_block is inlined, which I see it is marked as.
>
> I've gone ahead and added unlikely. However, should I perhaps skip
> inlining the heapgettup_initial_block() function?

I'm not sure of the exact best combination of functions to mark as
inline. I did try the v7 patchset from 0002 to 0006 on top of c2891175
and I found that the performance is slightly better after removing
inline from all 4 of the helper functions. However, I think if we do
unlikely() and the function is moved into the cold path then it
matters less if it's inlined.

create table a (a int);
insert into a select x from generate_series(1,100)x;
vacuum freeze a;

$ cat seqscan.sql
select * from a where a = 0;
$ cat countall.sql
select count(*) from a;

seqscan.sql filters out all rows and countall.sql returns all rows and
does an aggregate so we don't have to return all those in the query.

max_parallel_workers_per_gather=0;

master
$ psql -c "select pg_prewarm('a')" postgres > /dev/null && for i in
{1..3}; do pgbench -n -f seqscan.sql -M prepared -T 10 postgres | grep
tps; done
tps = 25.464091 (without initial connection time)
tps = 25.117001 (without initial connection time)
tps = 25.141646 (without initial connection time)

$ psql -c "select pg_prewarm('a')" postgres > /dev/null && for i in
{1..3}; do pgbench -n -f countall.sql -M prepared -T 10 postgres |
grep tps; done
tps = 27.906307 (without initial connection time)
tps = 27.527580 (without initial connection time)
tps = 27.563035 (without initial connection time)

master + v7
$ psql -c "select pg_prewarm('a')" postgres > /dev/null && for i in
{1..3}; do pgbench -n -f seqscan.sql -M prepared -T 10 postgres | grep
tps; done
tps = 25.920370 (without initial connection time)
tps = 25.680052 (without initial connection time)
tps = 24.988895 (without initial connection time)

$ psql -c "select pg_prewarm('a')" postgres > /dev/null && for i in
{1..3}; do pgbench -n -f countall.sql -M prepared -T 10 postgres |
grep tps; done
tps = 33.783122 (without initial connection time)
tps = 33.248571 (without initial connection time)
tps = 33.512984 (without initial connection time)

master + v7 + inline removed
$ psql -c "select pg_prewarm('a')" postgres > /dev/null && for i in
{1..3}; do pgbench -n -f seqscan.sql -M prepared -T 10 postgres | grep
tps; done
tps = 27.680115 (without initial connection time)
tps = 26.418562 (without initial connection time)
tps = 26.166800 (without initial connection time)

$ psql -c "select pg_prewarm('a')" postgres > /dev/null && for i in
{1..3}; do pgbench -n -f countall.sql -M prepared -T 10 postgres |
grep tps; done
tps = 33.948588 (without initial connection time)
tps = 33.684966 (without initial connection time)
tps = 33.946700 (without initial connection time)

You can see that v7 helps countall.sql quite a bit. It seems to also
help a little bit with seqscan.sql. v7 + inline removed makes
seqscan.sql a decent amount faster than both master and master + v7.

David




Re: heapgettup refactoring

2023-01-30 Thread Melanie Plageman
v7 attached

On Fri, Jan 27, 2023 at 10:34 PM David Rowley  wrote:
>
> "On Wed, 25 Jan 2023 at 10:17, Melanie Plageman
>  wrote:
> > I've added a comment but I didn't include the function name in it -- I
> > find it repetitive when the comments above functions do that -- however,
> > I'm not strongly attached to that.
>
> I think the general format for header comments is:
>
> /*
>  * 
>  *\t\t
>  *
>  * [Further details]
>  */
>
> We've certainly got places that don't follow that, but I don't think
> that's any reason to have no comment or invent some new format.
>
> heapam.c seems to have some other format where we do: "
> - ". I generally just try to copy
> the style from the surrounding code. I think generally, people won't
> argue if you follow the style from the surrounding code, but there'd
> be exceptions to that, I'm sure.

I have followed the same convention as the other functions in heapam.c
in the various helper functions comments I've added in this version.

> v6-0002:
>
> 1. heapgettup_initial_block() needs a header comment to mention what
> it does and what it returns. It would be good to make it obvious that
> it returns InvalidBlockNumber when there are no blocks to scan.

I've done this.

>
> 2. After heapgettup_initial_block(), you're checking "if (block ==
> InvalidBlockNumber). It might be worth a mention something like
>
> /*
>  * Check if we got to the end of the scan already.  This could happen for
>  * an empty relation or if parallel workers scanned everything before we
>  * got a chance.
>  */
>
> the backward scan comment wouldn't mention parallel workers.

I've done this as well.

>
> v6-0003:
>
> 3. Can you explain why you removed the snapshot local variable in 
> heapgettup()?

In the subsequent commit, the helpers I add call TestForOldSnapshot(),
and I didn't want to pass in the snapshot as a separate parameter since
I already need to pass the scan descriptor. I thought it was confusing
to have a local variable (snapshot) used in some places and the one in
the scan used in others. This "streamlining" commit also reduces the
number of times the snapshot variable is used, making it less necessary
to have a local variable.

I didn't remove the snapshot local variable in the same commit as adding
the helpers because I thought it made the diff harder to understand (for
review, the final commit should likely not be separate patches).

> 4. I think it might be a good idea to use unlikely() in if
> (!scan->rs_inited). The idea is to help coax the compiler into moving
> that code off to a cold path. That's likely especially important if
> heapgettup_initial_block is inlined, which I see it is marked as.

I've gone ahead and added unlikely. However, should I perhaps skip
inlining the heapgettup_initial_block() function?

> v6-0004:
>
> 5. heapgettup_start_page() and heapgettup_continue_page() both need a
> header comment to explain what they do and what the inputs and output
> arguments are.

I've added these. I've also removed an unused parameter to both, block.

>
> 6. I'm not too sure what the following comment means:
>
> /* block and lineoff now reference the physically next tid */
>
> "block" is just a parameter to the function and its value is not
> adjusted. The comment makes it sound like something was changed.
>
> (I think these might be just not well updated from having split this
> out from the 0006 patch as the same comment makes more sense in 0006)

Yes, that is true. I've updated it to just mention lineoff.

> v6-0005:
>
> 7. heapgettup_advance_block() needs a header comment.
>
> 8. Is there a reason why heapgettup_advance_block() handle backward
> scans first? I'd expect you should just follow the lead of the other
> functions and do ScanDirectionIsForward first.

The reason I do this is that backwards scans cannot be parallel, so
handling backwards scans first let me return, then handle parallel
scans, then forward scans. This reduced the level of nesting/if
statements for all of the code in this function.

- Melanie
From 62416fee879a1ea6a7ca7ec132227701d35d7468 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 9 Jan 2023 17:33:43 -0500
Subject: [PATCH v7 2/6] Add heapgettup_initial_block() helper

---
 src/backend/access/heap/heapam.c | 226 ++-
 1 file changed, 102 insertions(+), 124 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 75f68b4e79..ff2e64822a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -484,6 +484,67 @@ heapgetpage(TableScanDesc sscan, BlockNumber block)
 }
 
 
+/*
+ * heapgettup_initial_block - helper for heapgettup() and heapgettup_pagemode()
+ *
+ * Determines and returns the first block to scan, given the scan direction and
+ * whether or not it is parallel. If the relation is empty or the current
+ * parallel worker finds the scan has already been completed by other workers,
+ * return InvalidBlockNumber.
+ *
+ * Note 

Re: heapgettup refactoring

2023-01-27 Thread David Rowley
"On Wed, 25 Jan 2023 at 10:17, Melanie Plageman
 wrote:
> I've added a comment but I didn't include the function name in it -- I
> find it repetitive when the comments above functions do that -- however,
> I'm not strongly attached to that.

I think the general format for header comments is:

/*
 * 
 *\t\t
 *
 * [Further details]
 */

We've certainly got places that don't follow that, but I don't think
that's any reason to have no comment or invent some new format.

heapam.c seems to have some other format where we do: "
- ". I generally just try to copy
the style from the surrounding code. I think generally, people won't
argue if you follow the style from the surrounding code, but there'd
be exceptions to that, I'm sure.

I'll skip further review of 0001 here as the whole
ScanDirectionNoMovement case is being discussed on the other thread.

v6-0002:

1. heapgettup_initial_block() needs a header comment to mention what
it does and what it returns. It would be good to make it obvious that
it returns InvalidBlockNumber when there are no blocks to scan.

2. After heapgettup_initial_block(), you're checking "if (block ==
InvalidBlockNumber). It might be worth a mention something like

/*
 * Check if we got to the end of the scan already.  This could happen for
 * an empty relation or if parallel workers scanned everything before we
 * got a chance.
 */

the backward scan comment wouldn't mention parallel workers.

v6-0003:

3. Can you explain why you removed the snapshot local variable in heapgettup()?

4. I think it might be a good idea to use unlikely() in if
(!scan->rs_inited). The idea is to help coax the compiler into moving
that code off to a cold path. That's likely especially important if
heapgettup_initial_block is inlined, which I see it is marked as.

v6-0004:

5. heapgettup_start_page() and heapgettup_continue_page() both need a
header comment to explain what they do and what the inputs and output
arguments are.

6. I'm not too sure what the following comment means:

/* block and lineoff now reference the physically next tid */

"block" is just a parameter to the function and its value is not
adjusted. The comment makes it sound like something was changed.

(I think these might be just not well updated from having split this
out from the 0006 patch as the same comment makes more sense in 0006)

v6-0005:

7. heapgettup_advance_block() needs a header comment.

8. Is there a reason why heapgettup_advance_block() handle backward
scans first? I'd expect you should just follow the lead of the other
functions and do ScanDirectionIsForward first.

v6-0006

David




Re: heapgettup refactoring

2023-01-24 Thread Melanie Plageman
On Tue, Jan 24, 2023 at 04:17:23PM -0500, Melanie Plageman wrote:
> Thanks for taking a look!
> 
> On Mon, Jan 23, 2023 at 6:08 AM David Rowley  wrote:
> >
> > On Thu, 19 Jan 2023 at 00:04, Peter Eisentraut
> >  wrote:
> > > In your v2 patch, you remove these assertions:
> > >
> > > -   /* check that rs_cindex is in sync */
> > > -   Assert(scan->rs_cindex < scan->rs_ntuples);
> > > -   Assert(lineoff == scan->rs_vistuples[scan->rs_cindex]);
> > >
> > > Is that intentional?
> > >
> > > I don't see any explanation, or some other equivalent code appearing
> > > elsewhere to replace this.
> >
> > I guess it's because those asserts are not relevant unless
> > heapgettup_no_movement() is being called from heapgettup_pagemode().
> > Maybe they can be put back along the lines of:
> >
> > Assert((scan->rs_base.rs_flags & SO_ALLOW_PAGEMODE) == 0 ||
> > scan->rs_cindex < scan->rs_ntuples);
> > Assert((scan->rs_base.rs_flags & SO_ALLOW_PAGEMODE) == 0 || lineoff ==
> > scan->rs_vistuples[scan->rs_cindex]);
> >
> > but it probably would be cleaner to just do an: if
> > (scan->rs_base.rs_flags & SO_ALLOW_PAGEMODE) { Assert(...);
> > Assert(...}; }
> 
> I prefer the first method and have implemented that in attached v6.
> 
> > The only issue I see with that is that we don't seem to have anywhere
> > in the regression tests that call heapgettup_no_movement() when
> > rs_flags have SO_ALLOW_PAGEMODE. At least, adding an elog(NOTICE) to
> > heapgettup() just before calling heapgettup_no_movement() does not
> > seem to cause make check to fail.  I wonder if any series of SQL
> > commands would allow us to call heapgettup_no_movement() from
> > heapgettup()?
> 
> So, the places in which we set scan direction to no movement include:
> - explain analyze on a ctas with no data
>   EXPLAIN ANALYZE CREATE TABLE foo AS SELECT 1 WITH NO DATA;
>   However, in standard_ExecutorRun() we only call ExecutePlan() if the
>   ScanDirection is not no movement, so this wouldn't hit our code
> - PortalRunSelect
> - PersistHoldablePortal()
> 
> I can't say I know enough about portals currently to design a test that
> will hit this code, but I will poke around some more.
 
I don't think we can write a test for this afterall. I've started
another thread on the topic over here:

https://www.postgresql.org/message-id/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY%3D_T8QEqZqOL02rpi2bw%40mail.gmail.com




Re: heapgettup refactoring

2023-01-24 Thread Melanie Plageman
Thanks for taking a look!

On Mon, Jan 23, 2023 at 6:08 AM David Rowley  wrote:
>
> On Thu, 19 Jan 2023 at 00:04, Peter Eisentraut
>  wrote:
> > In your v2 patch, you remove these assertions:
> >
> > -   /* check that rs_cindex is in sync */
> > -   Assert(scan->rs_cindex < scan->rs_ntuples);
> > -   Assert(lineoff == scan->rs_vistuples[scan->rs_cindex]);
> >
> > Is that intentional?
> >
> > I don't see any explanation, or some other equivalent code appearing
> > elsewhere to replace this.
>
> I guess it's because those asserts are not relevant unless
> heapgettup_no_movement() is being called from heapgettup_pagemode().
> Maybe they can be put back along the lines of:
>
> Assert((scan->rs_base.rs_flags & SO_ALLOW_PAGEMODE) == 0 ||
> scan->rs_cindex < scan->rs_ntuples);
> Assert((scan->rs_base.rs_flags & SO_ALLOW_PAGEMODE) == 0 || lineoff ==
> scan->rs_vistuples[scan->rs_cindex]);
>
> but it probably would be cleaner to just do an: if
> (scan->rs_base.rs_flags & SO_ALLOW_PAGEMODE) { Assert(...);
> Assert(...}; }

I prefer the first method and have implemented that in attached v6.

> The only issue I see with that is that we don't seem to have anywhere
> in the regression tests that call heapgettup_no_movement() when
> rs_flags have SO_ALLOW_PAGEMODE. At least, adding an elog(NOTICE) to
> heapgettup() just before calling heapgettup_no_movement() does not
> seem to cause make check to fail.  I wonder if any series of SQL
> commands would allow us to call heapgettup_no_movement() from
> heapgettup()?

So, the places in which we set scan direction to no movement include:
- explain analyze on a ctas with no data
  EXPLAIN ANALYZE CREATE TABLE foo AS SELECT 1 WITH NO DATA;
  However, in standard_ExecutorRun() we only call ExecutePlan() if the
  ScanDirection is not no movement, so this wouldn't hit our code
- PortalRunSelect
- PersistHoldablePortal()

I can't say I know enough about portals currently to design a test that
will hit this code, but I will poke around some more.

> I think heapgettup_no_movement() also needs a header comment more
> along the lines of:
>
> /*
>  * heapgettup_no_movement
>  *Helper function for NoMovementScanDirection direction for
> heapgettup() and
>  *heapgettup_pagemode.
>  */

I've added a comment but I didn't include the function name in it -- I
find it repetitive when the comments above functions do that -- however,
I'm not strongly attached to that.

> I pushed the pgindent stuff that v5-0001 did along with some additions
> to typedefs.list so that further runs could be done more easily as
> changes are made to these patches.

Cool!

- Melanie
From 5d2e8cb3388d9fa2d122db7ca8d9319f56abcaf9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Tue, 10 Jan 2023 14:15:15 -0500
Subject: [PATCH v6 5/6] Add heapgettup_advance_block() helper

---
 src/backend/access/heap/heapam.c | 167 +--
 1 file changed, 72 insertions(+), 95 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 08c4fb5228..15e4f3262b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -638,6 +638,74 @@ heapgettup_continue_page(HeapScanDesc scan, BlockNumber block, ScanDirection
 	return page;
 }
 
+static inline BlockNumber
+heapgettup_advance_block(HeapScanDesc scan, BlockNumber block, ScanDirection dir)
+{
+	if (ScanDirectionIsBackward(dir))
+	{
+		if (block == scan->rs_startblock)
+			return InvalidBlockNumber;
+
+		if (scan->rs_numblocks != InvalidBlockNumber)
+		{
+			if (--scan->rs_numblocks == 0)
+return InvalidBlockNumber;
+		}
+
+		if (block == 0)
+			block = scan->rs_nblocks;
+
+		block--;
+
+		return block;
+	}
+	else if (scan->rs_base.rs_parallel != NULL)
+	{
+		Assert(ScanDirectionIsForward(dir));
+
+		block = table_block_parallelscan_nextpage(scan->rs_base.rs_rd,
+  scan->rs_parallelworkerdata, (ParallelBlockTableScanDesc)
+  scan->rs_base.rs_parallel);
+
+		return block;
+	}
+	else
+	{
+		Assert(ScanDirectionIsForward(dir));
+
+		block++;
+
+		if (block >= scan->rs_nblocks)
+			block = 0;
+
+		if (block == scan->rs_startblock)
+			return InvalidBlockNumber;
+
+		if (scan->rs_numblocks != InvalidBlockNumber)
+		{
+			if (--scan->rs_numblocks == 0)
+return InvalidBlockNumber;
+		}
+
+		/*
+		 * Report our new scan position for synchronization purposes. We don't
+		 * do that when moving backwards, however. That would just mess up any
+		 * other forward-moving scanners.
+		 *
+		 * Note: we do this before checking for end of scan so that the final
+		 * state of the position hint is back at the start of the rel.  That's
+		 * not strictly necessary, but otherwise when you run the same query
+		 * multiple times the starting position would shift a little bit
+		 * backwards on every invocation, which is confusing. We don't
+		 * guarantee any specific ordering in general, though.
+		 */
+		if (scan->rs_base.rs_flags & SO_ALLOW_SYNC)
+	

Re: heapgettup refactoring

2023-01-23 Thread David Rowley
On Thu, 19 Jan 2023 at 00:04, Peter Eisentraut
 wrote:
> In your v2 patch, you remove these assertions:
>
> -   /* check that rs_cindex is in sync */
> -   Assert(scan->rs_cindex < scan->rs_ntuples);
> -   Assert(lineoff == scan->rs_vistuples[scan->rs_cindex]);
>
> Is that intentional?
>
> I don't see any explanation, or some other equivalent code appearing
> elsewhere to replace this.

I guess it's because those asserts are not relevant unless
heapgettup_no_movement() is being called from heapgettup_pagemode().
Maybe they can be put back along the lines of:

Assert((scan->rs_base.rs_flags & SO_ALLOW_PAGEMODE) == 0 ||
scan->rs_cindex < scan->rs_ntuples);
Assert((scan->rs_base.rs_flags & SO_ALLOW_PAGEMODE) == 0 || lineoff ==
scan->rs_vistuples[scan->rs_cindex]);

but it probably would be cleaner to just do an: if
(scan->rs_base.rs_flags & SO_ALLOW_PAGEMODE) { Assert(...);
Assert(...}; }

The only issue I see with that is that we don't seem to have anywhere
in the regression tests that call heapgettup_no_movement() when
rs_flags have SO_ALLOW_PAGEMODE. At least, adding an elog(NOTICE) to
heapgettup() just before calling heapgettup_no_movement() does not
seem to cause make check to fail.  I wonder if any series of SQL
commands would allow us to call heapgettup_no_movement() from
heapgettup()?

I think heapgettup_no_movement() also needs a header comment more
along the lines of:

/*
 * heapgettup_no_movement
 *Helper function for NoMovementScanDirection direction for
heapgettup() and
 *heapgettup_pagemode.
 */

I pushed the pgindent stuff that v5-0001 did along with some additions
to typedefs.list so that further runs could be done more easily as
changes are made to these patches.

David




Re: heapgettup refactoring

2023-01-18 Thread Peter Eisentraut

On 10.01.23 21:31, Melanie Plageman wrote:

On Thu, Jan 5, 2023 at 8:52 AM Peter Eisentraut
 wrote:


Ok, let's look through these patches starting from the top then.

v4-0001-Add-no-movement-scan-helper.patch

This makes sense overall; there is clearly some duplicate code that can
be unified.

It appears that during your rebasing you have effectively reverted your
earlier changes that have been committed as
8e1db29cdbbd218ab6ba53eea56624553c3bef8c.  You should undo that.


Thanks. I think I have addressed this.
I've attached a rebased v5.


In your v2 patch, you remove these assertions:

-   /* check that rs_cindex is in sync */
-   Assert(scan->rs_cindex < scan->rs_ntuples);
-   Assert(lineoff == scan->rs_vistuples[scan->rs_cindex]);

Is that intentional?

I don't see any explanation, or some other equivalent code appearing 
elsewhere to replace this.






Re: heapgettup refactoring

2023-01-10 Thread Melanie Plageman
On Thu, Jan 5, 2023 at 8:52 AM Peter Eisentraut
 wrote:
>
> Ok, let's look through these patches starting from the top then.
>
> v4-0001-Add-no-movement-scan-helper.patch
>
> This makes sense overall; there is clearly some duplicate code that can
> be unified.
>
> It appears that during your rebasing you have effectively reverted your
> earlier changes that have been committed as
> 8e1db29cdbbd218ab6ba53eea56624553c3bef8c.  You should undo that.

Thanks. I think I have addressed this.
I've attached a rebased v5.

> I don't understand the purpose of the noinline maker.  If it's not
> necessary to inline, we can just leave it off, but there is no need to
> outright prevent inlining AFAICT.
>

I have removed it.

> I don't know why you changed the if/else sequences.  Before, the
> sequence was effectively
>
> if (forward)
> {
>  ...
> }
> else if (backward)
> {
>  ...
> }
> else
> {
>  /* it's no movement */
> }
>
> Now it's changed to
>
> if (no movement)
> {
>  ...
>  return;
> }
>
> if (forward)
> {
>  ...
> }
> else
> {
>  Assert(backward);
>  ...
> }
>
> Sure, that's the same thing, but it looks less elegant to me.

In this commit, you could keep the original ordering of if statements. I
preferred no movement scan first because then backwards and forwards
scans' code is physically closer to the rest of the code without the
intrusion of the no movement scan code.

Ultimately, the refactor (in later patches) flips the ordering of if
statements at the top from
if (scan direction)
to
if (initial or continue)
and this isn't a very interesting distinction for no movement scans. By
dealing with no movement scan at the top, I didn't have to handle no
movement scans in the initial and continue branches in the new structure.

Also, I will note that patches 4-6 at least and perhaps 4-7 do not make
sense as separate commits. I separated them for ease of review. Each is
correct and passes tests but is not really an improvement without the
others.

- Melanie
From 9b51959c483812c30ca848b66a0e6e3a807ab03f Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 9 Jan 2023 17:33:43 -0500
Subject: [PATCH v5 3/7] Add heapgettup_initial_block() helper

---
 src/backend/access/heap/heapam.c | 212 ---
 1 file changed, 82 insertions(+), 130 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c80b547809..b9a1aac3ca 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -521,6 +521,57 @@ heapgettup_no_movement(HeapScanDesc scan)
 }
 
 
+static inline BlockNumber
+heapgettup_initial_block(HeapScanDesc scan, ScanDirection dir)
+{
+	Assert(!ScanDirectionIsNoMovement(dir));
+	Assert(!scan->rs_inited);
+
+	/* return null immediately if relation is empty */
+	if (scan->rs_nblocks == 0 || scan->rs_numblocks == 0)
+		return InvalidBlockNumber;
+
+	scan->rs_inited = true;
+
+	/* forward and serial */
+	if (ScanDirectionIsForward(dir) && scan->rs_base.rs_parallel == NULL)
+		return scan->rs_startblock;
+
+	/* forward and parallel */
+	if (ScanDirectionIsForward(dir))
+	{
+		table_block_parallelscan_startblock_init(scan->rs_base.rs_rd,
+ scan->rs_parallelworkerdata,
+ (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel);
+
+		return table_block_parallelscan_nextpage(scan->rs_base.rs_rd,
+ scan->rs_parallelworkerdata,
+ (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel);
+	}
+
+	/* backward parallel scan not supported */
+	Assert(scan->rs_base.rs_parallel == NULL);
+
+	/*
+	 * Disable reporting to syncscan logic in a backwards scan; it's not very
+	 * likely anyone else is doing the same thing at the same time, and much
+	 * more likely that we'll just bollix things for forward scanners.
+	 */
+	scan->rs_base.rs_flags &= ~SO_ALLOW_SYNC;
+
+	/*
+	 * Start from last page of the scan.  Ensure we take into account
+	 * rs_numblocks if it's been adjusted by heap_setscanlimits().
+	 */
+	if (scan->rs_numblocks != InvalidBlockNumber)
+		return (scan->rs_startblock + scan->rs_numblocks - 1) % scan->rs_nblocks;
+
+	if (scan->rs_startblock > 0)
+		return scan->rs_startblock - 1;
+
+	return scan->rs_nblocks - 1;
+}
+
 /* 
  *		heapgettup - fetch next heap tuple
  *
@@ -574,41 +625,16 @@ heapgettup(HeapScanDesc scan,
 	{
 		if (!scan->rs_inited)
 		{
-			/*
-			 * return null immediately if relation is empty
-			 */
-			if (scan->rs_nblocks == 0 || scan->rs_numblocks == 0)
+			block = heapgettup_initial_block(scan, dir);
+
+			if (block == InvalidBlockNumber)
 			{
 Assert(!BufferIsValid(scan->rs_cbuf));
 tuple->t_data = NULL;
 return;
 			}
-			if (scan->rs_base.rs_parallel != NULL)
-			{
-ParallelBlockTableScanDesc pbscan =
-(ParallelBlockTableScanDesc) scan->rs_base.rs_parallel;
-ParallelBlockTableScanWorker pbscanwork =
-scan->rs_parallelworkerdata;
-
-

Re: heapgettup refactoring

2023-01-05 Thread Peter Eisentraut

On 03.01.23 21:39, Melanie Plageman wrote:

On 30.11.22 23:34, Melanie Plageman wrote:

I have attached a patchset with only the code changes contained in the
previous patch 0003. I have broken the refactoring down into many
smaller pieces for ease of review.


To keep this moving along a bit, I have committed your 0002, which I
think is a nice little improvement on its own.


Thanks!
I've attached a rebased patchset - v4.

I also changed heapgettup_no_movement() to noinline (instead of inline).
David Rowley pointed out that this might make more sense given how
comparatively rare no movement scans are.


Ok, let's look through these patches starting from the top then.

v4-0001-Add-no-movement-scan-helper.patch

This makes sense overall; there is clearly some duplicate code that can 
be unified.


It appears that during your rebasing you have effectively reverted your 
earlier changes that have been committed as 
8e1db29cdbbd218ab6ba53eea56624553c3bef8c.  You should undo that.


I don't understand the purpose of the noinline maker.  If it's not 
necessary to inline, we can just leave it off, but there is no need to 
outright prevent inlining AFAICT.


I don't know why you changed the if/else sequences.  Before, the 
sequence was effectively


if (forward)
{
...
}
else if (backward)
{
...
}
else
{
/* it's no movement */
}

Now it's changed to

if (no movement)
{
...
return;
}

if (forward)
{
...
}
else
{
Assert(backward);
...
}

Sure, that's the same thing, but it looks less elegant to me.





Re: heapgettup refactoring

2023-01-03 Thread Melanie Plageman
On Mon, Jan 2, 2023 at 5:23 AM Peter Eisentraut
 wrote:
>
> On 30.11.22 23:34, Melanie Plageman wrote:
> > I have attached a patchset with only the code changes contained in the
> > previous patch 0003. I have broken the refactoring down into many
> > smaller pieces for ease of review.
>
> To keep this moving along a bit, I have committed your 0002, which I
> think is a nice little improvement on its own.

Thanks!
I've attached a rebased patchset - v4.

I also changed heapgettup_no_movement() to noinline (instead of inline).
David Rowley pointed out that this might make more sense given how
comparatively rare no movement scans are.

- Melanie


v4-0004-Add-heapgettup-helpers.patch
Description: Binary data


v4-0002-Add-heapgettup_initial_page-helper.patch
Description: Binary data


v4-0005-Add-heapgettup_advance_page.patch
Description: Binary data


v4-0006-Refactor-heapgettup-and-heapgettup_pagemode.patch
Description: Binary data


v4-0003-Streamline-heapgettup-for-refactor.patch
Description: Binary data


v4-0001-Add-no-movement-scan-helper.patch
Description: Binary data


Re: heapgettup refactoring

2023-01-02 Thread Peter Eisentraut

On 30.11.22 23:34, Melanie Plageman wrote:

I have attached a patchset with only the code changes contained in the
previous patch 0003. I have broken the refactoring down into many
smaller pieces for ease of review.


To keep this moving along a bit, I have committed your 0002, which I 
think is a nice little improvement on its own.






Re: heapgettup refactoring

2022-11-30 Thread Melanie Plageman
On Wed, Nov 16, 2022 at 10:49 AM Peter Eisentraut
 wrote:
>
> On 04.11.22 16:51, Melanie Plageman wrote:
> > Thanks for the review!
> > Attached is v2 with feedback addressed.
>
> Your 0001 had already been pushed.
>
> I have pushed your 0002.
>
> I have also pushed the renaming of page -> block, dp -> page separately.
>   This should reduce the size of your 0003 a bit.
>
> Please produce an updated version of the 0003 page for further review.

Thanks for looking at this!
I have attached a patchset with only the code changes contained in the
previous patch 0003. I have broken the refactoring down into many
smaller pieces for ease of review.

- Melanie
From 2e63656e75b00b24d286533ae8c8ef291876d4a8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Wed, 30 Nov 2022 14:18:08 -0500
Subject: [PATCH v3 5/7] Add heapgettup() helpers

Add heapgettup_start_page() and heapgettup_continue_page(), helper
functions for heapgettup().
---
 src/backend/access/heap/heapam.c | 93 +---
 1 file changed, 61 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a4365beee1..f9b2d5cadb 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -575,6 +575,65 @@ heapgettup_initial_page(HeapScanDesc scan, ScanDirection dir)
 	return scan->rs_nblocks - 1;
 }
 
+static inline Page
+heapgettup_start_page(HeapScanDesc scan, BlockNumber block, ScanDirection dir,
+		int *linesleft, OffsetNumber *lineoff)
+{
+	Page page;
+
+	Assert(scan->rs_inited);
+	Assert(BufferIsValid(scan->rs_cbuf));
+
+	/* Caller is responsible for ensuring buffer is locked if needed */
+	page = BufferGetPage(scan->rs_cbuf);
+
+	TestForOldSnapshot(scan->rs_base.rs_snapshot, scan->rs_base.rs_rd, page);
+
+	*linesleft = PageGetMaxOffsetNumber((Page) page) - FirstOffsetNumber + 1;
+
+	if (ScanDirectionIsForward(dir))
+		*lineoff = FirstOffsetNumber;
+	else
+		*lineoff = (OffsetNumber) (*linesleft);
+
+	/* lineoff now references the physically previous or next tid */
+	return page;
+}
+
+static inline Page
+heapgettup_continue_page(HeapScanDesc scan, BlockNumber block, ScanDirection
+		dir, int *linesleft, OffsetNumber *lineoff)
+{
+	Page page;
+
+	Assert(scan->rs_inited);
+	Assert(BufferIsValid(scan->rs_cbuf));
+
+	/* Caller is responsible for ensuring buffer is locked if needed */
+	page = BufferGetPage(scan->rs_cbuf);
+
+	TestForOldSnapshot(scan->rs_base.rs_snapshot, scan->rs_base.rs_rd, page);
+
+	if (ScanDirectionIsForward(dir))
+	{
+		*lineoff = OffsetNumberNext(scan->rs_cindex);
+		*linesleft = PageGetMaxOffsetNumber(page) - (*lineoff) + 1;
+	}
+	else
+	{
+		/*
+		* The previous returned tuple may have been vacuumed since the
+		* previous scan when we use a non-MVCC snapshot, so we must
+		* re-establish the lineoff <= PageGetMaxOffsetNumber(page)
+		* invariant
+		*/
+		*lineoff = Min(PageGetMaxOffsetNumber(page), OffsetNumberPrev(scan->rs_cindex));
+		*linesleft = *lineoff;
+	}
+	/* block and lineoff now reference the physically next tid */
+	return page;
+}
+
 /* 
  *		heapgettup - fetch next heap tuple
  *
@@ -632,45 +691,15 @@ heapgettup(HeapScanDesc scan,
 		}
 
 		heapgetpage((TableScanDesc) scan, block);
-
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
-		page = BufferGetPage(scan->rs_cbuf);
-		TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, page);
-
-		linesleft = PageGetMaxOffsetNumber((Page) page) - FirstOffsetNumber + 1;
-
-		if (ScanDirectionIsForward(dir))
-			lineoff = FirstOffsetNumber;
-		else
-			lineoff = (OffsetNumber) linesleft;
+		page = heapgettup_start_page(scan, block, dir, , );
 	}
 	else
 	{
 		block = scan->rs_cblock; /* current page */
 
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
-
-		page = BufferGetPage(scan->rs_cbuf);
-		TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, page);
-
-		if (ScanDirectionIsForward(dir))
-		{
-			lineoff = OffsetNumberNext(scan->rs_cindex);
-			/* block and lineoff now reference the physically next tid */
-			linesleft = PageGetMaxOffsetNumber(page) - lineoff + 1;
-		}
-		else
-		{
-			/*
-			 * The previous returned tuple may have been vacuumed since the
-			 * previous scan when we use a non-MVCC snapshot, so we must
-			 * re-establish the lineoff <= PageGetMaxOffsetNumber(page)
-			 * invariant
-			 */
-			lineoff = Min(PageGetMaxOffsetNumber(page), OffsetNumberPrev(scan->rs_cindex));
-			/* block and lineoff now reference the physically previous tid */
-			linesleft = lineoff;
-		}
+		page = heapgettup_continue_page(scan, block, dir, , );
 	}
 
 	/*
-- 
2.34.1

From 8d4106f9f045999c5b3432fc19ee8f216b85df2d Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Wed, 30 Nov 2022 10:42:12 -0500
Subject: [PATCH v3 2/7] Push lpp variable closer to usage in heapgetpage()

---
 src/backend/access/heap/heapam.c | 41 
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/heap/heapam.c 

Re: heapgettup refactoring

2022-11-16 Thread Peter Eisentraut

On 04.11.22 16:51, Melanie Plageman wrote:

Thanks for the review!
Attached is v2 with feedback addressed.


Your 0001 had already been pushed.

I have pushed your 0002.

I have also pushed the renaming of page -> block, dp -> page separately. 
 This should reduce the size of your 0003 a bit.


Please produce an updated version of the 0003 page for further review.





Re: heapgettup refactoring

2022-11-04 Thread Melanie Plageman
Thanks for the review!
Attached is v2 with feedback addressed.

On Mon, Oct 31, 2022 at 9:09 PM Andres Freund  wrote:
> > From 9d8b01960463dc64ff5b111d523ff80fce3017af Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman 
> > Date: Mon, 31 Oct 2022 13:40:06 -0400
> > Subject: [PATCH v1 2/3] Turn HeapKeyTest macro into function
> >
> > This should always be inlined appropriately now. It is easier to read as
> > a function. Also, remove unused include in catcache.c.
> > ---
> >  src/backend/access/heap/heapam.c   | 10 ++--
> >  src/backend/utils/cache/catcache.c |  1 -
> >  src/include/access/valid.h | 76 --
> >  3 files changed, 36 insertions(+), 51 deletions(-)
> >
> > diff --git a/src/backend/access/heap/heapam.c 
> > b/src/backend/access/heap/heapam.c
> > index 12be87efed..1c995faa12 100644
> > --- a/src/backend/access/heap/heapam.c
> > +++ b/src/backend/access/heap/heapam.c
> > @@ -716,8 +716,10 @@ heapgettup(HeapScanDesc scan,
> > 
> >   snapshot);
> >
> >   if (valid && key != NULL)
> > - HeapKeyTest(tuple, 
> > RelationGetDescr(scan->rs_base.rs_rd),
> > - nkeys, key, 
> > valid);
> > + {
> > + valid = HeapKeyTest(tuple, 
> > RelationGetDescr(scan->rs_base.rs_rd),
> > + nkeys, key);
> > + }
> >
> >   if (valid)
> >   {
>
> superfluous parens.

fixed.

> > From a894ce38c488df6546392b9f3bd894b67edf951e Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman 
> > Date: Mon, 31 Oct 2022 13:40:29 -0400
> > Subject: [PATCH v1 3/3] Refactor heapgettup* and heapgetpage
> >
> > Simplify heapgettup(), heapgettup_pagemode(), and heapgetpage(). All
> > three contained several unnecessary local variables, duplicate code, and
> > nested if statements. Streamlining these improves readability and
> > extensibility.
>
> It'd be nice to break this into slightly smaller chunks.

I can do that. Since incorporating feedback will be harder once I break
it up into smaller chunks, I'm inclined to wait to do so until I know
that the structure I have now is the one we will go with. (I know smaller
chunks will make it more reviewable.)

> > +
> > +static inline void heapgettup_no_movement(HeapScanDesc scan)
> > +{
>
> FWIW, for function definitions we keep the return type (and with that also the
> the "static inline") on a separate line.

Fixed

>
> > + ItemId  lpp;
> > + OffsetNumber lineoff;
> > + BlockNumber page;
> > + Page dp;
> > + HeapTuple   tuple = &(scan->rs_ctup);
> > + /*
> > + * ``no movement'' scan direction: refetch prior tuple
> > + */
> > +
> > + /* Since the tuple was previously fetched, needn't lock page here */
> > + if (!scan->rs_inited)
> > + {
> > + Assert(!BufferIsValid(scan->rs_cbuf));
> > + tuple->t_data = NULL;
> > + return;
>
> Is it possible to have a no-movement scan with an uninitialized scan? That
> doesn't really seem to make sense. At least that's how I understand the
> explanation for NoMovement nearby:
>  * dir == NoMovementScanDirection means "re-fetch the tuple indicated
>  * by scan->rs_ctup".
>
> We can't have a rs_ctup without an already started scan, no?
>
> Looks like this is pre-existing code that you just moved, but it still seems
> wrong.

Changed to an assert

>
> > + }
> > + page = ItemPointerGetBlockNumber(&(tuple->t_self));
> > + if (page != scan->rs_cblock)
> > + heapgetpage((TableScanDesc) scan, page);
>
>
> We have a
> BlockNumber page;
> and
> Pagedp;
> in this code which seems almost intentionally confusing. This again is a
> pre-existing issue but perhaps we could clean it up first?

in attached
page -> block
dp -> page
in basically all locations in heapam.c (should that be its own commit?)

> > +static inline Page heapgettup_continue_page(HeapScanDesc scan, BlockNumber 
> > page, ScanDirection dir,
> > + int *linesleft, OffsetNumber *lineoff)
> > +{
> > + HeapTuple   tuple = &(scan->rs_ctup);
>
> Hm. Finding the next offset via rs_ctup doesn't seem quite right. For one,
> it's not actually that cheap to extract the offset from an ItemPointer because
> of the the way we pack it into ItemPointerData.

So, it was like this before [1].
What about saving the lineoff in rs_cindex.

It is smaller, but that seems okay, right?

> > + Page dp = BufferGetPage(scan->rs_cbuf);
> > + TestForOldSnapshot(scan->rs_base.rs_snapshot, scan->rs_base.rs_rd, 
> > dp);
>
> Newlines between definitions and code :)

k

> Perhaps worth asserting that the scan is initialized 

Re: heapgettup refactoring

2022-10-31 Thread Andres Freund
Hi,

On 2022-10-31 14:37:39 -0400, Melanie Plageman wrote:
> and heapgetpage(). heapgettup() and heapgettup_pagemode() have a lot of
> duplicated code, confusingly nested if statements, and unnecessary local
> variables. While working on a feature for the AIO/DIO patchset, I
> noticed that it was difficult to add new code to heapgettup() and
> heapgettup_pagemode() because of how the functions are written.

Thanks for working on this - the current state is quite painful.


> From cde2d6720f4f5ab2531c22ad4a5f0d9e6ec1039d Mon Sep 17 00:00:00 2001
> From: Melanie Plageman 
> Date: Wed, 26 Oct 2022 20:00:34 -0400
> Subject: [PATCH v1 1/3] Remove breaks in HeapTupleSatisfiesVisibility
>
> breaks in HeapTupleSatisfiesVisibility were superfluous
> ---
>  src/backend/access/heap/heapam_visibility.c | 7 ---
>  1 file changed, 7 deletions(-)
>
> diff --git a/src/backend/access/heap/heapam_visibility.c 
> b/src/backend/access/heap/heapam_visibility.c
> index 6e33d1c881..dd5d5da190 100644
> --- a/src/backend/access/heap/heapam_visibility.c
> +++ b/src/backend/access/heap/heapam_visibility.c
> @@ -1769,25 +1769,18 @@ HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot 
> snapshot, Buffer buffer)
>   {
>   case SNAPSHOT_MVCC:
>   return HeapTupleSatisfiesMVCC(htup, snapshot, buffer);
> - break;
>   case SNAPSHOT_SELF:
>   return HeapTupleSatisfiesSelf(htup, snapshot, buffer);
> - break;
>   case SNAPSHOT_ANY:
>   return HeapTupleSatisfiesAny(htup, snapshot, buffer);
> - break;
>   case SNAPSHOT_TOAST:
>   return HeapTupleSatisfiesToast(htup, snapshot, buffer);
> - break;
>   case SNAPSHOT_DIRTY:
>   return HeapTupleSatisfiesDirty(htup, snapshot, buffer);
> - break;
>   case SNAPSHOT_HISTORIC_MVCC:
>   return HeapTupleSatisfiesHistoricMVCC(htup, snapshot, 
> buffer);
> - break;
>   case SNAPSHOT_NON_VACUUMABLE:
>   return HeapTupleSatisfiesNonVacuumable(htup, snapshot, 
> buffer);
> - break;
>   }

Not sure what the author of this code, a certain Mr Freund, was thinking when
he added those returns...


> From 9d8b01960463dc64ff5b111d523ff80fce3017af Mon Sep 17 00:00:00 2001
> From: Melanie Plageman 
> Date: Mon, 31 Oct 2022 13:40:06 -0400
> Subject: [PATCH v1 2/3] Turn HeapKeyTest macro into function
>
> This should always be inlined appropriately now. It is easier to read as
> a function. Also, remove unused include in catcache.c.
> ---
>  src/backend/access/heap/heapam.c   | 10 ++--
>  src/backend/utils/cache/catcache.c |  1 -
>  src/include/access/valid.h | 76 --
>  3 files changed, 36 insertions(+), 51 deletions(-)
>
> diff --git a/src/backend/access/heap/heapam.c 
> b/src/backend/access/heap/heapam.c
> index 12be87efed..1c995faa12 100644
> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -716,8 +716,10 @@ heapgettup(HeapScanDesc scan,
>   
> snapshot);
>
>   if (valid && key != NULL)
> - HeapKeyTest(tuple, 
> RelationGetDescr(scan->rs_base.rs_rd),
> - nkeys, key, 
> valid);
> + {
> + valid = HeapKeyTest(tuple, 
> RelationGetDescr(scan->rs_base.rs_rd),
> + nkeys, key);
> + }
>
>   if (valid)
>   {

superfluous parens.



> --- a/src/include/access/valid.h
> +++ b/src/include/access/valid.h
> @@ -19,51 +19,35 @@
>   *
>   *   Test a heap tuple to see if it satisfies a scan key.
>   */
> -#define HeapKeyTest(tuple, \
> - tupdesc, \
> - nkeys, \
> - keys, \
> - result) \
> -do \
> -{ \
> - /* Use underscores to protect the variables passed in as parameters */ \
> - int __cur_nkeys = (nkeys); \
> - ScanKey __cur_keys = (keys); \
> - \
> - (result) = true; /* may change */ \
> - for (; __cur_nkeys--; __cur_keys++) \
> - { \
> - Datum   __atp; \
> - bool__isnull; \
> - Datum   __test; \
> - \
> - if (__cur_keys->sk_flags & SK_ISNULL) \
> - { \
> - (result) = false; \
> - break; \
> - } \
> - \
> - __atp = heap_getattr((tuple), \
> 

Re: heapgettup refactoring

2022-10-31 Thread Justin Pryzby
FYI:

[18:51:54.707] ../src/backend/access/heap/heapam.c(720): warning C4098: 
'heapgettup': 'void' function returning a value
[18:51:54.707] ../src/backend/access/heap/heapam.c(850): warning C4098: 
'heapgettup_pagemode': 'void' function returning a value

For some reason, MSVC is the only one to complain, and cfbot doesn't
currently tell you about it.  I have a patch to show that, which I'll
send $later.




heapgettup refactoring

2022-10-31 Thread Melanie Plageman
Hi,

Attached is a patchset to refactor heapgettup(), heapgettup_pagemode(),
and heapgetpage(). heapgettup() and heapgettup_pagemode() have a lot of
duplicated code, confusingly nested if statements, and unnecessary local
variables. While working on a feature for the AIO/DIO patchset, I
noticed that it was difficult to add new code to heapgettup() and
heapgettup_pagemode() because of how the functions are written.

I've taken a stab at refactoring them -- without generating less
efficient code or causing regressions. I'm interested if people find it
more readable and if those with more assembly expertise see issues (new
branches added which are not highly predictable, etc). I took a look at
the assembly for those symbols compiled at O2 but am not experienced
enough at analysis to come to any conclusions.

- Melanie


v1-0002-Turn-HeapKeyTest-macro-into-function.patch
Description: Binary data


v1-0001-Remove-breaks-in-HeapTupleSatisfiesVisibility.patch
Description: Binary data


v1-0003-Refactor-heapgettup-and-heapgetpage.patch
Description: Binary data