On Mon, Apr 8, 2024 at 10:26 AM Melanie Plageman
<melanieplage...@gmail.com> wrote:
> On Sun, Apr 07, 2024 at 03:00:00PM -0700, Andres Freund wrote:
> > On 2024-04-07 16:59:26 -0400, Melanie Plageman wrote:
> > > From 862b7ac81cdafcda7b525e02721da14e46265509 Mon Sep 17 00:00:00 2001
> > > From: Melanie Plageman <melanieplage...@gmail.com>
> > > Date: Sun, 7 Apr 2024 15:38:41 -0400
> > > Subject: [PATCH v10 3/3] Obsolete BlockSampler_HasMore()
> > >
> > > A previous commit stopped using BlockSampler_HasMore() for flow control
> > > in acquire_sample_rows(). There seems little use now for
> > > BlockSampler_HasMore(). It should be sufficient to return
> > > InvalidBlockNumber from BlockSampler_Next() when BlockSample_HasMore()
> > > would have returned false. Remove BlockSampler_HasMore().
> > >
> > > Author: Melanie Plageman, Nazir Bilal Yavuz
> > > Discussion: 
> > > https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com
> >
> > The justification here seems somewhat odd. Sure, the previous commit stopped
> > using BlockSampler_HasMore in acquire_sample_rows - but only because it was
> > moved to block_sampling_streaming_read_next()?
>
> It didn't stop using it. It stopped being useful. The reason it existed,
> as far as I can tell, was to use it as the while() loop condition in
> acquire_sample_rows(). I think it makes much more sense for
> BlockSampler_Next() to return InvalidBlockNumber when there are no more
> blocks -- not to assert you don't call it when there aren't any more
> blocks.
>
> I didn't want to change BlockSampler_Next() in the same commit as the
> streaming read user and we can't remove BlockSampler_HasMore() without
> changing BlockSampler_Next().

I agree that the code looks useless if one condition implies the
other, but isn't it good to keep that cross-check, perhaps
reformulated as an assertion?  I didn't look too hard at the maths, I
just saw the words "It is not obvious that this code matches Knuth's
Algorithm S ..." and realised I'm not sure I have time to develop a
good opinion about this today.  So I'll leave the 0002 change out for
now, as it's a tidy-up that can easily be applied in the next cycle.
From c3b8df8e4720d8b0dfb4c892c0aa3ddaef8f401f Mon Sep 17 00:00:00 2001
From: Thomas Munro <tmu...@postgresql.org>
Date: Mon, 8 Apr 2024 14:38:58 +1200
Subject: [PATCH v12] Remove obsolete BlockSampler_HasMore().

Commit 041b9680 stopped using BlockSampler_HasMore() for flow control in
acquire_sample_rows(). There seems to be little use for it now. We can
just return InvalidBlockNumber from BlockSampler_Next() when
BlockSample_HasMore() would have returned false.

Author: Melanie Plageman <melanieplage...@gmail.com>
Author: Nazir Bilal Yavuz <byavu...@gmail.com>
Reviewed-by: Andres Freund <and...@anarazel.de>
Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com
---
 src/backend/commands/analyze.c    |  4 +---
 src/backend/utils/misc/sampling.c | 10 +++-------
 src/include/utils/sampling.h      |  1 -
 3 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index da27a13a3f..e9fa3470cf 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1111,9 +1111,7 @@ block_sampling_read_stream_next(ReadStream *stream,
 								void *callback_private_data,
 								void *per_buffer_data)
 {
-	BlockSamplerData *bs = callback_private_data;
-
-	return BlockSampler_HasMore(bs) ? BlockSampler_Next(bs) : InvalidBlockNumber;
+	return BlockSampler_Next(callback_private_data);
 }
 
 /*
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index 933db06702..6e2bca9739 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -54,12 +54,6 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 	return Min(bs->n, bs->N);
 }
 
-bool
-BlockSampler_HasMore(BlockSampler bs)
-{
-	return (bs->t < bs->N) && (bs->m < bs->n);
-}
-
 BlockNumber
 BlockSampler_Next(BlockSampler bs)
 {
@@ -68,7 +62,9 @@ BlockSampler_Next(BlockSampler bs)
 	double		p;				/* probability to skip block */
 	double		V;				/* random */
 
-	Assert(BlockSampler_HasMore(bs));	/* hence K > 0 and k > 0 */
+	/* Return if no remaining blocks or no blocks to sample */
+	if (K <= 0 || k <= 0)
+		return InvalidBlockNumber;
 
 	if ((BlockNumber) k >= K)
 	{
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index be48ee52ba..fb5d6820a2 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -38,7 +38,6 @@ typedef BlockSamplerData *BlockSampler;
 
 extern BlockNumber BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
 									 int samplesize, uint32 randseed);
-extern bool BlockSampler_HasMore(BlockSampler bs);
 extern BlockNumber BlockSampler_Next(BlockSampler bs);
 
 /* Reservoir sampling methods */
-- 
2.44.0

Reply via email to