On Sat, Aug 24, 2024 at 5:31 AM Thomas Munro <thomas.mu...@gmail.com> wrote:

> On Thu, Aug 22, 2024 at 7:31 PM Mats Kindahl <m...@timescale.com> wrote:
> > The alternate version proposed by Nazir allows you to deide which
> interface to use.
> > Reverting the patch entirely would also solve the problem.
>

After digging through the code a little more I discovered that
there actually is another one: move the ReadStream struct into
read_stream.h.


> > Passing down the block sampler and the strategy to scan_begin() and move
> the ReadStream setup in analyze.c into initscan() in heapam.c, but this
> requires adding new parameters to this function.
> > Having accessors that allow you to get the block sampler and strategy
> from the ReadStream object.
>
> I'm a bit confused about how it can make sense to use the same
> BlockSampler with a side relation/fork.  Could you point me at the
> code?
>

Sorry, that was a bit unclear. Intention was not to re-use the block
sampler but to set a new one up with parameters from the original block
sampler, which would require access to it. (The strategy is less of a
problem since only one is used.)

To elaborate on the situation:

For the TAM in question we have two different storage areas, both are
heaps. Both relations use the same attributes "publicly" (they are
internally different, but we transform them to look the same). One of the
relations is the "default" one and is stored in rd_rel. In order to run
ANALYZE, we need to sample blocks from both relations, in slightly
different ways.

With the old interface, we faked the number of blocks in relation_size()
callback and claimed that there were N + M blocks. When then being asked
about a block by block number, we could easily pick the correct relation
and just forward the call.

With the new ReadStream API, a read-stream is (automatically) set up on the
"default" relation, but we can set up a separate read-stream inside the TAM
for the other relation. However, the difficulty is in setting it up
correctly:

We cannot use the "fake number of block"-trick since the read stream does
not only compute the block number, but actually tries to read the buffer in
the relation provided when setting up the read stream, so a block number
outside the range of this relation will not be found since it is in a
different relation.

If we could create our own read stream with both relations, that could be
solved and we could just implement the same logic, but direct it to the
correct relations depending on where we want to read the block. Unless I am
mistaken, there is already support for this since there is an array of
in-progress I/O and it would be trivial to extend this with more
relations+forks, if you have access to the structure definition. The
ReadStream struct is, however, an opaque struct so it's hard to hack around
with it. Just making the struct declaration public would potentially solve
a lot of problems here. (See attached patch, which is close to the minimum
of what is needed to allow extension writers to tweak the contents.)

Since both relations are using the same attributes with the same
"analyzability", having that information would be useful to compute the
targrows for setting up the additional stream, but it is computed in
do_analyze_rel() and not further propagated, so it needs to be re-computed
if we want to set up a separate read-stream.


> > It would be great if this could be fixed before the PG17 release now
> that 27bc1772fc8 was reverted.
>
> Ack.  Thinking...
>

Right now I think that just making the ReadStream struct available in the
header file is the best approach. It is a safe and low-risk fix (so
something that can be added to a beta) and will allow extension writers to
hack to their hearts' contents. In addition to that, being able to select
what interface to use would also help.



> Random thought: is there a wiki page or something where we can find
> out about all the table AM projects?  For the successor to
> 27bc1772fc8, I hope they'll be following along.
>

At this point, unfortunately not, we are quite early in this. Once I have
something, I'll share.
-- 
Best wishes,
Mats Kindahl, Timescale
From ea4bb194e0dcccac8465b3aa13950f721bde3860 Mon Sep 17 00:00:00 2001
From: Mats Kindahl <mats@timescale.com>
Date: Thu, 29 Aug 2024 10:39:34 +0200
Subject: Make ReadStream struct non-opaque

Move the ReadStream struct and two utility functions from read_stream.c
to read_stream.h to allow extensions to modify the structure if they
need to.
---
 src/backend/storage/aio/read_stream.c |  59 ---------------
 src/include/storage/read_stream.h     | 105 ++++++++++++++++++++++++++
 2 files changed, 105 insertions(+), 59 deletions(-)

diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c
index a83c18c2a4..bf2a679037 100644
--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -97,65 +97,6 @@
 #include "utils/rel.h"
 #include "utils/spccache.h"
 
-typedef struct InProgressIO
-{
-	int16		buffer_index;
-	ReadBuffersOperation op;
-} InProgressIO;
-
-/*
- * State for managing a stream of reads.
- */
-struct ReadStream
-{
-	int16		max_ios;
-	int16		ios_in_progress;
-	int16		queue_size;
-	int16		max_pinned_buffers;
-	int16		pinned_buffers;
-	int16		distance;
-	bool		advice_enabled;
-
-	/*
-	 * Small buffer of block numbers, useful for 'ungetting' to resolve flow
-	 * control problems when I/Os are split.  Also useful for batch-loading
-	 * block numbers in the fast path.
-	 */
-	BlockNumber blocknums[16];
-	int16		blocknums_count;
-	int16		blocknums_next;
-
-	/*
-	 * The callback that will tell us which block numbers to read, and an
-	 * opaque pointer that will be pass to it for its own purposes.
-	 */
-	ReadStreamBlockNumberCB callback;
-	void	   *callback_private_data;
-
-	/* Next expected block, for detecting sequential access. */
-	BlockNumber seq_blocknum;
-
-	/* The read operation we are currently preparing. */
-	BlockNumber pending_read_blocknum;
-	int16		pending_read_nblocks;
-
-	/* Space for buffers and optional per-buffer private data. */
-	size_t		per_buffer_data_size;
-	void	   *per_buffer_data;
-
-	/* Read operations that have been started but not waited for yet. */
-	InProgressIO *ios;
-	int16		oldest_io_index;
-	int16		next_io_index;
-
-	bool		fast_path;
-
-	/* Circular queue of buffers. */
-	int16		oldest_buffer_index;	/* Next pinned buffer to return */
-	int16		next_buffer_index;	/* Index of next buffer to pin */
-	Buffer		buffers[FLEXIBLE_ARRAY_MEMBER];
-};
-
 /*
  * Return a pointer to the per-buffer data by index.
  */
diff --git a/src/include/storage/read_stream.h b/src/include/storage/read_stream.h
index 4e599904f2..006ec3feb1 100644
--- a/src/include/storage/read_stream.h
+++ b/src/include/storage/read_stream.h
@@ -50,6 +50,111 @@ typedef BlockNumber (*ReadStreamBlockNumberCB) (ReadStream *stream,
 												void *callback_private_data,
 												void *per_buffer_data);
 
+/*
+ * State of an in-progress read buffer operation.
+ */
+typedef struct InProgressIO
+{
+	int16		buffer_index;
+	ReadBuffersOperation op;
+} InProgressIO;
+
+/*
+ * State for managing a stream of reads.
+ */
+struct ReadStream
+{
+	int16		max_ios;
+	int16		ios_in_progress;
+	int16		queue_size;
+	int16		max_pinned_buffers;
+	int16		pinned_buffers;
+	int16		distance;
+	bool		advice_enabled;
+
+	/*
+	 * Small buffer of block numbers, useful for 'ungetting' to resolve flow
+	 * control problems when I/Os are split.  Also useful for batch-loading
+	 * block numbers in the fast path.
+	 */
+	BlockNumber blocknums[16];
+	int16		blocknums_count;
+	int16		blocknums_next;
+
+	/*
+	 * The callback that will tell us which block numbers to read, and an
+	 * opaque pointer that will be pass to it for its own purposes.
+	 */
+	ReadStreamBlockNumberCB callback;
+	void	   *callback_private_data;
+
+	/* Next expected block, for detecting sequential access. */
+	BlockNumber seq_blocknum;
+
+	/* The read operation we are currently preparing. */
+	BlockNumber pending_read_blocknum;
+	int16		pending_read_nblocks;
+
+	/* Space for buffers and optional per-buffer private data. */
+	size_t		per_buffer_data_size;
+	void	   *per_buffer_data;
+
+	/* Read operations that have been started but not waited for yet. */
+	InProgressIO *ios;
+	int16		oldest_io_index;
+	int16		next_io_index;
+
+	bool		fast_path;
+
+	/* Circular queue of buffers. */
+	int16		oldest_buffer_index;	/* Next pinned buffer to return */
+	int16		next_buffer_index;	/* Index of next buffer to pin */
+	Buffer		buffers[FLEXIBLE_ARRAY_MEMBER];
+};
+
+/*
+ * Ask the callback which block it would like us to read next, with a small
+ * buffer in front to allow read_stream_unget_block() to work and to allow the
+ * fast path to skip this function and work directly from the array.
+ */
+static inline BlockNumber
+read_stream_get_block(ReadStream *stream, void *per_buffer_data)
+{
+	if (stream->blocknums_next < stream->blocknums_count)
+		return stream->blocknums[stream->blocknums_next++];
+
+	/*
+	 * We only bother to fetch one at a time here (but see the fast path which
+	 * uses more).
+	 */
+	return stream->callback(stream,
+							stream->callback_private_data,
+							per_buffer_data);
+}
+
+/*
+ * In order to deal with short reads in StartReadBuffers(), we sometimes need
+ * to defer handling of a block until later.
+ */
+static inline void
+read_stream_unget_block(ReadStream *stream, BlockNumber blocknum)
+{
+	if (stream->blocknums_next == stream->blocknums_count)
+	{
+		/* Never initialized or entirely consumed.  Re-initialize. */
+		stream->blocknums[0] = blocknum;
+		stream->blocknums_count = 1;
+		stream->blocknums_next = 0;
+	}
+	else
+	{
+		/* Must be the last value return from blocknums array. */
+		Assert(stream->blocknums_next > 0);
+		stream->blocknums_next--;
+		Assert(stream->blocknums[stream->blocknums_next] == blocknum);
+	}
+}
+
 extern ReadStream *read_stream_begin_relation(int flags,
 											  BufferAccessStrategy strategy,
 											  Relation rel,
-- 
2.34.1

Reply via email to