On Sat, Aug 24, 2024 at 5:31 AM Thomas Munro <thomas.mu...@gmail.com> wrote:
> On Thu, Aug 22, 2024 at 7:31 PM Mats Kindahl <m...@timescale.com> wrote: > > The alternate version proposed by Nazir allows you to deide which > interface to use. > > Reverting the patch entirely would also solve the problem. > After digging through the code a little more I discovered that there actually is another one: move the ReadStream struct into read_stream.h. > > Passing down the block sampler and the strategy to scan_begin() and move > the ReadStream setup in analyze.c into initscan() in heapam.c, but this > requires adding new parameters to this function. > > Having accessors that allow you to get the block sampler and strategy > from the ReadStream object. > > I'm a bit confused about how it can make sense to use the same > BlockSampler with a side relation/fork. Could you point me at the > code? > Sorry, that was a bit unclear. Intention was not to re-use the block sampler but to set a new one up with parameters from the original block sampler, which would require access to it. (The strategy is less of a problem since only one is used.) To elaborate on the situation: For the TAM in question we have two different storage areas, both are heaps. Both relations use the same attributes "publicly" (they are internally different, but we transform them to look the same). One of the relations is the "default" one and is stored in rd_rel. In order to run ANALYZE, we need to sample blocks from both relations, in slightly different ways. With the old interface, we faked the number of blocks in relation_size() callback and claimed that there were N + M blocks. When then being asked about a block by block number, we could easily pick the correct relation and just forward the call. With the new ReadStream API, a read-stream is (automatically) set up on the "default" relation, but we can set up a separate read-stream inside the TAM for the other relation. However, the difficulty is in setting it up correctly: We cannot use the "fake number of block"-trick since the read stream does not only compute the block number, but actually tries to read the buffer in the relation provided when setting up the read stream, so a block number outside the range of this relation will not be found since it is in a different relation. If we could create our own read stream with both relations, that could be solved and we could just implement the same logic, but direct it to the correct relations depending on where we want to read the block. Unless I am mistaken, there is already support for this since there is an array of in-progress I/O and it would be trivial to extend this with more relations+forks, if you have access to the structure definition. The ReadStream struct is, however, an opaque struct so it's hard to hack around with it. Just making the struct declaration public would potentially solve a lot of problems here. (See attached patch, which is close to the minimum of what is needed to allow extension writers to tweak the contents.) Since both relations are using the same attributes with the same "analyzability", having that information would be useful to compute the targrows for setting up the additional stream, but it is computed in do_analyze_rel() and not further propagated, so it needs to be re-computed if we want to set up a separate read-stream. > > It would be great if this could be fixed before the PG17 release now > that 27bc1772fc8 was reverted. > > Ack. Thinking... > Right now I think that just making the ReadStream struct available in the header file is the best approach. It is a safe and low-risk fix (so something that can be added to a beta) and will allow extension writers to hack to their hearts' contents. In addition to that, being able to select what interface to use would also help. > Random thought: is there a wiki page or something where we can find > out about all the table AM projects? For the successor to > 27bc1772fc8, I hope they'll be following along. > At this point, unfortunately not, we are quite early in this. Once I have something, I'll share. -- Best wishes, Mats Kindahl, Timescale
From ea4bb194e0dcccac8465b3aa13950f721bde3860 Mon Sep 17 00:00:00 2001 From: Mats Kindahl <mats@timescale.com> Date: Thu, 29 Aug 2024 10:39:34 +0200 Subject: Make ReadStream struct non-opaque Move the ReadStream struct and two utility functions from read_stream.c to read_stream.h to allow extensions to modify the structure if they need to. --- src/backend/storage/aio/read_stream.c | 59 --------------- src/include/storage/read_stream.h | 105 ++++++++++++++++++++++++++ 2 files changed, 105 insertions(+), 59 deletions(-) diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c index a83c18c2a4..bf2a679037 100644 --- a/src/backend/storage/aio/read_stream.c +++ b/src/backend/storage/aio/read_stream.c @@ -97,65 +97,6 @@ #include "utils/rel.h" #include "utils/spccache.h" -typedef struct InProgressIO -{ - int16 buffer_index; - ReadBuffersOperation op; -} InProgressIO; - -/* - * State for managing a stream of reads. - */ -struct ReadStream -{ - int16 max_ios; - int16 ios_in_progress; - int16 queue_size; - int16 max_pinned_buffers; - int16 pinned_buffers; - int16 distance; - bool advice_enabled; - - /* - * Small buffer of block numbers, useful for 'ungetting' to resolve flow - * control problems when I/Os are split. Also useful for batch-loading - * block numbers in the fast path. - */ - BlockNumber blocknums[16]; - int16 blocknums_count; - int16 blocknums_next; - - /* - * The callback that will tell us which block numbers to read, and an - * opaque pointer that will be pass to it for its own purposes. - */ - ReadStreamBlockNumberCB callback; - void *callback_private_data; - - /* Next expected block, for detecting sequential access. */ - BlockNumber seq_blocknum; - - /* The read operation we are currently preparing. */ - BlockNumber pending_read_blocknum; - int16 pending_read_nblocks; - - /* Space for buffers and optional per-buffer private data. */ - size_t per_buffer_data_size; - void *per_buffer_data; - - /* Read operations that have been started but not waited for yet. */ - InProgressIO *ios; - int16 oldest_io_index; - int16 next_io_index; - - bool fast_path; - - /* Circular queue of buffers. */ - int16 oldest_buffer_index; /* Next pinned buffer to return */ - int16 next_buffer_index; /* Index of next buffer to pin */ - Buffer buffers[FLEXIBLE_ARRAY_MEMBER]; -}; - /* * Return a pointer to the per-buffer data by index. */ diff --git a/src/include/storage/read_stream.h b/src/include/storage/read_stream.h index 4e599904f2..006ec3feb1 100644 --- a/src/include/storage/read_stream.h +++ b/src/include/storage/read_stream.h @@ -50,6 +50,111 @@ typedef BlockNumber (*ReadStreamBlockNumberCB) (ReadStream *stream, void *callback_private_data, void *per_buffer_data); +/* + * State of an in-progress read buffer operation. + */ +typedef struct InProgressIO +{ + int16 buffer_index; + ReadBuffersOperation op; +} InProgressIO; + +/* + * State for managing a stream of reads. + */ +struct ReadStream +{ + int16 max_ios; + int16 ios_in_progress; + int16 queue_size; + int16 max_pinned_buffers; + int16 pinned_buffers; + int16 distance; + bool advice_enabled; + + /* + * Small buffer of block numbers, useful for 'ungetting' to resolve flow + * control problems when I/Os are split. Also useful for batch-loading + * block numbers in the fast path. + */ + BlockNumber blocknums[16]; + int16 blocknums_count; + int16 blocknums_next; + + /* + * The callback that will tell us which block numbers to read, and an + * opaque pointer that will be pass to it for its own purposes. + */ + ReadStreamBlockNumberCB callback; + void *callback_private_data; + + /* Next expected block, for detecting sequential access. */ + BlockNumber seq_blocknum; + + /* The read operation we are currently preparing. */ + BlockNumber pending_read_blocknum; + int16 pending_read_nblocks; + + /* Space for buffers and optional per-buffer private data. */ + size_t per_buffer_data_size; + void *per_buffer_data; + + /* Read operations that have been started but not waited for yet. */ + InProgressIO *ios; + int16 oldest_io_index; + int16 next_io_index; + + bool fast_path; + + /* Circular queue of buffers. */ + int16 oldest_buffer_index; /* Next pinned buffer to return */ + int16 next_buffer_index; /* Index of next buffer to pin */ + Buffer buffers[FLEXIBLE_ARRAY_MEMBER]; +}; + +/* + * Ask the callback which block it would like us to read next, with a small + * buffer in front to allow read_stream_unget_block() to work and to allow the + * fast path to skip this function and work directly from the array. + */ +static inline BlockNumber +read_stream_get_block(ReadStream *stream, void *per_buffer_data) +{ + if (stream->blocknums_next < stream->blocknums_count) + return stream->blocknums[stream->blocknums_next++]; + + /* + * We only bother to fetch one at a time here (but see the fast path which + * uses more). + */ + return stream->callback(stream, + stream->callback_private_data, + per_buffer_data); +} + +/* + * In order to deal with short reads in StartReadBuffers(), we sometimes need + * to defer handling of a block until later. + */ +static inline void +read_stream_unget_block(ReadStream *stream, BlockNumber blocknum) +{ + if (stream->blocknums_next == stream->blocknums_count) + { + /* Never initialized or entirely consumed. Re-initialize. */ + stream->blocknums[0] = blocknum; + stream->blocknums_count = 1; + stream->blocknums_next = 0; + } + else + { + /* Must be the last value return from blocknums array. */ + Assert(stream->blocknums_next > 0); + stream->blocknums_next--; + Assert(stream->blocknums[stream->blocknums_next] == blocknum); + } +} + extern ReadStream *read_stream_begin_relation(int flags, BufferAccessStrategy strategy, Relation rel, -- 2.34.1