On Sat, Jul 19, 2025 at 6:31 AM Tomas Vondra <to...@vondra.me> wrote: > Perhaps the ReadStream should do something like this? Of course, the > simple patch resets the stream very often, likely mcuh more often than > anything else in the code. But wouldn't it be beneficial for streams > reset because of a rescan? Possibly needs to be optional.
Right, that's also discussed, with a similar patch, here: https://www.postgresql.org/message-id/CA%2BhUKG%2Bx2BcqWzBC77cN0ewhzMF0kYhC6c4G_T2gJLPbqYQ6Ow%40mail.gmail.com Resetting the distance was a short-sighted mistake: I was thinking about rescans, the original use case for the reset operation, and guessing that the data would remain cached. But all the new users of _reset() have a completely different motivation, namely temporary exhaustion in their source data, so that guess was simply wrong. There was also some discussion at the time about whether "reset so I can rescan", and "reset so I can continue after a temporary stop" should be different operations requiring different APIs. It now seems like one operation is sufficient, but it should preserve the distance as you showed and then let the algorithm learn about already-cached data in the rescan case (if it is even true then, which is also debatable since it depends on the size of the scan). So, I think we should just go ahead and commit a patch like that.