Hi, On 2024-04-16 08:31:24 -0400, Robert Haas wrote: > On Tue, Apr 16, 2024 at 6:52 AM Alexander Korotkov <aekorot...@gmail.com> > wrote: > > Taking a closer look at acquire_sample_rows(), I think it would be > > good if table AM implementation would care about block-level (or > > whatever-level) sampling. So that acquire_sample_rows() just fetches > > tuples one-by-one from table AM implementation without any care about > > blocks. Possible table_beginscan_analyze() could take an argument of > > target number of tuples, then those tuples are just fetches with > > table_scan_analyze_next_tuple(). What do you think? > > Andres is the expert here, but FWIW, that plan seems reasonable to me. > One downside is that every block-based tableam is going to end up with > a very similar implementation, which is kind of something I don't like > about the tableam API in general: if you want to make something that > is basically heap plus a little bit of special sauce, you have to copy > a mountain of code. Right now we don't really care about that problem, > because we don't have any other tableams in core, but if we ever do, I > think we're going to find ourselves very unhappy with that aspect of > things. But maybe now is not the time to start worrying. That problem > isn't unique to analyze, and giving out-of-core tableams the > flexibility to do what they want is better than not.
I think that can partially be addressed by having more "block oriented AM" helpers in core, like we have for table_block_parallelscan*. Doesn't work for everything, but should for something like analyze. Greetings, Andres Freund