On Thu, Jan 4, 2024 at 9:55 AM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > Here's a somewhat reworked version of the patch. My initial goal was to > see if it could adopt the StreamingRead API proposed in [1], but that > turned out to be less straight-forward than I hoped, for two reasons:
I guess we need Thomas or Andres or maybe Melanie to comment on this. > Perhaps a bigger change is that I decided to move this into a separate > API on top of indexam.c. The original idea was to integrate this into > index_getnext_tid/index_getnext_slot, so that all callers benefit from > the prefetching automatically. Which would be nice, but it also meant > it's need to happen in the indexam.c code, which seemed dirty. This patch is hard to review right now because there's a bunch of comment updating that doesn't seem to have been done for the new design. For instance: + * XXX This does not support prefetching of heap pages. When such prefetching is + * desirable, use index_getnext_tid(). But not any more. + * XXX The prefetching may interfere with the patch allowing us to evaluate + * conditions on the index tuple, in which case we may not need the heap + * tuple. Maybe if there's such filter, we should prefetch only pages that + * are not all-visible (and the same idea would also work for IOS), but + * it also makes the indexing a bit "aware" of the visibility stuff (which + * seems a somewhat wrong). Also, maybe we should consider the filter selectivity I'm not sure whether all the problems in this area are solved, but I think you've solved enough of them that this at least needs rewording, if not removing. + * XXX Comment/check seems obsolete. This occurs in two places. I'm not sure if it's accurate or not. + * XXX Could this be an issue for the prefetching? What if we prefetch something + * but the direction changes before we get to the read? If that could happen, + * maybe we should discard the prefetched data and go back? But can we even + * do that, if we already fetched some TIDs from the index? I don't think + * indexorderdir can't change, but es_direction maybe can? But your email claims that "The patch simply disables prefetching for such queries, using the same logic that we do for parallelism." FWIW, I think that's a fine way to handle that case. + * XXX Maybe we should enable prefetching, but prefetch only pages that + * are not all-visible (but checking that from the index code seems like + * a violation of layering etc). Isn't this fixed now? Note this comment occurs twice. + * XXX We need to disable this in some cases (e.g. when using index-only + * scans, we don't want to prefetch pages). Or maybe we should prefetch + * only pages that are not all-visible, that'd be even better. Here again. And now for some comments on other parts of the patch, mostly other XXX comments: + * XXX This does not support prefetching of heap pages. When such prefetching is + * desirable, use index_getnext_tid(). There's probably no reason to write XXX here. The comment is fine. + * XXX Notice we haven't added the block to the block queue yet, and there + * is a preceding block (i.e. blockIndex-1 is valid). Same here, possibly? If this XXX indicates a defect in the code, I don't know what the defect is, so I guess it needs to be more clear. If it is just explaining the code, then there's no reason for the comment to say XXX. + * XXX Could it be harmful that we read the queue backwards? Maybe memory + * prefetching works better for the forward direction? It does. But I don't know whether that matters here or not. + * XXX We do add the cache size to the request in order not to + * have issues with uint64 underflows. I don't know what this means. + * XXX not sure this correctly handles xs_heap_continue - see index_getnext_slot, + * maybe nodeIndexscan needs to do something more to handle this? Although, that + * should be in the indexscan next_cb callback, probably. + * + * XXX If xs_heap_continue=true, we need to return the last TID. You've got a bunch of comments about xs_heap_continue here -- and I don't fully understand what the issues are here with respect to this particular patch, but I think that the general purpose of xs_heap_continue is to handle the case where we need to return more than one tuple from the same HOT chain. With an MVCC snapshot that doesn't happen, but with say SnapshotAny or SnapshotDirty, it could. As far as possible, the prefetcher shouldn't be involved at all when xs_heap_continue is set, I believe, because in that case we're just returning a bunch of tuples from the same page, and the extra fetches from that heap page shouldn't trigger or require any further prefetching. + * XXX Should this also look at plan.plan_rows and maybe cap the target + * to that? Pointless to prefetch more than we expect to use. Or maybe + * just reset to that value during prefetching, after reading the next + * index page (or rather after rescan)? It seems questionable to use plan_rows here because (1) I don't think we have existing cases where we use the estimated row count in the executor for anything, we just carry it through so EXPLAIN can print it and (2) row count estimates can be really far off, especially if we're on the inner side of a nested loop, we might like to figure that out eventually instead of just DTWT forever. But on the other hand this does feel like an important case where we have a clue that prefetching might need to be done less aggressively or not at all, and it doesn't seem right to ignore that signal either. I wonder if we want this shaped in some other way, like a Boolean that says are-we-under-a-potentially-row-limiting-construct e.g. limit or inner side of a semi-join or anti-join. + * We reach here if the index only scan is not parallel, or if we're + * serially executing an index only scan that was planned to be + * parallel. Well, this seems sad. + * XXX This might lead to IOS being slower than plain index scan, if the + * table has a lot of pages that need recheck. How? + /* + * XXX Only allow index prefetching when parallelModeOK=true. This is a bit + * of a misuse of the flag, but we need to disable prefetching for cursors + * (which might change direction), and parallelModeOK does that. But maybe + * we might (or should) have a separate flag. + */ I think the correct flag to be using here is execute_once, which captures whether the executor could potentially be invoked a second time for the same portal. Changes in the fetch direction are possible if and only if !execute_once. > Note 1: The IndexPrefetch name is a bit misleading, because it's used > even with prefetching disabled - all index reads from the index scan > happen through it. Maybe it should be called IndexReader or something > like that. My biggest gripe here is the capitalization. This version adds, inter alia, IndexPrefetchAlloc, PREFETCH_QUEUE_INDEX, and index_heap_prefetch_target, which seems like one or two too many conventions. But maybe the PREFETCH_* macros don't even belong in a public header. I do like the index_heap_prefetch_* naming. Possibly that's too verbose to use for everything, but calling this index-heap-prefetch rather than index-prefetch seems clearer. -- Robert Haas EDB: http://www.enterprisedb.com