Michail Nikolaev <michail.nikol...@gmail.com> writes: > [ offset_index_only_v4.patch ]
I started to go through this patch with the intention of committing it, but the more I looked at it the less happy I got. What you've done to IndexNext() is a complete disaster from a modularity standpoint: it now knows all about the interactions between index_getnext, index_getnext_tid, and index_fetch_heap. Or that is, it knows too much and still not enough, because it's flat out wrong for the case that xs_continue_hot is set. You can't call index_getnext_tid when that's still true from last time. I'm not sure about a nicer way to refactor that, but I'd suggest that maybe you need an additional function in indexam.c that hides all this knowledge about the internal behavior of an IndexScanDesc. The PredicateLockPage call also troubles me quite a bit, not only from a modularity standpoint but because that implies a somewhat-user-visible behavioral change when this optimization activates. People who are using serializable mode are not going to find it to be an improvement if their queries fail and need retries a lot more often than they did before. I don't know if that problem is bad enough that we should disable skipping when serializable mode is active, but it's something to think about. You haven't documented the behavior required for tuple-skipping in any meaningful fashion, particularly not the expectation that the child plan node will still return tuples that just need not contain any valid content. Also, this particular implementation of that: + ExecClearTuple(slot); + slot->tts_isempty = false; + slot->tts_nvalid = 0; is a gross hack and probably wrong. You could use ExecStoreAllNullTuple, perhaps. I'm disturbed by the fact that you've not taught the planner about the potential cost saving from this, so that it won't have any particular reason to pick a regular indexscan over some other plan type when this optimization applies. Maybe there's no practical way to do that, or maybe it wouldn't really matter in practice; I've not looked into it. But not doing anything feels like a hack. Setting this back to Waiting on Author. regards, tom lane