Hello. Tom, thanks a lot for your thorough review.
> What you've done to > IndexNext() is a complete disaster from a modularity standpoint: it now > knows all about the interactions between index_getnext, index_getnext_tid, > and index_fetch_heap. I was looking into the current IndexOnlyNext implementation as a starting point - and it knows about index_getnext_tid and index_fetch_heap already. At the same time I was trying to keep patch non-invasive. Patched IndexNext now only knowns about index_getnext_tid and index_fetch_heap - the same as IndexOnlyNext. But yes, it probably could be done better. > I'm not sure about a nicer way to refactor that, but I'd suggest that > maybe you need an additional function in indexam.c that hides all this > knowledge about the internal behavior of an IndexScanDesc. I'll try to split index_getnext into two functions. A new one will do everything index_getnext does except index_fetch_heap. > Or that is, it knows too much and still not enough, > because it's flat out wrong for the case that xs_continue_hot is set. > You can't call index_getnext_tid when that's still true from last time. Oh.. Yes, clear error here. < The PredicateLockPage call also troubles me quite a bit, not only from < a modularity standpoint but because that implies a somewhat-user-visible < behavioral change when this optimization activates. People who are using < serializable mode are not going to find it to be an improvement if their < queries fail and need retries a lot more often than they did before. < I don't know if that problem is bad enough that we should disable skipping < when serializable mode is active, but it's something to think about. Current IndexOnlyScan already does that. And I think a user should expect such a change in serializable mode. > You haven't documented the behavior required for tuple-skipping in any > meaningful fashion, particularly not the expectation that the child plan > node will still return tuples that just need not contain any valid > content. Only nodeLimit could receive such tuples and they are immediately discarded. I'll add some comment to it. > is a gross hack and probably wrong. You could use ExecStoreAllNullTuple, > perhaps. Oh, nice, missed it. > I'm disturbed by the fact that you've not taught the planner about the > potential cost saving from this, so that it won't have any particular > reason to pick a regular indexscan over some other plan type when this > optimization applies. Maybe there's no practical way to do that, or maybe > it wouldn't really matter in practice; I've not looked into it. But not > doing anything feels like a hack. I was trying to do it. But current planner architecture does not provide a nice way to achive it due to the way it handles limit and offset. So, I think it is better to to be avoided for now. > Setting this back to Waiting on Author. I'll try to make the required changes in a few days. Thanks.
