Andres Freund <and...@anarazel.de> writes:
> A bit of food, a coke and a talk later, here's a first draft *prototype*
> of how this could be solved. ...
> Obviously this is far from clean enough, but what do you think about the
> basic approach?  It does, in my limited testing, indeed solve the "could
> not read block" issue.

A couple thoughts after reading and reflecting for awhile:

1. I don't much like the pending_rebuilds list, mainly because of this
consideration: what happens if we hit an OOM error trying to add an entry
to that list?  As you've drafted the patch, we don't even mark the
relevant relcache entry rd_invalid before that fails, so that's surely
bad.  Now, I'm not sure how bulletproof relcache inval is in general
with respect to OOM failures, but let's not add more hazards.

2. I think we may need to address the same order-of-operations hazards
as RelationCacheInvalidate() worries about.  Alternatively, maybe we
could simplify that function by making it use the same
delayed-revalidation logic as we're going to develop for this.

3. I don't at all like the ProcessPendingRelcacheRebuilds call you added
to ProcessInvalidationMessages.  That's effectively assuming that the
"func" *must* be LocalExecuteInvalidationMessage and not anything else;
likewise, the lack of such a call inside ProcessInvalidationMessagesMulti
presumes that that one is never called to actually execute invalidations.
(While those are true statements, it's a horrible violation of modularity
for these two functions to know it.)  Probably better to put this into the
callers, which will know what the actual semantics are.

4. The call added to the middle of ReceiveSharedInvalidMessages doesn't
seem all that safe either; the problem is its relationship to the
"catchup" processing.  We are caught up at the moment we exit the loop,
but are we sure we still would be after doing assorted work for relcache
rebuild?  Swapping the order of the two steps might help, but then we
have to consider what happens if we error out from SICleanupQueue.

(In general, the hard part of all this stuff is being sure that sane
things happen if you error out part way through ...)

                        regards, tom lane

Reply via email to