I wrote:
> OK, I think I see it.  The problem is that the code in slru.c is careful
> about not modifying state when it doesn't hold the proper lock, but not
> so careful about not *inspecting* state without the proper lock.
> ...
> I'm still thinking about how to make a real fix without introducing
> another lock/unlock cycle --- we can do that if we have to, but I think
> maybe it's possible to fix it without.

Attached is a proposed patch to fix up slru.c so that it's not playing
any games by either reading or writing shared state without holding
the ControlLock for the SLRU set.

The main problem with the existing code, as I now see it, was a poor
choice of representation of page state: we had states CLEAN, DIRTY, and
WRITE_IN_PROGRESS, which meant that re-dirtying a page while a write was
in progress required setting the state back to DIRTY, which hid the fact
that indeed a write was in progress.  So the I/O code was designed to
cope with not knowing whether another write was already in progress.  We
can make it a whole lot cleaner by changing the state representation so
that we can tell the difference --- then, we can know before releasing
the ControlLock whether we need to write the page or just wait for
someone else to finish writing it.  And that means we can do all the
state-manipulation before releasing the lock.

I could have just added an additional state WRITE_IN_PROGRESS_REDIRTIED,
or some such, but it seemed notationally cleaner to create a separate
"page_dirty" boolean, and reduce the number of states to four (empty,
read-in-progress, valid, write-in-progress).  This lets outside code
such as clog.c just set "page_dirty = true" rather than doing a complex
bit of logic to change the state value properly.

The patch is a bit bulky because of the representation change, but the
key changes are localized in SimpleLruReadPage and SimpleLruWritePage.

I think this code is a whole lot cleaner than it was before, but it's
a bit of a large change to be making so late in the 8.1 cycle (not to
mention that we really ought to back-patch similar changes all the way
back, because the bug exists as far back as 7.2).  I am going to take
another look to see if there is a less invasive change that will fix
the problem at some performance cost; if so, we might want to do it that
way in 8.1 and back branches.

Any comments on the patch, or thoughts on how to proceed?

                        regards, tom lane

Attachment: binYlP4HAkp8I.bin
Description: slru-race-1.patch

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not

Reply via email to