On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas <robertmh...@gmail.com> wrote:

>> I'll do some testing to try to confirm whether this theory is correct
>> and whether the above fix helps.

 Very interesting work.


> Having performed this investigation, I've discovered a couple of
> interesting things.  First, SlruRecentlyUsed() is an ineffective way
> of keeping a page from getting reused, because it's called extremely
> frequently, and on these high-velocity tests it takes almost no time
> at all for the most recently used buffer to become the least recently
> used buffer.

Measurement?

> Therefore, SlruRecentlyUsed() doesn't prevent the lock
> pile-up.  In the unpatched code, once a long buffer I/O starts,
> everybody immediately goes into the tank until the I/O finishes.  If
> you patch the code so that the page is marked recently-used before
> beginning the I/O, everybody's next few CLOG requests hit some other
> buffer but eventually the long-I/O-in-progress buffer again becomes
> least recently used and the next CLOG eviction causes a second backend
> to begin waiting for that buffer.  Lather, rinse, repeat, until
> literally every backend is once again waiting on that buffer I/O.  You
> still get the same problem; it just takes slightly longer to develop.

Sounds believable, I just want to make sure we have measured things.

> On reflection, it seems to me that the right fix here is to make
> SlruSelectLRUPage() to avoid selecting a page on which an I/O is
> already in progress.  In general, those I/Os are all writes.  We don't
> end up waiting for reads because all the old CLOG pages we might want
> to read are still in the OS cache.  So reads complete quickly, on this
> test.

I believe that, but if all buffers are I/O busy we should avoid
waiting on a write I/O if possible.

>  Writes take a long time, because there we have to actually get
> the data down to disk, and the disk is busy.  But there's no reason
> for a backend doing a replacement to wait for either a read or a write
> that is in progress: once the read or write completes, we're going to
> loop around and repeat the buffer selection process, and most likely
> pick a buffer completely unrelated to the one whose I/O we waited for.
>  We might as well just skip the wait and select that other buffer
> immediately.  The attached patch implements that.

That seems much smarter. I'm thinking this should be back patched
because it appears to be fairly major, so I'm asking for some more
certainty that every thing you say here is valid. No doubt much of it
is valid, but that's not enough.

> Applying this patch does in fact eliminate the stalls.

I'd like to see that measured from a user perspective. It would be
good to see the response time distribution for run with and without
the patch.

> 2. I think we might want to revisit Simon's idea of background-writing
> SLRU pages.

Agreed. No longer anywhere near as important. I'll take a little
credit for identifying the right bottleneck, since you weren't a
believer before.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to