On Mon, Apr 2, 2012 at 12:00 AM, Greg Stark <st...@mit.edu> wrote: > On Sun, Apr 1, 2012 at 4:05 AM, Robert Haas <robertmh...@gmail.com> wrote: >> My guess based on previous testing is >> that what's happening here is (1) we examine a tuple on an old page >> and decide we must look up its XID, (2) the relevant CLOG page isn't >> in cache so we decide to read it, but (3) the page we decide to evict >> happens to be dirty, so we have to write it first. > > Reading the code one possibility is that in the time we write the > oldest slru page another process has come along and redirtied it. So > we pick a new oldest slru page and write that. By the time we've > written it another process could have redirtied it again. On a loaded > system where the writes are taking 100ms or more it's conceivable -- > barely -- that could happen over and over again hundreds of times.
That's a valid concern but I don't think the instrumentation would show that as a single long wait because the locks would be released and be retaken each time around the loop - I guess that's for Robert to explain how it would show up. If it doesn't show it, then the actual max wait time could be even higher. ;-( -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers