Re: [HACKERS] hint bit cache v6

Robert Haas Wed, 29 Jun 2011 09:11:27 -0700

On Fri, May 13, 2011 at 3:48 PM, Merlin Moncure <[email protected]> wrote:
> what's changed:
> *) as advertised, i'm no longer bothering to cache invalid bits.  hint
> bit i/o via rollbacked transactions is not a big deal IMNSHO, so they
> will work as they have always done.
> *) all the tuple visibility routines are now participating in the cache
> *) xmax commit bits are now being cached, not just xmin.  this
> prevents hint bit i/o following deletes.
> *) juggled the cache interaction a bit so the changes to the
> visibility routines could be a lot more surgical. specifically, I
> reverted SetHintBits() to return void and put the cache adjustment
> inside 'SetHintBitsCache()' which sets the bits, dirties, and adjusts
> the cache.  SetHintBitsNonDirty() only sets the bits without dirtying
> the page, and is called when you get a cache hit.
> *) various minor cleanups, decided to keep pageindex as type
> TransactionId since that's what clog does.
> *) made a proper init function, put cache data into
> CacheMemoryContext, and moved the cache data pointer references into
> the page structure
>
> performance testing done:
> *) select only pgbench and standard i/o pgbech tests don't show
> performance difference within statistical noise.
>
> The test I need to do, a cache and clog thrashing test, I haven't
> found a clean way to put together yet.  I'm really skeptical that any
> problems will show up there. That said, I am satisfied the patch does
> what it is supposed to do -- eliminates hint bit i/o without for cases
> where it is a big headache without penalizing other cases.  Anyone
> that would like to comment on the technique or other points of the
> patch please do so (otherwise it's time for the 'fest).


OK, so I read through this patch.  I agree with Simon's comment on the
other thread that it's probably not ready for commit right at the
moment, but also want to offer a few other thoughts.

The patch fails to conform to our coding style in a variety of ways,
but I don't want to get bogged down in that at this point.  The first
question we need to answer here is: What is this patch doing?  And do
we want that?

With regards to the first question, it seems to me that the patch is
doing two separate but related things.

1. It allocates four 8kB pages in backend-local memory, each of which
can store one bit for each XID in a 64K range.  The bit is set if the
XID is known to be committed.  If we find this bit set, then we
needn't consult CLOG.  This reduces CLOG traffic, at the cost of
potentially doing more work in the case where the cache misses.  Every
so often, we reconsider which XID ranges should be represented in our
cache and consider replacing an existing page in the cache with some
other one.

2. When we probe the cache and hit, we set the hint bit on the tuple
*without dirtying the page*.  Thus, the hint bit will only get written
back to disk if the page is dirtied for some other reason.  This will
frequently reduce the amount of write I/O generated by large table
scans, at the cost of possibly generating additional work for other
backends who try to read this data later.

Now, the interesting thing about the patch is that the downsides of #2
are considerably mitigated by #1.  If we assume for the sake of
argument that the backend-local cache is really, really fast, and that
the cache replacement policy is sound, then one is just as good as the
other, and the only time we need the hint bits is when the cache
overflows, which just so happens to be exactly the patch forces them
to be written out to disk.  That's pretty clever.  The exact way this
is implemented is a little bit grotty, but I feel like it can work.
So I'm inclined to think that, at 10,000 feet, if we can convince
ourselves that the basic idea of sticking a cache in here is OK, then
the additional refinement of declining to dirty the page when the
cache, rather than CLOG, tell us that the XID is committed is probably
OK too.

With respect to the cache itself, I think the part that concerns me
most is the cache replacement algorithm.  It's obvious that straight
LRU can't work here, since that could easily result in horrible
thrashing behavior.  But it's not clear to me that the actual
algorithm, which considers evicting a page after every 100 misses, is
all that great either.  Each page stores 64K transaction IDs, and
after being in cache for a while, you might have 25% or 50% of those
bits set, which is quite an asset.  Now you hit a run of 100 tuples
with some XID not represented in that cache and you chuck that page
out the window and replace it with a page that has just that one bit
set.  Now you hit another run of 100 tuples with XIDs that would have
hit the old page and flip it back; but now you've lost all those
valuable bits you had set.  I'm not sure how likely that is to happen
in real life, but it certainly seems like it could be a bit painful if
it did.  The cache replacement algorithm is not cheap.

Rather than just fiddling with the thresholds, it would be nice to use
some sort of algorithm here that would be inherently resistant to
cache thrashing.  For example, if we always just cached the four most
recent pages of XIDs, and only ever replaced the oldest page we had in
the cache with one newer than any page we had in the cache, that would
make cache thrashing basically impossible.  Fairly obviously, that
particular algorithm is probably not a good idea because it makes it
impossible for us to adapt the cache contents to the data that is
actually in the table, so we need something a bit smarter.  It might
be worth doing a search of the academic literature on cache
replacement policies and see if anyone else has come up with a good
algorithm for this kind of situation.  Just to be clear, I'm not
saying your algorithm is definitely bad; just that it seems possible
that it might be bad, and that it feels like there's not a lot of
justification for the particular method you've chosen rather than
anything else.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hint bit cache v6

Reply via email to