On Thu, Nov 15, 2012 at 7:42 PM, Jeff Davis <pg...@j-davis.com> wrote:
> But the other tuple hint bits seem to be there just for symmetry,
> because they shouldn't last long. If HEAP_XMIN_INVALID or
> HEAP_XMAX_COMMITTED is set, then it's (hopefully) going to be vacuumed
> soon, and gone completely. And if HEAP_XMAX_INVALID is set, then it
> should just be changed to InvalidTransactionId.

"Soon" is a relative term.  I doubt that we can really rely on vacuum
to be timely enough to avoid pain here - you can easily have tens of
thousands of hits on the same tuple before vacuum gets around to
dealing with it.  Now, we might be able to rejigger things to avoid
that.  For example, maybe it'd be possible to arrange things so that
when we see an invalid xmin, we set the flag that triggers a HOT prune
instead of setting the hint bit.  That would probably be good enough
to dispense with the hint bit, and maybe better altogether better than
the current system, because now the next time someone (including us)
locks the buffer we'll nuke the entire tuple, which would not only
make it cheaper to scan but also frees up space in the buffer sooner.

However, that solution only works for invalid-xmin.  For
committed-xmax, there could actually be quite a long time before the
page can be pruned, because there can be some other backend holding an
old snapshot open.  A one-minute reporting query in another database,
which is hardly an unreasonable scenario, could result in many, many
additional CLOG lookups, which are already a major contention point at
high concurrencies.  I think that bit is probably pretty important,
and I don't see a viable way to get rid of it, though maybe someone
can think of one.  For invalid-xmax, I agree that we could probably
just change xmax to InvalidTransactionId, if we need to save
bit-space.  In the past Tom and I think also Alvaro have been
skeptical about anything that would overwrite xmin/xmax values too
quickly for forensic reasons, but maybe it's worth considering.

> Also, I am wondering about PD_ALL_VISIBLE. It was originally introduced
> in the visibility map patch, apparently as a way to know when to clear
> the VM bit when doing an update. It was then also used for scans, which
> showed a significant speedup. But I wonder: why not just use the
> visibilitymap directly from those places?

Well, you'd have to look up, lock and pin the page to do that.  I
suspect that overhead is pretty significant.  The benefit of noticing
that the flag is set is that you need not call HeapTupleSatisfiesMVCC
for each tuple on the page: checking one bit in the page header is a
lot cheaper than calling that function for every tuple.  However, if
you had to lock and pin a second page in order to check whether the
page is all-visible, I suspect it wouldn't be a win; you'd probably be
better off just doing the HeapTupleSatisfiesMVCC checks for each
tuple.

One of the main advantages of PD_ALL_VISIBLE is that if you do an
insert, update, or delete on a page where that bit isn't set, you need
not lock and pin the visibility map page, because you already know
that the bit will be clear in the visibility map.   If the data is
being rapidly modified, you'll get the benefit of this optimization
most of the time, only losing it when vacuum has visited recently.  I
hope that's not premature optimization because I sure sweat a lot of
blood last release cycle to keep it working like that.  I had a few
doubts at the time about how much we were winning there, but I don't
actually have any hard data either way, so I would be reluctant to
assume it doesn't matter.

Even if it doesn't, the sequential-scan optimization definitely
matters a LOT, as you can easily verify.

One approach that I've been hoping to pursue is to find a way to make
CLOG lookups cheaper and more concurrent.  I started to work on some
concurrent hash table code, which you can find here:

http://git.postgresql.org/gitweb/?p=users/rhaas/postgres.git;a=shortlog;h=refs/heads/chash

The concurrency properties of this code are vastly better than what we
have now, but there are cases where it loses vs. dynahash when there's
no concurrency.  That might be fixable or just not a big deal, though.
 A bigger problem is that I got sucked off into other things before I
was able to get as far with it as I wanted to; in particular, I have
only unit test results for it, and haven't tried to integrate it into
the SLRU code yet.

But I'm not sure any of this is going to fundamentally chip away at
the need for hint bits all that much.  Making CLOG lookups cheaper or
less frequent is all to the good, but the prognosis for improving
things enough that we can dump some or all of the hint bits completely
seems uncertain at best.  Even if we COULD dump everything but
heap-xmin-committed, how much would that really help with the
disk-write problem?   I bet heap-xmin-committed gets set far more
often than the other three put together.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to