Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

Sokolov Yura Sun, 13 Aug 2017 08:06:06 -0700

В Fri, 11 Aug 2017 14:05:08 -0400
Robert Haas <[email protected]> пишет:

> On Thu, Aug 10, 2017 at 11:12 AM, Alexander Korotkov
> <[email protected]> wrote:
> > These results look very cool!
> > I think CSN is eventually inevitable, but it's a long distance
> > feature. Thus, this optimization could make us a good serve before
> > we would have CSN. Do you expect there are cases when this patch
> > causes slowdown?  What about case when we scan each xip array only
> > once (not sure how to simulate that using pgbench)?  
> 
> Just a random thought here:
> 
> The statements pgbench executes are pretty simple: they touch one row
> in one table.  You wouldn't expect them to scan the xip array very
> many times.  If even those statements touch the array enough for this
> to win, it seems like it might be hard to construct an even worse
> case.  I might be missing something, though.

With zipfian distribution, many concurrent transactions tries to update
the same row. Every transaction eventually updates this row (may be
after waiting), so there are a lot of concurrent row version, and
transaction scans through its snapshot many times.

> 
> We're not going to accept code like this, though:
> 
> +            d = xip[i] >> 6;
> +            j = k = xip[i] & mask;
> +            while (xiphash[j] != InvalidTransactionId)
> +            {
> +                j = (k + d) & mask;
> +                d = d*5 + 1;
> +            }
> 
> Single-character variable names, hard-coded constants, and no
> comments!

I totally agree. First version (which you've cited) was clear POC.
Second version (I've sent in Thursday) looks a bit better, but I agree
there is room for improvement. I'll try to make it prettier.

BTW, I have a question: there is CopySnapshot function, so snapshot is
clearly changed after copy. But I could not follow: is xip and subxip
array changes in a copy, or it remains the same to original, but only
other snapshot fields could be changed?

> I kind of doubt as a general point that we really want another
> open-coded hash table -- I wonder if this could be made to use
> simplehash.

I like simplehash very much (although I'm missing couple of features).
Unfortunately, siplehash is not simple enough for this use case:
- it will use at least twice more memory for hash table itself (cause
  it have to have 'entry->status' field),
- it has large header (ad-hoc hash-table consumes 1 byte in
  SnapshotData structure),
- its allocation will be trickier (custom hash-table is co-allocated
  after xip-array itself),
- its insert algorithm looks to be much more expensive (at least, it is
  more complex in instructions count).

I thing, using simplehash here will lead to much more invasive patch,
than this ad-hoc table. And I believe it will be slower (and this place
is very performance critical), though I will not bet on.

BTW, ad-hoc hash tables already exist: at least recourse owner uses one.

-- 
With regards,
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

Reply via email to