Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

Heikki Linnakangas Mon, 15 Sep 2014 03:59:26 -0700

On 09/15/2014 02:42 AM, Arthur Silva wrote:

Em 14/09/2014 12:21, "Andres Freund" <and...@2ndquadrant.com> escreveu:


On 2014-09-13 20:27:51 -0500, k...@rice.edu wrote:


What we are looking for here is uniqueness thus better error

detection. Not

avalanche effect, nor cryptographically secure, nor bit distribution.
As far as I'm aware CRC32C is unbeaten collision wise and time proven.

I couldn't find tests with xxhash and crc32 on the same hardware so I

spent

some time putting together a benchmark (see attachment, to run it just
start run.sh)

I included a crc32 implementation using ssr4.2 instructions (which

works on

pretty much any Intel processor built after 2008 and AMD built after

2012),

a portable Slice-By-8 software implementation and xxhash since it's

the

fastest software 32bit hash I know of.

Here're the results running the test program on my i5-4200M

crc sb8: 90444623
elapsed: 0.513688s
speed: 1.485220 GB/s

crc hw: 90444623
elapsed: 0.048327s
speed: 15.786877 GB/s

xxhash: 7f4a8d5
elapsed: 0.182100s
speed: 4.189663 GB/s

The hardware version is insanely and works on the majority of Postgres
setups and the fallback software implementations is 2.8x slower than

the

fastest 32bit hash around.

Hopefully it'll be useful in the discussion.


Note that all these numbers aren't fully relevant to the use case
here. For the WAL - which is what we're talking about and the only place
where CRC32 is used with high throughput - the individual parts of a
record are pretty darn small on average. So performance of checksumming
small amounts of data is more relevant. Mind, that's not likely to go
for CRC32, especially not slice-by-8. The cache fooprint of the large
tables is likely going to be noticeable in non micro benchmarks.


Indeed, the small input sizes is something I was missing. Something more
cache friendly would be better, it's just a matter of finding a better
candidate.

It's worth noting that the extra tables that slicing-by-4 requires areand *in addition to* the lookup table we already have. And slicing-by-8builds on the slicing-by-4 lookup tables. Our current algorithm uses a1kB lookup table, slicing-by-4 a 4kB, and slicing-by-8 8kB. But thefirst 1kB of the slicing-by-4 lookup table is identical to the current1kB lookup table, and the first 4kB of the slicing-by-8 are identical tothe slicing-by-4 tables.

It would be pretty straightforward to use the current algorithm when theWAL record is very small, and slicing-by-4 or slicing-by-8 for largerrecords (like FPWs), where the larger table is more likely to pay off. Ihave no idea where the break-even point is with the current algorithmvs. slicing-by-4 and a cold cache, but maybe we can get a handle on thatwith some micro-benchmarking.

Although this is complicated by the fact that slicing-by-4 or -8 mightwell be a win even with very small records, if you generate a lot of them.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

Reply via email to