On 09/15/2014 02:42 AM, Arthur Silva wrote:
Em 14/09/2014 12:21, "Andres Freund" <and...@2ndquadrant.com> escreveu:

On 2014-09-13 20:27:51 -0500, k...@rice.edu wrote:

What we are looking for here is uniqueness thus better error
detection. Not
avalanche effect, nor cryptographically secure, nor bit distribution.
As far as I'm aware CRC32C is unbeaten collision wise and time proven.

I couldn't find tests with xxhash and crc32 on the same hardware so I
spent
some time putting together a benchmark (see attachment, to run it just
start run.sh)

I included a crc32 implementation using ssr4.2 instructions (which
works on
pretty much any Intel processor built after 2008 and AMD built after
2012),
a portable Slice-By-8 software implementation and xxhash since it's
the
fastest software 32bit hash I know of.

Here're the results running the test program on my i5-4200M

crc sb8: 90444623
elapsed: 0.513688s
speed: 1.485220 GB/s

crc hw: 90444623
elapsed: 0.048327s
speed: 15.786877 GB/s

xxhash: 7f4a8d5
elapsed: 0.182100s
speed: 4.189663 GB/s

The hardware version is insanely and works on the majority of Postgres
setups and the fallback software implementations is 2.8x slower than
the
fastest 32bit hash around.

Hopefully it'll be useful in the discussion.

Note that all these numbers aren't fully relevant to the use case
here. For the WAL - which is what we're talking about and the only place
where CRC32 is used with high throughput - the individual parts of a
record are pretty darn small on average. So performance of checksumming
small amounts of data is more relevant. Mind, that's not likely to go
for CRC32, especially not slice-by-8. The cache fooprint of the large
tables is likely going to be noticeable in non micro benchmarks.

Indeed, the small input sizes is something I was missing. Something more
cache friendly would be better, it's just a matter of finding a better
candidate.

It's worth noting that the extra tables that slicing-by-4 requires are and *in addition to* the lookup table we already have. And slicing-by-8 builds on the slicing-by-4 lookup tables. Our current algorithm uses a 1kB lookup table, slicing-by-4 a 4kB, and slicing-by-8 8kB. But the first 1kB of the slicing-by-4 lookup table is identical to the current 1kB lookup table, and the first 4kB of the slicing-by-8 are identical to the slicing-by-4 tables.

It would be pretty straightforward to use the current algorithm when the WAL record is very small, and slicing-by-4 or slicing-by-8 for larger records (like FPWs), where the larger table is more likely to pay off. I have no idea where the break-even point is with the current algorithm vs. slicing-by-4 and a cold cache, but maybe we can get a handle on that with some micro-benchmarking.

Although this is complicated by the fact that slicing-by-4 or -8 might well be a win even with very small records, if you generate a lot of them.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to