On 09/15/2014 02:42 AM, Arthur Silva wrote:
Em 14/09/2014 12:21, "Andres Freund" <and...@2ndquadrant.com> escreveu:
On 2014-09-13 20:27:51 -0500, k...@rice.edu wrote:
What we are looking for here is uniqueness thus better error
avalanche effect, nor cryptographically secure, nor bit distribution.
As far as I'm aware CRC32C is unbeaten collision wise and time proven.
I couldn't find tests with xxhash and crc32 on the same hardware so I
some time putting together a benchmark (see attachment, to run it just
I included a crc32 implementation using ssr4.2 instructions (which
pretty much any Intel processor built after 2008 and AMD built after
a portable Slice-By-8 software implementation and xxhash since it's
fastest software 32bit hash I know of.
Here're the results running the test program on my i5-4200M
crc sb8: 90444623
speed: 1.485220 GB/s
crc hw: 90444623
speed: 15.786877 GB/s
speed: 4.189663 GB/s
The hardware version is insanely and works on the majority of Postgres
setups and the fallback software implementations is 2.8x slower than
fastest 32bit hash around.
Hopefully it'll be useful in the discussion.
Note that all these numbers aren't fully relevant to the use case
here. For the WAL - which is what we're talking about and the only place
where CRC32 is used with high throughput - the individual parts of a
record are pretty darn small on average. So performance of checksumming
small amounts of data is more relevant. Mind, that's not likely to go
for CRC32, especially not slice-by-8. The cache fooprint of the large
tables is likely going to be noticeable in non micro benchmarks.
Indeed, the small input sizes is something I was missing. Something more
cache friendly would be better, it's just a matter of finding a better
It's worth noting that the extra tables that slicing-by-4 requires are
and *in addition to* the lookup table we already have. And slicing-by-8
builds on the slicing-by-4 lookup tables. Our current algorithm uses a
1kB lookup table, slicing-by-4 a 4kB, and slicing-by-8 8kB. But the
first 1kB of the slicing-by-4 lookup table is identical to the current
1kB lookup table, and the first 4kB of the slicing-by-8 are identical to
the slicing-by-4 tables.
It would be pretty straightforward to use the current algorithm when the
WAL record is very small, and slicing-by-4 or slicing-by-8 for larger
records (like FPWs), where the larger table is more likely to pay off. I
have no idea where the break-even point is with the current algorithm
vs. slicing-by-4 and a cold cache, but maybe we can get a handle on that
with some micro-benchmarking.
Although this is complicated by the fact that slicing-by-4 or -8 might
well be a win even with very small records, if you generate a lot of them.
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: