On 2014-09-13 20:27:51 -0500, k...@rice.edu wrote: > > > > What we are looking for here is uniqueness thus better error detection. Not > > avalanche effect, nor cryptographically secure, nor bit distribution. > > As far as I'm aware CRC32C is unbeaten collision wise and time proven. > > > > I couldn't find tests with xxhash and crc32 on the same hardware so I spent > > some time putting together a benchmark (see attachment, to run it just > > start run.sh) > > > > I included a crc32 implementation using ssr4.2 instructions (which works on > > pretty much any Intel processor built after 2008 and AMD built after 2012), > > a portable Slice-By-8 software implementation and xxhash since it's the > > fastest software 32bit hash I know of. > > > > Here're the results running the test program on my i5-4200M > > > > crc sb8: 90444623 > > elapsed: 0.513688s > > speed: 1.485220 GB/s > > > > crc hw: 90444623 > > elapsed: 0.048327s > > speed: 15.786877 GB/s > > > > xxhash: 7f4a8d5 > > elapsed: 0.182100s > > speed: 4.189663 GB/s > > > > The hardware version is insanely and works on the majority of Postgres > > setups and the fallback software implementations is 2.8x slower than the > > fastest 32bit hash around. > > > > Hopefully it'll be useful in the discussion.
Note that all these numbers aren't fully relevant to the use case here. For the WAL - which is what we're talking about and the only place where CRC32 is used with high throughput - the individual parts of a record are pretty darn small on average. So performance of checksumming small amounts of data is more relevant. Mind, that's not likely to go for CRC32, especially not slice-by-8. The cache fooprint of the large tables is likely going to be noticeable in non micro benchmarks. > Also, while I understand that CRC has a very venerable history and > is well studied for transmission type errors, I have been unable to find > any research on its applicability to validating file/block writes to a > disk drive. Which incidentally doesn't really match what the CRC is used for here. It's used for individual WAL records. Usually these are pretty small, far smaller than disk/postgres' blocks on average. There's a couple scenarios where they can get large, true, but most of them are small. The primary reason they're important is to correctly detect the end of the WAL. To ensure we're interpreting half written records, or records from before the WAL file was overwritten. > While it is to quote you "unbeaten collision wise", xxhash, > both the 32-bit and 64-bit version are its equal. Aha? You take that from the smhasher results? > Since there seems to be a lack of research on disk based error > detection versus CRC polynomials, it seems likely that any of the > proposed hash functions are on an equal footing in this regard. As > Andres commented up-thread, xxhash comes along for "free" with lz4. This is pure handwaving. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers