On 2013-04-13 18:14:28 +0300, Ants Aasma wrote: > On Sat, Apr 13, 2013 at 5:58 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > > Andres Freund <and...@2ndquadrant.com> writes: > >> On 2013-04-13 09:14:26 -0400, Bruce Momjian wrote: > >>> As I understand it, SIMD is just a CPU-optimized method for producing a > >>> CRC checksum. Is that right? Does it produce the same result as a > >>> non-CPU-optimized CRC calculation? > > > >> No we are talking about a different algorithm that results in different > >> results, thats why its important to choose now since we can't change it > >> later without breaking pg_upgrade in further releases. > >> http://en.wikipedia.org/wiki/SIMD_%28hash_function%29 > > > > [ squint... ] We're talking about a *cryptographic* hash function? > > Why in the world was this considered a good idea for page checksums? > > > > In the first place, it's probably not very fast compared to some > > alternatives, and in the second place, the criteria by which people > > would consider it a good crypto hash function have approximately nothing > > to do with what we need for a checksum function. What we want for a > > checksum function is high probability of detection of common hardware > > failure modes, such as burst errors and all-zeroes. This is > > particularly critical when we're going with only a 16-bit checksum --- > > the probabilities need to be skewed in the right direction, or it's not > > going to be all that terribly useful. > > > > CRCs are known to be good for that sort of thing; it's what they were > > designed for. I'd like to see some evidence that any substitute > > algorithm has similar properties. Without that, I'm going to vote > > against this idea. > > Sorry for creating confusion here by playing fast and loose with the > terminology. We are not talking about that hash function at all. What > we are talking about here is Fowler-Noll-Vo-ish > (http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function) > hash function that is restructured to be parallelisable with SIMD > instructions with the explicit goal of being as fast as possible. The > resulting hash function is roughly two orders of magnitude faster than > 1-byte-at-a-time CRC32 currently in use. Performance is about > comparable with optimized fixed size memcpy running in cache.
Gah, one shouldn't look to quick for a reference, sorry. > Based on current analysis, it is particularly good at detecting single > bit errors, as good at detecting burst errors as can be expected from > 16 bits and not horrible at detecting burst writes of zeroes. It is > quite bad at detecting multiple uncorrelated single bit errors and > extremely bad at detecting repeating patterns of errors in low order > bits. > All in all I would say that the performance is worth the loss in > detection capability as we are not talking about using the checksum to > prove correctness. Is it actually a loss compared to our 16bit flavor of crc32 we now use? I didn't think so far from the properties you described? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers