On Apr13, 2013, at 17:14 , Ants Aasma <a...@cybertec.at> wrote: > Based on current analysis, it is particularly good at detecting single > bit errors, as good at detecting burst errors as can be expected from > 16 bits and not horrible at detecting burst writes of zeroes. It is > quite bad at detecting multiple uncorrelated single bit errors and > extremely bad at detecting repeating patterns of errors in low order > bits.

I've read the patch and tried to understand why it's that bad at detecting repeating patterns of errors in low order bits, and to see if there might be a way to fix that without too much of a performance impact. Here's what I gather the algorithm does: It treats the input data, a page of L bytes, as a Nx64 matrix V of 16-bit quantities (N = L/64, obviously). It then first computes (using two primes p (PRIME1) and q (PRIME2)) S = V[1,1]*p^63*q^63 + V[1,2]*p^63*q^62 + … + V[1,64]*p^63*q^0 + V[2,1]*p^62*q^63 + V[2,2]*p^62*q^62 + … + V[2,64]*p^62*q^0 + … + V[N,1]*p^0 *q^63 + V[N,2]*p^0 *q^62 + … + V[N,64]*p^0 *q^0 (mod 2^16) = sum V[i,j]*p^(64-i)*q^(64-j) Note that it does that by first computing the row-wise sums without the q^i coefficient, and then (in what the code calls the aggregation phase) combines those row-wise sums into a total, adding the q^i- coefficients along the way. The final hash value is then H = S * p + B * q mod 2^16 where B is a salt value intended to detect swapped pages (currently B is simply the page index) This raises two question. First, why are there two primes? You could just as well using a single prime q and set p=q^64 mod 2^16. You then get S = sum V[i,j] * q^(64*(64-i) + (64-j) = sum V[i,j] * q^(4096 - 64*(i-1) - j) You get higher prime powers that way, but you can easily choose a prime that yields distinct values mod 2^16 for exponents up to 16383. Your PRIME2, for example, does. (It wraps around for 16384, i.e. PRIME2^16384 = 1 mod 2^16, but that's true for every possible prime since 16384 is the Carmichael function's value at 2^16) Second, why does it use addition instead of XOR? It seems that FNV usually XORs the terms together instead of adding them? Regarding the bad behaviour for multiple low-bit errors - can you explain why it behaves badly in that case? I currently fail to see why that would be. I *can* see that the lowest bit of the hash depends only on the lowest bit of the input words, but as long as the lowest bits of the input words also affect other bits of the hash, that shouldn't matter. Which I think they do, but I might be missing something... Here, btw, is a page on FNV hashing. It mentions a few rules for picking suitable primes http://www.isthe.com/chongo/tech/comp/fnv best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers