On Apr 23, 2013 10:17 AM, "Jeff Davis" <pg...@j-davis.com> wrote: > Attached is my reorganization of Ants's patch here: > > http://www.postgresql.org/message-id/CA > +CSw_vinyf-w45i=M1m__MpJZY=e8s4nt_knnpebtwjtoa...@mail.gmail.com
Thanks for your help. Some notes below. > My changes: > > * wrest the core FNV algorithm from the specific concerns of a data page > - PageCalcChecksum16 mixes the block number, reduces to 16 bits, > and avoids the pd_checksum field > - the checksum algorithm is just a pure block checksum with a 32-bit > result > * moved the FNV checksum into a separate file, checksum.c I think the function should not be called checksum_fnv as it implies that we use the well known straightforward implementation. Maybe checksum_block or some other generic name. > * added Ants's suggested compilation flags for better optimization -msse4.1 is not safe to use in default builds. On the other hand it doesn't hurt to just specify it in CFLAGS for the whole compile (possibly as -march=native). We should just omit it and mention somewhere that SSE4.1 enabled builds will have better checksum performance. > * slight update to the paragraph in the README that discusses concerns > specific to a data page > > I do have a couple questions/concerns about Ants's patch though: > > * The README mentions a slight bias; does that come from the mod > (2^16-1)? That's how I updated the README, so I wanted to make sure. Yes. > * How was the FNV_PRIME chosen? I still haven't found the actual source for this value. It's specified as the value to use for 32bit FNV-1a. > * I can't match the individual algorithm step as described in the README > to the actual code. And the comments in the README don't make it clear > enough which one is right (or maybe they both are, and I'm just tired). I will try to reword. > The README says: > > hash = (hash ^ value) * ((hash ^ value) >> 3) > > (which is obviously missing the FNV_PRIME factor) and the code says: > > +#define CHECKSUM_COMP(checksum, value) do {\ > + uint32 __tmp = (checksum) ^ (value);\ > + (checksum) = __tmp * FNV_PRIME ^ (__tmp >> 3);\ > +} while (0) > > I'm somewhat on the fence about the "shift right". It was discussed in > this part of the thread: > > http://www.postgresql.org/message-id/99343716-5f5a-45c8-b2f6-74b9ba357...@phlo.org > > I think we should be able to state with a little more clarity in the > README why there is a problem with plain FNV-1a, and why this > modification is both effective and safe. Florian already mentioned why it's effective. I have an intuition why it's safe, will try to come up with a well reasoned argument. Regards, Antd Aasma