Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)

Ants Aasma Tue, 23 Apr 2013 01:45:02 -0700

On Apr 23, 2013 10:17 AM, "Jeff Davis" <pg...@j-davis.com> wrote:
> Attached is my reorganization of Ants's patch here:
>
> http://www.postgresql.org/message-id/CA
> +CSw_vinyf-w45i=M1m__MpJZY=e8s4nt_knnpebtwjtoa...@mail.gmail.com


Thanks for your help. Some notes below.

> My changes:
>
> * wrest the core FNV algorithm from the specific concerns of a data page
>    - PageCalcChecksum16 mixes the block number, reduces to 16 bits,
>      and avoids the pd_checksum field
>    - the checksum algorithm is just a pure block checksum with a 32-bit
>      result
> * moved the FNV checksum into a separate file, checksum.c

I think the function should not be called checksum_fnv as it implies that
we use the well known straightforward implementation. Maybe checksum_block
or some other generic name.

> * added Ants's suggested compilation flags for better optimization

-msse4.1 is not safe to use in default builds. On the other hand it doesn't
hurt to just specify it in CFLAGS for the whole compile (possibly as
-march=native). We should just omit it and mention somewhere that SSE4.1
enabled builds will have better checksum performance.

> * slight update to the paragraph in the README that discusses concerns
> specific to a data page
>
> I do have a couple questions/concerns about Ants's patch though:
>
> * The README mentions a slight bias; does that come from the mod
> (2^16-1)? That's how I updated the README, so I wanted to make sure.

Yes.

> * How was the FNV_PRIME chosen?

I still haven't found the actual source for this value. It's specified as
the value to use for 32bit FNV-1a.

> * I can't match the individual algorithm step as described in the README
> to the actual code. And the comments in the README don't make it clear
> enough which one is right (or maybe they both are, and I'm just tired).

I will try to reword.

> The README says:
>
>    hash = (hash ^ value) * ((hash ^ value) >> 3)
>
> (which is obviously missing the FNV_PRIME factor) and the code says:
>
>    +#define CHECKSUM_COMP(checksum, value) do {\
>    +       uint32 __tmp = (checksum) ^ (value);\
>    +       (checksum) = __tmp * FNV_PRIME ^ (__tmp >> 3);\
>    +} while (0)
>
> I'm somewhat on the fence about the "shift right". It was discussed in
> this part of the thread:
>
>
http://www.postgresql.org/message-id/99343716-5f5a-45c8-b2f6-74b9ba357...@phlo.org
>
> I think we should be able to state with a little more clarity in the
> README why there is a problem with plain FNV-1a, and why this
> modification is both effective and safe.

Florian already mentioned why it's effective. I have an intuition why it's
safe, will try to come up with a well reasoned argument.

Regards,
Antd Aasma

Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)

Reply via email to