On Sat, Sep 13, 2014 at 09:50:55PM -0300, Arthur Silva wrote:
> On Sat, Sep 13, 2014 at 1:55 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> > Andres Freund <and...@2ndquadrant.com> writes:
> > > On 2014-09-13 08:52:33 +0300, Ants Aasma wrote:
> > >> On Sat, Sep 13, 2014 at 6:59 AM, Arthur Silva <arthur...@gmail.com>
> > wrote:
> > >>> That's not entirely true. CRC-32C beats pretty much everything with
> > the same
> > >>> length quality-wise and has both hardware implementations and highly
> > >>> optimized software versions.
> > >> For better or for worse CRC is biased by detecting all single bit
> > >> errors, the detection capability of larger errors is slightly
> > >> diminished. The quality of the other algorithms I mentioned is also
> > >> very good, while producing uniformly varying output.
> > > There's also much more literature about the various CRCs in comparison
> > > to some of these hash allgorithms.
> > Indeed. CRCs have well-understood properties for error detection.
> > Have any of these new algorithms been analyzed even a hundredth as
> > thoroughly? No. I'm unimpressed by evidence-free claims that
> > something else is "also very good".
> > Now, CRCs are designed for detecting the sorts of short burst errors
> > that are (or were, back in the day) common on phone lines. You could
> > certainly make an argument that that's not the type of threat we face
> > for PG data. However, I've not seen anyone actually make such an
> > argument, let alone demonstrate that some other algorithm would be better.
> > To start with, you'd need to explain precisely what other error pattern
> > is more important to defend against, and why.
> > regards, tom lane
> Mysql went this way as well, changing the CRC polynomial in 5.6.
> What we are looking for here is uniqueness thus better error detection. Not
> avalanche effect, nor cryptographically secure, nor bit distribution.
> As far as I'm aware CRC32C is unbeaten collision wise and time proven.
> I couldn't find tests with xxhash and crc32 on the same hardware so I spent
> some time putting together a benchmark (see attachment, to run it just
> start run.sh)
> I included a crc32 implementation using ssr4.2 instructions (which works on
> pretty much any Intel processor built after 2008 and AMD built after 2012),
> a portable Slice-By-8 software implementation and xxhash since it's the
> fastest software 32bit hash I know of.
> Here're the results running the test program on my i5-4200M
> crc sb8: 90444623
> elapsed: 0.513688s
> speed: 1.485220 GB/s
> crc hw: 90444623
> elapsed: 0.048327s
> speed: 15.786877 GB/s
> xxhash: 7f4a8d5
> elapsed: 0.182100s
> speed: 4.189663 GB/s
> The hardware version is insanely and works on the majority of Postgres
> setups and the fallback software implementations is 2.8x slower than the
> fastest 32bit hash around.
> Hopefully it'll be useful in the discussion.
Thank you for running this sample benchmark. It definitely shows that the
hardware version of the CRC is very fast, unfortunately it is really only
available on x64 Intel/AMD processors which leaves all the rest lacking.
For current 64-bit hardware, it might be instructive to also try using
the XXH64 version and just take one half of the hash. It should come in
at around 8.5 GB/s, or very nearly the speed of the hardware accelerated
CRC. Also, while I understand that CRC has a very venerable history and
is well studied for transmission type errors, I have been unable to find
any research on its applicability to validating file/block writes to a
disk drive. While it is to quote you "unbeaten collision wise", xxhash,
both the 32-bit and 64-bit version are its equal. Since there seems to
be a lack of research on disk based error detection versus CRC polynomials,
it seems likely that any of the proposed hash functions are on an equal
footing in this regard. As Andres commented up-thread, xxhash comes along
for "free" with lz4.
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: