On Thu, Apr 18, 2013 at 1:32 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Ants Aasma <a...@cybertec.at> writes: >> I was thinking about something similar too. The big issue here is that >> the parallel checksums already hide each other latencies effectively >> executing one each of movdqu/pmullw/paddw each cycle, that's why the >> N_SUMS adds up to 128 bytes not 16 bytes. > > The more I read of this thread, the more unhappy I get. It appears that > the entire design process is being driven by micro-optimization for CPUs > being built by Intel in 2013. That ought to be, at best, a fifth-order > consideration, with full recognition that it'll be obsolete in two years, > and is already irrelevant to anyone not running one of those CPUs.
The large scale structure takes into account the trends in computer architecture. A lot more so than using anything straight out of the literature. Specifically, computer architectures have hit a wall in terms of sequential throughput, so the linear dependency chain in the checksum algorithm will be the bottleneck soon if it isn't already. From that it follows that a fast and future proof algorithm should not calculate the checksum in a single log chain. The proposed algorithms divide the input into 64x64 and 32x64 chunks. It's easy to show that both convert the dependency chain from O(n) to O(sqrt(n)). Secondly, unless we pick something really popular, CPUs are unlikely to provide specifically for us, so the algorithm should be built from general purpose computational pieces. Vector integer multiply and xor are pretty much guaranteed to be there and fast on future CPUs. In my view it's much more probable to be available and fast on future CPU's than something like the Intel CRC32 acceleration. > I would like to ban all discussion of assembly-language optimizations > until after 9.3 is out, so that we can concentrate on what actually > matters. Which IMO is mostly the error detection rate and the probable > nature of false successes. I'm glad to see that you're paying at least > some attention to that, but the priorities in this discussion are > completely backwards. I approached it from the angle that what needs to be done to get a fundamentally fast approach have a good enough error detection rate and not have a way of generating false positives that will give a likely error. The algorithms are simple enough and well studied enough that the rewards from tweaking them are negligible. I think the resulting performance speaks for itself. Now the question is what is a good enough algorithm. In my view, the checksum is more like a canary in the coal mine, not something that can be relied upon, and so ultimate efficiency is not that important if there are no obvious horrible cases. I can see that there are other views and so am exploring different tradeoffs between performance and quality. > And I reiterate that there is theory out there about the error detection > capabilities of CRCs. I'm not seeing any theory here, which leaves me > with very little confidence that we know what we're doing. I haven't found much literature that is of use here. There is theory underlying here coming from basic number theory and distilled into rules for hash functions. For the FNV hash the prime supposedly is carefully chosen, although all literature so far is saying "it is a good choice, but here is not the place to explain why". Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers