On 3/18/13 8:17 PM, Ants Aasma wrote:
I looked for fast CRC implementations on the net. The fastest plain C
variant I could find was one produced by Intels R&D department
(available with a BSD license [1], requires some porting).

Very specifically, it references http://opensource.org/licenses/bsd-license.html as the 2 clause BSD license it is released under. If PostgreSQL wanted to use that as its implementation, the source file would need to have Intel's copyright, and there's this ugly thing:

"Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution."

I don't remember if there's any good precedent for whether this form of BSD licensed code can be assimilated into PostgreSQL without having to give credit to Intel in impractical places. I hate these licenses with the binary restrictions in them.

For CRC32C there is also an option to use the crc32 instructions
available on newer Intel machines and run 3 parallel CRC calculations
to cover for the 3 cycle latency on that instruction, combining them
in the end [2]. Skimming the paper it looks like there are some
patents in this area, so if we wish to implement this, we would have
to see how we can navigate around them.

Discussing patent issues, especially about how someone else implemented a feature on list, is generally bad news. But since as you noted Intel has interacted with other open-source communities already with code related to those patents, I think it's OK to talk about that for a bit.

Yes, there are two Intel patents on how they actually implement the CRC32C in their processor. I just read them both, and they have many very specific claims. I suspect their purpose is to keep AMD from knocking off the exact way Intel does this in hardware. But they also contributed CRC32C code to Linux:

https://lwn.net/Articles/292984/
http://git.kernel.org/cgit/linux/kernel/git/herbert/cryptodev-2.6.git/tree/arch/x86/crypto/crc32c-pcl-intel-asm_64.S

with a dual license, GPLv2 and the 2 clause BSD again. In theory any CRC32C implementation might get dragged into court over Intel's patents if they wanted to shake someone down.

But they would bring a world of hurt upon themselves for asserting a CRC32C patent claim against any open-source project, considering that they contributed this code themselves under a pair of liberal licenses. This doesn't set off any of my "beware of patents" alarms. Intel wants projects to use this approach, detect their acceleration when it's available, and run faster on Intel than AMD. Dragging free software packages into court over code they submitted would create a PR disaster for Intel. That would practically be entrapment on their part.

perf accounted 25% of execution time to
rdtsc instructions in the measurement loop for the handcoded variant
not all of that is from the pipeline flush.

To clarify this part, rdtsc is instruction that gets timing information from the processor: "Read Time Stamp Counter". So Ants is saying a lot of the runtime is the timing itself. rdtsc execution time is the overhead that the pg_test_timing utility estimates in some cases.

http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf

The main message I took away from this paper is that it's possible to speed up CRC computation if you fix a) the CRC polynomial and b) the size of the input buffer. There may be some good optimization possibilities in both those, given I'd only expect Postgres to use one polynomial and the typical database page sizes. Intel's processor acceleration has optimizations for running against 1K blocks for example. I don't think requiring the database page size to be a multiple of 1K is ever going to be an unreasonable limitation, if that's what it takes to get useful hardware acceleration.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to