RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2025-02-05 Thread Devulapalli, Raghuveer
Hi John, > Further, we verified upthread that Intel's current and near-future product > line > includes server chips (some with over 100 cores, so not exactly low-end) that > don't support AVX-512 at all. I have no idea how common they will be, but they > will certainly be found in cloud datacen

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2025-01-28 Thread John Naylor
On Sat, Jan 25, 2025 at 3:35 AM Devulapalli, Raghuveer wrote: > > #1 - The choice of AVX-512. There is no such thing as a "CRC instruction > > operating > > on 8 bytes", and the proposed algorithm is a multistep process using > > carryless > > multiplication and requiring at least 256 bytes of

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2025-01-24 Thread Devulapalli, Raghuveer
Hi John, Thanks for your summary and here are responses: > #1 - The choice of AVX-512. There is no such thing as a "CRC instruction > operating > on 8 bytes", and the proposed algorithm is a multistep process using carryless > multiplication and requiring at least 256 bytes of input. The Chrom

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2025-01-22 Thread John Naylor
On Wed, Jan 22, 2025 at 12:46 AM Devulapalli, Raghuveer wrote: > > Is there any additional feedback for this patch? Hi Raghuveer, I raised one question and one concern upthread. I will repeat them here for convenience. #1 - The choice of AVX-512. There is no such thing as a "CRC instruction ope

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2025-01-21 Thread Devulapalli, Raghuveer
> Hello! I'm Matthew Sterrett and I'm a coworker of Raghuveer; he asked me to > look into the Windows build failures related to pg_comp_crc32c. > > It seems that the only thing that was required to fix that is to mark > pg_comp_crc32c as PGDLLIMPORT, so I added a patch that does just that. > I'm n

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-20 Thread Sterrett, Matthew
On 12/7/2024 12:42 AM, Devulapalli, Raghuveer wrote: [0] https://cirrus-ci.com/task/6023394207989760 [1] https://cirrus-ci.com/task/5460444254568448 [2] https://cirrus-ci.com/task/6586344161411072 I was able to fix [0] and [1], but I can't think of why [2] fails. When I tried to reproduce th

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-15 Thread John Naylor
On Sat, Dec 14, 2024 at 10:24 PM Andres Freund wrote: > > Hi, > > On 2024-12-14 12:08:57 +0700, John Naylor wrote: > > On Thu, Jun 13, 2024 at 2:37 AM Andres Freund wrote: > > > > > > It's hard to understand, but a nonetheless helpful page is > > > https://users.ece.cmu.edu/~koopman/crc/crc32.htm

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-14 Thread Andres Freund
Hi, On 2024-12-14 12:08:57 +0700, John Naylor wrote: > On Thu, Jun 13, 2024 at 2:37 AM Andres Freund wrote: > > > > It's hard to understand, but a nonetheless helpful page is > > https://users.ece.cmu.edu/~koopman/crc/crc32.html which lists properties for > > crc32c: > > https://users.ece.cmu.edu

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-13 Thread John Naylor
On Thu, Jun 13, 2024 at 2:37 AM Andres Freund wrote: > > It's hard to understand, but a nonetheless helpful page is > https://users.ece.cmu.edu/~koopman/crc/crc32.html which lists properties for > crc32c: > https://users.ece.cmu.edu/~koopman/crc/c32/0x8f6e37a0_len.txt > which lists > (0x8f6e37a0;

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-13 Thread Ants Aasma
On Fri, 13 Dec 2024 at 00:14, Nathan Bossart wrote: > > On Thu, Dec 12, 2024 at 10:45:29AM -0500, Andres Freund wrote: > > Frankly, we should just move away from using CRCs. They're good for cases > > where short runs of bit flips are much more likely than other kinds of > > errors > > and where

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-12 Thread Nathan Bossart
On Thu, Dec 12, 2024 at 10:45:29AM -0500, Andres Freund wrote: > Frankly, we should just move away from using CRCs. They're good for cases > where short runs of bit flips are much more likely than other kinds of errors > and where the amount of data covered by them has a low upper bound. That's not

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-12 Thread Andres Freund
Hi, On 2024-12-12 18:32:20 +0700, John Naylor wrote: > I went and looked at the Chromium source, and found the following > snippet that uses the same technique, but only requires 128-bit CLMUL > and has a minimum input size of 64 bytes, rather than 256. This seems > like it might be better suited

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-12 Thread John Naylor
+ * For This Function: + * Copyright 2015 The Chromium Authors I went and looked at the Chromium source, and found the following snippet that uses the same technique, but only requires 128-bit CLMUL and has a minimum input size of 64 bytes, rather than 256. This seems like it might be better suite

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-09 Thread John Naylor
On Sat, Dec 7, 2024 at 10:16 PM Devulapalli, Raghuveer wrote: > > There is another technique that computes CRC on 3 separate chunks and > > combines them at the end, so about 3x faster on large-enough chunks. > > That's the way used for the Arm proposal [0], coincidentally also citing a > > whit

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-07 Thread Devulapalli, Raghuveer
> Sorry for going back so far, but this thread was pointed out to me, and this > aspect > of the design could use some more discussion: > > + * pg_crc32c_avx512(): compute the crc32c of the buffer, where the > + * buffer length must be at least 256, and a multiple of 64. Based > > There is anoth

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-07 Thread Devulapalli, Raghuveer
> [0] https://cirrus-ci.com/task/6023394207989760 > [1] https://cirrus-ci.com/task/5460444254568448 > [2] https://cirrus-ci.com/task/6586344161411072 I was able to fix [0] and [1], but I can't think of why [2] fails. When I tried to reproduce this locally, I get a different unrelated error. Any i

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-04 Thread John Naylor
On Thu, Jun 13, 2024 at 3:11 AM Andres Freund wrote: > > On 2024-05-01 15:56:08 +, Amonson, Paul D wrote: > > Workload call size distribution details (write heavy): > >* Average was approximately around 1,010 bytes per call > >* ~80% of the calls were under 256 bytes > >* ~20% of

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-03 Thread Devulapalli, Raghuveer
> Thanks! cfbot is showing a couple of errors [0] [1] [2]. Oh yikes, the CI had passed with an earlier version. Wonder if I made a mess of the rebase. I will take a look and fix them. Raghuveer

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-03 Thread Nathan Bossart
On Tue, Dec 03, 2024 at 03:46:16PM +, Devulapalli, Raghuveer wrote: >> Raghuveer, would you mind rebasing this patch set now that the SSE4.2 patch >> is >> committed? > > Rebased to master branch. Thanks! cfbot is showing a couple of errors [0] [1] [2]. 32-bit Linux is failing to compile

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-03 Thread Devulapalli, Raghuveer
> Raghuveer, would you mind rebasing this patch set now that the SSE4.2 patch is > committed? Rebased to master branch. Raghuveer v8-0001-Add-a-Postgres-SQL-function-for-crc32c-benchmarki.patch Description: v8-0001-Add-a-Postgres-SQL-function-for-crc32c-benchmarki.patch v8-0002-Refactor-cons

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-02 Thread Nathan Bossart
On Mon, Nov 25, 2024 at 08:54:48PM +, Devulapalli, Raghuveer wrote: > As Nathan suggested, we moved this to a separate thread. The latest set > of patches here need to applied on top of patches in that thread. Raghuveer, would you mind rebasing this patch set now that the SSE4.2 patch is commi

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-11-25 Thread Devulapalli, Raghuveer
> > create mode 100644 src/test/modules/test_crc32c/test_crc32c.c > > create mode 100644 src/test/modules/test_crc32c/test_crc32c.control > > Needs to be integrated with the meson based build as well. Done. > > +drive_crc32c(PG_FUNCTION_ARGS) > > +{ > > + int64 count = P

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-11-07 Thread Devulapalli, Raghuveer
> Would you mind moving the function attribute change for the existing SSE > 4.2 code to its own patch? I think that is pretty straightforward, and IMHO > it'd be > nice to take care of it first so that we can focus on the new stuff. Just submitted a separate patch for this. Will update the CRC3

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-11-07 Thread Nathan Bossart
On Thu, Nov 07, 2024 at 11:05:14AM -0500, Andres Freund wrote: > On 2024-10-30 21:03:20 +, Devulapalli, Raghuveer wrote: >> From a495124ee42cb8f9f206f719b9f2235aff715963 Mon Sep 17 00:00:00 2001 >> From: Nathan Bossart >> Date: Wed, 16 Oct 2024 15:57:55 -0500 >> Subject: [PATCH v6 5/6] use __a

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-11-07 Thread Andres Freund
Hi, On 2024-10-30 21:03:20 +, Devulapalli, Raghuveer wrote: > v6: Fixing build failure on Windows/MSVC. > > Raghuveer > From b601e7b4ee9f25fd32e9d8d056bb20a03d755a8a Mon Sep 17 00:00:00 2001 > From: Paul Amonson > Date: Mon, 6 May 2024 08:34:17 -0700 > Subject: [PATCH v6 1/6] Add a Postgre

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-11-07 Thread Nathan Bossart
On Tue, Oct 29, 2024 at 09:00:17PM +, Devulapalli, Raghuveer wrote: > (1) The SSE42 and AVX-512 CRC32C also use function attributes to build > with ISA specific flag.. Would you mind moving the function attribute change for the existing SSE 4.2 code to its own patch? I think that is pretty st

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-10-18 Thread Devulapalli, Raghuveer
> I've proposed a patch to move the existing AVX-512 code in Postgres to use > __attribute__((target("..."))) instead of per-translation-unit compiler flags > [0]. We > should likely do something similar for this one. > > [0] https://postgr.es/m/ZxAqRG1-8fJLMRUY%40nathan I assume this will be

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-10-18 Thread Nathan Bossart
On Tue, Oct 08, 2024 at 08:19:27PM +, Devulapalli, Raghuveer wrote: > Hi all, I'm currently in the process of reviewing and analyzing Paul's > patch. In the meantime, I'm open to addressing any questions or feedback > you may have. I've proposed a patch to move the existing AVX-512 code in Pos

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-10-08 Thread Devulapalli, Raghuveer
Thank you for the introduction, Paul. Hi all, I'm currently in the process of reviewing and analyzing Paul's patch. In the meantime, I'm open to addressing any questions or feedback you may have. > Hi all, > > I will be retiring from Intel at the end of this week. I wanted to introduce > the

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-09-24 Thread Amonson, Paul D
Hi all, I will be retiring from Intel at the end of this week. I wanted to introduce the engineer who will be taking over the CRC32c proposal and commit fest entry. Devulapalli, Raghuveer I have brought him up to speed and he will be the go-to for technical review comments and questions. Plea

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-27 Thread Amonson, Paul D
> Things like sizeof() and offsetof() are known at compile time, so the compiler > will recognize when a condition is always true or false and optimize it out > accordingly. In cases where the value cannot be known at compile time, > checking the length in the macro and dispatching to a different

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Nathan Bossart
On Mon, Aug 26, 2024 at 07:15:47PM +, Amonson, Paul D wrote: >> +#define COMP_CRC32C_SMALL(crc, data, len) \ >> +((crc) = pg_comp_crc32c_sse42((crc), (data), (len))) >> >> My interpretation of Andres's upthread suggestion is that we'd add the length >> check within the macro instead of int

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Amonson, Paul D
> IMHO that would be useful to establish the current state of the patch set from > a performance standpoint, especially since you've added code intended to > mitigate the regression. Ok. > +#define COMP_CRC32C_SMALL(crc, data, len) \ > + ((crc) = pg_comp_crc32c_sse42((crc), (data), (len))) >

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Nathan Bossart
On Mon, Aug 26, 2024 at 06:54:58PM +, Amonson, Paul D wrote: >> And this still shows the ~14% regression in your original post? > > At the small buffer sizes the margin of error or "noise" is larger, > 7-11%. My average could be just bad luck. It will take me a while to > re-setup for full dat

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Amonson, Paul D
> And this still shows the ~14% regression in your original post? At the small buffer sizes the margin of error or "noise" is larger, 7-11%. My average could be just bad luck. It will take me a while to re-setup for full data collection runs but I can try it again if you like. Paul

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Nathan Bossart
On Mon, Aug 26, 2024 at 06:44:55PM +, Amonson, Paul D wrote: >> I'm curious about where exactly the regression is coming from. Is it >> possible >> that your build for the SSE 4.2 tests was using it unconditionally, i.e., >> optimizing away the function pointer? > > I am calling the SSE 4.2

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Amonson, Paul D
> I'm curious about where exactly the regression is coming from. Is it possible > that your build for the SSE 4.2 tests was using it unconditionally, i.e., > optimizing away the function pointer? I am calling the SSE 4.2 implementation directly; I am not even building the pg_sse42_*_choose.c fil

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Nathan Bossart
On Mon, Aug 26, 2024 at 05:09:35PM +, Amonson, Paul D wrote: > Ok I added a patch that exposed a new macro CRC32C_COMP_SMALL for > targeted fixed size < 256 use cases in Postgres. As for mitigating the > regression in general, I have not been able to work up a fallback (i.e. > <256 bytes) that

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Amonson, Paul D
> Upthread [0], Andres suggested dispatching to a different implementation for > compile-time-known small lengths. Have you looked into that? In your > original post, you noted a 14% regression for records smaller than 256 bytes, > which is not an uncommon case for Postgres. IMO we should try to

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-22 Thread Amonson, Paul D
> Upthread [0], Andres suggested dispatching to a different implementation for > compile-time-known small lengths. Have you looked into that? In your > original post, you noted a 14% regression for records smaller than 256 bytes, > which is not an uncommon case for Postgres. IMO we should try to

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-22 Thread Nathan Bossart
Thanks for the new patches. On Thu, Aug 22, 2024 at 03:14:32PM +, Amonson, Paul D wrote: > I reran all the basic tests again to make sure that the performance > numbers were within the margin of error when compared to my original > finding. This step showed similar numbers (see origin post) ar

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-22 Thread Amonson, Paul D
Hi, Here are the latest patches for the accelerated CRC32c algorithm. I did the following to create these refactored patches: 1) From the main branch I moved all x86_64 hardware checks from the various locations into a single location. I did not move any ARM tests as I would have no way to tes

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-08 Thread Nathan Bossart
On Wed, Jun 12, 2024 at 12:37:46PM -0700, Andres Freund wrote: > I'm wonder if this isn't going in the wrong direction. We're using CRCs for > something they're not well suited for in my understanding - and are paying a > reasonably high price for it, given that even hardware accelerated CRCs aren'

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-07-18 Thread Amonson, Paul D
> Okay, that is very interesting. Yes, we will have no problem reproducing the > exact license text in the source code. I think we can remove the license > issue > as a blocker for this patch. Hi, I was wondering if I can I get a review please. I am interested in the refactor question for the

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-25 Thread Bruce Momjian
On Tue, Jun 25, 2024 at 05:41:12PM +, Amonson, Paul D wrote: > > It would be good to know exactly what, if any, changes the Intel > > lawyers want us to make to our license if we accept this patch. > > I asked about this and there is nothing Intel requires here license > wise. They believe that

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-25 Thread Amonson, Paul D
> It would be good to know exactly what, if any, changes the Intel lawyers want > us to make to our license if we accept this patch. I asked about this and there is nothing Intel requires here license wise. They believe that there is nothing wrong with including Clause-3 BSD like licenses under

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-19 Thread Bruce Momjian
On Tue, Jun 18, 2024 at 02:00:34PM -0400, Bruce Momjian wrote: > While the license we are concerned about does not have this clause, it > does have: > >2. Redistributions in binary form must reproduce the above > copyright notice, this list of conditions and the following > dis

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-18 Thread Bruce Momjian
On Tue, Jun 18, 2024 at 01:20:50PM -0400, Bruce Momjian wrote: > On Tue, Jun 18, 2024 at 05:14:08PM +, Amonson, Paul D wrote: > > > And this bit doesn't look good. The LICENSE file says: > > ... > > > > //* Redistributions in binary form must reproduce the above > > > > // copyright notice

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-18 Thread Bruce Momjian
On Tue, Jun 18, 2024 at 05:14:08PM +, Amonson, Paul D wrote: > > And this bit doesn't look good. The LICENSE file says: > ... > > > //* Redistributions in binary form must reproduce the above > > > // copyright notice, this list of conditions and the following > > > disclaimer // in the do

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-18 Thread Amonson, Paul D
> Hmm, I wonder if the "(c) 2024 Intel" line is going to bring us trouble. > (I bet it's not really necessary anyway.) Our lawyer agrees, copyright is covered by the "PostgreSQL Global Development Group" copyright line as a contributor. > And this bit doesn't look good. The LICENSE file says: .

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-18 Thread Alvaro Herrera
On 2024-Jun-12, Amonson, Paul D wrote: > +/*- > + * > + * pg_crc32c_avx512.c > + * Compute CRC-32C checksum using Intel AVX-512 instructions. > + * > + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Grou

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-17 Thread Amonson, Paul D
> This is extremely workload dependent, it's not hard to find workloads with > lots of very small record and very few big ones... What you observed might > have "just" been the warmup behaviour where more full page writes have to > be written. Can you tell me how to avoid capturing this "warm-up"

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Amonson, Paul D
> -Original Message- > From: Andres Freund > Sent: Wednesday, June 12, 2024 1:12 PM > To: Amonson, Paul D > FWIW, I tried the v2 patch on my Xeon Gold 5215 workstation, and dies early > on with SIGILL: Nice catch!!! I was testing the bit for the vpclmulqdq in EBX instead of the corre

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Andres Freund
Hi, On 2024-05-01 15:56:08 +, Amonson, Paul D wrote: > Comparing the current SSE4.2 implementation of the CRC32C algorithm in > Postgres, to an optimized AVX-512 algorithm [0] we observed significant > gains. The result was a ~6.6X average multiplier of increased performance > measured on 3 di

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Andres Freund
Hi, I'm wonder if this isn't going in the wrong direction. We're using CRCs for something they're not well suited for in my understanding - and are paying a reasonably high price for it, given that even hardware accelerated CRCs aren't blazingly fast. CRCs are used for things like ethernet, iSCSI

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Bruce Momjian
On Wed, Jun 12, 2024 at 02:08:02PM -0400, Tom Lane wrote: > "Amonson, Paul D" writes: > > I had our OSS internal team, who are experts in OSS licensing, review > > possible conflicts between the PostgreSQL license and the BSD-Clause 3-like > > license for the CRC32C AVX-512 code, and they found

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Tom Lane
"Amonson, Paul D" writes: > I had our OSS internal team, who are experts in OSS licensing, review > possible conflicts between the PostgreSQL license and the BSD-Clause 3-like > license for the CRC32C AVX-512 code, and they found no issues. Therefore, > including the new license into the Postgr

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Amonson, Paul D
> The project is currently in feature-freeze in preparation for the next major > release so new development and ideas are not the top priority right now. > Additionally there is a large developer meeting shortly which many are busy > preparing for. Excercise some patience, and I'm sure there will

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-05-20 Thread Daniel Gustafsson
> On 17 May 2024, at 18:21, Amonson, Paul D wrote: > Hi, forgive the top-post but I have not seen any response to this post? The project is currently in feature-freeze in preparation for the next major release so new development and ideas are not the top priority right now. Additionally there is

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-05-17 Thread Amonson, Paul D
Proposal for Updating CRC32C with AVX-512 Algorithm. > > Hi, > > Comparing the current SSE4.2 implementation of the CRC32C algorithm in > Postgres, to an optimized AVX-512 algorithm [0] we observed significant > gains. The result was a ~6.6X average multiplier of increased perf

Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-05-01 Thread Amonson, Paul D
Hi, Comparing the current SSE4.2 implementation of the CRC32C algorithm in Postgres, to an optimized AVX-512 algorithm [0] we observed significant gains. The result was a ~6.6X average multiplier of increased performance measured on 3 different Intel products. Details below. The AVX-512 algorit