Hi John,
> Further, we verified upthread that Intel's current and near-future product
> line
> includes server chips (some with over 100 cores, so not exactly low-end) that
> don't support AVX-512 at all. I have no idea how common they will be, but they
> will certainly be found in cloud datacen
On Sat, Jan 25, 2025 at 3:35 AM Devulapalli, Raghuveer
wrote:
> > #1 - The choice of AVX-512. There is no such thing as a "CRC instruction
> > operating
> > on 8 bytes", and the proposed algorithm is a multistep process using
> > carryless
> > multiplication and requiring at least 256 bytes of
Hi John,
Thanks for your summary and here are responses:
> #1 - The choice of AVX-512. There is no such thing as a "CRC instruction
> operating
> on 8 bytes", and the proposed algorithm is a multistep process using carryless
> multiplication and requiring at least 256 bytes of input. The Chrom
On Wed, Jan 22, 2025 at 12:46 AM Devulapalli, Raghuveer
wrote:
>
> Is there any additional feedback for this patch?
Hi Raghuveer,
I raised one question and one concern upthread. I will repeat them
here for convenience.
#1 - The choice of AVX-512. There is no such thing as a "CRC
instruction ope
> Hello! I'm Matthew Sterrett and I'm a coworker of Raghuveer; he asked me to
> look into the Windows build failures related to pg_comp_crc32c.
>
> It seems that the only thing that was required to fix that is to mark
> pg_comp_crc32c as PGDLLIMPORT, so I added a patch that does just that.
> I'm n
On 12/7/2024 12:42 AM, Devulapalli, Raghuveer wrote:
[0] https://cirrus-ci.com/task/6023394207989760
[1] https://cirrus-ci.com/task/5460444254568448
[2] https://cirrus-ci.com/task/6586344161411072
I was able to fix [0] and [1], but I can't think of why [2] fails. When I tried
to reproduce th
On Sat, Dec 14, 2024 at 10:24 PM Andres Freund wrote:
>
> Hi,
>
> On 2024-12-14 12:08:57 +0700, John Naylor wrote:
> > On Thu, Jun 13, 2024 at 2:37 AM Andres Freund wrote:
> > >
> > > It's hard to understand, but a nonetheless helpful page is
> > > https://users.ece.cmu.edu/~koopman/crc/crc32.htm
Hi,
On 2024-12-14 12:08:57 +0700, John Naylor wrote:
> On Thu, Jun 13, 2024 at 2:37 AM Andres Freund wrote:
> >
> > It's hard to understand, but a nonetheless helpful page is
> > https://users.ece.cmu.edu/~koopman/crc/crc32.html which lists properties for
> > crc32c:
> > https://users.ece.cmu.edu
On Thu, Jun 13, 2024 at 2:37 AM Andres Freund wrote:
>
> It's hard to understand, but a nonetheless helpful page is
> https://users.ece.cmu.edu/~koopman/crc/crc32.html which lists properties for
> crc32c:
> https://users.ece.cmu.edu/~koopman/crc/c32/0x8f6e37a0_len.txt
> which lists
> (0x8f6e37a0;
On Fri, 13 Dec 2024 at 00:14, Nathan Bossart wrote:
>
> On Thu, Dec 12, 2024 at 10:45:29AM -0500, Andres Freund wrote:
> > Frankly, we should just move away from using CRCs. They're good for cases
> > where short runs of bit flips are much more likely than other kinds of
> > errors
> > and where
On Thu, Dec 12, 2024 at 10:45:29AM -0500, Andres Freund wrote:
> Frankly, we should just move away from using CRCs. They're good for cases
> where short runs of bit flips are much more likely than other kinds of errors
> and where the amount of data covered by them has a low upper bound. That's not
Hi,
On 2024-12-12 18:32:20 +0700, John Naylor wrote:
> I went and looked at the Chromium source, and found the following
> snippet that uses the same technique, but only requires 128-bit CLMUL
> and has a minimum input size of 64 bytes, rather than 256. This seems
> like it might be better suited
+ * For This Function:
+ * Copyright 2015 The Chromium Authors
I went and looked at the Chromium source, and found the following
snippet that uses the same technique, but only requires 128-bit CLMUL
and has a minimum input size of 64 bytes, rather than 256. This seems
like it might be better suite
On Sat, Dec 7, 2024 at 10:16 PM Devulapalli, Raghuveer
wrote:
> > There is another technique that computes CRC on 3 separate chunks and
> > combines them at the end, so about 3x faster on large-enough chunks.
> > That's the way used for the Arm proposal [0], coincidentally also citing a
> > whit
> Sorry for going back so far, but this thread was pointed out to me, and this
> aspect
> of the design could use some more discussion:
>
> + * pg_crc32c_avx512(): compute the crc32c of the buffer, where the
> + * buffer length must be at least 256, and a multiple of 64. Based
>
> There is anoth
> [0] https://cirrus-ci.com/task/6023394207989760
> [1] https://cirrus-ci.com/task/5460444254568448
> [2] https://cirrus-ci.com/task/6586344161411072
I was able to fix [0] and [1], but I can't think of why [2] fails. When I tried
to reproduce this locally, I get a different unrelated error. Any i
On Thu, Jun 13, 2024 at 3:11 AM Andres Freund wrote:
>
> On 2024-05-01 15:56:08 +, Amonson, Paul D wrote:
> > Workload call size distribution details (write heavy):
> >* Average was approximately around 1,010 bytes per call
> >* ~80% of the calls were under 256 bytes
> >* ~20% of
> Thanks! cfbot is showing a couple of errors [0] [1] [2].
Oh yikes, the CI had passed with an earlier version. Wonder if I made a mess of
the rebase. I will take a look and fix them.
Raghuveer
On Tue, Dec 03, 2024 at 03:46:16PM +, Devulapalli, Raghuveer wrote:
>> Raghuveer, would you mind rebasing this patch set now that the SSE4.2 patch
>> is
>> committed?
>
> Rebased to master branch.
Thanks! cfbot is showing a couple of errors [0] [1] [2]. 32-bit Linux is
failing to compile
> Raghuveer, would you mind rebasing this patch set now that the SSE4.2 patch is
> committed?
Rebased to master branch.
Raghuveer
v8-0001-Add-a-Postgres-SQL-function-for-crc32c-benchmarki.patch
Description: v8-0001-Add-a-Postgres-SQL-function-for-crc32c-benchmarki.patch
v8-0002-Refactor-cons
On Mon, Nov 25, 2024 at 08:54:48PM +, Devulapalli, Raghuveer wrote:
> As Nathan suggested, we moved this to a separate thread. The latest set
> of patches here need to applied on top of patches in that thread.
Raghuveer, would you mind rebasing this patch set now that the SSE4.2 patch
is commi
> > create mode 100644 src/test/modules/test_crc32c/test_crc32c.c
> > create mode 100644 src/test/modules/test_crc32c/test_crc32c.control
>
> Needs to be integrated with the meson based build as well.
Done.
> > +drive_crc32c(PG_FUNCTION_ARGS)
> > +{
> > + int64 count = P
> Would you mind moving the function attribute change for the existing SSE
> 4.2 code to its own patch? I think that is pretty straightforward, and IMHO
> it'd be
> nice to take care of it first so that we can focus on the new stuff.
Just submitted a separate patch for this. Will update the CRC3
On Thu, Nov 07, 2024 at 11:05:14AM -0500, Andres Freund wrote:
> On 2024-10-30 21:03:20 +, Devulapalli, Raghuveer wrote:
>> From a495124ee42cb8f9f206f719b9f2235aff715963 Mon Sep 17 00:00:00 2001
>> From: Nathan Bossart
>> Date: Wed, 16 Oct 2024 15:57:55 -0500
>> Subject: [PATCH v6 5/6] use __a
Hi,
On 2024-10-30 21:03:20 +, Devulapalli, Raghuveer wrote:
> v6: Fixing build failure on Windows/MSVC.
>
> Raghuveer
> From b601e7b4ee9f25fd32e9d8d056bb20a03d755a8a Mon Sep 17 00:00:00 2001
> From: Paul Amonson
> Date: Mon, 6 May 2024 08:34:17 -0700
> Subject: [PATCH v6 1/6] Add a Postgre
On Tue, Oct 29, 2024 at 09:00:17PM +, Devulapalli, Raghuveer wrote:
> (1) The SSE42 and AVX-512 CRC32C also use function attributes to build
> with ISA specific flag..
Would you mind moving the function attribute change for the existing SSE
4.2 code to its own patch? I think that is pretty st
> I've proposed a patch to move the existing AVX-512 code in Postgres to use
> __attribute__((target("..."))) instead of per-translation-unit compiler flags
> [0]. We
> should likely do something similar for this one.
>
> [0] https://postgr.es/m/ZxAqRG1-8fJLMRUY%40nathan
I assume this will be
On Tue, Oct 08, 2024 at 08:19:27PM +, Devulapalli, Raghuveer wrote:
> Hi all, I'm currently in the process of reviewing and analyzing Paul's
> patch. In the meantime, I'm open to addressing any questions or feedback
> you may have.
I've proposed a patch to move the existing AVX-512 code in Pos
Thank you for the introduction, Paul.
Hi all, I'm currently in the process of reviewing and analyzing Paul's patch.
In the meantime, I'm open to addressing any questions or feedback you may have.
> Hi all,
>
> I will be retiring from Intel at the end of this week. I wanted to introduce
> the
Hi all,
I will be retiring from Intel at the end of this week. I wanted to introduce
the engineer who will be taking over the CRC32c proposal and commit fest entry.
Devulapalli, Raghuveer
I have brought him up to speed and he will be the go-to for technical review
comments and questions. Plea
> Things like sizeof() and offsetof() are known at compile time, so the compiler
> will recognize when a condition is always true or false and optimize it out
> accordingly. In cases where the value cannot be known at compile time,
> checking the length in the macro and dispatching to a different
On Mon, Aug 26, 2024 at 07:15:47PM +, Amonson, Paul D wrote:
>> +#define COMP_CRC32C_SMALL(crc, data, len) \
>> +((crc) = pg_comp_crc32c_sse42((crc), (data), (len)))
>>
>> My interpretation of Andres's upthread suggestion is that we'd add the length
>> check within the macro instead of int
> IMHO that would be useful to establish the current state of the patch set from
> a performance standpoint, especially since you've added code intended to
> mitigate the regression.
Ok.
> +#define COMP_CRC32C_SMALL(crc, data, len) \
> + ((crc) = pg_comp_crc32c_sse42((crc), (data), (len)))
>
On Mon, Aug 26, 2024 at 06:54:58PM +, Amonson, Paul D wrote:
>> And this still shows the ~14% regression in your original post?
>
> At the small buffer sizes the margin of error or "noise" is larger,
> 7-11%. My average could be just bad luck. It will take me a while to
> re-setup for full dat
> And this still shows the ~14% regression in your original post?
At the small buffer sizes the margin of error or "noise" is larger, 7-11%. My
average could be just bad luck. It will take me a while to re-setup for full
data collection runs but I can try it again if you like.
Paul
On Mon, Aug 26, 2024 at 06:44:55PM +, Amonson, Paul D wrote:
>> I'm curious about where exactly the regression is coming from. Is it
>> possible
>> that your build for the SSE 4.2 tests was using it unconditionally, i.e.,
>> optimizing away the function pointer?
>
> I am calling the SSE 4.2
> I'm curious about where exactly the regression is coming from. Is it possible
> that your build for the SSE 4.2 tests was using it unconditionally, i.e.,
> optimizing away the function pointer?
I am calling the SSE 4.2 implementation directly; I am not even building the
pg_sse42_*_choose.c fil
On Mon, Aug 26, 2024 at 05:09:35PM +, Amonson, Paul D wrote:
> Ok I added a patch that exposed a new macro CRC32C_COMP_SMALL for
> targeted fixed size < 256 use cases in Postgres. As for mitigating the
> regression in general, I have not been able to work up a fallback (i.e.
> <256 bytes) that
> Upthread [0], Andres suggested dispatching to a different implementation for
> compile-time-known small lengths. Have you looked into that? In your
> original post, you noted a 14% regression for records smaller than 256 bytes,
> which is not an uncommon case for Postgres. IMO we should try to
> Upthread [0], Andres suggested dispatching to a different implementation for
> compile-time-known small lengths. Have you looked into that? In your
> original post, you noted a 14% regression for records smaller than 256 bytes,
> which is not an uncommon case for Postgres. IMO we should try to
Thanks for the new patches.
On Thu, Aug 22, 2024 at 03:14:32PM +, Amonson, Paul D wrote:
> I reran all the basic tests again to make sure that the performance
> numbers were within the margin of error when compared to my original
> finding. This step showed similar numbers (see origin post) ar
Hi,
Here are the latest patches for the accelerated CRC32c algorithm. I did the
following to create these refactored patches:
1) From the main branch I moved all x86_64 hardware checks from the various
locations into a single location. I did not move any ARM tests as I would have
no way to tes
On Wed, Jun 12, 2024 at 12:37:46PM -0700, Andres Freund wrote:
> I'm wonder if this isn't going in the wrong direction. We're using CRCs for
> something they're not well suited for in my understanding - and are paying a
> reasonably high price for it, given that even hardware accelerated CRCs aren'
> Okay, that is very interesting. Yes, we will have no problem reproducing the
> exact license text in the source code. I think we can remove the license
> issue
> as a blocker for this patch.
Hi,
I was wondering if I can I get a review please. I am interested in the refactor
question for the
On Tue, Jun 25, 2024 at 05:41:12PM +, Amonson, Paul D wrote:
> > It would be good to know exactly what, if any, changes the Intel
> > lawyers want us to make to our license if we accept this patch.
>
> I asked about this and there is nothing Intel requires here license
> wise. They believe that
> It would be good to know exactly what, if any, changes the Intel lawyers want
> us to make to our license if we accept this patch.
I asked about this and there is nothing Intel requires here license wise. They
believe that there is nothing wrong with including Clause-3 BSD like licenses
under
On Tue, Jun 18, 2024 at 02:00:34PM -0400, Bruce Momjian wrote:
> While the license we are concerned about does not have this clause, it
> does have:
>
>2. Redistributions in binary form must reproduce the above
> copyright notice, this list of conditions and the following
> dis
On Tue, Jun 18, 2024 at 01:20:50PM -0400, Bruce Momjian wrote:
> On Tue, Jun 18, 2024 at 05:14:08PM +, Amonson, Paul D wrote:
> > > And this bit doesn't look good. The LICENSE file says:
> > ...
> > > > //* Redistributions in binary form must reproduce the above
> > > > // copyright notice
On Tue, Jun 18, 2024 at 05:14:08PM +, Amonson, Paul D wrote:
> > And this bit doesn't look good. The LICENSE file says:
> ...
> > > //* Redistributions in binary form must reproduce the above
> > > // copyright notice, this list of conditions and the following
> > > disclaimer // in the do
> Hmm, I wonder if the "(c) 2024 Intel" line is going to bring us trouble.
> (I bet it's not really necessary anyway.)
Our lawyer agrees, copyright is covered by the "PostgreSQL Global Development
Group" copyright line as a contributor.
> And this bit doesn't look good. The LICENSE file says:
.
On 2024-Jun-12, Amonson, Paul D wrote:
> +/*-
> + *
> + * pg_crc32c_avx512.c
> + * Compute CRC-32C checksum using Intel AVX-512 instructions.
> + *
> + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Grou
> This is extremely workload dependent, it's not hard to find workloads with
> lots of very small record and very few big ones... What you observed might
> have "just" been the warmup behaviour where more full page writes have to
> be written.
Can you tell me how to avoid capturing this "warm-up"
> -Original Message-
> From: Andres Freund
> Sent: Wednesday, June 12, 2024 1:12 PM
> To: Amonson, Paul D
> FWIW, I tried the v2 patch on my Xeon Gold 5215 workstation, and dies early
> on with SIGILL:
Nice catch!!! I was testing the bit for the vpclmulqdq in EBX instead of the
corre
Hi,
On 2024-05-01 15:56:08 +, Amonson, Paul D wrote:
> Comparing the current SSE4.2 implementation of the CRC32C algorithm in
> Postgres, to an optimized AVX-512 algorithm [0] we observed significant
> gains. The result was a ~6.6X average multiplier of increased performance
> measured on 3 di
Hi,
I'm wonder if this isn't going in the wrong direction. We're using CRCs for
something they're not well suited for in my understanding - and are paying a
reasonably high price for it, given that even hardware accelerated CRCs aren't
blazingly fast.
CRCs are used for things like ethernet, iSCSI
On Wed, Jun 12, 2024 at 02:08:02PM -0400, Tom Lane wrote:
> "Amonson, Paul D" writes:
> > I had our OSS internal team, who are experts in OSS licensing, review
> > possible conflicts between the PostgreSQL license and the BSD-Clause 3-like
> > license for the CRC32C AVX-512 code, and they found
"Amonson, Paul D" writes:
> I had our OSS internal team, who are experts in OSS licensing, review
> possible conflicts between the PostgreSQL license and the BSD-Clause 3-like
> license for the CRC32C AVX-512 code, and they found no issues. Therefore,
> including the new license into the Postgr
> The project is currently in feature-freeze in preparation for the next major
> release so new development and ideas are not the top priority right now.
> Additionally there is a large developer meeting shortly which many are busy
> preparing for. Excercise some patience, and I'm sure there will
> On 17 May 2024, at 18:21, Amonson, Paul D wrote:
> Hi, forgive the top-post but I have not seen any response to this post?
The project is currently in feature-freeze in preparation for the next major
release so new development and ideas are not the top priority right now.
Additionally there is
Proposal for Updating CRC32C with AVX-512 Algorithm.
>
> Hi,
>
> Comparing the current SSE4.2 implementation of the CRC32C algorithm in
> Postgres, to an optimized AVX-512 algorithm [0] we observed significant
> gains. The result was a ~6.6X average multiplier of increased perf
Hi,
Comparing the current SSE4.2 implementation of the CRC32C algorithm in
Postgres, to an optimized AVX-512 algorithm [0] we observed significant gains.
The result was a ~6.6X average multiplier of increased performance measured on
3 different Intel products. Details below. The AVX-512 algorit
61 matches
Mail list logo