Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-03 Thread Tom Lane
Andrew Gierth writes: > "Tom" == Tom Lane writes: > Tom> I also noticed that we'd been sloppy about making the file safe to > Tom> compile for both frontend and backend, so I cleaned that up. > In a frontend, wouldn't it be more kosher to

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-03 Thread Andrew Gierth
> "Tom" == Tom Lane writes: Tom> I also noticed that we'd been sloppy about making the file safe to Tom> compile for both frontend and backend, so I cleaned that up. In a frontend, wouldn't it be more kosher to restore the previous SIGILL handler rather than

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-03 Thread Tom Lane
Thomas Munro writes: > Let me try that again with that stupid typo (crc2) fixed... I didn't like that too much as-is, because it was capable of calling elog(ERROR) without having reset the SIGILL trap first. That's just trouble waiting to happen, so I rearranged

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-03 Thread Thomas Munro
On Thu, May 3, 2018 at 5:18 PM, Thomas Munro wrote: > 2018-05-03 05:07:25.904 UTC [19677] DEBUG: using armv8 crc2 hardware = 1 Let me try that again with that stupid typo (crc2) fixed... -- Thomas Munro http://www.enterprisedb.com

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Thomas Munro
On Thu, May 3, 2018 at 4:48 PM, Tom Lane wrote: > Thomas Munro writes: >> On Thu, May 3, 2018 at 4:04 PM, Tom Lane wrote: >>> It strikes me also that, at least for debugging purposes, it's seriously >>> awful that you can't

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Tom Lane
Thomas Munro writes: > On Thu, May 3, 2018 at 4:04 PM, Tom Lane wrote: >> It strikes me also that, at least for debugging purposes, it's seriously >> awful that you can't tell from outside what result this function got. > I don't think *broken*

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Thomas Munro
On Thu, May 3, 2018 at 4:04 PM, Tom Lane wrote: > I wrote: >> Andrew Gierth writes: >>> Isn't there a hidden assumption about endianness there? Right, thanks. >> Yeah, I was wondering about that too, but does anyone actually run >> ARMs

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Tom Lane
I wrote: > Andrew Gierth writes: >> Isn't there a hidden assumption about endianness there? > Yeah, I was wondering about that too, but does anyone actually run > ARMs big-endian? After a bit more thought ... we could remove the magic constant, along with any

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Tom Lane
Andrew Gierth writes: > "Thomas" == Thomas Munro writes: > + uint64 data = 42; > Isn't there a hidden assumption about endianness there? Yeah, I was wondering about that too, but does anyone actually run ARMs

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Andrew Gierth
> "Thomas" == Thomas Munro writes: + uint64 data = 42; Isn't there a hidden assumption about endianness there? -- Andrew (irc:RhodiumToad)

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Thomas Munro
On Thu, May 3, 2018 at 10:08 AM, Tom Lane wrote: > I don't have any way to test this, but it looks plausible, so pushed. > > (I note you forgot to run autoheader, btw.) Thanks! -- Thomas Munro http://www.enterprisedb.com

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Tom Lane
Thomas Munro writes: > On Thu, May 3, 2018 at 2:30 AM, Tom Lane wrote: >> Do you really need the pg_crc32c_armv8_choose_dummy global variable? > True. Of course I needed an interesting length too... I don't have any way to test this, but it

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Tom Lane
Thomas Munro writes: > Ahh, OpenSSL's armcap.c shows how to do this. You need to > siglongjmp() out of there. Here's a patch that does it that way. > Isn't this better? Do you really need the pg_crc32c_armv8_choose_dummy global variable? That seems pretty ugly.

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-05-02 Thread Thomas Munro
On Thu, Apr 5, 2018 at 12:00 AM, Thomas Munro wrote: > ... trying to figure out how to detect the instruction portably using SIGILL > ... Ahh, OpenSSL's armcap.c shows how to do this. You need to siglongjmp() out of there. Here's a patch that does it that way.

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-04-04 Thread Thomas Munro
On Wed, Apr 4, 2018 at 11:47 PM, Thomas Munro wrote: > BTW I did some googling just now and found out that some libraries use > a technique they call "CPU probing": just try it and see if you get > SIGILL. Is that a bad idea for some reason? Here is a quick hack

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-04-04 Thread Thomas Munro
On Wed, Apr 4, 2018 at 11:21 PM, Heikki Linnakangas wrote: > Yep. I got the code before the main loop, to handle the first 1-7 unaligned > bytes, wrong. Apparently those are the only tests that call the CRC function > with very short and unaligned input. BTW I did some googling

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-04-04 Thread Heikki Linnakangas
On 04/04/18 14:13, Thomas Munro wrote: On Wed, Apr 4, 2018 at 9:23 PM, Heikki Linnakangas wrote: Pushed, thanks everyone! On eelpout two test_decoding tests failed: test ddl ... FAILED test rewrite ... FAILED The output has

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-04-04 Thread Thomas Munro
On Wed, Apr 4, 2018 at 9:23 PM, Heikki Linnakangas wrote: > Pushed, thanks everyone! On eelpout two test_decoding tests failed: test ddl ... FAILED test rewrite ... FAILED The output has several cases where

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-04-04 Thread Heikki Linnakangas
On 03/04/18 19:43, Andres Freund wrote: Architecture manual time? They're available freely IIRC and should answer this. Yeah. The best reference I could find was "ARM Cortex-A Series Programmer’s Guide for ARMv8-A" (http://infocenter.arm.com/help/topic/com.arm.doc.den0024a/ch08s01.html).

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-04-04 Thread Heikki Linnakangas
On 03/04/18 21:56, Daniel Gustafsson wrote: The following line should say SSE and not SSSE (and the same typo is in src/include/pg_config.h.in and src/include/pg_config.h.win32). While not introduced in this patch, it’s adjacent to the patched codepath and on topic so may well be fixed while in

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-04-03 Thread Thomas Munro
On Wed, Apr 4, 2018 at 4:38 AM, Heikki Linnakangas wrote: > [armv8-crc32c-2.patch] This tests OK on my Debian buster aarch64 system (the machine that runs "eelpout" in the buildfarm), configure tests as expected and I confirmed that pg_comp_crc32c_armv8 is reached at runtime. I

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-04-03 Thread Daniel Gustafsson
> On 03 Apr 2018, at 18:38, Heikki Linnakangas wrote: > > On 03/04/18 19:09, Andres Freund wrote: >> Hi, >> On 2018-04-03 19:05:19 +0300, Heikki Linnakangas wrote: >>> On 01/04/18 20:32, Andres Freund wrote: On 2018-03-06 02:44:35 +0800, Heikki Linnakangas wrote: > * I

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-04-03 Thread Tom Lane
Heikki Linnakangas writes: > I was just about to commit this, when I started to wonder: Do we need to > worry about alignment? As the patch stands, it will merrily do unaligned > 8-byte loads. Is that OK on ARM? It seems to work on the system I've > been testing on, but I

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-04-03 Thread Andres Freund
Hi, On 2018-04-03 19:38:42 +0300, Heikki Linnakangas wrote: > I was just about to commit this, when I started to wonder: Do we need to > worry about alignment? As the patch stands, it will merrily do unaligned > 8-byte loads. Is that OK on ARM? It seems to work on the system I've been > testing

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-04-03 Thread Heikki Linnakangas
On 03/04/18 19:09, Andres Freund wrote: Hi, On 2018-04-03 19:05:19 +0300, Heikki Linnakangas wrote: On 01/04/18 20:32, Andres Freund wrote: On 2018-03-06 02:44:35 +0800, Heikki Linnakangas wrote: * I tested this on Linux, with gcc and clang, on an ARM64 virtual machine that I had available

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-04-03 Thread Andres Freund
Hi, On 2018-04-03 19:05:19 +0300, Heikki Linnakangas wrote: > On 01/04/18 20:32, Andres Freund wrote: > > On 2018-03-06 02:44:35 +0800, Heikki Linnakangas wrote: > > > * I tested this on Linux, with gcc and clang, on an ARM64 virtual machine > > > that I had available (not an emulator, but a VM

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-04-03 Thread Heikki Linnakangas
On 01/04/18 20:32, Andres Freund wrote: On 2018-03-06 02:44:35 +0800, Heikki Linnakangas wrote: * I tested this on Linux, with gcc and clang, on an ARM64 virtual machine that I had available (not an emulator, but a VM on a shared ARM64 server). Have you seen actual postgres performance

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-04-01 Thread Andres Freund
Hi, On 2018-03-06 02:44:35 +0800, Heikki Linnakangas wrote: > On 02/03/18 06:42, Andres Freund wrote: > > On 2018-03-02 11:37:52 +1300, Thomas Munro wrote: > > > So... that stuff probably needs either a configure check for the > > > getauxval function and/or those headers, or an OS check? > > >

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-03-05 Thread Heikki Linnakangas
On 02/03/18 06:42, Andres Freund wrote: On 2018-03-02 11:37:52 +1300, Thomas Munro wrote: So... that stuff probably needs either a configure check for the getauxval function and/or those headers, or an OS check? It'd probably be better to not rely on os specific headers, and instead directly

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-03-02 Thread Andres Freund
Hi, On 2018-03-02 09:42:22 +, Yuqi Gu wrote: > > Could you show whether it actually improves performance? Usually bulk > > loading data with parallel COPYs is a good way to hit this code path. > > The mini benchmark code: I'd be more interested in a benchmark using postgres itself... > >

RE: Optimize Arm64 crc32c implementation in Postgresql

2018-03-02 Thread Yuqi Gu
onfigure script be made by specific version autoconf ? Thanks! BRs, Yuqi -Original Message- From: Andres Freund [mailto:and...@anarazel.de] Sent: Friday, March 2, 2018 5:36 AM To: Yuqi Gu <yuqi...@arm.com> Cc: pgsql-hack...@postgresql.org Subject: Re: Optimize Arm64 crc32c im

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-03-01 Thread Michael Paquier
On Thu, Mar 01, 2018 at 01:36:23PM -0800, Andres Freund wrote: > On 2018-01-10 15:09:16 +0900, Michael Paquier wrote: >> There are not enough patches for ARM. > > Nah, not needing arch specific patches is good ;) You know already that I am always impressed by your skills in normalizing things

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-03-01 Thread Andres Freund
On 2018-03-02 11:37:52 +1300, Thomas Munro wrote: > So... that stuff probably needs either a configure check for the > getauxval function and/or those headers, or an OS check? It'd probably be better to not rely on os specific headers, and instead directly access the capabilities. > While I'm

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-03-01 Thread Thomas Munro
On Fri, Mar 2, 2018 at 10:36 AM, Andres Freund wrote: > On 2018-01-10 05:58:19 +, Yuqi Gu wrote: >> +#ifdef USE_ARMCE_CRC32C_WITH_RUNTIME_CHECK >> +#include >> +#include >> +#ifndef HWCAP_CRC32 >> +#define HWCAP_CRC32 (1 << 7) >> +#endif > >> +static bool >>

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-03-01 Thread Andres Freund
Hi, On 2018-01-10 05:58:19 +, Yuqi Gu wrote: > Currently PostgreSQL only implements hardware support for CRC32 checksums for > the x86_64 architecture. > Some ARMv8 (AArch64) CPUs implement the CRC32 extension which is implemented > by inline assembly, > so they can also benefit from

Re: Optimize Arm64 crc32c implementation in Postgresql

2018-01-09 Thread Michael Paquier
On Wed, Jan 10, 2018 at 05:58:19AM +, Yuqi Gu wrote: > I would like to propose the patch to optimize crc32c calculation with > Arm64 specific instructions. The hardware-specific code > implementation is used under #if defined > USE_ARMCE_CRC32C_WITH_RUNTIME_CHECK. And the performance is