Re: support for SSE2 intrinsics

2022-08-04 Thread John Naylor
On Thu, Aug 4, 2022 at 12:38 PM Masahiko Sawada 
wrote:
>
> I also think it's a good start. There is a typo in the commit message:
>
> s/hepler/helper/
>
> The rest looks good to me.

Fixed, and pushed, thanks to you both! I'll polish a small patch I have
that actually uses this.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: support for SSE2 intrinsics

2022-08-03 Thread Masahiko Sawada
Hi,

On Wed, Aug 3, 2022 at 2:01 PM John Naylor  wrote:
>
>
> On Tue, Aug 2, 2022 at 11:53 PM Nathan Bossart  
> wrote:
> > I did a bit of cross-checking, and AFAICT this is a reasonable starting
> > point.  emmintrin.h appears to be sufficient for one of my patches that
> > makes use of SSE2 instructions.  That being said, I imagine it'll be
> > especially important to keep an eye on the buildfarm when this change is
> > committed.
>
> Thanks for checking! Here's a concrete patch for testing.

I also think it's a good start. There is a typo in the commit message:

s/hepler/helper/

The rest looks good to me.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: support for SSE2 intrinsics

2022-08-03 Thread Nathan Bossart
On Wed, Aug 03, 2022 at 12:00:39PM +0700, John Naylor wrote:
> Thanks for checking! Here's a concrete patch for testing.

LGTM

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: support for SSE2 intrinsics

2022-08-02 Thread John Naylor
On Tue, Aug 2, 2022 at 11:53 PM Nathan Bossart 
wrote:
> I did a bit of cross-checking, and AFAICT this is a reasonable starting
> point.  emmintrin.h appears to be sufficient for one of my patches that
> makes use of SSE2 instructions.  That being said, I imagine it'll be
> especially important to keep an eye on the buildfarm when this change is
> committed.

Thanks for checking! Here's a concrete patch for testing.

--
John Naylor
EDB: http://www.enterprisedb.com
From 0e01cc8f5f1c2c27631953091a1d657c5e5486be Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Wed, 3 Aug 2022 11:07:40 +0700
Subject: [PATCH v1] Support SSE2 intrinsics where available

SSE2 vector instructions are part of the spec for the 64-bit x86
architecture. Until now we have relied on the compiler to autovectorize
in some limited situations, but some useful coding idioms can only be
expressed explicitly via compiler intrinsics. To this end, add a header
that defines USE_SSE2 when available. While x86-only for now, we can
add other architectures in the future. This will also be the intended
place for low-level hepler functions that use vector operations.

Reviewed by Nathan Bossart

Discussion: https://www.postgresql.org/message-id/CAFBsxsE2G_H_5Wbw%2BNOPm70-BK4xxKf86-mRzY%3DL2sLoQqM%2B-Q%40mail.gmail.com
---
 src/include/port/simd.h | 30 ++
 1 file changed, 30 insertions(+)
 create mode 100644 src/include/port/simd.h

diff --git a/src/include/port/simd.h b/src/include/port/simd.h
new file mode 100644
index 00..a571e79f57
--- /dev/null
+++ b/src/include/port/simd.h
@@ -0,0 +1,30 @@
+/*-
+ *
+ * simd.h
+ *	  Support for platform-specific vector operations.
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/port/simd.h
+ *
+ *-
+ */
+#ifndef SIMD_H
+#define SIMD_H
+
+/*
+ * SSE2 instructions are part of the spec for the 64-bit x86 ISA. We assume
+ * that compilers targeting this architecture understand SSE2 intrinsics.
+ *
+ * We use emmintrin.h rather than the comprehensive header immintrin.h in
+ * order to exclude extensions beyond SSE2. This is because MSVC, at least,
+ * will allow the use of intrinsics that haven't been enabled at compile
+ * time.
+ */
+#if (defined(__x86_64__) || defined(_M_AMD64))
+#include 
+#define USE_SSE2
+#endif
+
+#endif			/* SIMD_H */
-- 
2.36.1



Re: support for SSE2 intrinsics

2022-08-02 Thread Nathan Bossart
On Tue, Aug 02, 2022 at 05:22:52PM +0700, John Naylor wrote:
> Given all this, the anti-climax is: it seems we can start with something
> like src/include/port/simd.h with:
> 
> #if (defined(__x86_64__) || defined(_M_AMD64))
> #include 
> #define USE_SSE2
> #endif
> 
> (plus a comment summarizing the above)
> 
> That we can include into other files, and would be the place to put helper
> functions. Thoughts?

+1

I did a bit of cross-checking, and AFAICT this is a reasonable starting
point.  emmintrin.h appears to be sufficient for one of my patches that
makes use of SSE2 instructions.  That being said, I imagine it'll be
especially important to keep an eye on the buildfarm when this change is
committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




support for SSE2 intrinsics

2022-08-02 Thread John Naylor
Recently there have been several threads where the problem at hand lends
itself to using SSE2 SIMD intrinsics. These are convenient because on
64-bit x86 the instructions are always present and so don't need a runtime
check. To integrate them into our code base, we will need to take some
measures for portability, but after looking around it seems fairly
lightweight:

1. Compiler invocation and symbols

Since SSE2 is part of the AMD64 spec, gcc enables it always:

$ gcc -dM -E- < /dev/null | grep SSE | sort
$ gcc -dM -E -msse2 - < /dev/null | grep SSE | sort
#define __MMX_WITH_SSE__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1

Passing -m32 discards the "MATH" macros but keeps the rest:

$ gcc -dM -E -m32 - < /dev/null | grep SSE | sort
#define __SSE__ 1
#define __SSE2__ 1

Clang behaves similarly.

MSVC doesn't define __SSE2__ (although it does define __AVX__ etc), but we
can just test for _M_X64 or _M_AMD64 (they are equivalent according to [1],
and we have both in our code base already). We could test for __SSE2__ for
32-bit gcc-alikes in the build farm, but I don't think that would tell us
anything interesting, so we can just test for __x86_64__.

2. The intrinsics header

>From Peter Cordes on StackOverflow [2]:

```
immintrin.h is portable across all compilers, and includes all Intel SIMD
intrinsics, and some scalar extensions like BMI2 _pdep_u32. (For AMD SSE4a
and XOP (Bulldozer-family only, dropped for Zen), you need to include a
different header as well.)

The only reason I can think of for including  specifically
would be if you're using MSVC and want to leave intrinsics undefined for
ISA extensions you don't want to depend on.
```

It seems then that MSVC will compile intrinsics without prompting, so to be
safe we'd need to take the latter advice and use .

3. Support for SSE2 intrinsics

This seems to be well-nigh universal AFAICT and doesn't need to be tested
for at configure time. A quick search doesn't turn up anything weird for
Msys or Cygwin. From [2] again, gcc older than 4.4 can generate poor code,
but there is no mention that correctness is a problem.

4. Helper functions

In a couple proposed patches, there has been some interest in abstracting
some SIMD functionality into functions to hide implementation details away.
I agree there are cases where that would help readability and avoid
duplication.

Given all this, the anti-climax is: it seems we can start with something
like src/include/port/simd.h with:

#if (defined(__x86_64__) || defined(_M_AMD64))
#include 
#define USE_SSE2
#endif

(plus a comment summarizing the above)

That we can include into other files, and would be the place to put helper
functions. Thoughts?

[1] https://docs.microsoft.com/en-us/archive/blogs/reiley/macro-revisited
[2]
https://stackoverflow.com/questions/56049110/including-the-correct-intrinsic-header

-- 
John Naylor
EDB: http://www.enterprisedb.com