[Bug target/94832] AVX512 scatter/gather macros lack parentheses when unoptimized

2020-09-17 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94832

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Jakub Jelinek  ---
Fixed for 8.5 in r8-10498-ga0159c30c19a1271f6b6ba6bc489c2c1c59954a3 and by the
above commit for 9.4+ too.

[Bug target/94832] AVX512 scatter/gather macros lack parentheses when unoptimized

2020-09-16 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94832

--- Comment #8 from CVS Commits  ---
The releases/gcc-9 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:f97bf9657cecaaf8afd14b43e5ca9be294ab870c

commit r9-8891-gf97bf9657cecaaf8afd14b43e5ca9be294ab870c
Author: Jakub Jelinek 
Date:   Wed Apr 29 17:30:22 2020 +0200

x86: Fix -O0 intrinsic *gather*/*scatter* macros [PR94832]

As reported in the PR, while most intrinsic -O0 macro argument uses
are properly wrapped in ()s or used in context where having a complex
expression passed as the argument doesn't pose a problem (e.g. when
macro argument use is in between commas, or between ( and comma, or
between comma and ) etc.), especially the gather/scatter macros don't do
this and if one passes to some macro e.g. x + y as argument, the
corresponding inline function would do cast on the argument, but
the macro does (int) ARG, then it is (int) x + y rather than (int) (x + y).

The following patch fixes those issues in *gather/*scatter*; additionally,
the AVX2 macros were passing incorrect mask of e.g.
(__v2df)_mm_set1_pd((double)(long long int) -1)
which is IMHO equivalent to
(__v2df){-1.0, -1.0}
when it really wants to pass __v2df vector with all bits set.
I've used what the inline functions use for those cases.

2020-04-29  Jakub Jelinek  

PR target/94832
* config/i386/avx2intrin.h (_mm_mask_i32gather_pd,
_mm256_mask_i32gather_pd, _mm_mask_i64gather_pd,
_mm256_mask_i64gather_pd, _mm_mask_i32gather_ps,
_mm256_mask_i32gather_ps, _mm_mask_i64gather_ps,
_mm256_mask_i64gather_ps, _mm_i32gather_epi64,
_mm_mask_i32gather_epi64, _mm256_i32gather_epi64,
_mm256_mask_i32gather_epi64, _mm_i64gather_epi64,
_mm_mask_i64gather_epi64, _mm256_i64gather_epi64,
_mm256_mask_i64gather_epi64, _mm_i32gather_epi32,
_mm_mask_i32gather_epi32, _mm256_i32gather_epi32,
_mm256_mask_i32gather_epi32, _mm_i64gather_epi32,
_mm_mask_i64gather_epi32, _mm256_i64gather_epi32,
_mm256_mask_i64gather_epi32): Surround macro parameter uses with
parens.
(_mm_i32gather_pd, _mm256_i32gather_pd, _mm_i64gather_pd,
_mm256_i64gather_pd, _mm_i32gather_ps, _mm256_i32gather_ps,
_mm_i64gather_ps, _mm256_i64gather_ps): Likewise.  Don't use
as mask vector containing -1.0 or -1.0f elts, but instead vector
with all bits set using _mm*_cmpeq_p? with zero operands.
* config/i386/avx512fintrin.h (_mm512_i32gather_ps,
_mm512_mask_i32gather_ps, _mm512_i32gather_pd,
_mm512_mask_i32gather_pd, _mm512_i64gather_ps,
_mm512_mask_i64gather_ps, _mm512_i64gather_pd,
_mm512_mask_i64gather_pd, _mm512_i32gather_epi32,
_mm512_mask_i32gather_epi32, _mm512_i32gather_epi64,
_mm512_mask_i32gather_epi64, _mm512_i64gather_epi32,
_mm512_mask_i64gather_epi32, _mm512_i64gather_epi64,
_mm512_mask_i64gather_epi64, _mm512_i32scatter_ps,
_mm512_mask_i32scatter_ps, _mm512_i32scatter_pd,
_mm512_mask_i32scatter_pd, _mm512_i64scatter_ps,
_mm512_mask_i64scatter_ps, _mm512_i64scatter_pd,
_mm512_mask_i64scatter_pd, _mm512_i32scatter_epi32,
_mm512_mask_i32scatter_epi32, _mm512_i32scatter_epi64,
_mm512_mask_i32scatter_epi64, _mm512_i64scatter_epi32,
_mm512_mask_i64scatter_epi32, _mm512_i64scatter_epi64,
_mm512_mask_i64scatter_epi64): Surround macro parameter uses with
parens.
* config/i386/avx512pfintrin.h (_mm512_prefetch_i32gather_pd,
_mm512_prefetch_i32gather_ps, _mm512_mask_prefetch_i32gather_pd,
_mm512_mask_prefetch_i32gather_ps, _mm512_prefetch_i64gather_pd,
_mm512_prefetch_i64gather_ps, _mm512_mask_prefetch_i64gather_pd,
_mm512_mask_prefetch_i64gather_ps, _mm512_prefetch_i32scatter_pd,
_mm512_prefetch_i32scatter_ps, _mm512_mask_prefetch_i32scatter_pd,
_mm512_mask_prefetch_i32scatter_ps, _mm512_prefetch_i64scatter_pd,
_mm512_prefetch_i64scatter_ps, _mm512_mask_prefetch_i64scatter_pd,
_mm512_mask_prefetch_i64scatter_ps): Likewise.
* config/i386/avx512vlintrin.h (_mm256_mmask_i32gather_ps,
_mm_mmask_i32gather_ps, _mm256_mmask_i32gather_pd,
_mm_mmask_i32gather_pd, _mm256_mmask_i64gather_ps,
_mm_mmask_i64gather_ps, _mm256_mmask_i64gather_pd,
_mm_mmask_i64gather_pd, _mm256_mmask_i32gather_epi32,
_mm_mmask_i32gather_epi32, _mm256_mmask_i32gather_epi64,
_mm_mmask_i32gather_epi64, _mm256_mmask_i64gather_epi32,
_mm_mmask_i64gather_epi32, _mm256_mmask_i64gather_epi64,
_mm_mmask_i64gather_epi64, _mm256_i32scatter_ps,
  

[Bug target/94832] AVX512 scatter/gather macros lack parentheses when unoptimized

2020-09-16 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94832

--- Comment #9 from CVS Commits  ---
The releases/gcc-9 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:ccee0511abf6e0bb679fa6b4941e5a71a6521b12

commit r9-8892-gccee0511abf6e0bb679fa6b4941e5a71a6521b12
Author: Jakub Jelinek 
Date:   Wed Apr 29 17:31:26 2020 +0200

x86: Fix -O0 remaining intrinsic macros [PR94832]

A few other macros seem to suffer from the same issue.  What I've done was:
cat gcc/config/i386/*intrin.h | sed -e ':x /\\$/ { N; s/\\\n//g ; bx }' \
| grep '^[[:blank:]]*#[[:blank:]]*define[[:blank:]].*(' | sed 's/[ 
]\+/ /g' \
> /tmp/macros
and then looking for regexps:
)[a-zA-Z]
) [a-zA-Z]
[a-zA-Z][-+*/%]
[a-zA-Z] [-+*/%]
[-+*/%][a-zA-Z]
[-+*/%] [a-zA-Z]
in the resulting file.

2020-04-29  Jakub Jelinek  

PR target/94832
* config/i386/avx512bwintrin.h (_mm512_alignr_epi8,
_mm512_mask_alignr_epi8, _mm512_maskz_alignr_epi8): Wrap macro
operands
used in casts into parens.
* config/i386/avx512fintrin.h (_mm512_cvt_roundps_ph,
_mm512_cvtps_ph,
_mm512_mask_cvt_roundps_ph, _mm512_mask_cvtps_ph,
_mm512_maskz_cvt_roundps_ph, _mm512_maskz_cvtps_ph,
_mm512_mask_cmp_epi64_mask, _mm512_mask_cmp_epi32_mask,
_mm512_mask_cmp_epu64_mask, _mm512_mask_cmp_epu32_mask,
_mm512_mask_cmp_round_pd_mask, _mm512_mask_cmp_round_ps_mask,
_mm512_mask_cmp_pd_mask, _mm512_mask_cmp_ps_mask): Likewise.
* config/i386/avx512vlbwintrin.h (_mm256_mask_alignr_epi8,
_mm256_maskz_alignr_epi8, _mm_mask_alignr_epi8,
_mm_maskz_alignr_epi8,
_mm256_mask_cmp_epu8_mask): Likewise.
* config/i386/avx512vlintrin.h (_mm_mask_cvtps_ph,
_mm_maskz_cvtps_ph,
_mm256_mask_cvtps_ph, _mm256_maskz_cvtps_ph): Likewise.
* config/i386/f16cintrin.h (_mm_cvtps_ph, _mm256_cvtps_ph):
Likewise.
* config/i386/shaintrin.h (_mm_sha1rnds4_epu32): Likewise.

(cherry picked from commit 0c8217b16f307c3eedce8f22354714938613f701)

[Bug target/94832] AVX512 scatter/gather macros lack parentheses when unoptimized

2020-04-29 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94832

--- Comment #7 from Jakub Jelinek  ---
Fixed for 10+ so far.

[Bug target/94832] AVX512 scatter/gather macros lack parentheses when unoptimized

2020-04-29 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94832

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:0c8217b16f307c3eedce8f22354714938613f701

commit r10-8055-g0c8217b16f307c3eedce8f22354714938613f701
Author: Jakub Jelinek 
Date:   Wed Apr 29 17:31:26 2020 +0200

x86: Fix -O0 remaining intrinsic macros [PR94832]

A few other macros seem to suffer from the same issue.  What I've done was:
cat gcc/config/i386/*intrin.h | sed -e ':x /\\$/ { N; s/\\\n//g ; bx }' \
| grep '^[[:blank:]]*#[[:blank:]]*define[[:blank:]].*(' | sed 's/[ 
]\+/ /g' \
> /tmp/macros
and then looking for regexps:
)[a-zA-Z]
) [a-zA-Z]
[a-zA-Z][-+*/%]
[a-zA-Z] [-+*/%]
[-+*/%][a-zA-Z]
[-+*/%] [a-zA-Z]
in the resulting file.

2020-04-29  Jakub Jelinek  

PR target/94832
* config/i386/avx512bwintrin.h (_mm512_alignr_epi8,
_mm512_mask_alignr_epi8, _mm512_maskz_alignr_epi8): Wrap macro
operands
used in casts into parens.
* config/i386/avx512fintrin.h (_mm512_cvt_roundps_ph,
_mm512_cvtps_ph,
_mm512_mask_cvt_roundps_ph, _mm512_mask_cvtps_ph,
_mm512_maskz_cvt_roundps_ph, _mm512_maskz_cvtps_ph,
_mm512_mask_cmp_epi64_mask, _mm512_mask_cmp_epi32_mask,
_mm512_mask_cmp_epu64_mask, _mm512_mask_cmp_epu32_mask,
_mm512_mask_cmp_round_pd_mask, _mm512_mask_cmp_round_ps_mask,
_mm512_mask_cmp_pd_mask, _mm512_mask_cmp_ps_mask): Likewise.
* config/i386/avx512vlbwintrin.h (_mm256_mask_alignr_epi8,
_mm256_maskz_alignr_epi8, _mm_mask_alignr_epi8,
_mm_maskz_alignr_epi8,
_mm256_mask_cmp_epu8_mask): Likewise.
* config/i386/avx512vlintrin.h (_mm_mask_cvtps_ph,
_mm_maskz_cvtps_ph,
_mm256_mask_cvtps_ph, _mm256_maskz_cvtps_ph): Likewise.
* config/i386/f16cintrin.h (_mm_cvtps_ph, _mm256_cvtps_ph):
Likewise.
* config/i386/shaintrin.h (_mm_sha1rnds4_epu32): Likewise.

[Bug target/94832] AVX512 scatter/gather macros lack parentheses when unoptimized

2020-04-29 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94832

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:78cef09019cc9c80d1b39a49861f8827a2ee2e60

commit r10-8054-g78cef09019cc9c80d1b39a49861f8827a2ee2e60
Author: Jakub Jelinek 
Date:   Wed Apr 29 17:30:22 2020 +0200

x86: Fix -O0 intrinsic *gather*/*scatter* macros [PR94832]

As reported in the PR, while most intrinsic -O0 macro argument uses
are properly wrapped in ()s or used in context where having a complex
expression passed as the argument doesn't pose a problem (e.g. when
macro argument use is in between commas, or between ( and comma, or
between comma and ) etc.), especially the gather/scatter macros don't do
this and if one passes to some macro e.g. x + y as argument, the
corresponding inline function would do cast on the argument, but
the macro does (int) ARG, then it is (int) x + y rather than (int) (x + y).

The following patch fixes those issues in *gather/*scatter*; additionally,
the AVX2 macros were passing incorrect mask of e.g.
(__v2df)_mm_set1_pd((double)(long long int) -1)
which is IMHO equivalent to
(__v2df){-1.0, -1.0}
when it really wants to pass __v2df vector with all bits set.
I've used what the inline functions use for those cases.

2020-04-29  Jakub Jelinek  

PR target/94832
* config/i386/avx2intrin.h (_mm_mask_i32gather_pd,
_mm256_mask_i32gather_pd, _mm_mask_i64gather_pd,
_mm256_mask_i64gather_pd, _mm_mask_i32gather_ps,
_mm256_mask_i32gather_ps, _mm_mask_i64gather_ps,
_mm256_mask_i64gather_ps, _mm_i32gather_epi64,
_mm_mask_i32gather_epi64, _mm256_i32gather_epi64,
_mm256_mask_i32gather_epi64, _mm_i64gather_epi64,
_mm_mask_i64gather_epi64, _mm256_i64gather_epi64,
_mm256_mask_i64gather_epi64, _mm_i32gather_epi32,
_mm_mask_i32gather_epi32, _mm256_i32gather_epi32,
_mm256_mask_i32gather_epi32, _mm_i64gather_epi32,
_mm_mask_i64gather_epi32, _mm256_i64gather_epi32,
_mm256_mask_i64gather_epi32): Surround macro parameter uses with
parens.
(_mm_i32gather_pd, _mm256_i32gather_pd, _mm_i64gather_pd,
_mm256_i64gather_pd, _mm_i32gather_ps, _mm256_i32gather_ps,
_mm_i64gather_ps, _mm256_i64gather_ps): Likewise.  Don't use
as mask vector containing -1.0 or -1.0f elts, but instead vector
with all bits set using _mm*_cmpeq_p? with zero operands.
* config/i386/avx512fintrin.h (_mm512_i32gather_ps,
_mm512_mask_i32gather_ps, _mm512_i32gather_pd,
_mm512_mask_i32gather_pd, _mm512_i64gather_ps,
_mm512_mask_i64gather_ps, _mm512_i64gather_pd,
_mm512_mask_i64gather_pd, _mm512_i32gather_epi32,
_mm512_mask_i32gather_epi32, _mm512_i32gather_epi64,
_mm512_mask_i32gather_epi64, _mm512_i64gather_epi32,
_mm512_mask_i64gather_epi32, _mm512_i64gather_epi64,
_mm512_mask_i64gather_epi64, _mm512_i32scatter_ps,
_mm512_mask_i32scatter_ps, _mm512_i32scatter_pd,
_mm512_mask_i32scatter_pd, _mm512_i64scatter_ps,
_mm512_mask_i64scatter_ps, _mm512_i64scatter_pd,
_mm512_mask_i64scatter_pd, _mm512_i32scatter_epi32,
_mm512_mask_i32scatter_epi32, _mm512_i32scatter_epi64,
_mm512_mask_i32scatter_epi64, _mm512_i64scatter_epi32,
_mm512_mask_i64scatter_epi32, _mm512_i64scatter_epi64,
_mm512_mask_i64scatter_epi64): Surround macro parameter uses with
parens.
* config/i386/avx512pfintrin.h (_mm512_prefetch_i32gather_pd,
_mm512_prefetch_i32gather_ps, _mm512_mask_prefetch_i32gather_pd,
_mm512_mask_prefetch_i32gather_ps, _mm512_prefetch_i64gather_pd,
_mm512_prefetch_i64gather_ps, _mm512_mask_prefetch_i64gather_pd,
_mm512_mask_prefetch_i64gather_ps, _mm512_prefetch_i32scatter_pd,
_mm512_prefetch_i32scatter_ps, _mm512_mask_prefetch_i32scatter_pd,
_mm512_mask_prefetch_i32scatter_ps, _mm512_prefetch_i64scatter_pd,
_mm512_prefetch_i64scatter_ps, _mm512_mask_prefetch_i64scatter_pd,
_mm512_mask_prefetch_i64scatter_ps): Likewise.
* config/i386/avx512vlintrin.h (_mm256_mmask_i32gather_ps,
_mm_mmask_i32gather_ps, _mm256_mmask_i32gather_pd,
_mm_mmask_i32gather_pd, _mm256_mmask_i64gather_ps,
_mm_mmask_i64gather_ps, _mm256_mmask_i64gather_pd,
_mm_mmask_i64gather_pd, _mm256_mmask_i32gather_epi32,
_mm_mmask_i32gather_epi32, _mm256_mmask_i32gather_epi64,
_mm_mmask_i32gather_epi64, _mm256_mmask_i64gather_epi32,
_mm_mmask_i64gather_epi32, _mm256_mmask_i64gather_epi64,
_mm_mmask_i64gather_epi64, _mm256_i32scatter_ps,
 

[Bug target/94832] AVX512 scatter/gather macros lack parentheses when unoptimized

2020-04-29 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94832

--- Comment #4 from Jakub Jelinek  ---
(In reply to Kenneth Heafield from comment #3)
> Being a macro some of the time also causes trouble with template commas and
> the C preprocessor.  
> 
> #include 
> template  int *TemplatedFunction();
> void Fail() {
>   _mm512_mask_i32scatter_epi32(TemplatedFunction(), 0x,
> _mm512_set1_epi32(1), _mm512_set1_epi32(1), 1);
> }

You need to wrap the arguments in ()s then, I'm afraid there is nothing else
that can be done about that.  The reason for the macros rather than inline
functions is that those particular intrinsic require at least one compile time
constant argument and at -O0 there is no guarantee the compile time constant
would be propagated into the builtin that is used under the hood for the
intrinsic.
It is the same thing as with say C header APIs, those can be also implemented
as functions or as macros.

[Bug target/94832] AVX512 scatter/gather macros lack parentheses when unoptimized

2020-04-29 Thread gcc at kheafield dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94832

--- Comment #3 from Kenneth Heafield  ---
Being a macro some of the time also causes trouble with template commas and the
C preprocessor.  

#include 
template  int *TemplatedFunction();
void Fail() {
  _mm512_mask_i32scatter_epi32(TemplatedFunction(), 0x,
_mm512_set1_epi32(1), _mm512_set1_epi32(1), 1);
}

Without optimization, error because the template , is interpreted by the macro. 

g++ -mavx512f -c template.cc 
template.cc:6:118: error: macro "_mm512_mask_i32scatter_epi32" passed 6
arguments, but takes just 5
6 |   _mm512_mask_i32scatter_epi32(TemplatedFunction(), 0x,
_mm512_set1_epi32(1), _mm512_set1_epi32(1), 1);
  |
 ^
In file included from
/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/include/immintrin.h:55,
 from template.cc:1:
/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/include/avx512fintrin.h:10475: note:
macro "_mm512_mask_i32scatter_epi32" defined here
10475 | #define _mm512_mask_i32scatter_epi32(ADDR, MASK, INDEX, V1, SCALE) \
  | 
template.cc: In function ‘void Fail()’:
template.cc:6:3: error: ‘_mm512_mask_i32scatter_epi32’ was not declared in this
scope
6 |   _mm512_mask_i32scatter_epi32(TemplatedFunction(), 0x,
_mm512_set1_epi32(1), _mm512_set1_epi32(1), 1);
  |   ^~~~


With optimization, no output.  
g++ -mavx512f -O3 -c template.cc

[Bug target/94832] AVX512 scatter/gather macros lack parentheses when unoptimized

2020-04-29 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94832

--- Comment #2 from Jakub Jelinek  ---
Created attachment 48405
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48405=edit
gcc10-pr94832.patch

Untested fix for the -O0 gather/scatter macros.