Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread Uros Bizjak via Gcc-patches
On Fri, Aug 21, 2020 at 6:29 PM Hongtao Liu  wrote:
>
> On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak  wrote:
> >
> > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu  wrote:
> > >
> > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  wrote:
> > > >
> > > > > > > gcc/
> > > > > > > PR target/88808
> > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow
> > > > > > > QImode data go into mask registers.
> > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust 
> > > > > > > constraints
> > > > > > > for mask registers.
> > > > > > > (*movqi_internal): Ditto.
> > > > > > > (*anddi_1): Support mask register operations
> > > > > > > (*and_1): Ditto.
> > > > > > > (*andqi_1): Ditto.
> > > > > > > (*andn_1): Ditto.
> > > > > > > (*_1): Ditto.
> > > > > > > (*qi_1): Ditto.
> > > > > > > (*one_cmpl2_1): Ditto.
> > > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > > (define_peephole2): Move constant 0/-1 directly into mask
> > > > > > > registers.
> > > > > > > * config/i386/predicates.md (mask_reg_operand): New 
> > > > > > > predicate.
> > > > > > > * config/i386/sse.md (define_split): Add post-reload 
> > > > > > > splitters
> > > > > > > that would convert "generic" patterns to mask patterns.
> > > > > > > (*knotsi_1_zext): New define_insn.
> > > > > > >
> > > > > > > gcc/testsuite/
> > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > > >
> > > > > > A little nit, please put new splitters after the instruction 
> > > > > > pattern.
> > > > > >
> > > > > > OK for the whole patch set with the above change,
> > > > > >
> > > > >
> > > > > Yes, thanks for the review.
> > > >
> > > > Please note that your patch introduces several testsuite fails with 
> > > > -m32:
> > > >
> > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c
> > > >
> > >
> > > I can't reproduce this failure.
> >
> > Because you are running it on AVX512 enabled target.
> >
> > > > Program received signal SIGILL, Illegal instruction.
> > > > 0x080490ac in __get_cpuid_count (__edx=,
> > > > __ecx=, __ebx=, __eax= > > > pointer>,
> > > > __subleaf=0, __leaf=7) at 
> > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > > > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, 
> > > > *__edx);
> > > >
> > > >0x080490a3 <+51>:cpuid
> > > >0x080490a5 <+53>:mov$0x1,%eax
> > > >0x080490aa <+58>:mov%ecx,%esi
> > > > => 0x080490ac <+60>:kmovd  %ebx,%k0
> > > >0x080490b0 <+64>:mov%edi,%ecx
> > > >0x080490b2 <+66>:mov%edi,%ebx
> > > >
> > > > kmov insn is generated for __cpuid_count function, where the binary
> > > > determines, if the new instructions are supported. The binary will
> > > > crash in the detection code if the processor lacks AVX512
> > > > instructions.
> > > >
> > >
> > > IMHO, the testcase shouldn't be run on processors without AVX512BW.
> >
> > No, it could run, because it checks for AVX512BW at runtime.
> >
>
> Got it.
>
> > > Because in  avx512bitalg-vpopcntb-1.c, there's /*
> > > dg-require-effective-target avx512bw } */.
> >
> > This is to check the toolchain for support.
> >
> > > what's the version of your assembler?
> >
> > GNU assembler version 2.34-4.fc32
> >
>
> If assembler supports avx512bw, but processor not, the test would pass
> condition `dg-require-effective-target avx512bw` and be runned.
> then crashed for illegal instruction.

Please look at avx512-check.h. This is where main function lives.

> > Please add something like
> > X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK
> > and enable them only for m_CORE_AVX512 (or perhaps m_INTEL).
> >
> > Handle this in inline_secondary_memory_needed to reject direct moves
> > for all other targets. This should disable direct moves for generic
> > targets.
> >
>
> Yes, I'll add it.

Thanks.

Uros.


Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread Hongtao Liu via Gcc-patches
On Sat, Aug 22, 2020 at 1:08 AM H.J. Lu  wrote:
>
> On Fri, Aug 21, 2020 at 10:02 AM H.J. Lu  wrote:
> >
> > On Fri, Aug 21, 2020 at 9:46 AM Hongtao Liu  wrote:
> > >
> > > On Sat, Aug 22, 2020 at 12:36 AM H.J. Lu  wrote:
> > > >
> > > > On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu  wrote:
> > > > >
> > > > > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > > > gcc/
> > > > > > > > > > > PR target/88808
> > > > > > > > > > > * config/i386/i386.c 
> > > > > > > > > > > (ix86_preferred_reload_class): Allow
> > > > > > > > > > > QImode data go into mask registers.
> > > > > > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust 
> > > > > > > > > > > constraints
> > > > > > > > > > > for mask registers.
> > > > > > > > > > > (*movqi_internal): Ditto.
> > > > > > > > > > > (*anddi_1): Support mask register operations
> > > > > > > > > > > (*and_1): Ditto.
> > > > > > > > > > > (*andqi_1): Ditto.
> > > > > > > > > > > (*andn_1): Ditto.
> > > > > > > > > > > (*_1): Ditto.
> > > > > > > > > > > (*qi_1): Ditto.
> > > > > > > > > > > (*one_cmpl2_1): Ditto.
> > > > > > > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > > > > > > (define_peephole2): Move constant 0/-1 directly 
> > > > > > > > > > > into mask
> > > > > > > > > > > registers.
> > > > > > > > > > > * config/i386/predicates.md (mask_reg_operand): 
> > > > > > > > > > > New predicate.
> > > > > > > > > > > * config/i386/sse.md (define_split): Add 
> > > > > > > > > > > post-reload splitters
> > > > > > > > > > > that would convert "generic" patterns to mask 
> > > > > > > > > > > patterns.
> > > > > > > > > > > (*knotsi_1_zext): New define_insn.
> > > > > > > > > > >
> > > > > > > > > > > gcc/testsuite/
> > > > > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New 
> > > > > > > > > > > testcase.
> > > > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust 
> > > > > > > > > > > testcase.
> > > > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > > > > > > >
> > > > > > > > > > A little nit, please put new splitters after the 
> > > > > > > > > > instruction pattern.
> > > > > > > > > >
> > > > > > > > > > OK for the whole patch set with the above change,
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes, thanks for the review.
> > > > > > > >
> > > > > > > > Please note that your patch introduces several testsuite fails 
> > > > > > > > with -m32:
> > > > > > > >
> > > > > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g 
> > > > > > > > avx512bitalg-vpopcntb-1.c
> > > > > > > >
> > > > > > >
> > > > > > > I can't reproduce this failure.
> > > > > >
> > > > > > Because you are running it on AVX512 enabled target.
> > > > > >
> > > > > > > > Program received signal SIGILL, Illegal instruction.
> > > > > > > > 0x080490ac in __get_cpuid_count (__edx=,
> > > > > > > > __ecx=, __ebx=, 
> > > > > > > > __eax= > > > > > > > pointer>,
> > > > > > > > __subleaf=0, __leaf=7) at 
> > > > > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > > > > > > > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, 
> > > > > > > > *__ecx, *__edx);
> > > > > > > >
> > > > > > > >0x080490a3 <+51>:cpuid
> > > > > > > >0x080490a5 <+53>:mov$0x1,%eax
> > > > > > > >0x080490aa <+58>:mov%ecx,%esi
> > > > > > > > => 0x080490ac <+60>:kmovd  %ebx,%k0
> > > > > > > >0x080490b0 <+64>:mov%edi,%ecx
> > > > > > > >0x080490b2 <+66>:mov%edi,%ebx
> > > > > > > >
> > > > > > > > kmov insn is generated for __cpuid_count function, where the 
> > > > > > > > binary
> > > > > > > > determines, if the new instructions are supported. The binary 
> > > > > > > > will
> > > > > > > > crash in the detection code if the processor lacks AVX512
> > > > > > > > instructions.
> > > > > > > >
> > > > > > >
> > > > > > > IMHO, the testcase shouldn't be run on processors without 
> > > > > > > AVX512BW.
> > > > > >
> > > > > > No, it could run, because it checks for AVX512BW at runtime.
> > > > > >
> > > > >
> > > > > Got it.
> > > > >
> > > > > > > Because in  avx512bitalg-vpopcntb-1.c, there's /*
> > > > > > > dg-require-effective-target 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread H.J. Lu via Gcc-patches
On Fri, Aug 21, 2020 at 10:02 AM H.J. Lu  wrote:
>
> On Fri, Aug 21, 2020 at 9:46 AM Hongtao Liu  wrote:
> >
> > On Sat, Aug 22, 2020 at 12:36 AM H.J. Lu  wrote:
> > >
> > > On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu  wrote:
> > > >
> > > > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak  wrote:
> > > > >
> > > > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  
> > > > > > wrote:
> > > > > > >
> > > > > > > > > > gcc/
> > > > > > > > > > PR target/88808
> > > > > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): 
> > > > > > > > > > Allow
> > > > > > > > > > QImode data go into mask registers.
> > > > > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust 
> > > > > > > > > > constraints
> > > > > > > > > > for mask registers.
> > > > > > > > > > (*movqi_internal): Ditto.
> > > > > > > > > > (*anddi_1): Support mask register operations
> > > > > > > > > > (*and_1): Ditto.
> > > > > > > > > > (*andqi_1): Ditto.
> > > > > > > > > > (*andn_1): Ditto.
> > > > > > > > > > (*_1): Ditto.
> > > > > > > > > > (*qi_1): Ditto.
> > > > > > > > > > (*one_cmpl2_1): Ditto.
> > > > > > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > > > > > (define_peephole2): Move constant 0/-1 directly 
> > > > > > > > > > into mask
> > > > > > > > > > registers.
> > > > > > > > > > * config/i386/predicates.md (mask_reg_operand): New 
> > > > > > > > > > predicate.
> > > > > > > > > > * config/i386/sse.md (define_split): Add 
> > > > > > > > > > post-reload splitters
> > > > > > > > > > that would convert "generic" patterns to mask 
> > > > > > > > > > patterns.
> > > > > > > > > > (*knotsi_1_zext): New define_insn.
> > > > > > > > > >
> > > > > > > > > > gcc/testsuite/
> > > > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust 
> > > > > > > > > > testcase.
> > > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > > > > > >
> > > > > > > > > A little nit, please put new splitters after the instruction 
> > > > > > > > > pattern.
> > > > > > > > >
> > > > > > > > > OK for the whole patch set with the above change,
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes, thanks for the review.
> > > > > > >
> > > > > > > Please note that your patch introduces several testsuite fails 
> > > > > > > with -m32:
> > > > > > >
> > > > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g 
> > > > > > > avx512bitalg-vpopcntb-1.c
> > > > > > >
> > > > > >
> > > > > > I can't reproduce this failure.
> > > > >
> > > > > Because you are running it on AVX512 enabled target.
> > > > >
> > > > > > > Program received signal SIGILL, Illegal instruction.
> > > > > > > 0x080490ac in __get_cpuid_count (__edx=,
> > > > > > > __ecx=, __ebx=, 
> > > > > > > __eax= > > > > > > pointer>,
> > > > > > > __subleaf=0, __leaf=7) at 
> > > > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > > > > > > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, 
> > > > > > > *__ecx, *__edx);
> > > > > > >
> > > > > > >0x080490a3 <+51>:cpuid
> > > > > > >0x080490a5 <+53>:mov$0x1,%eax
> > > > > > >0x080490aa <+58>:mov%ecx,%esi
> > > > > > > => 0x080490ac <+60>:kmovd  %ebx,%k0
> > > > > > >0x080490b0 <+64>:mov%edi,%ecx
> > > > > > >0x080490b2 <+66>:mov%edi,%ebx
> > > > > > >
> > > > > > > kmov insn is generated for __cpuid_count function, where the 
> > > > > > > binary
> > > > > > > determines, if the new instructions are supported. The binary will
> > > > > > > crash in the detection code if the processor lacks AVX512
> > > > > > > instructions.
> > > > > > >
> > > > > >
> > > > > > IMHO, the testcase shouldn't be run on processors without AVX512BW.
> > > > >
> > > > > No, it could run, because it checks for AVX512BW at runtime.
> > > > >
> > > >
> > > > Got it.
> > > >
> > > > > > Because in  avx512bitalg-vpopcntb-1.c, there's /*
> > > > > > dg-require-effective-target avx512bw } */.
> > > > >
> > > > > This is to check the toolchain for support.
> > > > >
> > > > > > what's the version of your assembler?
> > > > >
> > > > > GNU assembler version 2.34-4.fc32
> > > > >
> > > >
> > > > If assembler supports avx512bw, but processor not, the test would pass
> > > > condition `dg-require-effective-target 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread H.J. Lu via Gcc-patches
On Fri, Aug 21, 2020 at 9:46 AM Hongtao Liu  wrote:
>
> On Sat, Aug 22, 2020 at 12:36 AM H.J. Lu  wrote:
> >
> > On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu  wrote:
> > >
> > > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak  wrote:
> > > >
> > > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu  wrote:
> > > > >
> > > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  wrote:
> > > > > >
> > > > > > > > > gcc/
> > > > > > > > > PR target/88808
> > > > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): 
> > > > > > > > > Allow
> > > > > > > > > QImode data go into mask registers.
> > > > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust 
> > > > > > > > > constraints
> > > > > > > > > for mask registers.
> > > > > > > > > (*movqi_internal): Ditto.
> > > > > > > > > (*anddi_1): Support mask register operations
> > > > > > > > > (*and_1): Ditto.
> > > > > > > > > (*andqi_1): Ditto.
> > > > > > > > > (*andn_1): Ditto.
> > > > > > > > > (*_1): Ditto.
> > > > > > > > > (*qi_1): Ditto.
> > > > > > > > > (*one_cmpl2_1): Ditto.
> > > > > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > > > > (define_peephole2): Move constant 0/-1 directly into 
> > > > > > > > > mask
> > > > > > > > > registers.
> > > > > > > > > * config/i386/predicates.md (mask_reg_operand): New 
> > > > > > > > > predicate.
> > > > > > > > > * config/i386/sse.md (define_split): Add post-reload 
> > > > > > > > > splitters
> > > > > > > > > that would convert "generic" patterns to mask 
> > > > > > > > > patterns.
> > > > > > > > > (*knotsi_1_zext): New define_insn.
> > > > > > > > >
> > > > > > > > > gcc/testsuite/
> > > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust 
> > > > > > > > > testcase.
> > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > > > > >
> > > > > > > > A little nit, please put new splitters after the instruction 
> > > > > > > > pattern.
> > > > > > > >
> > > > > > > > OK for the whole patch set with the above change,
> > > > > > > >
> > > > > > >
> > > > > > > Yes, thanks for the review.
> > > > > >
> > > > > > Please note that your patch introduces several testsuite fails with 
> > > > > > -m32:
> > > > > >
> > > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c
> > > > > >
> > > > >
> > > > > I can't reproduce this failure.
> > > >
> > > > Because you are running it on AVX512 enabled target.
> > > >
> > > > > > Program received signal SIGILL, Illegal instruction.
> > > > > > 0x080490ac in __get_cpuid_count (__edx=,
> > > > > > __ecx=, __ebx=, 
> > > > > > __eax= > > > > > pointer>,
> > > > > > __subleaf=0, __leaf=7) at 
> > > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > > > > > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, 
> > > > > > *__edx);
> > > > > >
> > > > > >0x080490a3 <+51>:cpuid
> > > > > >0x080490a5 <+53>:mov$0x1,%eax
> > > > > >0x080490aa <+58>:mov%ecx,%esi
> > > > > > => 0x080490ac <+60>:kmovd  %ebx,%k0
> > > > > >0x080490b0 <+64>:mov%edi,%ecx
> > > > > >0x080490b2 <+66>:mov%edi,%ebx
> > > > > >
> > > > > > kmov insn is generated for __cpuid_count function, where the binary
> > > > > > determines, if the new instructions are supported. The binary will
> > > > > > crash in the detection code if the processor lacks AVX512
> > > > > > instructions.
> > > > > >
> > > > >
> > > > > IMHO, the testcase shouldn't be run on processors without AVX512BW.
> > > >
> > > > No, it could run, because it checks for AVX512BW at runtime.
> > > >
> > >
> > > Got it.
> > >
> > > > > Because in  avx512bitalg-vpopcntb-1.c, there's /*
> > > > > dg-require-effective-target avx512bw } */.
> > > >
> > > > This is to check the toolchain for support.
> > > >
> > > > > what's the version of your assembler?
> > > >
> > > > GNU assembler version 2.34-4.fc32
> > > >
> > >
> > > If assembler supports avx512bw, but processor not, the test would pass
> > > condition `dg-require-effective-target avx512bw` and be runned.
> > > then crashed for illegal instruction.
> > >
> > > > Please add something like
> > > > X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK
> > > > and enable them only for m_CORE_AVX512 (or perhaps m_INTEL).
> > > >
> > > > Handle this in inline_secondary_memory_needed to reject 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread Hongtao Liu via Gcc-patches
On Sat, Aug 22, 2020 at 12:36 AM H.J. Lu  wrote:
>
> On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu  wrote:
> >
> > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak  wrote:
> > >
> > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu  wrote:
> > > >
> > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  wrote:
> > > > >
> > > > > > > > gcc/
> > > > > > > > PR target/88808
> > > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): 
> > > > > > > > Allow
> > > > > > > > QImode data go into mask registers.
> > > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust 
> > > > > > > > constraints
> > > > > > > > for mask registers.
> > > > > > > > (*movqi_internal): Ditto.
> > > > > > > > (*anddi_1): Support mask register operations
> > > > > > > > (*and_1): Ditto.
> > > > > > > > (*andqi_1): Ditto.
> > > > > > > > (*andn_1): Ditto.
> > > > > > > > (*_1): Ditto.
> > > > > > > > (*qi_1): Ditto.
> > > > > > > > (*one_cmpl2_1): Ditto.
> > > > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > > > (define_peephole2): Move constant 0/-1 directly into 
> > > > > > > > mask
> > > > > > > > registers.
> > > > > > > > * config/i386/predicates.md (mask_reg_operand): New 
> > > > > > > > predicate.
> > > > > > > > * config/i386/sse.md (define_split): Add post-reload 
> > > > > > > > splitters
> > > > > > > > that would convert "generic" patterns to mask patterns.
> > > > > > > > (*knotsi_1_zext): New define_insn.
> > > > > > > >
> > > > > > > > gcc/testsuite/
> > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust 
> > > > > > > > testcase.
> > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > > > >
> > > > > > > A little nit, please put new splitters after the instruction 
> > > > > > > pattern.
> > > > > > >
> > > > > > > OK for the whole patch set with the above change,
> > > > > > >
> > > > > >
> > > > > > Yes, thanks for the review.
> > > > >
> > > > > Please note that your patch introduces several testsuite fails with 
> > > > > -m32:
> > > > >
> > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c
> > > > >
> > > >
> > > > I can't reproduce this failure.
> > >
> > > Because you are running it on AVX512 enabled target.
> > >
> > > > > Program received signal SIGILL, Illegal instruction.
> > > > > 0x080490ac in __get_cpuid_count (__edx=,
> > > > > __ecx=, __ebx=, __eax= > > > > pointer>,
> > > > > __subleaf=0, __leaf=7) at 
> > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > > > > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, 
> > > > > *__edx);
> > > > >
> > > > >0x080490a3 <+51>:cpuid
> > > > >0x080490a5 <+53>:mov$0x1,%eax
> > > > >0x080490aa <+58>:mov%ecx,%esi
> > > > > => 0x080490ac <+60>:kmovd  %ebx,%k0
> > > > >0x080490b0 <+64>:mov%edi,%ecx
> > > > >0x080490b2 <+66>:mov%edi,%ebx
> > > > >
> > > > > kmov insn is generated for __cpuid_count function, where the binary
> > > > > determines, if the new instructions are supported. The binary will
> > > > > crash in the detection code if the processor lacks AVX512
> > > > > instructions.
> > > > >
> > > >
> > > > IMHO, the testcase shouldn't be run on processors without AVX512BW.
> > >
> > > No, it could run, because it checks for AVX512BW at runtime.
> > >
> >
> > Got it.
> >
> > > > Because in  avx512bitalg-vpopcntb-1.c, there's /*
> > > > dg-require-effective-target avx512bw } */.
> > >
> > > This is to check the toolchain for support.
> > >
> > > > what's the version of your assembler?
> > >
> > > GNU assembler version 2.34-4.fc32
> > >
> >
> > If assembler supports avx512bw, but processor not, the test would pass
> > condition `dg-require-effective-target avx512bw` and be runned.
> > then crashed for illegal instruction.
> >
> > > Please add something like
> > > X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK
> > > and enable them only for m_CORE_AVX512 (or perhaps m_INTEL).
> > >
> > > Handle this in inline_secondary_memory_needed to reject direct moves
> > > for all other targets. This should disable direct moves for generic
> > > targets.
> > >
> >
> > Yes, I'll add it.
> >
>
>
> (define_insn "*movsi_internal"
>   [(set (match_operand:SI 0 "nonimmediate_operand"
> "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread H.J. Lu via Gcc-patches
On Fri, Aug 21, 2020 at 9:35 AM H.J. Lu  wrote:
>
> On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu  wrote:
> >
> > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak  wrote:
> > >
> > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu  wrote:
> > > >
> > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  wrote:
> > > > >
> > > > > > > > gcc/
> > > > > > > > PR target/88808
> > > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): 
> > > > > > > > Allow
> > > > > > > > QImode data go into mask registers.
> > > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust 
> > > > > > > > constraints
> > > > > > > > for mask registers.
> > > > > > > > (*movqi_internal): Ditto.
> > > > > > > > (*anddi_1): Support mask register operations
> > > > > > > > (*and_1): Ditto.
> > > > > > > > (*andqi_1): Ditto.
> > > > > > > > (*andn_1): Ditto.
> > > > > > > > (*_1): Ditto.
> > > > > > > > (*qi_1): Ditto.
> > > > > > > > (*one_cmpl2_1): Ditto.
> > > > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > > > (define_peephole2): Move constant 0/-1 directly into 
> > > > > > > > mask
> > > > > > > > registers.
> > > > > > > > * config/i386/predicates.md (mask_reg_operand): New 
> > > > > > > > predicate.
> > > > > > > > * config/i386/sse.md (define_split): Add post-reload 
> > > > > > > > splitters
> > > > > > > > that would convert "generic" patterns to mask patterns.
> > > > > > > > (*knotsi_1_zext): New define_insn.
> > > > > > > >
> > > > > > > > gcc/testsuite/
> > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust 
> > > > > > > > testcase.
> > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > > > >
> > > > > > > A little nit, please put new splitters after the instruction 
> > > > > > > pattern.
> > > > > > >
> > > > > > > OK for the whole patch set with the above change,
> > > > > > >
> > > > > >
> > > > > > Yes, thanks for the review.
> > > > >
> > > > > Please note that your patch introduces several testsuite fails with 
> > > > > -m32:
> > > > >
> > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c
> > > > >
> > > >
> > > > I can't reproduce this failure.
> > >
> > > Because you are running it on AVX512 enabled target.
> > >
> > > > > Program received signal SIGILL, Illegal instruction.
> > > > > 0x080490ac in __get_cpuid_count (__edx=,
> > > > > __ecx=, __ebx=, __eax= > > > > pointer>,
> > > > > __subleaf=0, __leaf=7) at 
> > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > > > > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, 
> > > > > *__edx);
> > > > >
> > > > >0x080490a3 <+51>:cpuid
> > > > >0x080490a5 <+53>:mov$0x1,%eax
> > > > >0x080490aa <+58>:mov%ecx,%esi
> > > > > => 0x080490ac <+60>:kmovd  %ebx,%k0
> > > > >0x080490b0 <+64>:mov%edi,%ecx
> > > > >0x080490b2 <+66>:mov%edi,%ebx
> > > > >
> > > > > kmov insn is generated for __cpuid_count function, where the binary
> > > > > determines, if the new instructions are supported. The binary will
> > > > > crash in the detection code if the processor lacks AVX512
> > > > > instructions.
> > > > >
> > > >
> > > > IMHO, the testcase shouldn't be run on processors without AVX512BW.
> > >
> > > No, it could run, because it checks for AVX512BW at runtime.
> > >
> >
> > Got it.
> >
> > > > Because in  avx512bitalg-vpopcntb-1.c, there's /*
> > > > dg-require-effective-target avx512bw } */.
> > >
> > > This is to check the toolchain for support.
> > >
> > > > what's the version of your assembler?
> > >
> > > GNU assembler version 2.34-4.fc32
> > >
> >
> > If assembler supports avx512bw, but processor not, the test would pass
> > condition `dg-require-effective-target avx512bw` and be runned.
> > then crashed for illegal instruction.
> >
> > > Please add something like
> > > X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK
> > > and enable them only for m_CORE_AVX512 (or perhaps m_INTEL).
> > >
> > > Handle this in inline_secondary_memory_needed to reject direct moves
> > > for all other targets. This should disable direct moves for generic
> > > targets.
> > >
> >
> > Yes, I'll add it.
> >
>
>
> (define_insn "*movsi_internal"
>   [(set (match_operand:SI 0 "nonimmediate_operand"
> "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread H.J. Lu via Gcc-patches
On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu  wrote:
>
> On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak  wrote:
> >
> > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu  wrote:
> > >
> > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  wrote:
> > > >
> > > > > > > gcc/
> > > > > > > PR target/88808
> > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow
> > > > > > > QImode data go into mask registers.
> > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust 
> > > > > > > constraints
> > > > > > > for mask registers.
> > > > > > > (*movqi_internal): Ditto.
> > > > > > > (*anddi_1): Support mask register operations
> > > > > > > (*and_1): Ditto.
> > > > > > > (*andqi_1): Ditto.
> > > > > > > (*andn_1): Ditto.
> > > > > > > (*_1): Ditto.
> > > > > > > (*qi_1): Ditto.
> > > > > > > (*one_cmpl2_1): Ditto.
> > > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > > (define_peephole2): Move constant 0/-1 directly into mask
> > > > > > > registers.
> > > > > > > * config/i386/predicates.md (mask_reg_operand): New 
> > > > > > > predicate.
> > > > > > > * config/i386/sse.md (define_split): Add post-reload 
> > > > > > > splitters
> > > > > > > that would convert "generic" patterns to mask patterns.
> > > > > > > (*knotsi_1_zext): New define_insn.
> > > > > > >
> > > > > > > gcc/testsuite/
> > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > > >
> > > > > > A little nit, please put new splitters after the instruction 
> > > > > > pattern.
> > > > > >
> > > > > > OK for the whole patch set with the above change,
> > > > > >
> > > > >
> > > > > Yes, thanks for the review.
> > > >
> > > > Please note that your patch introduces several testsuite fails with 
> > > > -m32:
> > > >
> > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c
> > > >
> > >
> > > I can't reproduce this failure.
> >
> > Because you are running it on AVX512 enabled target.
> >
> > > > Program received signal SIGILL, Illegal instruction.
> > > > 0x080490ac in __get_cpuid_count (__edx=,
> > > > __ecx=, __ebx=, __eax= > > > pointer>,
> > > > __subleaf=0, __leaf=7) at 
> > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > > > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, 
> > > > *__edx);
> > > >
> > > >0x080490a3 <+51>:cpuid
> > > >0x080490a5 <+53>:mov$0x1,%eax
> > > >0x080490aa <+58>:mov%ecx,%esi
> > > > => 0x080490ac <+60>:kmovd  %ebx,%k0
> > > >0x080490b0 <+64>:mov%edi,%ecx
> > > >0x080490b2 <+66>:mov%edi,%ebx
> > > >
> > > > kmov insn is generated for __cpuid_count function, where the binary
> > > > determines, if the new instructions are supported. The binary will
> > > > crash in the detection code if the processor lacks AVX512
> > > > instructions.
> > > >
> > >
> > > IMHO, the testcase shouldn't be run on processors without AVX512BW.
> >
> > No, it could run, because it checks for AVX512BW at runtime.
> >
>
> Got it.
>
> > > Because in  avx512bitalg-vpopcntb-1.c, there's /*
> > > dg-require-effective-target avx512bw } */.
> >
> > This is to check the toolchain for support.
> >
> > > what's the version of your assembler?
> >
> > GNU assembler version 2.34-4.fc32
> >
>
> If assembler supports avx512bw, but processor not, the test would pass
> condition `dg-require-effective-target avx512bw` and be runned.
> then crashed for illegal instruction.
>
> > Please add something like
> > X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK
> > and enable them only for m_CORE_AVX512 (or perhaps m_INTEL).
> >
> > Handle this in inline_secondary_memory_needed to reject direct moves
> > for all other targets. This should disable direct moves for generic
> > targets.
> >
>
> Yes, I'll add it.
>


(define_insn "*movsi_internal"
  [(set (match_operand:SI 0 "nonimmediate_operand"
"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
(match_operand:SI 1 "general_operand"
"g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k ,CBC"))]
  "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
...
 [(set (attr "isa")
 (cond [(eq_attr "alternative" "12,13")
  (const_string "sse2")
   ]
   (const_string "*")))

is wrong.   mask register alternatives should be 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread Hongtao Liu via Gcc-patches
On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak  wrote:
>
> On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu  wrote:
> >
> > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  wrote:
> > >
> > > > > > gcc/
> > > > > > PR target/88808
> > > > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow
> > > > > > QImode data go into mask registers.
> > > > > > * config/i386/i386.md: (*movhi_internal): Adjust constraints
> > > > > > for mask registers.
> > > > > > (*movqi_internal): Ditto.
> > > > > > (*anddi_1): Support mask register operations
> > > > > > (*and_1): Ditto.
> > > > > > (*andqi_1): Ditto.
> > > > > > (*andn_1): Ditto.
> > > > > > (*_1): Ditto.
> > > > > > (*qi_1): Ditto.
> > > > > > (*one_cmpl2_1): Ditto.
> > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > (define_peephole2): Move constant 0/-1 directly into mask
> > > > > > registers.
> > > > > > * config/i386/predicates.md (mask_reg_operand): New 
> > > > > > predicate.
> > > > > > * config/i386/sse.md (define_split): Add post-reload 
> > > > > > splitters
> > > > > > that would convert "generic" patterns to mask patterns.
> > > > > > (*knotsi_1_zext): New define_insn.
> > > > > >
> > > > > > gcc/testsuite/
> > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > >
> > > > > A little nit, please put new splitters after the instruction pattern.
> > > > >
> > > > > OK for the whole patch set with the above change,
> > > > >
> > > >
> > > > Yes, thanks for the review.
> > >
> > > Please note that your patch introduces several testsuite fails with -m32:
> > >
> > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c
> > >
> >
> > I can't reproduce this failure.
>
> Because you are running it on AVX512 enabled target.
>
> > > Program received signal SIGILL, Illegal instruction.
> > > 0x080490ac in __get_cpuid_count (__edx=,
> > > __ecx=, __ebx=, __eax= > > pointer>,
> > > __subleaf=0, __leaf=7) at 
> > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, 
> > > *__edx);
> > >
> > >0x080490a3 <+51>:cpuid
> > >0x080490a5 <+53>:mov$0x1,%eax
> > >0x080490aa <+58>:mov%ecx,%esi
> > > => 0x080490ac <+60>:kmovd  %ebx,%k0
> > >0x080490b0 <+64>:mov%edi,%ecx
> > >0x080490b2 <+66>:mov%edi,%ebx
> > >
> > > kmov insn is generated for __cpuid_count function, where the binary
> > > determines, if the new instructions are supported. The binary will
> > > crash in the detection code if the processor lacks AVX512
> > > instructions.
> > >
> >
> > IMHO, the testcase shouldn't be run on processors without AVX512BW.
>
> No, it could run, because it checks for AVX512BW at runtime.
>

Got it.

> > Because in  avx512bitalg-vpopcntb-1.c, there's /*
> > dg-require-effective-target avx512bw } */.
>
> This is to check the toolchain for support.
>
> > what's the version of your assembler?
>
> GNU assembler version 2.34-4.fc32
>

If assembler supports avx512bw, but processor not, the test would pass
condition `dg-require-effective-target avx512bw` and be runned.
then crashed for illegal instruction.

> Please add something like
> X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK
> and enable them only for m_CORE_AVX512 (or perhaps m_INTEL).
>
> Handle this in inline_secondary_memory_needed to reject direct moves
> for all other targets. This should disable direct moves for generic
> targets.
>

Yes, I'll add it.

> Uros.



-- 
BR,
Hongtao


Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread Uros Bizjak via Gcc-patches
On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu  wrote:
>
> On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  wrote:
> >
> > > > > gcc/
> > > > > PR target/88808
> > > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow
> > > > > QImode data go into mask registers.
> > > > > * config/i386/i386.md: (*movhi_internal): Adjust constraints
> > > > > for mask registers.
> > > > > (*movqi_internal): Ditto.
> > > > > (*anddi_1): Support mask register operations
> > > > > (*and_1): Ditto.
> > > > > (*andqi_1): Ditto.
> > > > > (*andn_1): Ditto.
> > > > > (*_1): Ditto.
> > > > > (*qi_1): Ditto.
> > > > > (*one_cmpl2_1): Ditto.
> > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > (*one_cmplqi2_1): Ditto.
> > > > > (define_peephole2): Move constant 0/-1 directly into mask
> > > > > registers.
> > > > > * config/i386/predicates.md (mask_reg_operand): New predicate.
> > > > > * config/i386/sse.md (define_split): Add post-reload splitters
> > > > > that would convert "generic" patterns to mask patterns.
> > > > > (*knotsi_1_zext): New define_insn.
> > > > >
> > > > > gcc/testsuite/
> > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > >
> > > > A little nit, please put new splitters after the instruction pattern.
> > > >
> > > > OK for the whole patch set with the above change,
> > > >
> > >
> > > Yes, thanks for the review.
> >
> > Please note that your patch introduces several testsuite fails with -m32:
> >
> > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c
> >
>
> I can't reproduce this failure.

Because you are running it on AVX512 enabled target.

> > Program received signal SIGILL, Illegal instruction.
> > 0x080490ac in __get_cpuid_count (__edx=,
> > __ecx=, __ebx=, __eax= > pointer>,
> > __subleaf=0, __leaf=7) at 
> > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, *__edx);
> >
> >0x080490a3 <+51>:cpuid
> >0x080490a5 <+53>:mov$0x1,%eax
> >0x080490aa <+58>:mov%ecx,%esi
> > => 0x080490ac <+60>:kmovd  %ebx,%k0
> >0x080490b0 <+64>:mov%edi,%ecx
> >0x080490b2 <+66>:mov%edi,%ebx
> >
> > kmov insn is generated for __cpuid_count function, where the binary
> > determines, if the new instructions are supported. The binary will
> > crash in the detection code if the processor lacks AVX512
> > instructions.
> >
>
> IMHO, the testcase shouldn't be run on processors without AVX512BW.

No, it could run, because it checks for AVX512BW at runtime.

> Because in  avx512bitalg-vpopcntb-1.c, there's /*
> dg-require-effective-target avx512bw } */.

This is to check the toolchain for support.

> what's the version of your assembler?

GNU assembler version 2.34-4.fc32

Please add something like
X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK
and enable them only for m_CORE_AVX512 (or perhaps m_INTEL).

Handle this in inline_secondary_memory_needed to reject direct moves
for all other targets. This should disable direct moves for generic
targets.

Uros.


Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread H.J. Lu via Gcc-patches
On Fri, Aug 21, 2020 at 8:41 AM Hongtao Liu  wrote:
>
> On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  wrote:
> >
> > > > > gcc/
> > > > > PR target/88808
> > > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow
> > > > > QImode data go into mask registers.
> > > > > * config/i386/i386.md: (*movhi_internal): Adjust constraints
> > > > > for mask registers.
> > > > > (*movqi_internal): Ditto.
> > > > > (*anddi_1): Support mask register operations
> > > > > (*and_1): Ditto.
> > > > > (*andqi_1): Ditto.
> > > > > (*andn_1): Ditto.
> > > > > (*_1): Ditto.
> > > > > (*qi_1): Ditto.
> > > > > (*one_cmpl2_1): Ditto.
> > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > (*one_cmplqi2_1): Ditto.
> > > > > (define_peephole2): Move constant 0/-1 directly into mask
> > > > > registers.
> > > > > * config/i386/predicates.md (mask_reg_operand): New predicate.
> > > > > * config/i386/sse.md (define_split): Add post-reload splitters
> > > > > that would convert "generic" patterns to mask patterns.
> > > > > (*knotsi_1_zext): New define_insn.
> > > > >
> > > > > gcc/testsuite/
> > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > >
> > > > A little nit, please put new splitters after the instruction pattern.
> > > >
> > > > OK for the whole patch set with the above change,
> > > >
> > >
> > > Yes, thanks for the review.
> >
> > Please note that your patch introduces several testsuite fails with -m32:
> >
> > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c
> >
>
> I can't reproduce this failure.
>
> > Program received signal SIGILL, Illegal instruction.
> > 0x080490ac in __get_cpuid_count (__edx=,
> > __ecx=, __ebx=, __eax= > pointer>,
> > __subleaf=0, __leaf=7) at 
> > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, *__edx);
> >
> >0x080490a3 <+51>:cpuid
> >0x080490a5 <+53>:mov$0x1,%eax
> >0x080490aa <+58>:mov%ecx,%esi
> > => 0x080490ac <+60>:kmovd  %ebx,%k0
> >0x080490b0 <+64>:mov%edi,%ecx
> >0x080490b2 <+66>:mov%edi,%ebx
> >
> > kmov insn is generated for __cpuid_count function, where the binary
> > determines, if the new instructions are supported. The binary will
> > crash in the detection code if the processor lacks AVX512
> > instructions.
> >
>
> IMHO, the testcase shouldn't be run on processors without AVX512BW.
> Because in  avx512bitalg-vpopcntb-1.c, there's /* {
> dg-require-effective-target avx512bw } */.
>

dg-require-effective-target avx512bw checks assembler support.
__cpuid_count can't use any mask instructions.   Please run this test
on CPU without AVX512.


-- 
H.J.


Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread Hongtao Liu via Gcc-patches
On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak  wrote:
>
> > > > gcc/
> > > > PR target/88808
> > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow
> > > > QImode data go into mask registers.
> > > > * config/i386/i386.md: (*movhi_internal): Adjust constraints
> > > > for mask registers.
> > > > (*movqi_internal): Ditto.
> > > > (*anddi_1): Support mask register operations
> > > > (*and_1): Ditto.
> > > > (*andqi_1): Ditto.
> > > > (*andn_1): Ditto.
> > > > (*_1): Ditto.
> > > > (*qi_1): Ditto.
> > > > (*one_cmpl2_1): Ditto.
> > > > (*one_cmplsi2_1_zext): Ditto.
> > > > (*one_cmplqi2_1): Ditto.
> > > > (define_peephole2): Move constant 0/-1 directly into mask
> > > > registers.
> > > > * config/i386/predicates.md (mask_reg_operand): New predicate.
> > > > * config/i386/sse.md (define_split): Add post-reload splitters
> > > > that would convert "generic" patterns to mask patterns.
> > > > (*knotsi_1_zext): New define_insn.
> > > >
> > > > gcc/testsuite/
> > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > >
> > > A little nit, please put new splitters after the instruction pattern.
> > >
> > > OK for the whole patch set with the above change,
> > >
> >
> > Yes, thanks for the review.
>
> Please note that your patch introduces several testsuite fails with -m32:
>
> gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c
>

I can't reproduce this failure.

> Program received signal SIGILL, Illegal instruction.
> 0x080490ac in __get_cpuid_count (__edx=,
> __ecx=, __ebx=, __eax= pointer>,
> __subleaf=0, __leaf=7) at /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> 316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, *__edx);
>
>0x080490a3 <+51>:cpuid
>0x080490a5 <+53>:mov$0x1,%eax
>0x080490aa <+58>:mov%ecx,%esi
> => 0x080490ac <+60>:kmovd  %ebx,%k0
>0x080490b0 <+64>:mov%edi,%ecx
>0x080490b2 <+66>:mov%edi,%ebx
>
> kmov insn is generated for __cpuid_count function, where the binary
> determines, if the new instructions are supported. The binary will
> crash in the detection code if the processor lacks AVX512
> instructions.
>

IMHO, the testcase shouldn't be run on processors without AVX512BW.
Because in  avx512bitalg-vpopcntb-1.c, there's /* {
dg-require-effective-target avx512bw } */.

what's the version of your assembler?

> Uros.



-- 
BR,
Hongtao


Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-21 Thread Uros Bizjak via Gcc-patches
> > > gcc/
> > > PR target/88808
> > > * config/i386/i386.c (ix86_preferred_reload_class): Allow
> > > QImode data go into mask registers.
> > > * config/i386/i386.md: (*movhi_internal): Adjust constraints
> > > for mask registers.
> > > (*movqi_internal): Ditto.
> > > (*anddi_1): Support mask register operations
> > > (*and_1): Ditto.
> > > (*andqi_1): Ditto.
> > > (*andn_1): Ditto.
> > > (*_1): Ditto.
> > > (*qi_1): Ditto.
> > > (*one_cmpl2_1): Ditto.
> > > (*one_cmplsi2_1_zext): Ditto.
> > > (*one_cmplqi2_1): Ditto.
> > > (define_peephole2): Move constant 0/-1 directly into mask
> > > registers.
> > > * config/i386/predicates.md (mask_reg_operand): New predicate.
> > > * config/i386/sse.md (define_split): Add post-reload splitters
> > > that would convert "generic" patterns to mask patterns.
> > > (*knotsi_1_zext): New define_insn.
> > >
> > > gcc/testsuite/
> > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> >
> > A little nit, please put new splitters after the instruction pattern.
> >
> > OK for the whole patch set with the above change,
> >
>
> Yes, thanks for the review.

Please note that your patch introduces several testsuite fails with -m32:

gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c

Program received signal SIGILL, Illegal instruction.
0x080490ac in __get_cpuid_count (__edx=,
__ecx=, __ebx=, __eax=,
__subleaf=0, __leaf=7) at /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
316   __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, *__edx);

   0x080490a3 <+51>:cpuid
   0x080490a5 <+53>:mov$0x1,%eax
   0x080490aa <+58>:mov%ecx,%esi
=> 0x080490ac <+60>:kmovd  %ebx,%k0
   0x080490b0 <+64>:mov%edi,%ecx
   0x080490b2 <+66>:mov%edi,%ebx

kmov insn is generated for __cpuid_count function, where the binary
determines, if the new instructions are supported. The binary will
crash in the detection code if the processor lacks AVX512
instructions.

Uros.


Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-20 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 20, 2020 at 3:40 PM Uros Bizjak  wrote:
>
> On Thu, Aug 20, 2020 at 9:31 AM Hongtao Liu  wrote:
> >
> > On Thu, Aug 20, 2020 at 3:24 PM Hongtao Liu  wrote:
> > >
> > > On Wed, Aug 19, 2020 at 3:05 PM Uros Bizjak  wrote:
> > > >
> > > > On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu  wrote:
> > > > >
> > > > > On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak  wrote:
> > > > > >
> > > > > > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu  
> > > > > > wrote:
> > > > > > >
> > > > > > > Enable operator or/xor/and/andn/not for mask register, kxnor is 
> > > > > > > not
> > > > > > > enabled since there's no corresponding instruction for general
> > > > > > > registers.
> > > > > > >
> > > > > > > gcc/
> > > > > > > PR target/88808
> > > > > > > * config/i386/i386.md: (*movsi_internal): Adjust 
> > > > > > > constraints
> > > > > > > for mask registers.
> > > > > > > (*movhi_internal): Ditto.
> > > > > > > (*movqi_internal): Ditto.
> > > > > > > (*anddi_1): Support mask register operations
> > > > > > > (*and_1): Ditto.
> > > > > > > (*andqi_1): Ditto.
> > > > > > > (*andn_1): Ditto.
> > > > > > > (*_1): Ditto.
> > > > > > > (*qi_1): Ditto.
> > > > > > > (*one_cmpl2_1): Ditto.
> > > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > >
> > > > > > > gcc/testsuite/
> > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > > >
> > > > > > index 74d207c3711..e8ad79d1b0a 100644
> > > > > > --- a/gcc/config/i386/i386.md
> > > > > > +++ b/gcc/config/i386/i386.md
> > > > > > @@ -2294,7 +2294,7 @@
> > > > > >
> > > > > >  (define_insn "*movsi_internal"
> > > > > >[(set (match_operand:SI 0 "nonimmediate_operand"
> > > > > > -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> > > > > > +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k")
> > > > > >  (match_operand:SI 1 "general_operand"
> > > > > >  "g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k 
> > > > > > ,CBC"))]
> > > > > >"!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> > > > > >
> > > > > > I'd rather see *k everywhere, also with *movqi_internal and
> > > > > > *movhi_internal patterns. The "*" means that the allocator won't
> > > > > > allocate a mask register by default, but it will be used to optimize
> > > > > > moves. With the above change, you are risking that during integer
> > > > > > register pressure, the register allocator will allocate zero to a 
> > > > > > mask
> > > > > > register, and later "optimize" the move with a direct maskreg-intreg
> > > > > > move.
> > > > > >
> > > > > > The current strategy is that only general registers get allocated 
> > > > > > for
> > > > > > integer modes. Let's keep it this way for now.
> > > > > >
> > > > >
> > > > > Yes,  though it would fail gcc.target/i386/avx512dq-pr88465.c and
> > > > > gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to
> > > > > move zero into mask register directly.
> > > >
> > > > Although it would be nice if the register allocator was smart enough,
> > > > the current strategy is to introduce peephole2 patterns to fix these
> > > > problems, similar to [1]. These peepholes can be introduced in a
> > > > follow-up patch.
> > > >
> > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html
> > > >
> > >
> > > peephole2 added.
> > >
> > > > > > Otherwise, the patchset LGTM, but please test the suggested changes 
> > > > > > and repost.
> > > > > >
> > > > > > BTW: Do you plan to remove mask operations from sse.md? ATM, they 
> > > > > > are
> > > > > > used to distinguish mask operations, generated from builtins from
> > > > > > generic operations, so I'd like to keep them for a while. The 
> > > > > > drawback
> > > > > > is, that they are not combined with other operations, but at the end
> > > > > > of the day, this is what the programmer asked for by using builtins.
> > > > >
> > > > > Agree, I prefer to keep them.
> > > >
> > > > Thinking some more about the approach, it looks to me that the optimal
> > > > solution is a post-reload splitter that would convert "generic"
> > > > patterns to mask operations from sse.md. The mask operations don't set
> > > > flags, so we can substantially improve post reload scheduling of these
> > > > instructions by removing flags clobber.
> > > >
> > > > So, simply add "#" to relevant alternatives of logic patterns and add
> > > > something like:
> > > >
> > > > --cut here--
> > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-20 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 20, 2020 at 9:31 AM Hongtao Liu  wrote:
>
> On Thu, Aug 20, 2020 at 3:24 PM Hongtao Liu  wrote:
> >
> > On Wed, Aug 19, 2020 at 3:05 PM Uros Bizjak  wrote:
> > >
> > > On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu  wrote:
> > > >
> > > > On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak  wrote:
> > > > >
> > > > > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu  
> > > > > wrote:
> > > > > >
> > > > > > Enable operator or/xor/and/andn/not for mask register, kxnor is not
> > > > > > enabled since there's no corresponding instruction for general
> > > > > > registers.
> > > > > >
> > > > > > gcc/
> > > > > > PR target/88808
> > > > > > * config/i386/i386.md: (*movsi_internal): Adjust constraints
> > > > > > for mask registers.
> > > > > > (*movhi_internal): Ditto.
> > > > > > (*movqi_internal): Ditto.
> > > > > > (*anddi_1): Support mask register operations
> > > > > > (*and_1): Ditto.
> > > > > > (*andqi_1): Ditto.
> > > > > > (*andn_1): Ditto.
> > > > > > (*_1): Ditto.
> > > > > > (*qi_1): Ditto.
> > > > > > (*one_cmpl2_1): Ditto.
> > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > (*one_cmplqi2_1): Ditto.
> > > > > >
> > > > > > gcc/testsuite/
> > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > >
> > > > > index 74d207c3711..e8ad79d1b0a 100644
> > > > > --- a/gcc/config/i386/i386.md
> > > > > +++ b/gcc/config/i386/i386.md
> > > > > @@ -2294,7 +2294,7 @@
> > > > >
> > > > >  (define_insn "*movsi_internal"
> > > > >[(set (match_operand:SI 0 "nonimmediate_operand"
> > > > > -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> > > > > +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k")
> > > > >  (match_operand:SI 1 "general_operand"
> > > > >  "g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k ,CBC"))]
> > > > >"!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> > > > >
> > > > > I'd rather see *k everywhere, also with *movqi_internal and
> > > > > *movhi_internal patterns. The "*" means that the allocator won't
> > > > > allocate a mask register by default, but it will be used to optimize
> > > > > moves. With the above change, you are risking that during integer
> > > > > register pressure, the register allocator will allocate zero to a mask
> > > > > register, and later "optimize" the move with a direct maskreg-intreg
> > > > > move.
> > > > >
> > > > > The current strategy is that only general registers get allocated for
> > > > > integer modes. Let's keep it this way for now.
> > > > >
> > > >
> > > > Yes,  though it would fail gcc.target/i386/avx512dq-pr88465.c and
> > > > gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to
> > > > move zero into mask register directly.
> > >
> > > Although it would be nice if the register allocator was smart enough,
> > > the current strategy is to introduce peephole2 patterns to fix these
> > > problems, similar to [1]. These peepholes can be introduced in a
> > > follow-up patch.
> > >
> > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html
> > >
> >
> > peephole2 added.
> >
> > > > > Otherwise, the patchset LGTM, but please test the suggested changes 
> > > > > and repost.
> > > > >
> > > > > BTW: Do you plan to remove mask operations from sse.md? ATM, they are
> > > > > used to distinguish mask operations, generated from builtins from
> > > > > generic operations, so I'd like to keep them for a while. The drawback
> > > > > is, that they are not combined with other operations, but at the end
> > > > > of the day, this is what the programmer asked for by using builtins.
> > > >
> > > > Agree, I prefer to keep them.
> > >
> > > Thinking some more about the approach, it looks to me that the optimal
> > > solution is a post-reload splitter that would convert "generic"
> > > patterns to mask operations from sse.md. The mask operations don't set
> > > flags, so we can substantially improve post reload scheduling of these
> > > instructions by removing flags clobber.
> > >
> > > So, simply add "#" to relevant alternatives of logic patterns and add
> > > something like:
> > >
> > > --cut here--
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > index 41c6dbfa668..ad49bdc7583 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -1470,6 +1470,18 @@
> > >]
> > >(const_string "")))])
> > >
> > > +(define_split
> > > +  [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
> > > +   (any_logic:SWI1248_AVX512BW
> > > + 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-20 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 20, 2020 at 3:24 PM Hongtao Liu  wrote:
>
> On Wed, Aug 19, 2020 at 3:05 PM Uros Bizjak  wrote:
> >
> > On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu  wrote:
> > >
> > > On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak  wrote:
> > > >
> > > > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu  wrote:
> > > > >
> > > > > Enable operator or/xor/and/andn/not for mask register, kxnor is not
> > > > > enabled since there's no corresponding instruction for general
> > > > > registers.
> > > > >
> > > > > gcc/
> > > > > PR target/88808
> > > > > * config/i386/i386.md: (*movsi_internal): Adjust constraints
> > > > > for mask registers.
> > > > > (*movhi_internal): Ditto.
> > > > > (*movqi_internal): Ditto.
> > > > > (*anddi_1): Support mask register operations
> > > > > (*and_1): Ditto.
> > > > > (*andqi_1): Ditto.
> > > > > (*andn_1): Ditto.
> > > > > (*_1): Ditto.
> > > > > (*qi_1): Ditto.
> > > > > (*one_cmpl2_1): Ditto.
> > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > (*one_cmplqi2_1): Ditto.
> > > > >
> > > > > gcc/testsuite/
> > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > >
> > > > index 74d207c3711..e8ad79d1b0a 100644
> > > > --- a/gcc/config/i386/i386.md
> > > > +++ b/gcc/config/i386/i386.md
> > > > @@ -2294,7 +2294,7 @@
> > > >
> > > >  (define_insn "*movsi_internal"
> > > >[(set (match_operand:SI 0 "nonimmediate_operand"
> > > > -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> > > > +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k")
> > > >  (match_operand:SI 1 "general_operand"
> > > >  "g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k ,CBC"))]
> > > >"!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> > > >
> > > > I'd rather see *k everywhere, also with *movqi_internal and
> > > > *movhi_internal patterns. The "*" means that the allocator won't
> > > > allocate a mask register by default, but it will be used to optimize
> > > > moves. With the above change, you are risking that during integer
> > > > register pressure, the register allocator will allocate zero to a mask
> > > > register, and later "optimize" the move with a direct maskreg-intreg
> > > > move.
> > > >
> > > > The current strategy is that only general registers get allocated for
> > > > integer modes. Let's keep it this way for now.
> > > >
> > >
> > > Yes,  though it would fail gcc.target/i386/avx512dq-pr88465.c and
> > > gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to
> > > move zero into mask register directly.
> >
> > Although it would be nice if the register allocator was smart enough,
> > the current strategy is to introduce peephole2 patterns to fix these
> > problems, similar to [1]. These peepholes can be introduced in a
> > follow-up patch.
> >
> > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html
> >
>
> peephole2 added.
>
> > > > Otherwise, the patchset LGTM, but please test the suggested changes and 
> > > > repost.
> > > >
> > > > BTW: Do you plan to remove mask operations from sse.md? ATM, they are
> > > > used to distinguish mask operations, generated from builtins from
> > > > generic operations, so I'd like to keep them for a while. The drawback
> > > > is, that they are not combined with other operations, but at the end
> > > > of the day, this is what the programmer asked for by using builtins.
> > >
> > > Agree, I prefer to keep them.
> >
> > Thinking some more about the approach, it looks to me that the optimal
> > solution is a post-reload splitter that would convert "generic"
> > patterns to mask operations from sse.md. The mask operations don't set
> > flags, so we can substantially improve post reload scheduling of these
> > instructions by removing flags clobber.
> >
> > So, simply add "#" to relevant alternatives of logic patterns and add
> > something like:
> >
> > --cut here--
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 41c6dbfa668..ad49bdc7583 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -1470,6 +1470,18 @@
> >]
> >(const_string "")))])
> >
> > +(define_split
> > +  [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
> > +   (any_logic:SWI1248_AVX512BW
> > + (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand")
> > + (match_operand:SWI1248_AVX512BW 2 "mask_reg_operand")))
> > +   (clobber (reg:CC FLAGS_REG))]
> > +  "TARGET_AVX512F && reload_completed"
> > +  [(parallel
> > + [(set (match_dup 0)
> > +  

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-20 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 19, 2020 at 3:05 PM Uros Bizjak  wrote:
>
> On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu  wrote:
> >
> > On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak  wrote:
> > >
> > > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu  wrote:
> > > >
> > > > Enable operator or/xor/and/andn/not for mask register, kxnor is not
> > > > enabled since there's no corresponding instruction for general
> > > > registers.
> > > >
> > > > gcc/
> > > > PR target/88808
> > > > * config/i386/i386.md: (*movsi_internal): Adjust constraints
> > > > for mask registers.
> > > > (*movhi_internal): Ditto.
> > > > (*movqi_internal): Ditto.
> > > > (*anddi_1): Support mask register operations
> > > > (*and_1): Ditto.
> > > > (*andqi_1): Ditto.
> > > > (*andn_1): Ditto.
> > > > (*_1): Ditto.
> > > > (*qi_1): Ditto.
> > > > (*one_cmpl2_1): Ditto.
> > > > (*one_cmplsi2_1_zext): Ditto.
> > > > (*one_cmplqi2_1): Ditto.
> > > >
> > > > gcc/testsuite/
> > > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > >
> > > index 74d207c3711..e8ad79d1b0a 100644
> > > --- a/gcc/config/i386/i386.md
> > > +++ b/gcc/config/i386/i386.md
> > > @@ -2294,7 +2294,7 @@
> > >
> > >  (define_insn "*movsi_internal"
> > >[(set (match_operand:SI 0 "nonimmediate_operand"
> > > -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> > > +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k")
> > >  (match_operand:SI 1 "general_operand"
> > >  "g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k ,CBC"))]
> > >"!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> > >
> > > I'd rather see *k everywhere, also with *movqi_internal and
> > > *movhi_internal patterns. The "*" means that the allocator won't
> > > allocate a mask register by default, but it will be used to optimize
> > > moves. With the above change, you are risking that during integer
> > > register pressure, the register allocator will allocate zero to a mask
> > > register, and later "optimize" the move with a direct maskreg-intreg
> > > move.
> > >
> > > The current strategy is that only general registers get allocated for
> > > integer modes. Let's keep it this way for now.
> > >
> >
> > Yes,  though it would fail gcc.target/i386/avx512dq-pr88465.c and
> > gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to
> > move zero into mask register directly.
>
> Although it would be nice if the register allocator was smart enough,
> the current strategy is to introduce peephole2 patterns to fix these
> problems, similar to [1]. These peepholes can be introduced in a
> follow-up patch.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html
>

peephole2 added.

> > > Otherwise, the patchset LGTM, but please test the suggested changes and 
> > > repost.
> > >
> > > BTW: Do you plan to remove mask operations from sse.md? ATM, they are
> > > used to distinguish mask operations, generated from builtins from
> > > generic operations, so I'd like to keep them for a while. The drawback
> > > is, that they are not combined with other operations, but at the end
> > > of the day, this is what the programmer asked for by using builtins.
> >
> > Agree, I prefer to keep them.
>
> Thinking some more about the approach, it looks to me that the optimal
> solution is a post-reload splitter that would convert "generic"
> patterns to mask operations from sse.md. The mask operations don't set
> flags, so we can substantially improve post reload scheduling of these
> instructions by removing flags clobber.
>
> So, simply add "#" to relevant alternatives of logic patterns and add
> something like:
>
> --cut here--
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 41c6dbfa668..ad49bdc7583 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -1470,6 +1470,18 @@
>]
>(const_string "")))])
>
> +(define_split
> +  [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
> +   (any_logic:SWI1248_AVX512BW
> + (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand")
> + (match_operand:SWI1248_AVX512BW 2 "mask_reg_operand")))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_AVX512F && reload_completed"
> +  [(parallel
> + [(set (match_dup 0)
> +  (any_logic:SWI1248_AVX512BW (match_dup 1) (match_dup 2)))
> +  (unspec [(const_int 0)] UNSPEC_MASKOP)])])
> +
>  (define_insn "kandn"
>[(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
> (and:SWI1248_AVX512BW
> --cut here--
>
> and similar for kandn and knot in 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-19 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu  wrote:
>
> On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak  wrote:
> >
> > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu  wrote:
> > >
> > > Enable operator or/xor/and/andn/not for mask register, kxnor is not
> > > enabled since there's no corresponding instruction for general
> > > registers.
> > >
> > > gcc/
> > > PR target/88808
> > > * config/i386/i386.md: (*movsi_internal): Adjust constraints
> > > for mask registers.
> > > (*movhi_internal): Ditto.
> > > (*movqi_internal): Ditto.
> > > (*anddi_1): Support mask register operations
> > > (*and_1): Ditto.
> > > (*andqi_1): Ditto.
> > > (*andn_1): Ditto.
> > > (*_1): Ditto.
> > > (*qi_1): Ditto.
> > > (*one_cmpl2_1): Ditto.
> > > (*one_cmplsi2_1_zext): Ditto.
> > > (*one_cmplqi2_1): Ditto.
> > >
> > > gcc/testsuite/
> > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> >
> > index 74d207c3711..e8ad79d1b0a 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -2294,7 +2294,7 @@
> >
> >  (define_insn "*movsi_internal"
> >[(set (match_operand:SI 0 "nonimmediate_operand"
> > -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> > +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k")
> >  (match_operand:SI 1 "general_operand"
> >  "g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k ,CBC"))]
> >"!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> >
> > I'd rather see *k everywhere, also with *movqi_internal and
> > *movhi_internal patterns. The "*" means that the allocator won't
> > allocate a mask register by default, but it will be used to optimize
> > moves. With the above change, you are risking that during integer
> > register pressure, the register allocator will allocate zero to a mask
> > register, and later "optimize" the move with a direct maskreg-intreg
> > move.
> >
> > The current strategy is that only general registers get allocated for
> > integer modes. Let's keep it this way for now.
> >
>
> Yes,  though it would fail gcc.target/i386/avx512dq-pr88465.c and
> gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to
> move zero into mask register directly.

Although it would be nice if the register allocator was smart enough,
the current strategy is to introduce peephole2 patterns to fix these
problems, similar to [1]. These peepholes can be introduced in a
follow-up patch.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html

> > Otherwise, the patchset LGTM, but please test the suggested changes and 
> > repost.
> >
> > BTW: Do you plan to remove mask operations from sse.md? ATM, they are
> > used to distinguish mask operations, generated from builtins from
> > generic operations, so I'd like to keep them for a while. The drawback
> > is, that they are not combined with other operations, but at the end
> > of the day, this is what the programmer asked for by using builtins.
>
> Agree, I prefer to keep them.

Thinking some more about the approach, it looks to me that the optimal
solution is a post-reload splitter that would convert "generic"
patterns to mask operations from sse.md. The mask operations don't set
flags, so we can substantially improve post reload scheduling of these
instructions by removing flags clobber.

So, simply add "#" to relevant alternatives of logic patterns and add
something like:

--cut here--
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 41c6dbfa668..ad49bdc7583 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1470,6 +1470,18 @@
   ]
   (const_string "")))])

+(define_split
+  [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
+   (any_logic:SWI1248_AVX512BW
+ (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand")
+ (match_operand:SWI1248_AVX512BW 2 "mask_reg_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F && reload_completed"
+  [(parallel
+ [(set (match_dup 0)
+  (any_logic:SWI1248_AVX512BW (match_dup 1) (match_dup 2)))
+  (unspec [(const_int 0)] UNSPEC_MASKOP)])])
+
 (define_insn "kandn"
   [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
(and:SWI1248_AVX512BW
--cut here--

and similar for kandn and knot in sse.md. You will have to add
mask_reg_operand predicate, see e.g. sse_reg_operand in predicates.md
for example.

We don't lose anything, because all important transformations,
propagations and simplifications with these patterns happen before
reload.

Uros.


Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-18 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak  wrote:
>
> On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu  wrote:
> >
> > Enable operator or/xor/and/andn/not for mask register, kxnor is not
> > enabled since there's no corresponding instruction for general
> > registers.
> >
> > gcc/
> > PR target/88808
> > * config/i386/i386.md: (*movsi_internal): Adjust constraints
> > for mask registers.
> > (*movhi_internal): Ditto.
> > (*movqi_internal): Ditto.
> > (*anddi_1): Support mask register operations
> > (*and_1): Ditto.
> > (*andqi_1): Ditto.
> > (*andn_1): Ditto.
> > (*_1): Ditto.
> > (*qi_1): Ditto.
> > (*one_cmpl2_1): Ditto.
> > (*one_cmplsi2_1_zext): Ditto.
> > (*one_cmplqi2_1): Ditto.
> >
> > gcc/testsuite/
> > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
>
> index 74d207c3711..e8ad79d1b0a 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -2294,7 +2294,7 @@
>
>  (define_insn "*movsi_internal"
>[(set (match_operand:SI 0 "nonimmediate_operand"
> -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k")
>  (match_operand:SI 1 "general_operand"
>  "g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k ,CBC"))]
>"!(MEM_P (operands[0]) && MEM_P (operands[1]))"
>
> I'd rather see *k everywhere, also with *movqi_internal and
> *movhi_internal patterns. The "*" means that the allocator won't
> allocate a mask register by default, but it will be used to optimize
> moves. With the above change, you are risking that during integer
> register pressure, the register allocator will allocate zero to a mask
> register, and later "optimize" the move with a direct maskreg-intreg
> move.
>
> The current strategy is that only general registers get allocated for
> integer modes. Let's keep it this way for now.
>

Yes,  though it would fail gcc.target/i386/avx512dq-pr88465.c and
gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to
move zero into mask register directly.

> Otherwise, the patchset LGTM, but please test the suggested changes and 
> repost.
>
> BTW: Do you plan to remove mask operations from sse.md? ATM, they are
> used to distinguish mask operations, generated from builtins from
> generic operations, so I'd like to keep them for a while. The drawback
> is, that they are not combined with other operations, but at the end
> of the day, this is what the programmer asked for by using builtins.

Agree, I prefer to keep them.

>
> Uros.

Bootstrap is ok, regression test is ok for i386/x86-64 backend(After
adjusting testcase).

impact for SPEC2017 on SKL.

500.perlbench_r 0.00%
502.gcc_r 1.59%
505.mcf_r 1.49%
520.omnetpp_r 1.91%
523.xalancbmk_r -1.22%
525.x264_r 0.00%
531.deepsjeng_r 0.00%
541.leela_r -0.22%
548.exchange2_r 2.27%
557.xz_r 0.63%
INT geomean 0.64%

503.bwaves_r 3.68%
507.cactuBSSN_r -0.62%
508.namd_r 0.51%
510.parest_r -0.16%
511.povray_r 0.57%
519.lbm_r 0.50%
521.wrf_r 0.00%
526.blender_r 0.00%
527.cam4_r 0.00%
538.imagick_r -0.41%
544.nab_r 0.00%
549.fotonik3d_r -0.20%
554.roms_r 4.19%
FP geomean 0.66%

-- 
BR,
Hongtao
From e546516449ec4ed9301b83a063efdefbf0f7e75a Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Thu, 13 Aug 2020 14:20:43 +0800
Subject: [PATCH 4/4] Enable bitwise operation for type mask.

Enable operator or/xor/and/andn/not for mask register, kxnor is not
enabled since there's no corresponding instruction for general
registers.

gcc/
	PR target/88808
	* config/i386/i386.md: (*movsi_internal): Adjust constraints
	for mask registers.
	(*movhi_internal): Ditto.
	(*movqi_internal): Ditto.
	(*anddi_1): Support mask register operations
	(*and_1): Ditto.
	(*andqi_1): Ditto.
	(*andn_1): Ditto.
	(*_1): Ditto.
	(*qi_1): Ditto.
	(*one_cmpl2_1): Ditto.
	(*one_cmplsi2_1_zext): Ditto.
	(*one_cmplqi2_1): Ditto.

gcc/testsuite/
	* gcc.target/i386/bitwise_mask_op-1.c: New test.
	* gcc.target/i386/bitwise_mask_op-2.c: New test.
	* gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
	* gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
	* gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
	* gcc.target/i386/avx512f-kmovw-5.c: Ditto.
	* gcc.target/i386/avx512bw-pr88465.c: Ditto.
	* gcc.target/i386/avx512f-pr88465.c: Ditto.
---
 gcc/config/i386/i386.md   | 260 +-
 .../gcc.target/i386/avx512bw-kunpckwd-1.c |   2 +-
 .../gcc.target/i386/avx512bw-kunpckwd-3.c |   2 +-
 .../gcc.target/i386/avx512dq-kmovb-5.c|   2 +-
 .../gcc.target/i386/avx512dq-pr88465.c|   4 +-
 .../gcc.target/i386/avx512f-kmovw-5.c |  

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-17 Thread Uros Bizjak via Gcc-patches
On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu  wrote:
>
> Enable operator or/xor/and/andn/not for mask register, kxnor is not
> enabled since there's no corresponding instruction for general
> registers.
>
> gcc/
> PR target/88808
> * config/i386/i386.md: (*movsi_internal): Adjust constraints
> for mask registers.
> (*movhi_internal): Ditto.
> (*movqi_internal): Ditto.
> (*anddi_1): Support mask register operations
> (*and_1): Ditto.
> (*andqi_1): Ditto.
> (*andn_1): Ditto.
> (*_1): Ditto.
> (*qi_1): Ditto.
> (*one_cmpl2_1): Ditto.
> (*one_cmplsi2_1_zext): Ditto.
> (*one_cmplqi2_1): Ditto.
>
> gcc/testsuite/
> * gcc.target/i386/bitwise_mask_op-1.c: New test.
> * gcc.target/i386/bitwise_mask_op-2.c: New test.
> * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> * gcc.target/i386/avx512f-kmovw-5.c: Ditto.

index 74d207c3711..e8ad79d1b0a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2294,7 +2294,7 @@

 (define_insn "*movsi_internal"
   [(set (match_operand:SI 0 "nonimmediate_operand"
-"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
+"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k")
 (match_operand:SI 1 "general_operand"
 "g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k ,CBC"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"

I'd rather see *k everywhere, also with *movqi_internal and
*movhi_internal patterns. The "*" means that the allocator won't
allocate a mask register by default, but it will be used to optimize
moves. With the above change, you are risking that during integer
register pressure, the register allocator will allocate zero to a mask
register, and later "optimize" the move with a direct maskreg-intreg
move.

The current strategy is that only general registers get allocated for
integer modes. Let's keep it this way for now.

Otherwise, the patchset LGTM, but please test the suggested changes and repost.

BTW: Do you plan to remove mask operations from sse.md? ATM, they are
used to distinguish mask operations, generated from builtins from
generic operations, so I'd like to keep them for a while. The drawback
is, that they are not combined with other operations, but at the end
of the day, this is what the programmer asked for by using builtins.

Uros.