Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
On Fri, Aug 21, 2020 at 6:29 PM Hongtao Liu wrote: > > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak wrote: > > > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu wrote: > > > > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak wrote: > > > > > > > > > > > gcc/ > > > > > > > PR target/88808 > > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow > > > > > > > QImode data go into mask registers. > > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust > > > > > > > constraints > > > > > > > for mask registers. > > > > > > > (*movqi_internal): Ditto. > > > > > > > (*anddi_1): Support mask register operations > > > > > > > (*and_1): Ditto. > > > > > > > (*andqi_1): Ditto. > > > > > > > (*andn_1): Ditto. > > > > > > > (*_1): Ditto. > > > > > > > (*qi_1): Ditto. > > > > > > > (*one_cmpl2_1): Ditto. > > > > > > > (*one_cmplsi2_1_zext): Ditto. > > > > > > > (*one_cmplqi2_1): Ditto. > > > > > > > (define_peephole2): Move constant 0/-1 directly into mask > > > > > > > registers. > > > > > > > * config/i386/predicates.md (mask_reg_operand): New > > > > > > > predicate. > > > > > > > * config/i386/sse.md (define_split): Add post-reload > > > > > > > splitters > > > > > > > that would convert "generic" patterns to mask patterns. > > > > > > > (*knotsi_1_zext): New define_insn. > > > > > > > > > > > > > > gcc/testsuite/ > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test. > > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase. > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase. > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > > > > > > > > > A little nit, please put new splitters after the instruction > > > > > > pattern. > > > > > > > > > > > > OK for the whole patch set with the above change, > > > > > > > > > > > > > > > > Yes, thanks for the review. > > > > > > > > Please note that your patch introduces several testsuite fails with > > > > -m32: > > > > > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c > > > > > > > > > > I can't reproduce this failure. > > > > Because you are running it on AVX512 enabled target. > > > > > > Program received signal SIGILL, Illegal instruction. > > > > 0x080490ac in __get_cpuid_count (__edx=, > > > > __ecx=, __ebx=, __eax= > > > pointer>, > > > > __subleaf=0, __leaf=7) at > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316 > > > > 316 __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, > > > > *__edx); > > > > > > > >0x080490a3 <+51>:cpuid > > > >0x080490a5 <+53>:mov$0x1,%eax > > > >0x080490aa <+58>:mov%ecx,%esi > > > > => 0x080490ac <+60>:kmovd %ebx,%k0 > > > >0x080490b0 <+64>:mov%edi,%ecx > > > >0x080490b2 <+66>:mov%edi,%ebx > > > > > > > > kmov insn is generated for __cpuid_count function, where the binary > > > > determines, if the new instructions are supported. The binary will > > > > crash in the detection code if the processor lacks AVX512 > > > > instructions. > > > > > > > > > > IMHO, the testcase shouldn't be run on processors without AVX512BW. > > > > No, it could run, because it checks for AVX512BW at runtime. > > > > Got it. > > > > Because in avx512bitalg-vpopcntb-1.c, there's /* > > > dg-require-effective-target avx512bw } */. > > > > This is to check the toolchain for support. > > > > > what's the version of your assembler? > > > > GNU assembler version 2.34-4.fc32 > > > > If assembler supports avx512bw, but processor not, the test would pass > condition `dg-require-effective-target avx512bw` and be runned. > then crashed for illegal instruction. Please look at avx512-check.h. This is where main function lives. > > Please add something like > > X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK > > and enable them only for m_CORE_AVX512 (or perhaps m_INTEL). > > > > Handle this in inline_secondary_memory_needed to reject direct moves > > for all other targets. This should disable direct moves for generic > > targets. > > > > Yes, I'll add it. Thanks. Uros.
Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
On Sat, Aug 22, 2020 at 1:08 AM H.J. Lu wrote: > > On Fri, Aug 21, 2020 at 10:02 AM H.J. Lu wrote: > > > > On Fri, Aug 21, 2020 at 9:46 AM Hongtao Liu wrote: > > > > > > On Sat, Aug 22, 2020 at 12:36 AM H.J. Lu wrote: > > > > > > > > On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu wrote: > > > > > > > > > > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak > > > > > wrote: > > > > > > > > > > > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu > > > > > > wrote: > > > > > > > > > > > > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > gcc/ > > > > > > > > > > > PR target/88808 > > > > > > > > > > > * config/i386/i386.c > > > > > > > > > > > (ix86_preferred_reload_class): Allow > > > > > > > > > > > QImode data go into mask registers. > > > > > > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust > > > > > > > > > > > constraints > > > > > > > > > > > for mask registers. > > > > > > > > > > > (*movqi_internal): Ditto. > > > > > > > > > > > (*anddi_1): Support mask register operations > > > > > > > > > > > (*and_1): Ditto. > > > > > > > > > > > (*andqi_1): Ditto. > > > > > > > > > > > (*andn_1): Ditto. > > > > > > > > > > > (*_1): Ditto. > > > > > > > > > > > (*qi_1): Ditto. > > > > > > > > > > > (*one_cmpl2_1): Ditto. > > > > > > > > > > > (*one_cmplsi2_1_zext): Ditto. > > > > > > > > > > > (*one_cmplqi2_1): Ditto. > > > > > > > > > > > (define_peephole2): Move constant 0/-1 directly > > > > > > > > > > > into mask > > > > > > > > > > > registers. > > > > > > > > > > > * config/i386/predicates.md (mask_reg_operand): > > > > > > > > > > > New predicate. > > > > > > > > > > > * config/i386/sse.md (define_split): Add > > > > > > > > > > > post-reload splitters > > > > > > > > > > > that would convert "generic" patterns to mask > > > > > > > > > > > patterns. > > > > > > > > > > > (*knotsi_1_zext): New define_insn. > > > > > > > > > > > > > > > > > > > > > > gcc/testsuite/ > > > > > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > > > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > > > > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test. > > > > > > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New > > > > > > > > > > > testcase. > > > > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust > > > > > > > > > > > testcase. > > > > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > > > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > > > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > > > > > > > > > > > > > > > > > A little nit, please put new splitters after the > > > > > > > > > > instruction pattern. > > > > > > > > > > > > > > > > > > > > OK for the whole patch set with the above change, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, thanks for the review. > > > > > > > > > > > > > > > > Please note that your patch introduces several testsuite fails > > > > > > > > with -m32: > > > > > > > > > > > > > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g > > > > > > > > avx512bitalg-vpopcntb-1.c > > > > > > > > > > > > > > > > > > > > > > I can't reproduce this failure. > > > > > > > > > > > > Because you are running it on AVX512 enabled target. > > > > > > > > > > > > > > Program received signal SIGILL, Illegal instruction. > > > > > > > > 0x080490ac in __get_cpuid_count (__edx=, > > > > > > > > __ecx=, __ebx=, > > > > > > > > __eax= > > > > > > > pointer>, > > > > > > > > __subleaf=0, __leaf=7) at > > > > > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316 > > > > > > > > 316 __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, > > > > > > > > *__ecx, *__edx); > > > > > > > > > > > > > > > >0x080490a3 <+51>:cpuid > > > > > > > >0x080490a5 <+53>:mov$0x1,%eax > > > > > > > >0x080490aa <+58>:mov%ecx,%esi > > > > > > > > => 0x080490ac <+60>:kmovd %ebx,%k0 > > > > > > > >0x080490b0 <+64>:mov%edi,%ecx > > > > > > > >0x080490b2 <+66>:mov%edi,%ebx > > > > > > > > > > > > > > > > kmov insn is generated for __cpuid_count function, where the > > > > > > > > binary > > > > > > > > determines, if the new instructions are supported. The binary > > > > > > > > will > > > > > > > > crash in the detection code if the processor lacks AVX512 > > > > > > > > instructions. > > > > > > > > > > > > > > > > > > > > > > IMHO, the testcase shouldn't be run on processors without > > > > > > > AVX512BW. > > > > > > > > > > > > No, it could run, because it checks for AVX512BW at runtime. > > > > > > > > > > > > > > > > Got it. > > > > > > > > > > > > Because in avx512bitalg-vpopcntb-1.c, there's /* > > > > > > > dg-require-effective-target
Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
On Fri, Aug 21, 2020 at 10:02 AM H.J. Lu wrote: > > On Fri, Aug 21, 2020 at 9:46 AM Hongtao Liu wrote: > > > > On Sat, Aug 22, 2020 at 12:36 AM H.J. Lu wrote: > > > > > > On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu wrote: > > > > > > > > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak wrote: > > > > > > > > > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu > > > > > wrote: > > > > > > > > > > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak > > > > > > wrote: > > > > > > > > > > > > > > > > > gcc/ > > > > > > > > > > PR target/88808 > > > > > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): > > > > > > > > > > Allow > > > > > > > > > > QImode data go into mask registers. > > > > > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust > > > > > > > > > > constraints > > > > > > > > > > for mask registers. > > > > > > > > > > (*movqi_internal): Ditto. > > > > > > > > > > (*anddi_1): Support mask register operations > > > > > > > > > > (*and_1): Ditto. > > > > > > > > > > (*andqi_1): Ditto. > > > > > > > > > > (*andn_1): Ditto. > > > > > > > > > > (*_1): Ditto. > > > > > > > > > > (*qi_1): Ditto. > > > > > > > > > > (*one_cmpl2_1): Ditto. > > > > > > > > > > (*one_cmplsi2_1_zext): Ditto. > > > > > > > > > > (*one_cmplqi2_1): Ditto. > > > > > > > > > > (define_peephole2): Move constant 0/-1 directly > > > > > > > > > > into mask > > > > > > > > > > registers. > > > > > > > > > > * config/i386/predicates.md (mask_reg_operand): New > > > > > > > > > > predicate. > > > > > > > > > > * config/i386/sse.md (define_split): Add > > > > > > > > > > post-reload splitters > > > > > > > > > > that would convert "generic" patterns to mask > > > > > > > > > > patterns. > > > > > > > > > > (*knotsi_1_zext): New define_insn. > > > > > > > > > > > > > > > > > > > > gcc/testsuite/ > > > > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > > > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test. > > > > > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase. > > > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust > > > > > > > > > > testcase. > > > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > > > > > > > > > > > > > > > A little nit, please put new splitters after the instruction > > > > > > > > > pattern. > > > > > > > > > > > > > > > > > > OK for the whole patch set with the above change, > > > > > > > > > > > > > > > > > > > > > > > > > Yes, thanks for the review. > > > > > > > > > > > > > > Please note that your patch introduces several testsuite fails > > > > > > > with -m32: > > > > > > > > > > > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g > > > > > > > avx512bitalg-vpopcntb-1.c > > > > > > > > > > > > > > > > > > > I can't reproduce this failure. > > > > > > > > > > Because you are running it on AVX512 enabled target. > > > > > > > > > > > > Program received signal SIGILL, Illegal instruction. > > > > > > > 0x080490ac in __get_cpuid_count (__edx=, > > > > > > > __ecx=, __ebx=, > > > > > > > __eax= > > > > > > pointer>, > > > > > > > __subleaf=0, __leaf=7) at > > > > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316 > > > > > > > 316 __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, > > > > > > > *__ecx, *__edx); > > > > > > > > > > > > > >0x080490a3 <+51>:cpuid > > > > > > >0x080490a5 <+53>:mov$0x1,%eax > > > > > > >0x080490aa <+58>:mov%ecx,%esi > > > > > > > => 0x080490ac <+60>:kmovd %ebx,%k0 > > > > > > >0x080490b0 <+64>:mov%edi,%ecx > > > > > > >0x080490b2 <+66>:mov%edi,%ebx > > > > > > > > > > > > > > kmov insn is generated for __cpuid_count function, where the > > > > > > > binary > > > > > > > determines, if the new instructions are supported. The binary will > > > > > > > crash in the detection code if the processor lacks AVX512 > > > > > > > instructions. > > > > > > > > > > > > > > > > > > > IMHO, the testcase shouldn't be run on processors without AVX512BW. > > > > > > > > > > No, it could run, because it checks for AVX512BW at runtime. > > > > > > > > > > > > > Got it. > > > > > > > > > > Because in avx512bitalg-vpopcntb-1.c, there's /* > > > > > > dg-require-effective-target avx512bw } */. > > > > > > > > > > This is to check the toolchain for support. > > > > > > > > > > > what's the version of your assembler? > > > > > > > > > > GNU assembler version 2.34-4.fc32 > > > > > > > > > > > > > If assembler supports avx512bw, but processor not, the test would pass > > > > condition `dg-require-effective-target
Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
On Fri, Aug 21, 2020 at 9:46 AM Hongtao Liu wrote: > > On Sat, Aug 22, 2020 at 12:36 AM H.J. Lu wrote: > > > > On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu wrote: > > > > > > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak wrote: > > > > > > > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu wrote: > > > > > > > > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak wrote: > > > > > > > > > > > > > > > gcc/ > > > > > > > > > PR target/88808 > > > > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): > > > > > > > > > Allow > > > > > > > > > QImode data go into mask registers. > > > > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust > > > > > > > > > constraints > > > > > > > > > for mask registers. > > > > > > > > > (*movqi_internal): Ditto. > > > > > > > > > (*anddi_1): Support mask register operations > > > > > > > > > (*and_1): Ditto. > > > > > > > > > (*andqi_1): Ditto. > > > > > > > > > (*andn_1): Ditto. > > > > > > > > > (*_1): Ditto. > > > > > > > > > (*qi_1): Ditto. > > > > > > > > > (*one_cmpl2_1): Ditto. > > > > > > > > > (*one_cmplsi2_1_zext): Ditto. > > > > > > > > > (*one_cmplqi2_1): Ditto. > > > > > > > > > (define_peephole2): Move constant 0/-1 directly into > > > > > > > > > mask > > > > > > > > > registers. > > > > > > > > > * config/i386/predicates.md (mask_reg_operand): New > > > > > > > > > predicate. > > > > > > > > > * config/i386/sse.md (define_split): Add post-reload > > > > > > > > > splitters > > > > > > > > > that would convert "generic" patterns to mask > > > > > > > > > patterns. > > > > > > > > > (*knotsi_1_zext): New define_insn. > > > > > > > > > > > > > > > > > > gcc/testsuite/ > > > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test. > > > > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase. > > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust > > > > > > > > > testcase. > > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > > > > > > > > > > > > > A little nit, please put new splitters after the instruction > > > > > > > > pattern. > > > > > > > > > > > > > > > > OK for the whole patch set with the above change, > > > > > > > > > > > > > > > > > > > > > > Yes, thanks for the review. > > > > > > > > > > > > Please note that your patch introduces several testsuite fails with > > > > > > -m32: > > > > > > > > > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c > > > > > > > > > > > > > > > > I can't reproduce this failure. > > > > > > > > Because you are running it on AVX512 enabled target. > > > > > > > > > > Program received signal SIGILL, Illegal instruction. > > > > > > 0x080490ac in __get_cpuid_count (__edx=, > > > > > > __ecx=, __ebx=, > > > > > > __eax= > > > > > pointer>, > > > > > > __subleaf=0, __leaf=7) at > > > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316 > > > > > > 316 __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, > > > > > > *__edx); > > > > > > > > > > > >0x080490a3 <+51>:cpuid > > > > > >0x080490a5 <+53>:mov$0x1,%eax > > > > > >0x080490aa <+58>:mov%ecx,%esi > > > > > > => 0x080490ac <+60>:kmovd %ebx,%k0 > > > > > >0x080490b0 <+64>:mov%edi,%ecx > > > > > >0x080490b2 <+66>:mov%edi,%ebx > > > > > > > > > > > > kmov insn is generated for __cpuid_count function, where the binary > > > > > > determines, if the new instructions are supported. The binary will > > > > > > crash in the detection code if the processor lacks AVX512 > > > > > > instructions. > > > > > > > > > > > > > > > > IMHO, the testcase shouldn't be run on processors without AVX512BW. > > > > > > > > No, it could run, because it checks for AVX512BW at runtime. > > > > > > > > > > Got it. > > > > > > > > Because in avx512bitalg-vpopcntb-1.c, there's /* > > > > > dg-require-effective-target avx512bw } */. > > > > > > > > This is to check the toolchain for support. > > > > > > > > > what's the version of your assembler? > > > > > > > > GNU assembler version 2.34-4.fc32 > > > > > > > > > > If assembler supports avx512bw, but processor not, the test would pass > > > condition `dg-require-effective-target avx512bw` and be runned. > > > then crashed for illegal instruction. > > > > > > > Please add something like > > > > X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK > > > > and enable them only for m_CORE_AVX512 (or perhaps m_INTEL). > > > > > > > > Handle this in inline_secondary_memory_needed to reject
Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
On Sat, Aug 22, 2020 at 12:36 AM H.J. Lu wrote: > > On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu wrote: > > > > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak wrote: > > > > > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu wrote: > > > > > > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak wrote: > > > > > > > > > > > > > gcc/ > > > > > > > > PR target/88808 > > > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): > > > > > > > > Allow > > > > > > > > QImode data go into mask registers. > > > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust > > > > > > > > constraints > > > > > > > > for mask registers. > > > > > > > > (*movqi_internal): Ditto. > > > > > > > > (*anddi_1): Support mask register operations > > > > > > > > (*and_1): Ditto. > > > > > > > > (*andqi_1): Ditto. > > > > > > > > (*andn_1): Ditto. > > > > > > > > (*_1): Ditto. > > > > > > > > (*qi_1): Ditto. > > > > > > > > (*one_cmpl2_1): Ditto. > > > > > > > > (*one_cmplsi2_1_zext): Ditto. > > > > > > > > (*one_cmplqi2_1): Ditto. > > > > > > > > (define_peephole2): Move constant 0/-1 directly into > > > > > > > > mask > > > > > > > > registers. > > > > > > > > * config/i386/predicates.md (mask_reg_operand): New > > > > > > > > predicate. > > > > > > > > * config/i386/sse.md (define_split): Add post-reload > > > > > > > > splitters > > > > > > > > that would convert "generic" patterns to mask patterns. > > > > > > > > (*knotsi_1_zext): New define_insn. > > > > > > > > > > > > > > > > gcc/testsuite/ > > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test. > > > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase. > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust > > > > > > > > testcase. > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > > > > > > > > > > > A little nit, please put new splitters after the instruction > > > > > > > pattern. > > > > > > > > > > > > > > OK for the whole patch set with the above change, > > > > > > > > > > > > > > > > > > > Yes, thanks for the review. > > > > > > > > > > Please note that your patch introduces several testsuite fails with > > > > > -m32: > > > > > > > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c > > > > > > > > > > > > > I can't reproduce this failure. > > > > > > Because you are running it on AVX512 enabled target. > > > > > > > > Program received signal SIGILL, Illegal instruction. > > > > > 0x080490ac in __get_cpuid_count (__edx=, > > > > > __ecx=, __ebx=, __eax= > > > > pointer>, > > > > > __subleaf=0, __leaf=7) at > > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316 > > > > > 316 __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, > > > > > *__edx); > > > > > > > > > >0x080490a3 <+51>:cpuid > > > > >0x080490a5 <+53>:mov$0x1,%eax > > > > >0x080490aa <+58>:mov%ecx,%esi > > > > > => 0x080490ac <+60>:kmovd %ebx,%k0 > > > > >0x080490b0 <+64>:mov%edi,%ecx > > > > >0x080490b2 <+66>:mov%edi,%ebx > > > > > > > > > > kmov insn is generated for __cpuid_count function, where the binary > > > > > determines, if the new instructions are supported. The binary will > > > > > crash in the detection code if the processor lacks AVX512 > > > > > instructions. > > > > > > > > > > > > > IMHO, the testcase shouldn't be run on processors without AVX512BW. > > > > > > No, it could run, because it checks for AVX512BW at runtime. > > > > > > > Got it. > > > > > > Because in avx512bitalg-vpopcntb-1.c, there's /* > > > > dg-require-effective-target avx512bw } */. > > > > > > This is to check the toolchain for support. > > > > > > > what's the version of your assembler? > > > > > > GNU assembler version 2.34-4.fc32 > > > > > > > If assembler supports avx512bw, but processor not, the test would pass > > condition `dg-require-effective-target avx512bw` and be runned. > > then crashed for illegal instruction. > > > > > Please add something like > > > X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK > > > and enable them only for m_CORE_AVX512 (or perhaps m_INTEL). > > > > > > Handle this in inline_secondary_memory_needed to reject direct moves > > > for all other targets. This should disable direct moves for generic > > > targets. > > > > > > > Yes, I'll add it. > > > > > (define_insn "*movsi_internal" > [(set (match_operand:SI 0 "nonimmediate_operand" > "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k") >
Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
On Fri, Aug 21, 2020 at 9:35 AM H.J. Lu wrote: > > On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu wrote: > > > > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak wrote: > > > > > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu wrote: > > > > > > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak wrote: > > > > > > > > > > > > > gcc/ > > > > > > > > PR target/88808 > > > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): > > > > > > > > Allow > > > > > > > > QImode data go into mask registers. > > > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust > > > > > > > > constraints > > > > > > > > for mask registers. > > > > > > > > (*movqi_internal): Ditto. > > > > > > > > (*anddi_1): Support mask register operations > > > > > > > > (*and_1): Ditto. > > > > > > > > (*andqi_1): Ditto. > > > > > > > > (*andn_1): Ditto. > > > > > > > > (*_1): Ditto. > > > > > > > > (*qi_1): Ditto. > > > > > > > > (*one_cmpl2_1): Ditto. > > > > > > > > (*one_cmplsi2_1_zext): Ditto. > > > > > > > > (*one_cmplqi2_1): Ditto. > > > > > > > > (define_peephole2): Move constant 0/-1 directly into > > > > > > > > mask > > > > > > > > registers. > > > > > > > > * config/i386/predicates.md (mask_reg_operand): New > > > > > > > > predicate. > > > > > > > > * config/i386/sse.md (define_split): Add post-reload > > > > > > > > splitters > > > > > > > > that would convert "generic" patterns to mask patterns. > > > > > > > > (*knotsi_1_zext): New define_insn. > > > > > > > > > > > > > > > > gcc/testsuite/ > > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test. > > > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase. > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust > > > > > > > > testcase. > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > > > > > > > > > > > A little nit, please put new splitters after the instruction > > > > > > > pattern. > > > > > > > > > > > > > > OK for the whole patch set with the above change, > > > > > > > > > > > > > > > > > > > Yes, thanks for the review. > > > > > > > > > > Please note that your patch introduces several testsuite fails with > > > > > -m32: > > > > > > > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c > > > > > > > > > > > > > I can't reproduce this failure. > > > > > > Because you are running it on AVX512 enabled target. > > > > > > > > Program received signal SIGILL, Illegal instruction. > > > > > 0x080490ac in __get_cpuid_count (__edx=, > > > > > __ecx=, __ebx=, __eax= > > > > pointer>, > > > > > __subleaf=0, __leaf=7) at > > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316 > > > > > 316 __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, > > > > > *__edx); > > > > > > > > > >0x080490a3 <+51>:cpuid > > > > >0x080490a5 <+53>:mov$0x1,%eax > > > > >0x080490aa <+58>:mov%ecx,%esi > > > > > => 0x080490ac <+60>:kmovd %ebx,%k0 > > > > >0x080490b0 <+64>:mov%edi,%ecx > > > > >0x080490b2 <+66>:mov%edi,%ebx > > > > > > > > > > kmov insn is generated for __cpuid_count function, where the binary > > > > > determines, if the new instructions are supported. The binary will > > > > > crash in the detection code if the processor lacks AVX512 > > > > > instructions. > > > > > > > > > > > > > IMHO, the testcase shouldn't be run on processors without AVX512BW. > > > > > > No, it could run, because it checks for AVX512BW at runtime. > > > > > > > Got it. > > > > > > Because in avx512bitalg-vpopcntb-1.c, there's /* > > > > dg-require-effective-target avx512bw } */. > > > > > > This is to check the toolchain for support. > > > > > > > what's the version of your assembler? > > > > > > GNU assembler version 2.34-4.fc32 > > > > > > > If assembler supports avx512bw, but processor not, the test would pass > > condition `dg-require-effective-target avx512bw` and be runned. > > then crashed for illegal instruction. > > > > > Please add something like > > > X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK > > > and enable them only for m_CORE_AVX512 (or perhaps m_INTEL). > > > > > > Handle this in inline_secondary_memory_needed to reject direct moves > > > for all other targets. This should disable direct moves for generic > > > targets. > > > > > > > Yes, I'll add it. > > > > > (define_insn "*movsi_internal" > [(set (match_operand:SI 0 "nonimmediate_operand" > "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k") >
Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu wrote: > > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak wrote: > > > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu wrote: > > > > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak wrote: > > > > > > > > > > > gcc/ > > > > > > > PR target/88808 > > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow > > > > > > > QImode data go into mask registers. > > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust > > > > > > > constraints > > > > > > > for mask registers. > > > > > > > (*movqi_internal): Ditto. > > > > > > > (*anddi_1): Support mask register operations > > > > > > > (*and_1): Ditto. > > > > > > > (*andqi_1): Ditto. > > > > > > > (*andn_1): Ditto. > > > > > > > (*_1): Ditto. > > > > > > > (*qi_1): Ditto. > > > > > > > (*one_cmpl2_1): Ditto. > > > > > > > (*one_cmplsi2_1_zext): Ditto. > > > > > > > (*one_cmplqi2_1): Ditto. > > > > > > > (define_peephole2): Move constant 0/-1 directly into mask > > > > > > > registers. > > > > > > > * config/i386/predicates.md (mask_reg_operand): New > > > > > > > predicate. > > > > > > > * config/i386/sse.md (define_split): Add post-reload > > > > > > > splitters > > > > > > > that would convert "generic" patterns to mask patterns. > > > > > > > (*knotsi_1_zext): New define_insn. > > > > > > > > > > > > > > gcc/testsuite/ > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test. > > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase. > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase. > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > > > > > > > > > A little nit, please put new splitters after the instruction > > > > > > pattern. > > > > > > > > > > > > OK for the whole patch set with the above change, > > > > > > > > > > > > > > > > Yes, thanks for the review. > > > > > > > > Please note that your patch introduces several testsuite fails with > > > > -m32: > > > > > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c > > > > > > > > > > I can't reproduce this failure. > > > > Because you are running it on AVX512 enabled target. > > > > > > Program received signal SIGILL, Illegal instruction. > > > > 0x080490ac in __get_cpuid_count (__edx=, > > > > __ecx=, __ebx=, __eax= > > > pointer>, > > > > __subleaf=0, __leaf=7) at > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316 > > > > 316 __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, > > > > *__edx); > > > > > > > >0x080490a3 <+51>:cpuid > > > >0x080490a5 <+53>:mov$0x1,%eax > > > >0x080490aa <+58>:mov%ecx,%esi > > > > => 0x080490ac <+60>:kmovd %ebx,%k0 > > > >0x080490b0 <+64>:mov%edi,%ecx > > > >0x080490b2 <+66>:mov%edi,%ebx > > > > > > > > kmov insn is generated for __cpuid_count function, where the binary > > > > determines, if the new instructions are supported. The binary will > > > > crash in the detection code if the processor lacks AVX512 > > > > instructions. > > > > > > > > > > IMHO, the testcase shouldn't be run on processors without AVX512BW. > > > > No, it could run, because it checks for AVX512BW at runtime. > > > > Got it. > > > > Because in avx512bitalg-vpopcntb-1.c, there's /* > > > dg-require-effective-target avx512bw } */. > > > > This is to check the toolchain for support. > > > > > what's the version of your assembler? > > > > GNU assembler version 2.34-4.fc32 > > > > If assembler supports avx512bw, but processor not, the test would pass > condition `dg-require-effective-target avx512bw` and be runned. > then crashed for illegal instruction. > > > Please add something like > > X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK > > and enable them only for m_CORE_AVX512 (or perhaps m_INTEL). > > > > Handle this in inline_secondary_memory_needed to reject direct moves > > for all other targets. This should disable direct moves for generic > > targets. > > > > Yes, I'll add it. > (define_insn "*movsi_internal" [(set (match_operand:SI 0 "nonimmediate_operand" "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k") (match_operand:SI 1 "general_operand" "g ,re,C ,*y,m ,*y,*y,r ,C ,*v,m ,*v,*v,r ,*r,*km,*k ,CBC"))] "!(MEM_P (operands[0]) && MEM_P (operands[1]))" ... [(set (attr "isa") (cond [(eq_attr "alternative" "12,13") (const_string "sse2") ] (const_string "*"))) is wrong. mask register alternatives should be
Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak wrote: > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu wrote: > > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak wrote: > > > > > > > > > gcc/ > > > > > > PR target/88808 > > > > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow > > > > > > QImode data go into mask registers. > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust constraints > > > > > > for mask registers. > > > > > > (*movqi_internal): Ditto. > > > > > > (*anddi_1): Support mask register operations > > > > > > (*and_1): Ditto. > > > > > > (*andqi_1): Ditto. > > > > > > (*andn_1): Ditto. > > > > > > (*_1): Ditto. > > > > > > (*qi_1): Ditto. > > > > > > (*one_cmpl2_1): Ditto. > > > > > > (*one_cmplsi2_1_zext): Ditto. > > > > > > (*one_cmplqi2_1): Ditto. > > > > > > (define_peephole2): Move constant 0/-1 directly into mask > > > > > > registers. > > > > > > * config/i386/predicates.md (mask_reg_operand): New > > > > > > predicate. > > > > > > * config/i386/sse.md (define_split): Add post-reload > > > > > > splitters > > > > > > that would convert "generic" patterns to mask patterns. > > > > > > (*knotsi_1_zext): New define_insn. > > > > > > > > > > > > gcc/testsuite/ > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test. > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase. > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase. > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > > > > > > > A little nit, please put new splitters after the instruction pattern. > > > > > > > > > > OK for the whole patch set with the above change, > > > > > > > > > > > > > Yes, thanks for the review. > > > > > > Please note that your patch introduces several testsuite fails with -m32: > > > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c > > > > > > > I can't reproduce this failure. > > Because you are running it on AVX512 enabled target. > > > > Program received signal SIGILL, Illegal instruction. > > > 0x080490ac in __get_cpuid_count (__edx=, > > > __ecx=, __ebx=, __eax= > > pointer>, > > > __subleaf=0, __leaf=7) at > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316 > > > 316 __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, > > > *__edx); > > > > > >0x080490a3 <+51>:cpuid > > >0x080490a5 <+53>:mov$0x1,%eax > > >0x080490aa <+58>:mov%ecx,%esi > > > => 0x080490ac <+60>:kmovd %ebx,%k0 > > >0x080490b0 <+64>:mov%edi,%ecx > > >0x080490b2 <+66>:mov%edi,%ebx > > > > > > kmov insn is generated for __cpuid_count function, where the binary > > > determines, if the new instructions are supported. The binary will > > > crash in the detection code if the processor lacks AVX512 > > > instructions. > > > > > > > IMHO, the testcase shouldn't be run on processors without AVX512BW. > > No, it could run, because it checks for AVX512BW at runtime. > Got it. > > Because in avx512bitalg-vpopcntb-1.c, there's /* > > dg-require-effective-target avx512bw } */. > > This is to check the toolchain for support. > > > what's the version of your assembler? > > GNU assembler version 2.34-4.fc32 > If assembler supports avx512bw, but processor not, the test would pass condition `dg-require-effective-target avx512bw` and be runned. then crashed for illegal instruction. > Please add something like > X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK > and enable them only for m_CORE_AVX512 (or perhaps m_INTEL). > > Handle this in inline_secondary_memory_needed to reject direct moves > for all other targets. This should disable direct moves for generic > targets. > Yes, I'll add it. > Uros. -- BR, Hongtao
Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu wrote: > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak wrote: > > > > > > > gcc/ > > > > > PR target/88808 > > > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow > > > > > QImode data go into mask registers. > > > > > * config/i386/i386.md: (*movhi_internal): Adjust constraints > > > > > for mask registers. > > > > > (*movqi_internal): Ditto. > > > > > (*anddi_1): Support mask register operations > > > > > (*and_1): Ditto. > > > > > (*andqi_1): Ditto. > > > > > (*andn_1): Ditto. > > > > > (*_1): Ditto. > > > > > (*qi_1): Ditto. > > > > > (*one_cmpl2_1): Ditto. > > > > > (*one_cmplsi2_1_zext): Ditto. > > > > > (*one_cmplqi2_1): Ditto. > > > > > (define_peephole2): Move constant 0/-1 directly into mask > > > > > registers. > > > > > * config/i386/predicates.md (mask_reg_operand): New predicate. > > > > > * config/i386/sse.md (define_split): Add post-reload splitters > > > > > that would convert "generic" patterns to mask patterns. > > > > > (*knotsi_1_zext): New define_insn. > > > > > > > > > > gcc/testsuite/ > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test. > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase. > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase. > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > > > > > A little nit, please put new splitters after the instruction pattern. > > > > > > > > OK for the whole patch set with the above change, > > > > > > > > > > Yes, thanks for the review. > > > > Please note that your patch introduces several testsuite fails with -m32: > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c > > > > I can't reproduce this failure. Because you are running it on AVX512 enabled target. > > Program received signal SIGILL, Illegal instruction. > > 0x080490ac in __get_cpuid_count (__edx=, > > __ecx=, __ebx=, __eax= > pointer>, > > __subleaf=0, __leaf=7) at > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316 > > 316 __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, *__edx); > > > >0x080490a3 <+51>:cpuid > >0x080490a5 <+53>:mov$0x1,%eax > >0x080490aa <+58>:mov%ecx,%esi > > => 0x080490ac <+60>:kmovd %ebx,%k0 > >0x080490b0 <+64>:mov%edi,%ecx > >0x080490b2 <+66>:mov%edi,%ebx > > > > kmov insn is generated for __cpuid_count function, where the binary > > determines, if the new instructions are supported. The binary will > > crash in the detection code if the processor lacks AVX512 > > instructions. > > > > IMHO, the testcase shouldn't be run on processors without AVX512BW. No, it could run, because it checks for AVX512BW at runtime. > Because in avx512bitalg-vpopcntb-1.c, there's /* > dg-require-effective-target avx512bw } */. This is to check the toolchain for support. > what's the version of your assembler? GNU assembler version 2.34-4.fc32 Please add something like X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK and enable them only for m_CORE_AVX512 (or perhaps m_INTEL). Handle this in inline_secondary_memory_needed to reject direct moves for all other targets. This should disable direct moves for generic targets. Uros.
Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
On Fri, Aug 21, 2020 at 8:41 AM Hongtao Liu wrote: > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak wrote: > > > > > > > gcc/ > > > > > PR target/88808 > > > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow > > > > > QImode data go into mask registers. > > > > > * config/i386/i386.md: (*movhi_internal): Adjust constraints > > > > > for mask registers. > > > > > (*movqi_internal): Ditto. > > > > > (*anddi_1): Support mask register operations > > > > > (*and_1): Ditto. > > > > > (*andqi_1): Ditto. > > > > > (*andn_1): Ditto. > > > > > (*_1): Ditto. > > > > > (*qi_1): Ditto. > > > > > (*one_cmpl2_1): Ditto. > > > > > (*one_cmplsi2_1_zext): Ditto. > > > > > (*one_cmplqi2_1): Ditto. > > > > > (define_peephole2): Move constant 0/-1 directly into mask > > > > > registers. > > > > > * config/i386/predicates.md (mask_reg_operand): New predicate. > > > > > * config/i386/sse.md (define_split): Add post-reload splitters > > > > > that would convert "generic" patterns to mask patterns. > > > > > (*knotsi_1_zext): New define_insn. > > > > > > > > > > gcc/testsuite/ > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test. > > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase. > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase. > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > > > > > A little nit, please put new splitters after the instruction pattern. > > > > > > > > OK for the whole patch set with the above change, > > > > > > > > > > Yes, thanks for the review. > > > > Please note that your patch introduces several testsuite fails with -m32: > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c > > > > I can't reproduce this failure. > > > Program received signal SIGILL, Illegal instruction. > > 0x080490ac in __get_cpuid_count (__edx=, > > __ecx=, __ebx=, __eax= > pointer>, > > __subleaf=0, __leaf=7) at > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316 > > 316 __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, *__edx); > > > >0x080490a3 <+51>:cpuid > >0x080490a5 <+53>:mov$0x1,%eax > >0x080490aa <+58>:mov%ecx,%esi > > => 0x080490ac <+60>:kmovd %ebx,%k0 > >0x080490b0 <+64>:mov%edi,%ecx > >0x080490b2 <+66>:mov%edi,%ebx > > > > kmov insn is generated for __cpuid_count function, where the binary > > determines, if the new instructions are supported. The binary will > > crash in the detection code if the processor lacks AVX512 > > instructions. > > > > IMHO, the testcase shouldn't be run on processors without AVX512BW. > Because in avx512bitalg-vpopcntb-1.c, there's /* { > dg-require-effective-target avx512bw } */. > dg-require-effective-target avx512bw checks assembler support. __cpuid_count can't use any mask instructions. Please run this test on CPU without AVX512. -- H.J.
Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak wrote: > > > > > gcc/ > > > > PR target/88808 > > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow > > > > QImode data go into mask registers. > > > > * config/i386/i386.md: (*movhi_internal): Adjust constraints > > > > for mask registers. > > > > (*movqi_internal): Ditto. > > > > (*anddi_1): Support mask register operations > > > > (*and_1): Ditto. > > > > (*andqi_1): Ditto. > > > > (*andn_1): Ditto. > > > > (*_1): Ditto. > > > > (*qi_1): Ditto. > > > > (*one_cmpl2_1): Ditto. > > > > (*one_cmplsi2_1_zext): Ditto. > > > > (*one_cmplqi2_1): Ditto. > > > > (define_peephole2): Move constant 0/-1 directly into mask > > > > registers. > > > > * config/i386/predicates.md (mask_reg_operand): New predicate. > > > > * config/i386/sse.md (define_split): Add post-reload splitters > > > > that would convert "generic" patterns to mask patterns. > > > > (*knotsi_1_zext): New define_insn. > > > > > > > > gcc/testsuite/ > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test. > > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase. > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase. > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > > > A little nit, please put new splitters after the instruction pattern. > > > > > > OK for the whole patch set with the above change, > > > > > > > Yes, thanks for the review. > > Please note that your patch introduces several testsuite fails with -m32: > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c > I can't reproduce this failure. > Program received signal SIGILL, Illegal instruction. > 0x080490ac in __get_cpuid_count (__edx=, > __ecx=, __ebx=, __eax= pointer>, > __subleaf=0, __leaf=7) at /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316 > 316 __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, *__edx); > >0x080490a3 <+51>:cpuid >0x080490a5 <+53>:mov$0x1,%eax >0x080490aa <+58>:mov%ecx,%esi > => 0x080490ac <+60>:kmovd %ebx,%k0 >0x080490b0 <+64>:mov%edi,%ecx >0x080490b2 <+66>:mov%edi,%ebx > > kmov insn is generated for __cpuid_count function, where the binary > determines, if the new instructions are supported. The binary will > crash in the detection code if the processor lacks AVX512 > instructions. > IMHO, the testcase shouldn't be run on processors without AVX512BW. Because in avx512bitalg-vpopcntb-1.c, there's /* { dg-require-effective-target avx512bw } */. what's the version of your assembler? > Uros. -- BR, Hongtao
Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
> > > gcc/ > > > PR target/88808 > > > * config/i386/i386.c (ix86_preferred_reload_class): Allow > > > QImode data go into mask registers. > > > * config/i386/i386.md: (*movhi_internal): Adjust constraints > > > for mask registers. > > > (*movqi_internal): Ditto. > > > (*anddi_1): Support mask register operations > > > (*and_1): Ditto. > > > (*andqi_1): Ditto. > > > (*andn_1): Ditto. > > > (*_1): Ditto. > > > (*qi_1): Ditto. > > > (*one_cmpl2_1): Ditto. > > > (*one_cmplsi2_1_zext): Ditto. > > > (*one_cmplqi2_1): Ditto. > > > (define_peephole2): Move constant 0/-1 directly into mask > > > registers. > > > * config/i386/predicates.md (mask_reg_operand): New predicate. > > > * config/i386/sse.md (define_split): Add post-reload splitters > > > that would convert "generic" patterns to mask patterns. > > > (*knotsi_1_zext): New define_insn. > > > > > > gcc/testsuite/ > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > * gcc.target/i386/bitwise_mask_op-3.c: New test. > > > * gcc.target/i386/avx512bw-pr88465.c: New testcase. > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase. > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > A little nit, please put new splitters after the instruction pattern. > > > > OK for the whole patch set with the above change, > > > > Yes, thanks for the review. Please note that your patch introduces several testsuite fails with -m32: gcc -O2 -mavx512bitalg -mavx512bw -m32 -g avx512bitalg-vpopcntb-1.c Program received signal SIGILL, Illegal instruction. 0x080490ac in __get_cpuid_count (__edx=, __ecx=, __ebx=, __eax=, __subleaf=0, __leaf=7) at /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316 316 __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, *__ecx, *__edx); 0x080490a3 <+51>:cpuid 0x080490a5 <+53>:mov$0x1,%eax 0x080490aa <+58>:mov%ecx,%esi => 0x080490ac <+60>:kmovd %ebx,%k0 0x080490b0 <+64>:mov%edi,%ecx 0x080490b2 <+66>:mov%edi,%ebx kmov insn is generated for __cpuid_count function, where the binary determines, if the new instructions are supported. The binary will crash in the detection code if the processor lacks AVX512 instructions. Uros.
Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
On Thu, Aug 20, 2020 at 3:40 PM Uros Bizjak wrote: > > On Thu, Aug 20, 2020 at 9:31 AM Hongtao Liu wrote: > > > > On Thu, Aug 20, 2020 at 3:24 PM Hongtao Liu wrote: > > > > > > On Wed, Aug 19, 2020 at 3:05 PM Uros Bizjak wrote: > > > > > > > > On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu wrote: > > > > > > > > > > On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak wrote: > > > > > > > > > > > > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu > > > > > > wrote: > > > > > > > > > > > > > > Enable operator or/xor/and/andn/not for mask register, kxnor is > > > > > > > not > > > > > > > enabled since there's no corresponding instruction for general > > > > > > > registers. > > > > > > > > > > > > > > gcc/ > > > > > > > PR target/88808 > > > > > > > * config/i386/i386.md: (*movsi_internal): Adjust > > > > > > > constraints > > > > > > > for mask registers. > > > > > > > (*movhi_internal): Ditto. > > > > > > > (*movqi_internal): Ditto. > > > > > > > (*anddi_1): Support mask register operations > > > > > > > (*and_1): Ditto. > > > > > > > (*andqi_1): Ditto. > > > > > > > (*andn_1): Ditto. > > > > > > > (*_1): Ditto. > > > > > > > (*qi_1): Ditto. > > > > > > > (*one_cmpl2_1): Ditto. > > > > > > > (*one_cmplsi2_1_zext): Ditto. > > > > > > > (*one_cmplqi2_1): Ditto. > > > > > > > > > > > > > > gcc/testsuite/ > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase. > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > > > > > > > > > index 74d207c3711..e8ad79d1b0a 100644 > > > > > > --- a/gcc/config/i386/i386.md > > > > > > +++ b/gcc/config/i386/i386.md > > > > > > @@ -2294,7 +2294,7 @@ > > > > > > > > > > > > (define_insn "*movsi_internal" > > > > > >[(set (match_operand:SI 0 "nonimmediate_operand" > > > > > > -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k") > > > > > > +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k") > > > > > > (match_operand:SI 1 "general_operand" > > > > > > "g ,re,C ,*y,m ,*y,*y,r ,C ,*v,m ,*v,*v,r ,*r,*km,*k > > > > > > ,CBC"))] > > > > > >"!(MEM_P (operands[0]) && MEM_P (operands[1]))" > > > > > > > > > > > > I'd rather see *k everywhere, also with *movqi_internal and > > > > > > *movhi_internal patterns. The "*" means that the allocator won't > > > > > > allocate a mask register by default, but it will be used to optimize > > > > > > moves. With the above change, you are risking that during integer > > > > > > register pressure, the register allocator will allocate zero to a > > > > > > mask > > > > > > register, and later "optimize" the move with a direct maskreg-intreg > > > > > > move. > > > > > > > > > > > > The current strategy is that only general registers get allocated > > > > > > for > > > > > > integer modes. Let's keep it this way for now. > > > > > > > > > > > > > > > > Yes, though it would fail gcc.target/i386/avx512dq-pr88465.c and > > > > > gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to > > > > > move zero into mask register directly. > > > > > > > > Although it would be nice if the register allocator was smart enough, > > > > the current strategy is to introduce peephole2 patterns to fix these > > > > problems, similar to [1]. These peepholes can be introduced in a > > > > follow-up patch. > > > > > > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html > > > > > > > > > > peephole2 added. > > > > > > > > > Otherwise, the patchset LGTM, but please test the suggested changes > > > > > > and repost. > > > > > > > > > > > > BTW: Do you plan to remove mask operations from sse.md? ATM, they > > > > > > are > > > > > > used to distinguish mask operations, generated from builtins from > > > > > > generic operations, so I'd like to keep them for a while. The > > > > > > drawback > > > > > > is, that they are not combined with other operations, but at the end > > > > > > of the day, this is what the programmer asked for by using builtins. > > > > > > > > > > Agree, I prefer to keep them. > > > > > > > > Thinking some more about the approach, it looks to me that the optimal > > > > solution is a post-reload splitter that would convert "generic" > > > > patterns to mask operations from sse.md. The mask operations don't set > > > > flags, so we can substantially improve post reload scheduling of these > > > > instructions by removing flags clobber. > > > > > > > > So, simply add "#" to relevant alternatives of logic patterns and add > > > > something like: > > > > > > > > --cut here-- > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > >
Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
On Thu, Aug 20, 2020 at 9:31 AM Hongtao Liu wrote: > > On Thu, Aug 20, 2020 at 3:24 PM Hongtao Liu wrote: > > > > On Wed, Aug 19, 2020 at 3:05 PM Uros Bizjak wrote: > > > > > > On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu wrote: > > > > > > > > On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak wrote: > > > > > > > > > > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu > > > > > wrote: > > > > > > > > > > > > Enable operator or/xor/and/andn/not for mask register, kxnor is not > > > > > > enabled since there's no corresponding instruction for general > > > > > > registers. > > > > > > > > > > > > gcc/ > > > > > > PR target/88808 > > > > > > * config/i386/i386.md: (*movsi_internal): Adjust constraints > > > > > > for mask registers. > > > > > > (*movhi_internal): Ditto. > > > > > > (*movqi_internal): Ditto. > > > > > > (*anddi_1): Support mask register operations > > > > > > (*and_1): Ditto. > > > > > > (*andqi_1): Ditto. > > > > > > (*andn_1): Ditto. > > > > > > (*_1): Ditto. > > > > > > (*qi_1): Ditto. > > > > > > (*one_cmpl2_1): Ditto. > > > > > > (*one_cmplsi2_1_zext): Ditto. > > > > > > (*one_cmplqi2_1): Ditto. > > > > > > > > > > > > gcc/testsuite/ > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase. > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > > > > > > > index 74d207c3711..e8ad79d1b0a 100644 > > > > > --- a/gcc/config/i386/i386.md > > > > > +++ b/gcc/config/i386/i386.md > > > > > @@ -2294,7 +2294,7 @@ > > > > > > > > > > (define_insn "*movsi_internal" > > > > >[(set (match_operand:SI 0 "nonimmediate_operand" > > > > > -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k") > > > > > +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k") > > > > > (match_operand:SI 1 "general_operand" > > > > > "g ,re,C ,*y,m ,*y,*y,r ,C ,*v,m ,*v,*v,r ,*r,*km,*k ,CBC"))] > > > > >"!(MEM_P (operands[0]) && MEM_P (operands[1]))" > > > > > > > > > > I'd rather see *k everywhere, also with *movqi_internal and > > > > > *movhi_internal patterns. The "*" means that the allocator won't > > > > > allocate a mask register by default, but it will be used to optimize > > > > > moves. With the above change, you are risking that during integer > > > > > register pressure, the register allocator will allocate zero to a mask > > > > > register, and later "optimize" the move with a direct maskreg-intreg > > > > > move. > > > > > > > > > > The current strategy is that only general registers get allocated for > > > > > integer modes. Let's keep it this way for now. > > > > > > > > > > > > > Yes, though it would fail gcc.target/i386/avx512dq-pr88465.c and > > > > gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to > > > > move zero into mask register directly. > > > > > > Although it would be nice if the register allocator was smart enough, > > > the current strategy is to introduce peephole2 patterns to fix these > > > problems, similar to [1]. These peepholes can be introduced in a > > > follow-up patch. > > > > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html > > > > > > > peephole2 added. > > > > > > > Otherwise, the patchset LGTM, but please test the suggested changes > > > > > and repost. > > > > > > > > > > BTW: Do you plan to remove mask operations from sse.md? ATM, they are > > > > > used to distinguish mask operations, generated from builtins from > > > > > generic operations, so I'd like to keep them for a while. The drawback > > > > > is, that they are not combined with other operations, but at the end > > > > > of the day, this is what the programmer asked for by using builtins. > > > > > > > > Agree, I prefer to keep them. > > > > > > Thinking some more about the approach, it looks to me that the optimal > > > solution is a post-reload splitter that would convert "generic" > > > patterns to mask operations from sse.md. The mask operations don't set > > > flags, so we can substantially improve post reload scheduling of these > > > instructions by removing flags clobber. > > > > > > So, simply add "#" to relevant alternatives of logic patterns and add > > > something like: > > > > > > --cut here-- > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > > > index 41c6dbfa668..ad49bdc7583 100644 > > > --- a/gcc/config/i386/sse.md > > > +++ b/gcc/config/i386/sse.md > > > @@ -1470,6 +1470,18 @@ > > >] > > >(const_string "")))]) > > > > > > +(define_split > > > + [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand") > > > + (any_logic:SWI1248_AVX512BW > > > +
Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
On Thu, Aug 20, 2020 at 3:24 PM Hongtao Liu wrote: > > On Wed, Aug 19, 2020 at 3:05 PM Uros Bizjak wrote: > > > > On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu wrote: > > > > > > On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak wrote: > > > > > > > > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu wrote: > > > > > > > > > > Enable operator or/xor/and/andn/not for mask register, kxnor is not > > > > > enabled since there's no corresponding instruction for general > > > > > registers. > > > > > > > > > > gcc/ > > > > > PR target/88808 > > > > > * config/i386/i386.md: (*movsi_internal): Adjust constraints > > > > > for mask registers. > > > > > (*movhi_internal): Ditto. > > > > > (*movqi_internal): Ditto. > > > > > (*anddi_1): Support mask register operations > > > > > (*and_1): Ditto. > > > > > (*andqi_1): Ditto. > > > > > (*andn_1): Ditto. > > > > > (*_1): Ditto. > > > > > (*qi_1): Ditto. > > > > > (*one_cmpl2_1): Ditto. > > > > > (*one_cmplsi2_1_zext): Ditto. > > > > > (*one_cmplqi2_1): Ditto. > > > > > > > > > > gcc/testsuite/ > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase. > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > > > > > index 74d207c3711..e8ad79d1b0a 100644 > > > > --- a/gcc/config/i386/i386.md > > > > +++ b/gcc/config/i386/i386.md > > > > @@ -2294,7 +2294,7 @@ > > > > > > > > (define_insn "*movsi_internal" > > > >[(set (match_operand:SI 0 "nonimmediate_operand" > > > > -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k") > > > > +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k") > > > > (match_operand:SI 1 "general_operand" > > > > "g ,re,C ,*y,m ,*y,*y,r ,C ,*v,m ,*v,*v,r ,*r,*km,*k ,CBC"))] > > > >"!(MEM_P (operands[0]) && MEM_P (operands[1]))" > > > > > > > > I'd rather see *k everywhere, also with *movqi_internal and > > > > *movhi_internal patterns. The "*" means that the allocator won't > > > > allocate a mask register by default, but it will be used to optimize > > > > moves. With the above change, you are risking that during integer > > > > register pressure, the register allocator will allocate zero to a mask > > > > register, and later "optimize" the move with a direct maskreg-intreg > > > > move. > > > > > > > > The current strategy is that only general registers get allocated for > > > > integer modes. Let's keep it this way for now. > > > > > > > > > > Yes, though it would fail gcc.target/i386/avx512dq-pr88465.c and > > > gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to > > > move zero into mask register directly. > > > > Although it would be nice if the register allocator was smart enough, > > the current strategy is to introduce peephole2 patterns to fix these > > problems, similar to [1]. These peepholes can be introduced in a > > follow-up patch. > > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html > > > > peephole2 added. > > > > > Otherwise, the patchset LGTM, but please test the suggested changes and > > > > repost. > > > > > > > > BTW: Do you plan to remove mask operations from sse.md? ATM, they are > > > > used to distinguish mask operations, generated from builtins from > > > > generic operations, so I'd like to keep them for a while. The drawback > > > > is, that they are not combined with other operations, but at the end > > > > of the day, this is what the programmer asked for by using builtins. > > > > > > Agree, I prefer to keep them. > > > > Thinking some more about the approach, it looks to me that the optimal > > solution is a post-reload splitter that would convert "generic" > > patterns to mask operations from sse.md. The mask operations don't set > > flags, so we can substantially improve post reload scheduling of these > > instructions by removing flags clobber. > > > > So, simply add "#" to relevant alternatives of logic patterns and add > > something like: > > > > --cut here-- > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > > index 41c6dbfa668..ad49bdc7583 100644 > > --- a/gcc/config/i386/sse.md > > +++ b/gcc/config/i386/sse.md > > @@ -1470,6 +1470,18 @@ > >] > >(const_string "")))]) > > > > +(define_split > > + [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand") > > + (any_logic:SWI1248_AVX512BW > > + (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand") > > + (match_operand:SWI1248_AVX512BW 2 "mask_reg_operand"))) > > + (clobber (reg:CC FLAGS_REG))] > > + "TARGET_AVX512F && reload_completed" > > + [(parallel > > + [(set (match_dup 0) > > +
Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
On Wed, Aug 19, 2020 at 3:05 PM Uros Bizjak wrote: > > On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu wrote: > > > > On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak wrote: > > > > > > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu wrote: > > > > > > > > Enable operator or/xor/and/andn/not for mask register, kxnor is not > > > > enabled since there's no corresponding instruction for general > > > > registers. > > > > > > > > gcc/ > > > > PR target/88808 > > > > * config/i386/i386.md: (*movsi_internal): Adjust constraints > > > > for mask registers. > > > > (*movhi_internal): Ditto. > > > > (*movqi_internal): Ditto. > > > > (*anddi_1): Support mask register operations > > > > (*and_1): Ditto. > > > > (*andqi_1): Ditto. > > > > (*andn_1): Ditto. > > > > (*_1): Ditto. > > > > (*qi_1): Ditto. > > > > (*one_cmpl2_1): Ditto. > > > > (*one_cmplsi2_1_zext): Ditto. > > > > (*one_cmplqi2_1): Ditto. > > > > > > > > gcc/testsuite/ > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase. > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > > > index 74d207c3711..e8ad79d1b0a 100644 > > > --- a/gcc/config/i386/i386.md > > > +++ b/gcc/config/i386/i386.md > > > @@ -2294,7 +2294,7 @@ > > > > > > (define_insn "*movsi_internal" > > >[(set (match_operand:SI 0 "nonimmediate_operand" > > > -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k") > > > +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k") > > > (match_operand:SI 1 "general_operand" > > > "g ,re,C ,*y,m ,*y,*y,r ,C ,*v,m ,*v,*v,r ,*r,*km,*k ,CBC"))] > > >"!(MEM_P (operands[0]) && MEM_P (operands[1]))" > > > > > > I'd rather see *k everywhere, also with *movqi_internal and > > > *movhi_internal patterns. The "*" means that the allocator won't > > > allocate a mask register by default, but it will be used to optimize > > > moves. With the above change, you are risking that during integer > > > register pressure, the register allocator will allocate zero to a mask > > > register, and later "optimize" the move with a direct maskreg-intreg > > > move. > > > > > > The current strategy is that only general registers get allocated for > > > integer modes. Let's keep it this way for now. > > > > > > > Yes, though it would fail gcc.target/i386/avx512dq-pr88465.c and > > gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to > > move zero into mask register directly. > > Although it would be nice if the register allocator was smart enough, > the current strategy is to introduce peephole2 patterns to fix these > problems, similar to [1]. These peepholes can be introduced in a > follow-up patch. > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html > peephole2 added. > > > Otherwise, the patchset LGTM, but please test the suggested changes and > > > repost. > > > > > > BTW: Do you plan to remove mask operations from sse.md? ATM, they are > > > used to distinguish mask operations, generated from builtins from > > > generic operations, so I'd like to keep them for a while. The drawback > > > is, that they are not combined with other operations, but at the end > > > of the day, this is what the programmer asked for by using builtins. > > > > Agree, I prefer to keep them. > > Thinking some more about the approach, it looks to me that the optimal > solution is a post-reload splitter that would convert "generic" > patterns to mask operations from sse.md. The mask operations don't set > flags, so we can substantially improve post reload scheduling of these > instructions by removing flags clobber. > > So, simply add "#" to relevant alternatives of logic patterns and add > something like: > > --cut here-- > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index 41c6dbfa668..ad49bdc7583 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -1470,6 +1470,18 @@ >] >(const_string "")))]) > > +(define_split > + [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand") > + (any_logic:SWI1248_AVX512BW > + (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand") > + (match_operand:SWI1248_AVX512BW 2 "mask_reg_operand"))) > + (clobber (reg:CC FLAGS_REG))] > + "TARGET_AVX512F && reload_completed" > + [(parallel > + [(set (match_dup 0) > + (any_logic:SWI1248_AVX512BW (match_dup 1) (match_dup 2))) > + (unspec [(const_int 0)] UNSPEC_MASKOP)])]) > + > (define_insn "kandn" >[(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k") > (and:SWI1248_AVX512BW > --cut here-- > > and similar for kandn and knot in
Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu wrote: > > On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak wrote: > > > > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu wrote: > > > > > > Enable operator or/xor/and/andn/not for mask register, kxnor is not > > > enabled since there's no corresponding instruction for general > > > registers. > > > > > > gcc/ > > > PR target/88808 > > > * config/i386/i386.md: (*movsi_internal): Adjust constraints > > > for mask registers. > > > (*movhi_internal): Ditto. > > > (*movqi_internal): Ditto. > > > (*anddi_1): Support mask register operations > > > (*and_1): Ditto. > > > (*andqi_1): Ditto. > > > (*andn_1): Ditto. > > > (*_1): Ditto. > > > (*qi_1): Ditto. > > > (*one_cmpl2_1): Ditto. > > > (*one_cmplsi2_1_zext): Ditto. > > > (*one_cmplqi2_1): Ditto. > > > > > > gcc/testsuite/ > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase. > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > index 74d207c3711..e8ad79d1b0a 100644 > > --- a/gcc/config/i386/i386.md > > +++ b/gcc/config/i386/i386.md > > @@ -2294,7 +2294,7 @@ > > > > (define_insn "*movsi_internal" > >[(set (match_operand:SI 0 "nonimmediate_operand" > > -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k") > > +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k") > > (match_operand:SI 1 "general_operand" > > "g ,re,C ,*y,m ,*y,*y,r ,C ,*v,m ,*v,*v,r ,*r,*km,*k ,CBC"))] > >"!(MEM_P (operands[0]) && MEM_P (operands[1]))" > > > > I'd rather see *k everywhere, also with *movqi_internal and > > *movhi_internal patterns. The "*" means that the allocator won't > > allocate a mask register by default, but it will be used to optimize > > moves. With the above change, you are risking that during integer > > register pressure, the register allocator will allocate zero to a mask > > register, and later "optimize" the move with a direct maskreg-intreg > > move. > > > > The current strategy is that only general registers get allocated for > > integer modes. Let's keep it this way for now. > > > > Yes, though it would fail gcc.target/i386/avx512dq-pr88465.c and > gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to > move zero into mask register directly. Although it would be nice if the register allocator was smart enough, the current strategy is to introduce peephole2 patterns to fix these problems, similar to [1]. These peepholes can be introduced in a follow-up patch. [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html > > Otherwise, the patchset LGTM, but please test the suggested changes and > > repost. > > > > BTW: Do you plan to remove mask operations from sse.md? ATM, they are > > used to distinguish mask operations, generated from builtins from > > generic operations, so I'd like to keep them for a while. The drawback > > is, that they are not combined with other operations, but at the end > > of the day, this is what the programmer asked for by using builtins. > > Agree, I prefer to keep them. Thinking some more about the approach, it looks to me that the optimal solution is a post-reload splitter that would convert "generic" patterns to mask operations from sse.md. The mask operations don't set flags, so we can substantially improve post reload scheduling of these instructions by removing flags clobber. So, simply add "#" to relevant alternatives of logic patterns and add something like: --cut here-- diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 41c6dbfa668..ad49bdc7583 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1470,6 +1470,18 @@ ] (const_string "")))]) +(define_split + [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand") + (any_logic:SWI1248_AVX512BW + (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand") + (match_operand:SWI1248_AVX512BW 2 "mask_reg_operand"))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_AVX512F && reload_completed" + [(parallel + [(set (match_dup 0) + (any_logic:SWI1248_AVX512BW (match_dup 1) (match_dup 2))) + (unspec [(const_int 0)] UNSPEC_MASKOP)])]) + (define_insn "kandn" [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k") (and:SWI1248_AVX512BW --cut here-- and similar for kandn and knot in sse.md. You will have to add mask_reg_operand predicate, see e.g. sse_reg_operand in predicates.md for example. We don't lose anything, because all important transformations, propagations and simplifications with these patterns happen before reload. Uros.
Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak wrote: > > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu wrote: > > > > Enable operator or/xor/and/andn/not for mask register, kxnor is not > > enabled since there's no corresponding instruction for general > > registers. > > > > gcc/ > > PR target/88808 > > * config/i386/i386.md: (*movsi_internal): Adjust constraints > > for mask registers. > > (*movhi_internal): Ditto. > > (*movqi_internal): Ditto. > > (*anddi_1): Support mask register operations > > (*and_1): Ditto. > > (*andqi_1): Ditto. > > (*andn_1): Ditto. > > (*_1): Ditto. > > (*qi_1): Ditto. > > (*one_cmpl2_1): Ditto. > > (*one_cmplsi2_1_zext): Ditto. > > (*one_cmplqi2_1): Ditto. > > > > gcc/testsuite/ > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase. > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > index 74d207c3711..e8ad79d1b0a 100644 > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -2294,7 +2294,7 @@ > > (define_insn "*movsi_internal" >[(set (match_operand:SI 0 "nonimmediate_operand" > -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k") > +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k") > (match_operand:SI 1 "general_operand" > "g ,re,C ,*y,m ,*y,*y,r ,C ,*v,m ,*v,*v,r ,*r,*km,*k ,CBC"))] >"!(MEM_P (operands[0]) && MEM_P (operands[1]))" > > I'd rather see *k everywhere, also with *movqi_internal and > *movhi_internal patterns. The "*" means that the allocator won't > allocate a mask register by default, but it will be used to optimize > moves. With the above change, you are risking that during integer > register pressure, the register allocator will allocate zero to a mask > register, and later "optimize" the move with a direct maskreg-intreg > move. > > The current strategy is that only general registers get allocated for > integer modes. Let's keep it this way for now. > Yes, though it would fail gcc.target/i386/avx512dq-pr88465.c and gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to move zero into mask register directly. > Otherwise, the patchset LGTM, but please test the suggested changes and > repost. > > BTW: Do you plan to remove mask operations from sse.md? ATM, they are > used to distinguish mask operations, generated from builtins from > generic operations, so I'd like to keep them for a while. The drawback > is, that they are not combined with other operations, but at the end > of the day, this is what the programmer asked for by using builtins. Agree, I prefer to keep them. > > Uros. Bootstrap is ok, regression test is ok for i386/x86-64 backend(After adjusting testcase). impact for SPEC2017 on SKL. 500.perlbench_r 0.00% 502.gcc_r 1.59% 505.mcf_r 1.49% 520.omnetpp_r 1.91% 523.xalancbmk_r -1.22% 525.x264_r 0.00% 531.deepsjeng_r 0.00% 541.leela_r -0.22% 548.exchange2_r 2.27% 557.xz_r 0.63% INT geomean 0.64% 503.bwaves_r 3.68% 507.cactuBSSN_r -0.62% 508.namd_r 0.51% 510.parest_r -0.16% 511.povray_r 0.57% 519.lbm_r 0.50% 521.wrf_r 0.00% 526.blender_r 0.00% 527.cam4_r 0.00% 538.imagick_r -0.41% 544.nab_r 0.00% 549.fotonik3d_r -0.20% 554.roms_r 4.19% FP geomean 0.66% -- BR, Hongtao From e546516449ec4ed9301b83a063efdefbf0f7e75a Mon Sep 17 00:00:00 2001 From: liuhongt Date: Thu, 13 Aug 2020 14:20:43 +0800 Subject: [PATCH 4/4] Enable bitwise operation for type mask. Enable operator or/xor/and/andn/not for mask register, kxnor is not enabled since there's no corresponding instruction for general registers. gcc/ PR target/88808 * config/i386/i386.md: (*movsi_internal): Adjust constraints for mask registers. (*movhi_internal): Ditto. (*movqi_internal): Ditto. (*anddi_1): Support mask register operations (*and_1): Ditto. (*andqi_1): Ditto. (*andn_1): Ditto. (*_1): Ditto. (*qi_1): Ditto. (*one_cmpl2_1): Ditto. (*one_cmplsi2_1_zext): Ditto. (*one_cmplqi2_1): Ditto. gcc/testsuite/ * gcc.target/i386/bitwise_mask_op-1.c: New test. * gcc.target/i386/bitwise_mask_op-2.c: New test. * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase. * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. * gcc.target/i386/avx512f-kmovw-5.c: Ditto. * gcc.target/i386/avx512bw-pr88465.c: Ditto. * gcc.target/i386/avx512f-pr88465.c: Ditto. --- gcc/config/i386/i386.md | 260 +- .../gcc.target/i386/avx512bw-kunpckwd-1.c | 2 +- .../gcc.target/i386/avx512bw-kunpckwd-3.c | 2 +- .../gcc.target/i386/avx512dq-kmovb-5.c| 2 +- .../gcc.target/i386/avx512dq-pr88465.c| 4 +- .../gcc.target/i386/avx512f-kmovw-5.c |
Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.
On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu wrote: > > Enable operator or/xor/and/andn/not for mask register, kxnor is not > enabled since there's no corresponding instruction for general > registers. > > gcc/ > PR target/88808 > * config/i386/i386.md: (*movsi_internal): Adjust constraints > for mask registers. > (*movhi_internal): Ditto. > (*movqi_internal): Ditto. > (*anddi_1): Support mask register operations > (*and_1): Ditto. > (*andqi_1): Ditto. > (*andn_1): Ditto. > (*_1): Ditto. > (*qi_1): Ditto. > (*one_cmpl2_1): Ditto. > (*one_cmplsi2_1_zext): Ditto. > (*one_cmplqi2_1): Ditto. > > gcc/testsuite/ > * gcc.target/i386/bitwise_mask_op-1.c: New test. > * gcc.target/i386/bitwise_mask_op-2.c: New test. > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase. > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. index 74d207c3711..e8ad79d1b0a 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -2294,7 +2294,7 @@ (define_insn "*movsi_internal" [(set (match_operand:SI 0 "nonimmediate_operand" -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k") +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k") (match_operand:SI 1 "general_operand" "g ,re,C ,*y,m ,*y,*y,r ,C ,*v,m ,*v,*v,r ,*r,*km,*k ,CBC"))] "!(MEM_P (operands[0]) && MEM_P (operands[1]))" I'd rather see *k everywhere, also with *movqi_internal and *movhi_internal patterns. The "*" means that the allocator won't allocate a mask register by default, but it will be used to optimize moves. With the above change, you are risking that during integer register pressure, the register allocator will allocate zero to a mask register, and later "optimize" the move with a direct maskreg-intreg move. The current strategy is that only general registers get allocated for integer modes. Let's keep it this way for now. Otherwise, the patchset LGTM, but please test the suggested changes and repost. BTW: Do you plan to remove mask operations from sse.md? ATM, they are used to distinguish mask operations, generated from builtins from generic operations, so I'd like to keep them for a while. The drawback is, that they are not combined with other operations, but at the end of the day, this is what the programmer asked for by using builtins. Uros.