[Bug testsuite/116635] new test case gcc.dg/opt-ordered-and-nonequal-1.c from r15-3463-g91421e21e8f0f0 fails

2024-09-09 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116635

--- Comment #7 from Hu Lin  ---
Thanks for the explanation.

[Bug testsuite/116635] new test case cc.dg/opt-ordered-and-nonequal-1.c from r15-3463-g91421e21e8f0f0 fails

2024-09-09 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116635

--- Comment #3 from Hu Lin  ---
(In reply to Thomas Schwinge from comment #2)
> 
> "Match: Fix ordered and nonequal: Fix 'gcc.dg/opt-ordered-and-nonequal-1.c'
> re 'LOGICAL_OP_NON_SHORT_CIRCUIT' [PR116635]"

Thanks, the option is valid, https://godbolt.org/z/7xqEKTn7Y.

[Bug other/116635] new test case cc.dg/opt-ordered-and-nonequal-1.c from r15-3463-g91421e21e8f0f0 fails

2024-09-08 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116635

--- Comment #1 from Hu Lin  ---
According to the results from https://godbolt.org/z/eKnvraP8T and
https://godbolt.org/z/G6MTWKf4P, certain options such as -march=armv8-m.base
and -mtune=cortex-m23 influence the structure of the code in the ccp1 pass,
rendering the optimization invalid in the forwprop1 pass.

Since the patch is intended for general optimization, I would prefer not to
impose an architecture-specific restriction for this test.

Do you have any ideas for resolving this issue, or could you include this test
case in the list of stable failures?

[Bug testsuite/116608] i386/xorsign.c, i386/vect-double-2.c fail with -march=x86-64-v2(-msse4).

2024-09-05 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116608

Hu Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Hu Lin  ---
Has been solved.

[Bug testsuite/116608] New: i386/xorsign.c, i386/vect-double-2.c fail with -march=x86-64-v2(-msse4).

2024-09-04 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116608

Bug ID: 116608
   Summary: i386/xorsign.c, i386/vect-double-2.c fail with
-march=x86-64-v2(-msse4).
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: lin1.hu at intel dot com
  Target Milestone: ---

gcc.target/i386/xorsign.c and gcc.target/i386/vect-double-2.c will fail with
-march=x86-64-v2

 53 PASS: gcc.target/i386/xorsign.c execution test
 54 gcc.target/i386/xorsign.c: pattern found 0 times
 55 FAIL: gcc.target/i386/xorsign.c scan-tree-dump-times vect "vectorized 2
loops" 1
 56 PASS: gcc.target/i386/xorsign.c scan-assembler [ \t]xor
 57 PASS: gcc.target/i386/xorsign.c scan-assembler [ \t]and


With sse4, gcc will vectorized 4 loops. I want to add an extra option -mno-sse4
in these testcases.

[Bug middle-end/115863] [15 Regression] zlib-1.3.1 miscompilation since r15-1936-g80e446e829d818

2024-07-17 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115863

--- Comment #18 from Hu Lin  ---
(In reply to Uroš Bizjak from comment #17)
> (In reply to Hongtao Liu from comment #16)
> > > Unfortunately, x86 has no vector mode .SAT_TRUNC instruction.
> > No, AVX512 supports both signed and unsigned saturation
> Indeed.
> 
> BTW: PACKUSmn (despite the name) is not what we are looking for.

Indeed.

[Bug target/115931] New: mips: vec_pack_usat_m's pattern is wrong at gcc/config/mips/loongson-mmi.md:167

2024-07-14 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115931

Bug ID: 115931
   Summary: mips: vec_pack_usat_m's pattern is wrong at
gcc/config/mips/loongson-mmi.md:167
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: lin1.hu at intel dot com
  Target Milestone: ---

If I understand correctly. mips's packuswh(vec_pack_usat_) means convert
signed integer to unsigned integer with unsigned saturation. But gcc's
us_saturate means unsigned integer convert unsigned integer with unsigned
saturation. So the pattern should use UNSPEC.

[Bug tree-optimization/115753] [15 Regression] ICE: tree check: expected ssa_name, have integer_cst in supportable_indirect_convert_operation, at tree-vect-stmts.cc:14680

2024-07-02 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115753

--- Comment #6 from Hu Lin  ---
(In reply to Andrew Pinski from comment #5)
> Note the correct way to have a testcase that is able to handle float16 is to
> do:
> /* { dg-add-options float16 } */
> /* { dg-require-effective-target float16 } */
> 
> 
> This will allow it to work on 32bit x86 and arm (32bit) too.

Thanks for your advice.

I will add the testcase of comment #4 and put them in the
gcc/testsuite/gcc.dg/vect/.

[Bug tree-optimization/115753] [15 Regression] ICE: tree check: expected ssa_name, have integer_cst in supportable_indirect_convert_operation, at tree-vect-stmts.cc:14680

2024-07-02 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115753

--- Comment #2 from Hu Lin  ---
Created attachment 58572
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58572&action=edit
Untested fix.

Confirmed, I need to check if TYPE_CODE is SSA_NAME before SSA_NAME_RANGE_INFO.

[Bug middle-end/115675] [15 Regression] truncv4hiv4qi affect r14-1402-gd8545fb2c71683's optimization.

2024-06-27 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115675

--- Comment #3 from Hu Lin  ---
192t.slp2

Previous:

781   char * vectp.10;
782   vector(4) char * vectp_a.9;
783   short int _1;
784   short int _2;
785   char _3;
786   char _4;
787   short int _5;
788   short int _6;
789   char _7;
790   char _8;
791   vector(4) char _16;
792
793[local count: 1073741824]:
794   _1 = *b_10(D);
795   _2 = _1 >> 8;
796   _3 = (char) _2;
797   _4 = (char) _1;
798   _5 = MEM[(short int *)b_10(D) + 2B];
799   _6 = _5 >> 8;
800   _7 = (char) _6;
801   _8 = (char) _5;
802   _16 = {_3, _4, _7, _8};
803   vectp.10_17 = a_11(D);
804   MEM  [(char *)vectp.10_17] = _16;

Current:

792   char * vectp.11;
793   vector(4) char * vectp_a.10;
794   vector(4) char vect__3.9;
795   short int _1;
796   short int _2;
797   char _3;
798   char _4;
799   short int _5;
800   short int _6;
801   char _7;
802   char _8;
803   vector(4) short int _16;
804
805[local count: 1073741824]:
806   _1 = *b_10(D);
807   _2 = _1 >> 8;
808   _3 = (char) _2;
809   _4 = (char) _1;
810   _5 = MEM[(short int *)b_10(D) + 2B];
811   _6 = _5 >> 8;
812   _16 = {_2, _1, _6, _5};
813   vect__3.9_17 = (vector(4) char) _16;
814   _7 = (char) _6;
815   _8 = (char) _5;
816   vectp.11_18 = a_11(D);
817   MEM  [(char *)vectp.11_18] = vect__3.9_17;

[Bug middle-end/115675] New: truncv4hiv4qi affect r14-1402-gd8545fb2c71683's optimization.

2024-06-27 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115675

Bug ID: 115675
   Summary: truncv4hiv4qi affect r14-1402-gd8545fb2c71683's
optimization.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: lin1.hu at intel dot com
  Target Milestone: ---

After r15-1678-ge5f8a39941f6f0, truncv4hiv4qi affects dump and interferes with
r14-1402-gd8545fb2c71683's optimization.

When I compile pr108938-3.c with -mavx or -mavx512bw -mavx512vl, GCC doesn't
generate bswap r32. I've discussed this with Hongtao and haven't found an
easier way to do it yet. I think it might be possible to target match the
current form to re-support bswap optimization with option -mavx.

[Bug target/115462] [15 regression] 416.gamess regressed 4-6% on x86_64 since r15-882-g1d6199e5f8c1c0

2024-06-19 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115462

--- Comment #4 from Hu Lin  ---
Created attachment 58470
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58470&action=edit
A short case

I tested the file with
1) -Ofast -flto -march=skylake-avx512 -mfpmath=sse -funroll-loops
2) -O2 -march=native (on an Icelake server)

Both generate redundant mov.

[Bug target/115462] [15 regression] 416.gamess regressed 4-6% on x86_64 since r15-882-g1d6199e5f8c1c0

2024-06-19 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115462

Hu Lin  changed:

   What|Removed |Added

 CC||lin1.hu at intel dot com

--- Comment #3 from Hu Lin  ---
I looked up the hotspot for this test. 

At int2a.F:570 (we output its .f file int2a.fppized.f.), its source code is 

 566   DO 200 K = 1,MAX
 567   MX = NX+KLX(K)
 568   MY = NY+KLY(K)
 569   MZ = NZ+KLZ(K)
 570   N = N1+KLGT(K)
 571   200 GHONDO(N) = ( XIN(MX )*YIN(MY )*ZIN(MZ ) +XIN(MX+625)*YIN(MY+625)*
 572  + ZIN(MZ+625) +XIN(MX+1250)*YIN(MY+1250)*ZIN(MZ+1250) )*D1*
 573  + DKL(K)+GHONDO(N)
.

At this loop's beginning, the original ASM code is  
mov 0x271e3c98(,%rdx,4),%edi
mov 0x271e401c(,%rdx,4),%esi
mov 0x271e43a0(,%rdx,4),%ecx
mov 0x271e3914(,%rdx,4),%r8d
.
But after r15-882-g1d6199e5f8c1c0, the ASM code is
mov $0x27bf6c98, %r10d
mov $0x27bf701c, %r9d
mov $0x27bf73a0, %esi
movl  (%rbx,%rdx,4), %ecx
movl  (%r10,%rdx,4), %edi
movl  (%r9,%rdx,4), %r8d
movl  (%rsi,%rdx,4), %esi
.
In addition to this loop other places also have some similar extra
instructions. These instructions increase the instruction retired by about the
similar percentage as the regression.

[Bug target/115029] [14/15 regression] FFT computation performance regression, x86, between gcc-14 and gcc-13 on skylake platform

2024-05-22 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115029

Hu Lin  changed:

   What|Removed |Added

 CC||lin1.hu at intel dot com

--- Comment #3 from Hu Lin  ---
According to my investigation, the regression is about 0.9% regression on
cascadelake. And for Sapphire rapids, gcc14 has about a 4% improvement.

[Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog

2024-05-20 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021

Hu Lin  changed:

   What|Removed |Added

 CC||lin1.hu at intel dot com

--- Comment #3 from Hu Lin  ---
I found compiler allocates mem to the third source register of vpternlog in IRA
after commit f55cdce3f8dd8503e080e35be59c5f5390f6d95e. And it cause the
generate code will be 

  8 .cfi_startproc
  9 movl$4, %eax
 10 vpsraw  $5, %xmm0, %xmm2
 11 vpbroadcastb%eax, %xmm1
 12 movl$7, %eax
 13 vpbroadcastb%eax, %xmm3
 14 vmovdqa %xmm1, %xmm0
 15 vpternlogd  $120, %xmm3, %xmm2, %xmm0
 16 vmovdqa %xmm3, -24(%rsp)
 17 vpsubb  %xmm1, %xmm0, %xmm0
 18 ret

And 6a67fdcb3f0cc8be47b49ddd246d0c50c3770800 changes the vector type from v16qi
to v4si, leading to movv4si can't combine with the vpternlog in postreload, so
the result is what you see now.

[Bug target/54174] Missed optimization: Unnecessary vmovaps generated for __builtin_ia32_vextractf128_ps256(v, 0)

2024-05-15 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54174

Hu Lin  changed:

   What|Removed |Added

 CC||lin1.hu at intel dot com

--- Comment #4 from Hu Lin  ---
I tried to modify vec_extract_lo_ to:

 (define_insn "vec_extract_lo_"
   [(set (match_operand: 0 "nonimmediate_operand" "=v,v,vm,v")
 (vec_select:
   (match_operand:VI4F_256 1 "nonimmediate_operand" "0,v,v,vm")
   (parallel [(const_int 0) (const_int 1)
  (const_int 2) (const_int 3)])))]

and 
 (define_insn "vec_extract_lo_"
   [(set (match_operand: 0 "nonimmediate_operand"
"=v,?v,?vm,?v")
 (vec_select:
   (match_operand:VI4F_256 1 "nonimmediate_operand" "0,v,v,vm")
   (parallel [(const_int 0) (const_int 1)
  (const_int 2) (const_int 3)])))]

In 315r.reload 
 Considering alt=0 of insn 7:   (0) =v  (1) 0
1 Matching alt: reject+=2
  overall=8,losers=1,rld_nregs=1
 Considering alt=1 of insn 7:   (0) ?v  (1) v
Staticly defined alt reject+=6
  overall=0,losers=0,rld_nregs=0
  Choosing alt 1 in insn 7:  (0) ?v  (1) v {vec_extract_lo_v8sf}
and I tried to use !, alt=0 is still rejected.

And I even tried to modify
 (define_insn "vec_extract_lo_"
   [(set (match_operand: 0 "nonimmediate_operand" "=v")
 (vec_select:
   (match_operand:VI4F_256 1 "nonimmediate_operand" "0")
   (parallel [(const_int 0) (const_int 1)
  (const_int 2) (const_int 3)])))]

Although, vec_extract_lo_v8sf uses the same reg %xmm2, compiler will add an
extra insn "vmovaps %ymm0, %ymm2" after reload.

For the other hand, we tried to split the pattern to
  [(set (match_dup 0) (match_dup 1))]
{
   operands[1] = gen_lowpart (mode, operands[1]);
}
before reload. But GCC can't execute Register Coalescer like Clang.

[Bug middle-end/114700] middle-end optimization generates causes -fsanitize=undefined not to happen in some cases

2024-04-16 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114700

--- Comment #20 from Hu Lin  ---
Created attachment 57967
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57967&action=edit
A new version

When I tested this patch, I met another question. g++.dg/ubsan/vla-1.C will
raise a ICE without (TREE_TYPE (@2) == ssizetype at match.pd:3497.

In the original step, GCC generates the intermediate language using variables
declared in other blocks. Like _5 = _1 + 1, this led to the ICE in 022t.ssa. I
don't know if it is a bug, and I didn't find a test to raise this ICE on trunk.
So I add a condition to avoid this optimization in this case, any other
comments on my newly added conditions?



I paste some information that I think is important.

vla-1.C.005t.original:

The original line 12 is 
__builtin___ubsan_handle_vla_bound_not_positive (&*.Lubsan_data0, (unsigned
long) ((ssizetype) SAVE_EXPR ));

The current is
__builtin___ubsan_handle_vla_bound_not_positive (&*.Lubsan_data0, (unsigned
long) (((ssizetype) SAVE_EXPR  - 1) + 1));


vla-1.C.006t.gimple
original:
 22   i.0 = i;
 23   if (i.0 <= 0) goto ; else goto ;
 24   :
 25   _1 = (unsigned long) i.0;
 26   __builtin___ubsan_handle_vla_bound_not_positive (&*.Lubsan_data0,
_1);
 27   goto ;
 28   :
 29   :
 30   _2 = (ssizetype) i.0;
 31   _3 = _2 - 1;

current:
 22   i.0 = i;
 23   if (i.0 <= 0) goto ; else goto ;
 24   :
 25   _1 = (ssizetype) i.0;
 26   _2 = _1 - 1;
 27   _3 = _2 + 1;
 28   _4 = (unsigned long) _3;
 29   __builtin___ubsan_handle_vla_bound_not_positive (&*.Lubsan_data0,
_4);
 30   goto ;
 31   :
 32   :
 33   _5 = _1 - 1;


vla-1.C.015t.cfg
original:
 37   int i.0;
 38
 39:
 40   saved_stack.5 = __builtin_stack_save ();
 41   i.0 = i;
 42   if (i.0 <= 0)
 43 goto ; [INV]
 44   else
 45 goto ; [INV]
 46
 47:
 48   _1 = (unsigned long) i.0;
 49   __builtin___ubsan_handle_vla_bound_not_positive (&*.Lubsan_data0, _1);
 50
 51:
 52   _2 = (ssizetype) i.0;
 53   _3 = _2 - 1;

current:
 37   int i.0;
 38
 39:
 40   saved_stack.5 = __builtin_stack_save ();
 41   i.0 = i;
 42   if (i.0 <= 0)
 43 goto ; [INV]
 44   else
 45 goto ; [INV]
 46
 47:
 48   _1 = (ssizetype) i.0;
 49   _2 = _1 - 1;
 50   _3 = _2 + 1;
 51   _4 = (unsigned long) _3;
 52   __builtin___ubsan_handle_vla_bound_not_positive (&*.Lubsan_data0, _4);
 53
 54:
 55   _5 = _1 - 1;
 56   _6 = (sizetype) _5;
 57   D.3273 = _6;

[Bug middle-end/114700] middle-end optimization generates causes -fsanitize=undefined not to happen in some cases

2024-04-15 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114700

--- Comment #19 from Hu Lin  ---
(In reply to Jakub Jelinek from comment #18)
> (In reply to Hu Lin from comment #17)
> > (In reply to Jakub Jelinek from comment #16)
> > > 
> > > No, -ftrapv isn't a debugging tool.  There is no overflow in the 
> > > expression
> > > that GCC actually evaluates (into which the expression has been 
> > > optimized).
> > > If you have overflow in an expression that is never used, GCC with -ftrapv
> > > will also
> > > eliminate it as unused and won't diagnose the trap.
> > > -fsanitize=undefined behaves in that case actually the same with -O1 and
> > > higher (intentionally, to decrease the cost of the sanitization).  So, one
> > > needs to use -O0 -fsanitize=undefined to get as many cases of UB in the
> > > program diagnosed as possible.
> > 
> > OK, that look like GCC's -ftrapv is not the same as clang's. Then my added
> > condition should be (optimize || !TYPE_OVERFLOW_SANITIZED (type)). 
> 
> Why?  Just !TYPE_OVERFLOW_SANITIZED (type).
> 

OK, so the part is one of your suggestions on how to test UB in a program. 
I have another question, -fsanitize=undefined disable this optimization, but
you said -ftrapv won't diagnose the trap. Why is the logic here different for
these two options?

> 
> TYPE_OVERFLOW_SANITIZED is
> #define TYPE_OVERFLOW_SANITIZED(TYPE)   \
>   (INTEGRAL_TYPE_P (TYPE)   \
>&& !TYPE_OVERFLOW_WRAPS (TYPE)   \
>&& (flag_sanitize & SANITIZE_SI_OVERFLOW))
> so, it isn't true for non-integral types, nor for TYPE_OVERFLOW_WRAPS types.
> So, if you want to avoid the (view_convert (negate @1)), just add (if
> !TYPE_OVERFLOW_SANITIZED (type)) above the (view_convert (negate @1)).  But
> in each case, you want to be careful which exact type you want to check,
> type is the type of
> the outermost expression, otherwise TREE_TYPE (@0) etc.

Thanks for your advice.

[Bug middle-end/114700] middle-end optimization generates causes -fsanitize=undefined not to happen in some cases

2024-04-14 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114700

--- Comment #17 from Hu Lin  ---
(In reply to Jakub Jelinek from comment #16)
> (In reply to Hu Lin from comment #11)
> > I think it doesn't mean that's not a bug with -ftrapv, it should preserve
> > all overflow traps. Because it doesn't work, we use -fsanitize=undefined
> > instead of it.
> > 
> > refer: Gcc's trapv is known not always to work correctly.
> 
> No, -ftrapv isn't a debugging tool.  There is no overflow in the expression
> that GCC actually evaluates (into which the expression has been optimized).
> If you have overflow in an expression that is never used, GCC with -ftrapv
> will also
> eliminate it as unused and won't diagnose the trap.
> -fsanitize=undefined behaves in that case actually the same with -O1 and
> higher (intentionally, to decrease the cost of the sanitization).  So, one
> needs to use -O0 -fsanitize=undefined to get as many cases of UB in the
> program diagnosed as possible.

OK, that look like GCC's -ftrapv is not the same as clang's. Then my added
condition should be (optimize || !TYPE_OVERFLOW_SANITIZED (type)). 

> When a pattern already has one if, can't you just add that to the preexisting 
> if rather than adding yet another one.

I made a mistake on this line, it should be
+   (if (!TYPE_OVERFLOW_SANITIZED (type))
 (if (!ANY_INTEGRAL_TYPE_P (type)
 || TYPE_OVERFLOW_WRAPS (type))
  (negate (view_convert @1))
  (view_convert (negate @1

I can't just modify the preexisting if, the optimization shouldn't be used with
-fsanitize=undefined.

[Bug middle-end/114700] Front-end optimization generates wrong code with -fsanitize=undefined

2024-04-12 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114700

--- Comment #14 from Hu Lin  ---
Created attachment 57933
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57933&action=edit
Untested fix.

[Bug middle-end/114700] Front-end optimization generates wrong code with -fsanitize=undefined

2024-04-12 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114700

--- Comment #12 from Hu Lin  ---
(In reply to Hu Lin from comment #11)
> (In reply to Richard Biener from comment #9)
> > That that GCC doesn't promise that -ftrapv preserves all overflows and
> > traps, it merely guarantees that all overflows that actually happen trap. 
> > So GCC is fine to contract some expressions where the overall number of
> > overflows can only
> > decrease.
> > 
> > That's not a bug with -ftrapv.
> > 
> > It is considered a bug with -fsanitize=undefined though.
> 
> I think it doesn't mean that's not a bug with -ftrapv, it should preserve
> all overflow traps. Because it doesn't work, we use -fsanitize=undefined
> instead of it.
> 
> refer: Gcc's trapv is known not always to work correctly.
> 
> The current behavior is correct for -fsanitize=undefined, because the
> integer signed overflow is well-defined, so GCC can eliminate some
> variables. I just think GCC can optimize `z = c  - y  - c + a  + y - b` to
> `z = a - b`. But it doesn't mean is a bug for -fsanitize=undefined.

I was wrong about signed overflow, it's undefined behavior in c++20
(https://en.cppreference.com/w/cpp/language/ub). If I'm not mistaken about the
source, it's a bug.

[Bug middle-end/114700] Front-end optimization generates wrong code with -fsanitize=undefined

2024-04-11 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114700

--- Comment #11 from lin1.hu at intel dot com ---
(In reply to Richard Biener from comment #9)
> That that GCC doesn't promise that -ftrapv preserves all overflows and
> traps, it merely guarantees that all overflows that actually happen trap. 
> So GCC is fine to contract some expressions where the overall number of
> overflows can only
> decrease.
> 
> That's not a bug with -ftrapv.
> 
> It is considered a bug with -fsanitize=undefined though.

I think it doesn't mean that's not a bug with -ftrapv, it should preserve all
overflow traps. Because it doesn't work, we use -fsanitize=undefined instead of
it.

refer: Gcc's trapv is known not always to work correctly.

The current behavior is correct for -fsanitize=undefined, because the integer
signed overflow is well-defined, so GCC can eliminate some variables. I just
think GCC can optimize `z = c  - y  - c + a  + y - b` to `z = a - b`. But it
doesn't mean is a bug for -fsanitize=undefined.

[Bug middle-end/114700] Front-end optimization generates wrong code with -ftrapv.

2024-04-11 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114700

--- Comment #8 from lin1.hu at intel dot com ---
(In reply to Andrew Pinski from comment #6)
> Note `c  - y  - c` to become `-y` reduces the possible of an overflow and is
> well defined for wrapping so this might be still on purpose as there will
> never be an overflow that causes difference if assuming wrapping ...

Indeed, so for -fsanitize=undefined, `c  - y  - c` become `-y` is right. It's
just missing the optimization to turn `z = c  - y  - c + a  + y - b` into `z =
a - b`.

[Bug middle-end/114700] Front-end optimization generates wrong code with -ftrapv.

2024-04-11 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114700

--- Comment #7 from lin1.hu at intel dot com ---
(In reply to Andrew Pinski from comment #5)
> From match.pd:
>   /* Match patterns that allow contracting a plus-minus pair
>  irrespective of overflow issues.  */
>   /* (A +- B) - A   ->  +- B */
>   /* (A +- B) -+ B  ->  A */
>   /* A - (A +- B)   -> -+ B */
>   /* A +- (B -+ A)  ->  +- B */
>   (simplify
>(minus (nop_convert1? (plus:c (nop_convert2? @0) @1)) @0)
>(view_convert @1))
>   (simplify
>(minus (nop_convert1? (minus (nop_convert2? @0) @1)) @0)
>(if (!ANY_INTEGRAL_TYPE_P (type)
> || TYPE_OVERFLOW_WRAPS (type))
>(negate (view_convert @1))
>(view_convert (negate @1
> 
> Looks like missing a TYPE_OVERFLOW_SANITIZED check.

OK, this looks like the same reason why z = c - y + a - c is not optimized.

[Bug middle-end/114700] Front-end optimization generates wrong code with -ftrapv.

2024-04-11 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114700

--- Comment #3 from lin1.hu at intel dot com ---
(In reply to lin1.hu from comment #2)
> (In reply to Andrew Pinski from comment #1)
> > Gcc's trapv is known not always to work correctly.
> > 
> > Try -fsanitize=undefined instead.

Thanks, it solves the problem to some extent. But c is eliminated, I think c
- y may cause signed overflow.

https://godbolt.org/z/Wbzx5Edsj

[Bug middle-end/114700] Front-end optimization generates wrong code with -ftrapv.

2024-04-11 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114700

--- Comment #2 from lin1.hu at intel dot com ---
(In reply to Andrew Pinski from comment #1)
> Gcc's trapv is known not always to work correctly.
> 
> Try -fsanitize=undefined instead.

Thanks, it solves the problem to some extent. But c is eliminated, I think c -
y may cause signed overflow, c

https://godbolt.org/z/Wbzx5Edsj

[Bug c/114700] New: Front-end optimization generates wrong code with -ftrapv.

2024-04-11 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114700

Bug ID: 114700
   Summary: Front-end optimization generates wrong code with
-ftrapv.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: lin1.hu at intel dot com
  Target Milestone: ---

We test GCC vs Clang with -ftrapv, and test is 
z = c  - y  - c + a  + y - b;
https://godbolt.org/z/EW1xTsazG

We think the clang is right, the overflow judgment should be performed after
each operation. But the front-end generates a - b directly, looks like there's
a bug in the front-end's handling of -ftrapv.

-ftrapv
This option generates traps for signed overflow on addition, subtraction,
multiplication operations. The options -ftrapv and -fwrapv override each other,
so using -ftrapv -fwrapv on the command-line results in -fwrapv being
effective. Note that only active options override, so using -ftrapv -fwrapv
-fno-wrapv on the command-line results in -ftrapv being effective.


We have another question, we found the front-end won't optimize z = c - y + a -
c, while z = c - y - c + a will, is this for any particular reason or is it a
bug?

[Bug target/109117] "__builtin_ia32_vaesdec_v16qi" compiled only with option -mvaes report ICE.

2023-03-14 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109117

--- Comment #4 from lin1.hu at intel dot com ---
(In reply to lin1.hu from comment #2)
> Created attachment 54659 [details]
> 0001-i386-Add-missing-OPTION_MASK_ISA_AVX512VL-in-i386-bu.patch

Regtested on x86_64-pc-linux-gnu.

[Bug target/109117] "__builtin_ia32_vaesdec_v16qi" compiled only with option -mvaes report ICE.

2023-03-13 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109117

lin1.hu at intel dot com changed:

   What|Removed |Added

  Attachment #54659|No need AVX512VL for|0001-i386-Add-missing-OPTIO
description|256bit, so I modify the |N_MASK_ISA_AVX512VL-in-i386
   |original patch. |-bu.patch

--- Comment #3 from lin1.hu at intel dot com ---
Comment on attachment 54659
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54659
0001-i386-Add-missing-OPTION_MASK_ISA_AVX512VL-in-i386-bu.patch

No need AVX512VL for 256bit, so I modify the original patch.

[Bug target/109117] "__builtin_ia32_vaesdec_v16qi" compiled only with option -mvaes report ICE.

2023-03-13 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109117

--- Comment #2 from lin1.hu at intel dot com ---
Created attachment 54659
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54659&action=edit
No need AVX512VL for 256bit, so I modify the original patch.

[Bug target/109117] "__builtin_ia32_vaesdec_v16qi" compiled only with option -mvaes report ICE.

2023-03-13 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109117

--- Comment #1 from lin1.hu at intel dot com ---
Created attachment 54657
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54657&action=edit
Untested fix.

[Bug target/109117] New: "__builtin_ia32_vaesdec_v16qi" compiled only with option -mvaes report ICE.

2023-03-13 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109117

Bug ID: 109117
   Summary: "__builtin_ia32_vaesdec_v16qi" compiled only with
option -mvaes report ICE.
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: lin1.hu at intel dot com
  Target Milestone: ---

When the compiler compiles "__builtin_ia32_vaesdec_v16qi" with option
-mvaes,-no-avx512vl, it reports ICE.

The detail can refer to https://godbolt.org/z/fEGavbGWz.

Test-case:

typedef char __v16qi __attribute__ ((__vector_size__(16)));
typedef long long __m128i __attribute__((__vector_size__(16),
__aligned__(16)));
volatile __v16qi x, y;
volatile __m128i res;

void
foo (void)
{
res = (__m128i) __builtin_ia32_vaesdec_v16qi (x, y);
}

[Bug target/108881] New: "__builtin_ia32_cvtne2ps2bf16_v16hi" compiled only with option -mavx512bf16 report ICE.

2023-02-22 Thread lin1.hu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108881

Bug ID: 108881
   Summary: "__builtin_ia32_cvtne2ps2bf16_v16hi" compiled only
with option -mavx512bf16 report ICE.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: lin1.hu at intel dot com
  Target Milestone: ---

When the compiler compiles "__builtin_ia32_cvtne2ps2bf16_v16hi" with option
-mavx512bf16, it reports ICE.

The detail can refer to https://godbolt.org/z/fEGavbGWz.