from:"ubizjak at gmail dot com"

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-11 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #13 from Uroš Bizjak  ---
(In reply to Hongtao Liu from comment #12)
> short a;
> short c;
> short d;
> void
> foo (short b, short f)
> {
>   c = b + a;
>   d = f + a;
> }
> 
> foo(short, short):
> addwa(%rip), %di
> addwa(%rip), %si
> movw%di, c(%rip)
> movw%si, d(%rip)
> ret
> 
> this one is bad since gcc10.1 and there's no subreg, The problem is if the
> operand is used by more than 1 insn, and they all support separate m
> constraint, mem_cost is quite small(just 1, reg move cost is 2), and this
> makes RA more inclined to propagate memory across insns. I guess RA assumes
> the separate m means the insn only support memory_operand?

I don't see this as problematic. IIRC, there was a discussion in the past that
a couple (two?) memory accesses from the same location close to each other can
be faster (so, -O2, not -Os) than preloading the value to the register first.

In contrast, the example from the Comment #11 already has the correct value in
%eax, so there is no need to reload it again from memory, even in a narrower
mode.

[Bug rtl-optimization/112560] [14 Regression] ICE in try_combine on pr112494.c

2024-04-11 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112560

--- Comment #13 from Uroš Bizjak  ---
(In reply to Segher Boessenkool from comment #12)
> You cannot use a :CC value as argument of an unspec, as explained before.
> 
> The result of a comparison is expressed as a comparison, in RTL.  This patch
> allows malformed RTL in more places than before, not progress at all.

During our discussion we determined that this form with UNSPEC is actually a
copy operation, so it is not an use [1] of CC register, because "use" is in the
form of cc-compared-to-0.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647426.html

Now, let's see the part my patch fixes. The original change that introduced the
functionality (See Comment #1) updates the "use" of the CC register. It assumes
exclusively cc-compared-to-0 use, but there are several patterns in various
target .md files that use naked CC register. The "???" comment suggests that
the transformation tripped on this and thus the "verify the zero_rtx" was
bolted on. The zero_rtx is inherent part of the regular CC reg "use", so this
addition _mostly_ weeded out invalid access with naked CC reg.

If the CC reg is used as the source of copy operation ("move"), with or without
UNSPEC, then the unpatched compiler will blindly use:

SUBST (XEXP (*cc_use_loc, 0), newpat_dest);

which *assumes* the certain form of the changed expression. Failed assumption
will lead to memory corruption, and this is what my patch prevents.

Patched compiler will find the sole use of the naked CC reg (due to
find_single_use) in the RTX, and update its mode at the right place. If the new
mode is not recognized by the insn pattern, then the combination is rejected.

In any case, we are trading silent memory corruption with failed combine
attempt. In my rule book, this is "progress" with bold, capital letters.

[Bug rtl-optimization/112560] [14 Regression] ICE in try_combine on pr112494.c

2024-04-10 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112560

--- Comment #11 from Uroš Bizjak  ---
(In reply to Segher Boessenkool from comment #10)
> It is still wrong.  You're trying to sweep tour wrong assumptions under the
> rug,
> but they will only rear up elsewhere.  Just fix the actual *target* problem!

I can't see what could be wrong with:

(define_insn "@pushfl2"
  [(set (match_operand:W 0 "push_operand" "=<")
(unspec:W [(match_operand 1 "flags_reg_operand")]
  UNSPEC_PUSHFL))]
  "GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_CC"
  "pushf{}"
  [(set_attr "type" "push")
   (set_attr "mode" "")])

it is just a push of the flags reg to the stack. If the push can't be
described in this way, then it is the middle end at fault, we can't
just change modes at will.

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-10 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #10 from Uroš Bizjak  ---
(In reply to Hongtao Liu from comment #5)
> > My experience is memory cost for the operand with rm or separate r, m is
> > different which impacts RA decision.
> > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595573.html
> 
> Change operands[1] alternative 2 from m -> rm, then RA makes perfect
> decision.

Yes, I can confirm this oddity:

movlv1(%rip), %edx  # 5 [c=6 l=6]  *zero_extendsidi2/3
movq%rdx, v2(%rip)  # 16[c=4 l=7]  *movdi_internal/5
movq%rdx, %rax  # 18[c=4 l=3]  *movdi_internal/3
ret # 21[c=0 l=1]  simple_return_internal

But even there is room for improvement. The last move can be eliminated by
allocating %eax in the first instruction.

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-10 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #8 from Uroš Bizjak  ---
BTW: The reason for the original change:

 (define_insn "*movhi_internal"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k
,*r,*m,*k,?r,?v,*v,*v,*m")
-   (match_operand:HI 1 "general_operand"  "r
,rn,rm,rn,*r,*km,*k,*k,CBC,v, r, v, m, v"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand"
+"=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*v,*v,*v,m")
+   (match_operand:HI 1 "general_operand"
+"r ,n,m,rn,r ,*km,*k,*k,CBC,*v,r  ,C ,*v,m ,*v"))]

was that (r,r) overrides (r,rn) and (r,rm), so the later two can be changed
(without introducing any side effect) to (r,n) and (r,m), since (reg,reg) is
always matched by the (r,r) constraint. The different treatment of the changed
later two patterns is confusing at least.

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-10 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #7 from Uroš Bizjak  ---
(In reply to Hongtao Liu from comment #5)
> > My experience is memory cost for the operand with rm or separate r, m is
> > different which impacts RA decision.
> > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595573.html
> 
> Change operands[1] alternative 2 from m -> rm, then RA makes perfect
> decision.

Oh, you are also the author of the above patch ;)

Can you please take the issue from here and perhaps review other x86 patterns
for unoptimal constraints? I was always under impression that rm and separate
"r,m" are treated in the same way...

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-10 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #6 from Uroš Bizjak  ---
LRA starts with this:

5: r98:SI=[`v1']
  REG_EQUIV [`v1']
6: [`v2']=zero_extend(r98:SI)
7: r101:HI=r98:SI#0
  REG_DEAD r98:SI
   12: ax:HI=r101:HI
  REG_DEAD r101:HI
   13: use ax:HI

then decides that:

  Removing equiv init insn 5 (freq=1000)
5: r98:SI=[`v1']
  REG_EQUIV [`v1']

and substitutes all follow-up usages of r98 with a memory access. In insn 6, we
have:

(mem/c:SI (symbol_ref:DI ("v1")))

while in insn 7 we have:

(mem/c:HI (symbol_ref:DI ("v1")))

It looks that different modes of memory read confuse LRA to not CSE the read.

IMO, if the preloaded value is later accessed in different modes, LRA should
leave it. Alternatively, LRA should CSE memory accesses in different modes.

Cc LRA expert ... oh, he already is in the loop.

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-10 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #3 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #2)
> This changed with r12-5584-gca5667e867252db3c8642ee90f55427149cd92b6

Strange, if I revert the constraints to the previous setting with: 

--cut here--
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 10ae3113ae8..262dd25a8e0 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2870,9 +2870,9 @@ (define_peephole2

 (define_insn "*movhi_internal"
   [(set (match_operand:HI 0 "nonimmediate_operand"
-"=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*Yv,*v,*v,jm,m")
+"=r,r,r,m ,*k,*k ,*r ,*m ,*k ,?r,?v,*Yv,*v,*v,*jm,*m")
(match_operand:HI 1 "general_operand"
-"r ,n,m,rn,r ,*km,*k,*k,CBC,*v,r  ,C  ,*v,m ,*x,*v"))]
+"r ,n,m,rn,*r ,*km,*k,*k,CBC,v,r  ,C  ,v,m ,x,v"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))
&& ix86_hardreg_mov_ok (operands[0], operands[1])"
 {
--cut here--

I still get:

movlv1(%rip), %eax  # 6 [c=6 l=6]  *zero_extendsidi2/3
movq%rax, v2(%rip)  # 16[c=4 l=7]  *movdi_internal/5
movzwl  v1(%rip), %eax  # 7 [c=5 l=7]  *movhi_internal/2

[Bug rtl-optimization/112560] [14 Regression] ICE in try_combine on pr112494.c

2024-04-09 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112560

--- Comment #9 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #8)
> Fixed I suppose.

Yes - I plan backport the patch to at least gcc-13.

[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451

2024-04-09 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639

--- Comment #9 from Uroš Bizjak  ---
(In reply to Andrew Pinski from comment #2)
> /* If we didn't see a full return value copy, verify that there
>is a plausible reason for this.  If some, but not all of the
>return register is likely spilled, we can expect that there
>is a copy for the likely spilled part.  */

This part of the mode-switching pass is a real PITA. The trick here is with the
calculation of forced_late_switch (but please see N.b. comment at the beginning
of the function where some failed assumptions are described).

[Bug middle-end/114547] comparison than less than 0 (or greater or equal to than 0) after a subtraction does not use the flags regster

2024-04-02 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114547

Uroš Bizjak  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 CC||ubizjak at gmail dot com
   Last reconfirmed||2024-04-02
 Status|UNCONFIRMED |NEW

--- Comment #3 from Uroš Bizjak  ---
The similar testcase with comparison to zero:

int z(int *v, int n) {
*v -= n;
return *v == 0;
}

int nz(int *v, int n) {
*v -= n;
return *v != 0;
}

generates expected code:

z:
xorl%eax, %eax
subl%esi, (%rdi)
sete%al
ret

nz:
xorl%eax, %eax
subl%esi, (%rdi)
setne   %al
ret

The middle end expands via standard sequence:

(insn 10 9 11 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 83 [ _2 ])
(const_int 0 [0]))) "s.c":3:12 -1
 (nil))

(insn 11 10 12 (set (reg:QI 91)
(eq:QI (reg:CCZ 17 flags)
(const_int 0 [0]))) "s.c":3:12 -1
 (nil))

(insn 12 11 13 (set (reg:SI 90)
(zero_extend:SI (reg:QI 91))) "s.c":3:12 -1
 (nil))

(insn 13 12 14 (set (reg:SI 85 [  ])
(reg:SI 90)) "s.c":3:12 -1
 (nil))

OTOH, the sign compare is expanded via:

(insn 12 9 13 (parallel [
(set (reg:SI 90)
(lshiftrt:SI (reg:SI 83 [ _2 ])
(const_int 31 [0x1f])))
(clobber (reg:CC 17 flags))
]) "pr114547.c":4:12 -1
 (nil))

(insn 13 12 14 (set (reg:SI 85 [  ])
(reg:SI 90)) "pr114547.c":4:12 -1
 (nil))

(The above shift also interferes with RMW creation, resulting in unoptimal asm
sequence with two extra moves, and additional NEG insn in "ns" case.)

Middle-end expansion should avoid premature optimization in this case, at least
for targets that can merge sign comparison with the arith instruction.

[Bug target/114487] ICE when building libsdl2 on -mfpmath=sse x86 with LTO

2024-03-27 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114487

--- Comment #4 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #3)
> (In reply to Uroš Bizjak from comment #2)
> > Adding -msse to the second compilation works OK, removing -mfpmath=sse from
> > the first compilation also works OK.
> 
> Which makes this PR a LTO reincarnation of PR66047.

Please see the FIXME in ix86_function_sseregparm:

  /* Refuse to produce wrong code when local function with SSE enabled
 is called from SSE disabled function.
 FIXME: We need a way to detect these cases cross-ltrans partition
 and avoid using SSE calling conventions on local functions called
 from function with SSE disabled.  For now at least delay the
 warning until we know we are going to produce wrong code.
 See PR66047  */

[Bug target/114487] ICE when building libsdl2 on -mfpmath=sse x86 with LTO

2024-03-27 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114487

--- Comment #3 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #2)
> Adding -msse to the second compilation works OK, removing -mfpmath=sse from
> the first compilation also works OK.

Which makes this PR a LTO reincarnation of PR66047.

[Bug target/114487] ICE when building libsdl2 on -mfpmath=sse x86 with LTO

2024-03-27 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114487

--- Comment #2 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #1)

> (insn 6 5 0 (set (reg/v:SF 99 [ gamma ])
> (reg:SF 20 xmm0)) "testautomation-testautomation_pixels.i":15:17 -1
>  (nil))
> 
> I'm not sure what's wrong - looks like a target issue to me.

We are working with SFmode, so -msse is enough to trigger the bug.

This is known issue. GCC assumes that at least moves of all hard registers are
working, which is not the case when LTO-compiling
testautomation-testautomation_pixels.i with SDL_test_fuzzer.o (that enables and
uses XMM registers via -msse -mfpmath=sse).

It looks to me that the compiler hits this part in function_value_32 when
LTO-compiling:

--cut here--
  /* Override FP return register with %xmm0 for local functions when
 SSE math is enabled or for functions with sseregparm attribute.  */
  if ((fn || fntype) && (mode == SFmode || mode == DFmode))
{
  int sse_level = ix86_function_sseregparm (fntype, fn, false);
  if (sse_level == -1)
{
  error ("calling %qD with SSE calling convention without "
 "SSE/SSE2 enabled", fn);
  sorry ("this is a GCC bug that can be worked around by adding "
 "attribute used to function called");
}
  else if ((sse_level >= 1 && mode == SFmode)
   || (sse_level == 2 && mode == DFmode))
regno = FIRST_SSE_REG;
}
--cut here--

Adding -msse to the second compilation works OK, removing -mfpmath=sse from the
first compilation also works OK.

[Bug sanitizer/111736] Address sanitizer is not compatible with named address spaces

2024-03-25 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736

--- Comment #29 from Uroš Bizjak  ---
Do we also need to adjust TSAN? There is a bugreport that KCSAN does work
correctly with the named address spaces.(In reply to Jakub Jelinek from comment
#28)
> Created attachment 57807 [details]
> gcc14-pr111736-tsan.patch
> 
> Untested patch for tsan.

Yes, this patch fixes the failure for linux kernel.

[Bug sanitizer/111736] Address sanitizer is not compatible with named address spaces

2024-03-25 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736

--- Comment #26 from Uroš Bizjak  ---
Do we also need to adjust TSAN? There is a bugreport that KCSAN does work
correctly with the named address spaces.

[Bug sanitizer/111736] Address sanitizer is not compatible with named address spaces

2024-03-21 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736

--- Comment #19 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #16)
> (In reply to Richard Biener from comment #13)
> > The original testcase is fixed, appearantly slapping 'extern' on the int
> > makes it not effective.
> > 
> > Possibly better amend the
> > 
> >   if (VAR_P (inner) && DECL_HARD_REGISTER (inner))
> > return;
> > 
> > check though.  As indicated my fix fixed only VAR_DECL cases, there's
> > still pointer-based accesses (MEM_REF) to consider.  So possibly even
> > the following is necessary
> 
> I must admit that to create the patch from Comment #11 I just mindlessly
> searched for DECL_THREAD_LOCAL_P in asan.cc and amended the location with
> ADDR_SPACE_GENERIC_P check.
> 
> However, ASAN should back off from annotating *any* gs: prefixed address. 
> 
> I'll test your patch from Comment #13 ASAP.

Weee, it works!

Decompressing Linux... Parsing ELF... Performing relocations... done.
Booting the kernel (entry_offset: 0x).
[0.00] Linux version 6.8.0-11485-ge1826833c3a9 (uros@localhost) (xgcc
(GCC) 14.0.1 20240321 (experimental) [master r14-9588-g415091f0909], GNU ld
version 2.40-14.fc39) #1 SMP PREEMPT_DYNAMIC Thu Mar 21 09:44:30 CET 2024
...

I have used slightly different patch:

--cut here--
diff --git a/gcc/asan.cc b/gcc/asan.cc
index cfe83106460..026d079a4a1 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -2755,6 +2755,9 @@ instrument_derefs (gimple_stmt_iterator *iter, tree t,
   if (VAR_P (inner) && DECL_HARD_REGISTER (inner))
 return;

+  if (!ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (TREE_TYPE (inner
+return;
+
   poly_int64 decl_size;
   if ((VAR_P (inner)
|| (TREE_CODE (inner) == RESULT_DECL
--cut here--

Hard registers and named address spaces really have nothing in common.

IMO, the fixes here should be applied to all release branches. Running KASAN
sanitized kernel with the named AS is the ultimate test for this PR.

[Bug sanitizer/111736] Address sanitizer is not compatible with named address spaces

2024-03-21 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736

--- Comment #16 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #13)
> The original testcase is fixed, appearantly slapping 'extern' on the int
> makes it not effective.
> 
> Possibly better amend the
> 
>   if (VAR_P (inner) && DECL_HARD_REGISTER (inner))
> return;
> 
> check though.  As indicated my fix fixed only VAR_DECL cases, there's
> still pointer-based accesses (MEM_REF) to consider.  So possibly even
> the following is necessary

I must admit that to create the patch from Comment #11 I just mindlessly
searched for DECL_THREAD_LOCAL_P in asan.cc and amended the location with
ADDR_SPACE_GENERIC_P check.

However, ASAN should back off from annotating *any* gs: prefixed address. 

I'll test your patch from Comment #13 ASAP.

[Bug sanitizer/111736] Address sanitizer is not compatible with named address spaces

2024-03-21 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736

Uroš Bizjak  changed:

   What|Removed |Added

   Priority|P3  |P1

--- Comment #12 from Uroš Bizjak  ---
P1, I would *really* like this PR to be fixed for gcc-14, the new linux kernel
support for named address spaces depend on this fix.

[Bug sanitizer/111736] Address sanitizer is not compatible with named address spaces

2024-03-20 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736

--- Comment #11 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #10)
> Huh, is this really fixed?

IMO, this patch is also needed:

--cut here--
diff --git a/gcc/asan.cc b/gcc/asan.cc
index cfe83106460..54dcc3a38db 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -2764,7 +2764,9 @@ instrument_derefs (gimple_stmt_iterator *iter, tree t,
   && poly_int_tree_p (DECL_SIZE (inner), _size)
   && known_subrange_p (bitpos, bitsize, 0, decl_size))
 {
-  if (VAR_P (inner) && DECL_THREAD_LOCAL_P (inner))
+  if (VAR_P (inner)
+ && (DECL_THREAD_LOCAL_P (inner)
+ || !ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (TREE_TYPE (inner)
return;
   /* If we're not sanitizing globals and we can tell statically that this
 access is inside a global variable, then there's no point adding
--cut here--

But unfortunately, it doesn't result in bootable kernel.

[Bug sanitizer/111736] Address sanitizer is not compatible with named address spaces

2024-03-20 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736

Uroš Bizjak  changed:

   What|Removed |Added

 Resolution|FIXED   |---
 Status|RESOLVED|REOPENED

--- Comment #10 from Uroš Bizjak  ---
Huh, is this really fixed?

--cut here--
extern int __seg_gs m;

int foo (void)
{
  return m;
}

extern __thread int n;

int bar (void)
{
  return n;
}

extern int o;

int baz (void)
{
  return o;
}
--cut here--

gcc -O2 -fsanitize=address:


foo:
.LASANPC0:
.LFB0:
.cfi_startproc
movl$m, %eax
movq%rax, %rdx
andl$7, %eax
shrq$3, %rdx
addl$3, %eax
movzbl  2147450880(%rdx), %edx
cmpb%dl, %al
jl  .L2
testb   %dl, %dl
jne .L13
.L2:
movl%gs:m(%rip), %eax
ret
.L13:
pushq   %rax
.cfi_def_cfa_offset 16
movl$m, %edi
call__asan_report_load4
.cfi_endproc
.LFE0:
.size   foo, .-foo
.p2align 4
.globl  bar
.type   bar, @function

The memory access is still annotated with asan code.

I did test patched gcc by building a kernel with named address spaces, but I'm
not sure I did it correctly anymore - I was not able to boot recent -tip with
KASAN and enabled named address spaces.

[Bug target/111822] [12/13/14 Regression] during RTL pass: lr_shrinkage ICE: in operator[], at vec.h:910 with -O2 -m32 -flive-range-shrinkage -fno-dce -fnon-call-exceptions since r12-5301-g04520645038

2024-03-19 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111822

--- Comment #22 from Uroš Bizjak  ---
Fixed also for TImode STV.

[Bug target/111822] [12/13/14 Regression] during RTL pass: lr_shrinkage ICE: in operator[], at vec.h:910 with -O2 -m32 -flive-range-shrinkage -fno-dce -fnon-call-exceptions since r12-5301-g04520645038

2024-03-18 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111822

--- Comment #18 from Uroš Bizjak  ---
When we split
(insn 37 36 38 10 (set (reg:DI 104 [ _18 ])
(mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 MEM[(struct
SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 A32])) "test.C":22:42 84
{*movdi_internal}
 (expr_list:REG_EH_REGION (const_int -11 [0xfff5])

into

(insn 104 36 37 10 (set (subreg:V2DI (reg:DI 124) 0)
(vec_concat:V2DI (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6
MEM[(struct SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 A32])
(const_int 0 [0]))) "test.C":22:42 -1
(nil)))
(insn 37 104 105 10 (set (subreg:V2DI (reg:DI 104 [ _18 ]) 0)
(subreg:V2DI (reg:DI 124) 0)) "test.C":22:42 2024 {movv2di_internal}
 (expr_list:REG_EH_REGION (const_int -11 [0xfff5])
(nil)))

we must copy the REG_EH_REGION note to the first insn and split the block
after the newly added insn.  The REG_EH_REGION on the second insn will be
removed later since it no longer traps.

gcc/ChangeLog:

* config/i386/i386-features.cc
(general_scalar_chain::convert_op): Handle REG_EH_REGION note.
(convert_scalars_to_vector): Ditto.
* config/i386/i386-features.h (class scalar_chain): New
memeber control_flow_insns.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr111822.C: New test.

[Bug testsuite/40130] using RUNTESTFLAGS="--target_board 'foo{-mxyz,-mjkl,}'" screws up ieee.exp (and possibly others?)

2024-03-18 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40130

Uroš Bizjak  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |4.5.0

--- Comment #11 from Uroš Bizjak  ---
Fixed long time ago.

[Bug rtl-optimization/112560] [14 Regression] ICE in try_combine on pr112494.c

2024-03-12 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112560

--- Comment #6 from Uroš Bizjak  ---
v3 patch at [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647634.html

[Bug rtl-optimization/112560] [14 Regression] ICE in try_combine on pr112494.c

2024-03-11 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112560

Uroš Bizjak  changed:

   What|Removed |Added

  Attachment #56705|0   |1
is obsolete||

--- Comment #5 from Uroš Bizjak  ---
Created attachment 57666
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57666=edit
Proposed v2 patch

New version of patch in testing.

This version handles change of the mode of CC reg outside comparison. Now we
scan the RTX and change the mode of the CC reg at the proper place. We are
guaranteed by find_single_use that cc_use_loc can be non-NULL only when exactly
one user follows the combined comparison.

In case of unsupported cc_use_insn combine will be undone. To avoid combine
failure, pushfl2 in i386.md is changed to accept all MODE_CC modes.

[Bug target/111822] [12/13/14 Regression] during RTL pass: lr_shrinkage ICE: in operator[], at vec.h:910 with -O2 -m32 -flive-range-shrinkage -fno-dce -fnon-call-exceptions since r12-5301-g04520645038

2024-03-08 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111822

--- Comment #13 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #12)

> > But I think, we could do better. Adding CC.
> 
> We sure could, but I doubt it's too important?  Maybe for Go/Ada.

Preloading stuff is simply loading from the same DImode address, so I'd think
that EH_NOTE should be moved from the original insn to the new insn without
much problems.

Please note that on x86_32 split pass is later splitting DImode memory access
to two SImode loads, this looks somehow harder problem as far as EH notes are
concerned, as the one above.

I'm not versed in this area, so I'll leave the fix to someone else.

[Bug target/111822] [12/13/14 Regression] during RTL pass: lr_shrinkage ICE: in operator[], at vec.h:910 with -O2 -m32 -flive-range-shrinkage -fno-dce -fnon-call-exceptions since r12-5301-g04520645038

2024-03-08 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111822

Uroš Bizjak  changed:

   What|Removed |Added

 CC||liuhongt at gcc dot gnu.org

--- Comment #11 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #10)
> The easiest fix would be to refuse applying STV to a insn that
> can_throw_internal () (that's an insn that has associated EH info).  Updating
> in this case would require splitting the BB or at least moving the now
> no longer throwing insn to the next block (along the fallthru edge).

This would be simply:

--cut here--
diff --git a/gcc/config/i386/i386-features.cc
b/gcc/config/i386/i386-features.cc
index 1de2a07ed75..90acb33db49 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -437,6 +437,10 @@ scalar_chain::add_insn (bitmap candidates, unsigned int
insn_uid,
   && !HARD_REGISTER_P (SET_DEST (def_set)))
 bitmap_set_bit (defs, REGNO (SET_DEST (def_set)));

+  if (cfun->can_throw_non_call_exceptions
+  && can_throw_internal (insn))
+return false;
+
   /* ???  The following is quadratic since analyze_register_chain
  iterates over all refs to look for dual-mode regs.  Instead this
  should be done separately for all regs mentioned in the chain once.  */
--cut here--

But I think, we could do better. Adding CC.

[Bug target/111822] [12/13/14 Regression] during RTL pass: lr_shrinkage ICE: in operator[], at vec.h:910 with -O2 -m32 -flive-range-shrinkage -fno-dce -fnon-call-exceptions since r12-5301-g04520645038

2024-03-08 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111822

--- Comment #9 from Uroš Bizjak  ---
The offending insn is emitted in general_scalar_chain::convert_op due to
preloading, but I have no idea how EH should be adjusted.  There is another
instance in timode_scalar_chain::convert_op.

emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
   gen_gpr_to_xmm_move_src (vmode, *op)),
  insn);

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-06 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

Uroš Bizjak  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
 Target||i686

--- Comment #29 from Uroš Bizjak  ---
Fixed.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-06 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

Uroš Bizjak  changed:

   What|Removed |Added

  Attachment #57614|0   |1
is obsolete||

--- Comment #27 from Uroš Bizjak  ---
Created attachment 57626
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57626=edit
Proposed v2 patch

New version in testing:
  - uses optimize_size to stabilize optab discovery
  - explicitly enables insn pattern for TARGET_SSE2

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #25 from Uroš Bizjak  ---
Created attachment 57614
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57614=edit
Proposed patch

Proposed patch that changes optimize_function_for_size_p (cfun) to
optimize_size.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #23 from Uroš Bizjak  ---
(In reply to Jan Hubicka from comment #21)
> Looking at the prototype patch, why need to change also the splitters?

Purely for implementation reasons, we check for general resp. SSE register in
the operand predicates, so there is no need to use additional insn constraints.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #19 from Uroš Bizjak  ---
(In reply to Jan Hubicka from comment #18)
> But the problem here is more that optab initializations happens only at
> the optimization_node changes and not if we switch from hot function to
> cold?

I think solving optab init problem is a better solution than the target patch
from comment #10. Using optimize_function_for_size_p in named pattern predicate
would avoid using the non-optimal pattern also in cold functions, and would be
preferrable to using optimize_size.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #16 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #15)

> Seems various backends use e.g. optimize_size or !optimize_size or optimize
> > 0 etc. in
> insn-flags.h, so perhaps change optimize_function_for_size_p (cfun) to
> optimize_size?

Won't this cause a mismatch with insn patterns we split to? As said in Comment
#13, these have insn predicates that include optimize_function_for_size_p.

[Bug rtl-optimization/114211] [13 Regression] wrong code with -O -fno-tree-coalesce-vars since r13-1907

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114211

--- Comment #9 from Uroš Bizjak  ---
Noticed this in passing:

--> movq%rcx, %rdx
addqv(%rip), %rax
adcqv+8(%rip), %rdx
vmovq   %rax, %xmm1
vpinsrq $1, %rdx, %xmm1, %xmm0

We could use %rcx instead of %rdx to eliminate the marked move. This is an
artefact of post-reload split of *add3_doubleword. The _doubleword
patterns use only general regs, so they could be split before reload as well.
IIRC, there were some minor RA problems with the later approach, but as years
passed and LRA improved, perhaps the split point can be moved before reload.

Something to try (again) for gcc-15. It is just the case of using
ix86_pre_reload_split instead of reload_completed.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #13 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #12)
> Still, it would be nice to understand what changed
> optimize_function_for_size_p (cfun)
> after IPA.  Is something adjusting node->count or node->frequency?
> Otherwise it should just depend on the optimize_size flag which should not
> change...

The target-dependent issue is with insn patterns we split to. These are enabled
with "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)", as
they are intended to be combined from movstrict insn pattern (which can FAIL).

So, the condition for V2QI insn is now either !TARGET_PARTIAL_REG_STALL or
TARGET_SSE2 (where we disable alternative 0 for TARGET_PARTIAL_REG_STALL
targets, but we still emit SSE instruction).

Please note that while -march=i686 enables TARGET_PARTIAL_REG_STALL, it enables
SSE2, so the proposed solution won't have much impact. Also, V2QImode is not
much used, so I guess the proposed solution is the right compromise.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

Uroš Bizjak  changed:

   What|Removed |Added

   Last reconfirmed||2024-03-05
   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1

--- Comment #10 from Uroš Bizjak  ---
Created attachment 57612
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57612=edit
Prototype patch

Let's try this approach.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #9 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #8)
> > grep optimize_ insn-flags.h | wc -l
> 14
> 
> so it's not very many standard patterns that would be affected.  I'd say
> using these kind of flags on standard patterns is at least fragile?

You are right, only recently added neg, add, sub, ashl, lshr, ashr V2QImode
standard patterns have optimize_function_for_size_p in their condition.

Any suggestion on how to fix them?

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #5 from Uroš Bizjak  ---
Huh, it looks that optimize_function_for_size_p (cfun) is not stable during
LTO?!

Using:

--cut here--
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 2856ae6ffef..80114494b0b 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -2975,7 +2975,7 @@ (define_insn "v2qi3"
  (match_operand:V2QI 1 "register_operand" "0,0,Yw")
  (match_operand:V2QI 2 "register_operand" "Q,x,Yw")))
(clobber (reg:CC FLAGS_REG))]
-  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
+  "!TARGET_PARTIAL_REG_STALL"
   "#"
   [(set_attr "isa" "*,sse2_noavx,avx")
(set_attr "type" "multi,sseadd,sseadd")
@@ -2987,7 +2987,7 @@ (define_split
  (match_operand:V2QI 1 "general_reg_operand")
  (match_operand:V2QI 2 "general_reg_operand")))
(clobber (reg:CC FLAGS_REG))]
-  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
+  "!TARGET_PARTIAL_REG_STALL
&& reload_completed"
   [(parallel
  [(set (strict_low_part (match_dup 0))
@@ -3021,7 +3021,7 @@ (define_split
  (match_operand:V2QI 1 "sse_reg_operand")
  (match_operand:V2QI 2 "sse_reg_operand")))
(clobber (reg:CC FLAGS_REG))]
-  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
+  "!TARGET_PARTIAL_REG_STALL
&& TARGET_SSE2 && reload_completed"
   [(set (match_dup 0)
 (plusminus:V16QI (match_dup 1) (match_dup 2)))]
--cut here--

So, removing optimize-size bypass successfully compiles the testcase.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #4 from Uroš Bizjak  ---
(In reply to Sam James from comment #0)

> (insn 160 159 161 26 (parallel [
> (set (reg:V2QI 250 [ vect_patt_207.470_183 ])
> (minus:V2QI (reg:V2QI 251)
> (reg:V2QI 249 [ vect__4.468_451 ])))
> (clobber (reg:CC 17 flags))
> ])

This is the definition of the offending pattern in mmx.md:

(define_insn "v2qi3"
  [(set (match_operand:V2QI 0 "register_operand" "=?Q,x,Yw")
(plusminus:V2QI
  (match_operand:V2QI 1 "register_operand" "0,0,Yw")
  (match_operand:V2QI 2 "register_operand" "Q,x,Yw")))
   (clobber (reg:CC FLAGS_REG))]
  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
  "#"
  [(set_attr "isa" "*,sse2_noavx,avx")
   (set_attr "type" "multi,sseadd,sseadd")
   (set_attr "mode" "QI,TI,TI")])

where -march=i686 (aka pentiumpro) implies TARGET_PARTIAL_REG_STALL, so it
should be disabled unless optimizing for size. I wonder if
optimize_function_for_size_p is stable during the LTO compilation, but we have
plenty of usages like the above throughout x86  .md files and there were no
problems reported.

Another possibility is that the instruction RTX is emitted without checking of
the pattern condition, the testcase is compiled with -O3, so
optimize_function_for_size_p should also be false.

I don't see anything wrong with the above pattern. The failure also happens
very early into the RTL part of the compilation (vregs pass is the first pass
that tries to recognize the pattern), so my bet is on middle-end emitting insn
pattern without checking for pattern availability.

[Bug rtl-optimization/114211] [13/14 Regression] wrong code with -O -fno-tree-coalesce-vars

2024-03-04 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114211

Uroš Bizjak  changed:

   What|Removed |Added

  Component|target  |rtl-optimization
   Keywords|needs-bisection |

--- Comment #3 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #2)
> Possibly target independent rtl-optimization issue.

It is _subreg1 pass that converts:

(insn 10 7 11 2 (set (reg/v:TI 106 [ h ])
(rotate:TI (reg/v:TI 106 [ h ])
(const_int 64 [0x40]))) "pr114211.c":9:5 1042
{rotl64ti2_doubleword}
 (nil))

to:

(insn 39 7 40 2 (set (reg:DI 128 [ h+8 ])
(reg:DI 127 [ h ])) "pr114211.c":9:5 84 {*movdi_internal}
 (nil))
(insn 40 39 11 2 (set (reg:DI 127 [ h ])
(reg:DI 128 [ h+8 ])) "pr114211.c":9:5 84 {*movdi_internal}
 (nil))

Well... this won't swap. Either parallel should be emitted, or a temporary
should be used.

Adding -fno-split-wide-types fixes the testcase.

Re-confirmed as rtl-optimization problem.

[Bug target/113720] [14 Regression] internal compiler error: in extract_insn, at recog.cc:2812 targeting alpha-linux-gnu

2024-03-03 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113720

Uroš Bizjak  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com
 Status|NEW |RESOLVED

--- Comment #8 from Uroš Bizjak  ---
Assuming fixed, please reopen if not.

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-14 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #56 from Uroš Bizjak  ---
The testcase is fixed with g:430c772be3382134886db33133ed466c02efc71c

[Bug target/113871] psrlq is not used for PERM

2024-02-14 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113871

Uroš Bizjak  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |14.0
 Status|ASSIGNED|RESOLVED

--- Comment #7 from Uroš Bizjak  ---
Implemented for gcc-14.

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-14 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #55 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #53)
> Comment on attachment 57424 [details]
> Proposed testsuite patch
> 
> As skylake-avx512 is -mavx512{f,cd,bw,dq,vl}, requiring just avx512f
> effective target and testing it at runtime IMHO isn't enough.
> For dg-do run testcases I really think we should avoid those -march=
> options, because it means a lot of other stuff, BMI, LZCNT, ...

I think that addition of

+# if defined(__AVX512VL__)
+want_level = 7, want_b = bit_AVX512VL;
+# elif defined(__AVX512F__)
+want_level = 7, want_b = bit_AVX512F;
+# elif defined(__AVX2__)

to check_vect solves all current uses in gcc.dg/vect

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-14 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #52 from Uroš Bizjak  ---
Created attachment 57424
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57424=edit
Proposed testsuite patch

This patch fixes the failure for me (+ some other dg.exp/vect inconsistencies).

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-14 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #48 from Uroš Bizjak  ---
The runtime testcase fails on non-AVX512F x86 targets due to:

/* { dg-do run } */
/* { dg-options "-O3" } */
/* { dg-additional-options "-march=skylake-avx512" { target { x86_64-*-*
i?86-*-* } } } */

but check_vect() only checks runtime support up to AVX2.

[Bug target/113871] psrlq is not used for PERM

2024-02-14 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113871

Uroš Bizjak  changed:

   What|Removed |Added

  Attachment #57417|0   |1
is obsolete||

--- Comment #5 from Uroš Bizjak  ---
Created attachment 57419
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57419=edit
Proposed v2 patch

New version in testing, also handles 32-bit vectors.

[Bug target/113871] psrlq is not used for PERM

2024-02-13 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113871

Uroš Bizjak  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com
 Status|NEW |ASSIGNED

--- Comment #4 from Uroš Bizjak  ---
Created attachment 57417
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57417=edit
Proposed patch

Patch in testing.

[Bug target/113720] [14 Regression] internal compiler error: in extract_insn, at recog.cc:2812 targeting alpha-linux-gnu

2024-02-03 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113720

--- Comment #5 from Uroš Bizjak  ---
(In reply to Matthias Klose from comment #4)
> Uros proposed patch lets the build succeed.

FTR, the problem was in umuldi3_highpart expander, which did:

   if (REG_P (operands[2]))
 operands[2] = gen_rtx_ZERO_EXTEND (TImode, operands[2]);

on register_operand predicate, which also allows SUBREG RTX. So, subregs were
emitted without ZERO_EXTEND RTX.

But nowadays we have UMUL_HIGHPART that allows us to fix this issue while also
simplifying the instruction RTX.

Matthias, can you please run the regression check - The fix is kind of obvious,
but just to be sure.

[Bug target/113720] [14 Regression] internal compiler error: in extract_insn, at recog.cc:2812 targeting alpha-linux-gnu

2024-02-02 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113720

--- Comment #1 from Uroš Bizjak  ---
Created attachment 57292
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57292=edit
Patch that introduces umul_highpart RTX

Please try the attached (untested) patch.

[Bug target/82580] Optimize comparisons for __int128 on x86-64

2024-02-01 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82580

Uroš Bizjak  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #18 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #17)

> A different regression happens with pr82580.c, f0 function. Without the
> patch, the compiler generates:

Fixed by r14-8713-g44764984cf24e27cf7756cffd197283b9c62db8b

[Bug target/113701] Issues with __int128 argument passing

2024-02-01 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113701

--- Comment #4 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #2)

> > The most problematic function is f3, which regressed noticeably from 
> > gcc-12.3:
> 
> This patch solves the regression:
> 
> --cut here--
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index bac0a6ade67..02fed16db72 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -1632,11 +1632,6 @@ (define_insn_and_split "*cmp_doubleword"
>   (set (match_dup 4) (ior:DWIH (match_dup 4) (match_dup 5)))])]
>  {
>split_double_mode (mode, [0], 2, [0],
> [2]);
> -  /* Placing the SUBREG pieces in pseudos helps reload.  */
> -  for (int i = 0; i < 4; i++)
> -if (SUBREG_P (operands[i]))
> -  operands[i] = force_reg (mode, operands[i]);
> -
>operands[4] = gen_reg_rtx (mode);
>  
>/* Special case comparisons against -1.  */
> --cut here--

Roger, this part was added in [1]. Does this code improve some real reload
issue, not covered in the testsuite, because testsuite results do not show any
regression if the above code is removed.

BTW: The issue was noticed in gcc.target/i386/pr82580.c test case.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595860.html

[Bug target/113701] Issues with __int128 argument passing

2024-02-01 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113701

--- Comment #2 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #0)
> Following testcase:
> 
> --cut here--
> typedef unsigned __int128 U;
> 
> U f0 (U x, U y) { return x + y; }
> U f1 (U x, U y) { return x - y; }
> 
> U f2 (U x, U y) { return x | y; }
> 
> int f3 (U x, U y) { return x == y; }
> int f4 (U x, U y) { return x < y; }
> --cut here--
> 
> shows some issues with __int128 parameter passing.
> 
> gcc -O2:
> 
> f0:
> movq%rdx, %rax
> movq%rcx, %rdx
> addq%rdi, %rax
> adcq%rsi, %rdx
> ret
> 
> f1:
> xchgq   %rdi, %rsi
> movq%rdx, %r8
> movq%rsi, %rax
> movq%rdi, %rdx
> subq%r8, %rax
> sbbq%rcx, %rdx
> ret
> 
> f2:
> xchgq   %rdi, %rsi
> movq%rdx, %rax
> movq%rcx, %rdx
> orq %rsi, %rax
> orq %rdi, %rdx
> ret
> 
> f3:
> xchgq   %rdi, %rsi
> movq%rdx, %r8
> movq%rcx, %rax
> movq%rsi, %rdx
> movq%rdi, %rcx
> xorq%rax, %rcx
> xorq%r8, %rdx
> xorl%eax, %eax
> orq %rcx, %rdx
> sete%al
> ret
> 
> f4:
> xorl%eax, %eax
> cmpq%rdx, %rdi
> sbbq%rcx, %rsi
> setc%al
> ret
> 
> Functions f0 and f4 are now optimal.
> 
> Functions f1, f2 and f3 emit extra XCHG, but the swap should be propagated
> to MOV instructions instead.
> 
> The most problematic function is f3, which regressed noticeably from gcc-12.3:

This patch solves the regression:

--cut here--
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index bac0a6ade67..02fed16db72 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1632,11 +1632,6 @@ (define_insn_and_split "*cmp_doubleword"
  (set (match_dup 4) (ior:DWIH (match_dup 4) (match_dup 5)))])]
 {
   split_double_mode (mode, [0], 2, [0], [2]);
-  /* Placing the SUBREG pieces in pseudos helps reload.  */
-  for (int i = 0; i < 4; i++)
-if (SUBREG_P (operands[i]))
-  operands[i] = force_reg (mode, operands[i]);
-
   operands[4] = gen_reg_rtx (mode);

   /* Special case comparisons against -1.  */
--cut here--

gcc -O2:

f3:
xchgq   %rdi, %rsi
xorl%eax, %eax
xorq%rsi, %rdx
xorq%rdi, %rcx
orq %rcx, %rdx
sete%al
ret

[Bug target/113701] Issues with __int128 argument passing

2024-02-01 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113701

Uroš Bizjak  changed:

   What|Removed |Added

 CC||roger at nextmovesoftware dot 
com

--- Comment #1 from Uroš Bizjak  ---
CC added.

[Bug target/113701] New: Issues with __int128 argument passing

2024-02-01 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113701

Bug ID: 113701
   Summary: Issues with __int128 argument passing
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ubizjak at gmail dot com
  Target Milestone: ---

Following testcase:

--cut here--
typedef unsigned __int128 U;

U f0 (U x, U y) { return x + y; }
U f1 (U x, U y) { return x - y; }

U f2 (U x, U y) { return x | y; }

int f3 (U x, U y) { return x == y; }
int f4 (U x, U y) { return x < y; }
--cut here--

shows some issues with __int128 parameter passing.

gcc -O2:

f0:
movq%rdx, %rax
movq%rcx, %rdx
addq%rdi, %rax
adcq%rsi, %rdx
ret

f1:
xchgq   %rdi, %rsi
movq%rdx, %r8
movq%rsi, %rax
movq%rdi, %rdx
subq%r8, %rax
sbbq%rcx, %rdx
ret

f2:
xchgq   %rdi, %rsi
movq%rdx, %rax
movq%rcx, %rdx
orq %rsi, %rax
orq %rdi, %rdx
ret

f3:
xchgq   %rdi, %rsi
movq%rdx, %r8
movq%rcx, %rax
movq%rsi, %rdx
movq%rdi, %rcx
xorq%rax, %rcx
xorq%r8, %rdx
xorl%eax, %eax
orq %rcx, %rdx
sete%al
ret

f4:
xorl%eax, %eax
cmpq%rdx, %rdi
sbbq%rcx, %rsi
setc%al
ret

Functions f0 and f4 are now optimal.

Functions f1, f2 and f3 emit extra XCHG, but the swap should be propagated to
MOV instructions instead.

The most problematic function is f3, which regressed noticeably from gcc-12.3:

f3:
xorq%rdx, %rdi
xorq%rcx, %rsi
xorl%eax, %eax
orq %rsi, %rdi
sete%al
ret

[Bug target/113609] EQ/NE comparison between avx512 kmask and -1 can be optimized with kxortest with checking CF.

2024-01-25 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113609

--- Comment #2 from Uroš Bizjak  ---
(In reply to Hongtao Liu from comment #1)
> Since they're different modes, CCZ for cmp, but CCS for kortest, it could be
> diffcult to optimize it in RA stage by adding alternatives(like we did for
> compared to 0). So the easy way could be adding peephole to hanlde that.

You can use pre-reload split for this. Please see for example how *jcc_bt
and *bt_setcqi provide compound operation (constructed from CCZmode
compare) that is later split to CCCmode operation. You will have to provide jcc
and setcc patterns to fully handle mode change.

[Bug target/82580] Optimize comparisons for __int128 on x86-64

2024-01-22 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82580

Uroš Bizjak  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #17 from Uroš Bizjak  ---
(In reply to Roger Sayle from comment #16)
> Advance warning that the testcase pr82580.c will start FAILing due to
> differences in register allocation following improvements to __int128
> parameter passing as explained in
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623756.html.
> We might need additional reload alternatives/preferences to ensure that we
> don't generate a movzbl.  Hopefully, Jakub and/or Uros have some suggestions
> for how best this can be fixed.
> 
> Previously, the SUBREGs and CLOBBERs generated by middle-end RTL expansion
> (unintentionally) ensured that rdx and rax would never be used for __int128
> arguments, which conveniently allowed the use of xor eax,eax; setc al in
> peephole2 as AX_REG wasn't live.  Now reload has more freedom, it elects to
> use rax as at this point the backend hasn't expressed any preference that it
> would like eax reserved for producing the result.

A different regression happens with pr82580.c, f0 function. Without the patch,
the compiler generates:

f0:
xorq%rdi, %rdx
xorq%rcx, %rsi
xorl%eax, %eax
orq %rsi, %rdx
sete%al
ret

But with the patch:

f0:
xchgq   %rdi, %rsi
movq%rdx, %r8
movq%rcx, %rax
movq%rsi, %rdx
movq%rdi, %rcx
xorq%rax, %rcx
xorq%r8, %rdx
xorl%eax, %eax
orq %rcx, %rdx
sete%al
ret

It looks to me that *concatditi3_3 ties two registers together so RA now tries
to satisfy *concatditi3_3 constraints *and* *cmpti_doubleword constraints.

The gcc.target/i386/pr43644-2.c mitigates this issue with
*addti3_doubleword_concat pattern that combines *addti3_doubleword with concat
insn, but doubleword compares (and other doubleword insn besides addti3) do not
provide these compound instructions.

So, without a common strategy to use doubleword_concat patterns for all double
word instructions, it is questionable if the complications with concat insn are
worth the pain of providing (many?) doubleword_concat patterns.

The real issue is with x86_64 doubleword arguments. Unfortunately, the ABI
specifies RDI/RSI to pass the double word argument, while the compiler regalloc
order sequence is RSI/RDI. IMO, we can try to swap RDI and RSI in the order and
RA would be able to allocate registers in the same optimal way as for x86_32
with -mregparm=3, even without synthetic concat patterns.

[Bug target/45434] x86 missed optimization: use high register (ah, bh, ch, dh) when available to make comparisons

2024-01-17 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45434

--- Comment #9 from Uroš Bizjak  ---
The current mainline compiles:

--cut here--
_Bool foo(int i)
{
  return (i & 0xFF) == ((i & 0xFF00) >> 8);
}

_Bool bar(int i)
{
  return (i & 0xFF) <= ((i & 0xFF00) >> 8);
}
--cut here--

with -O2 to:

foo:
movl%edi, %eax
sarl$8, %eax
xorl%edi, %eax
testb   %al, %al
sete%al
ret

bar:
movl%edi, %eax
cmpb%al, %ah
setnb   %al
ret

While ._original dump reads:

;; Function foo (null)

{
  return ((i >> 8 ^ i) & 255) == 0;
}


;; Function bar (null)

{
  return (i & 255) <= (i >> 8 & 255);
}

The test for equivalence goes through XOR which interferes with the ability of
the combine pass to form the compare with zero-extracted argument RTX.

[Bug rtl-optimization/113048] [13/14 Regression] ICE: in lra_split_hard_reg_for, at lra-assigns.cc:1862 (unable to find a register to spill) {*andndi3_doubleword_bmi} with -march=cascadelake since r13

2024-01-15 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113048

--- Comment #8 from Uroš Bizjak  ---
(In reply to Vladimir Makarov from comment #7)
> I believe this PR was recently fixed by
> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;
> h=a729b6e002fe76208f33fdcdee49d6a310a1940e

Yes, I can confirm that the test compiles OK with recent:

xgcc (GCC) 14.0.1 20240115 (experimental) [master r14-7251-g04f22670d32]

[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8

2024-01-09 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

Uroš Bizjak  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #6 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #4)

> I can't decipher this from what expand generates but the problem lies
> there (the rep_8bytes expansion).

Let's ask Honza.

[Bug tree-optimization/108477] fwprop over-optimizes conversion from + to |

2024-01-08 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108477

--- Comment #2 from Uroš Bizjak  ---
If we consider the following testcase:

--cut here--
unsigned int foo (unsigned int a, unsigned int b)
{
  unsigned int r = a & 0x1;
  unsigned int p = b & ~0x3;

  return r + p + 2;
}

unsigned int bar (unsigned int a, unsigned int b)
{
  unsigned int r = a & 0x1;
  unsigned int p = b & ~0x3;

  return r | p | 2;
}
--cut here--

the above testcase compiles (x86_64 -O2) to:

foo:
andl$1, %edi
andl$-4, %esi
orl %esi, %edi
leal2(%rdi), %eax
ret

bar:
andl$1, %edi
andl$-4, %esi
orl %esi, %edi
movl%edi, %eax
orl $2, %eax
ret

So, there is no further simplification in any case, we can't combine OR with a
PLUS in the first case, and we don't have OR instruction with multiple inputs
in the second case.

If we switch around the logic in the conversion and convert from IOR/XOR to
PLUS, as is the case in the following patch:

--cut here--
diff --git a/gcc/match.pd b/gcc/match.pd
index 7b4b15acc41..deac18a7635 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1830,18 +1830,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& element_precision (type) <= element_precision (TREE_TYPE (@1)))
(bit_not (rop (convert @0) (convert @1))

-/* If we are XORing or adding two BIT_AND_EXPR's, both of which are and'ing
+/* If we are ORing or XORing two BIT_AND_EXPR's, both of which are and'ing
with a constant, and the two constants have no bits in common,
-   we should treat this as a BIT_IOR_EXPR since this may produce more
+   we should treat this as a PLUS_EXPR since this may produce more
simplifications.  */
-(for op (bit_xor plus)
+(for op (bit_ior bit_xor)
  (simplify
   (op (convert1? (bit_and@4 @0 INTEGER_CST@1))
   (convert2? (bit_and@5 @2 INTEGER_CST@3)))
   (if (tree_nop_conversion_p (type, TREE_TYPE (@0))
&& tree_nop_conversion_p (type, TREE_TYPE (@2))
&& (wi::to_wide (@1) & wi::to_wide (@3)) == 0)
-   (bit_ior (convert @4) (convert @5)
+   (plus (convert @4) (convert @5)

 /* (X | Y) ^ X -> Y & ~ X*/
 (simplify
--cut here--

then the resulting assembly reads:

foo:
andl$-4, %esi
andl$1, %edi
leal2(%rsi,%rdi), %eax
ret

bar:
andl$1, %edi
andl$-4, %esi
leal(%rdi,%rsi), %eax
orl $2, %eax
ret

On x86, the conversion can now use LEA instruction, which is much more usable
than OR instruction. In the first case, LEA implements three input PLUS
instruction, while in the second case, even though the instruction can't be
combined with a follow-up OR, the non-destructive LEA avoids a move.

[Bug tree-optimization/108477] fwprop over-optimizes conversion from + to |

2024-01-08 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108477

--- Comment #1 from Uroš Bizjak  ---
This conversion happens due to th following code in match.pd:

/* If we are XORing or adding two BIT_AND_EXPR's, both of which are and'ing
   with a constant, and the two constants have no bits in common,
   we should treat this as a BIT_IOR_EXPR since this may produce more
   simplifications.  */
(for op (bit_xor plus)
 (simplify
  (op (convert1? (bit_and@4 @0 INTEGER_CST@1))
  (convert2? (bit_and@5 @2 INTEGER_CST@3)))
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0))
   && tree_nop_conversion_p (type, TREE_TYPE (@2))
   && (wi::to_wide (@1) & wi::to_wide (@3)) == 0)
   (bit_ior (convert @4) (convert @5)

[Bug rtl-optimization/109052] Unnecessary reload with -mfpmath=both

2024-01-08 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109052

Uroš Bizjak  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |14.0

--- Comment #9 from Uroš Bizjak  ---
Implemented for gcc-14.

[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8

2024-01-07 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #3 from Uroš Bizjak  ---
_.dse1 pass is removing the store for some reason, -fno-dse "fixes" the
testcase.

Before _.dse1 pass, we have:

(insn 41 40 46 4 (set (mem/c:SI (plus:DI (reg/f:DI 19 frame)
(const_int -36 [0xffdc])) [2 e[1].y+0 S4 A32])
(reg:SI 98 [ e$1$y ])) "pr113255.c":21:9 85 {*movsi_internal}
 (expr_list:REG_DEAD (reg:SI 98 [ e$1$y ])
(nil)))

But _.dse1 pass decides that:

**scanning insn=41
  mem: (plus:DI (reg/f:DI 19 frame)
(const_int -36 [0xffdc]))

   after canon_rtx address: (plus:DI (reg/f:DI 19 frame)
(const_int -36 [0xffdc]))
  gid=1 offset=-36
 processing const base store gid=1[-36..-32)
mems_found = 1, cannot_delete = false

...

Locally deleting insn 41
deferring deletion of insn with uid = 41.

[Bug rtl-optimization/113048] [13/14 Regression] ICE: in lra_split_hard_reg_for, at lra-assigns.cc:1862 (unable to find a register to spill) {*andndi3_doubleword_bmi} with -march=cascadelake since r13

2024-01-05 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113048

Uroš Bizjak  changed:

   What|Removed |Added

  Component|target  |rtl-optimization

--- Comment #5 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #3)
> Started with r13-1716-gfd3d25d6df1cbd385d2834ff3059dfb6905dd75c

There is nothing wrong with the constrints in *andndi3_doubleword_bmi:

(define_insn_and_split "*andn3_doubleword_bmi"
  [(set (match_operand: 0 "register_operand" "=,r,r")
(and:
  (not: (match_operand: 1 "register_operand" "r,0,r"))
  (match_operand: 2 "nonimmediate_operand" "ro,ro,0")))
   (clobber (reg:CC FLAGS_REG))]

Reconfirmed as RA problem.

[Bug target/113231] x86_64 uses SSE instructions for `*mem <<= const` at -Os

2024-01-04 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113231

Uroš Bizjak  changed:

   What|Removed |Added

 CC||roger at nextmovesoftware dot 
com

--- Comment #3 from Uroš Bizjak  ---
CC Roger.

[Bug sanitizer/111736] Address sanitizer is not compatible with named address spaces

2023-12-29 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736

Uroš Bizjak  changed:

   What|Removed |Added

   Target Milestone|--- |13.3

[Bug target/113133] [14 Regression] ICE: SIGSEGV in mark_label_nuses(rtx_def*) (emit-rtl.cc:3896) with -O -fno-tree-ter -mavx512f -march=barcelona

2023-12-29 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113133

Uroš Bizjak  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #10 from Uroš Bizjak  ---
Fixed by a partial revert of
r14-4499-gc1eef66baa8dde706d7ea6921648e6016dc7c93d.

[Bug target/113133] [14 Regression] ICE: SIGSEGV in mark_label_nuses(rtx_def*) (emit-rtl.cc:3896) with -O -fno-tree-ter -mavx512f -march=barcelona

2023-12-29 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113133

--- Comment #8 from Uroš Bizjak  ---
(In reply to Haochen Jiang from comment #6)
> Aha, I see what happened. x/ymm16+ are usable for AVX512F w/o AVX512VL and
> that is why I added that to allow them.
> 
> Let me find a way to see if we can fix this.

It looks to me that ix86_hard_regno_mode_ok should be fixed to allow x/ymm16+
also with EVEX512. Currently we have:

  /* TODO check for QI/HI scalars.  */
  /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
  if (TARGET_AVX512VL
  && (VALID_AVX256_REG_OR_OI_MODE (mode)
  || VALID_AVX512VL_128_REG_MODE (mode)))
return true;

so the compiler is unable to change some of the modes of the xmm16 to 128-bit
mode using lowpart_subreg, e.g. DFmode to V4SFmode.

Please also note that your original patch missed to add TARGET_EVEX512 to the
splitter that handles float_truncate with TARGET_USE_VECTOR_FP_CONVERTS.

I propose to proceed with the minimal fix from Comment #3 as a hotfix to
unbreak the testcase in this PR. The real, but more involved fix is to fix
ix86_hard_regno_mode_ok, which I'll leave to you.

[Bug target/113133] [14 Regression] ICE: SIGSEGV in mark_label_nuses(rtx_def*) (emit-rtl.cc:3896) with -O -fno-tree-ter -mavx512f -march=barcelona

2023-12-28 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113133

Uroš Bizjak  changed:

   What|Removed |Added

 CC||haochen.jiang at intel dot com

--- Comment #4 from Uroš Bizjak  ---
Caused by r14-4499-gc1eef66baa8dde706d7ea6921648e6016dc7c93d

[Bug target/113133] [14 Regression] ICE: SIGSEGV in mark_label_nuses(rtx_def*) (emit-rtl.cc:3896) with -O -fno-tree-ter -mavx512f -march=barcelona

2023-12-28 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113133

Uroš Bizjak  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

--- Comment #3 from Uroš Bizjak  ---
This patch also fixes the failure:

--cut here--
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ca6dbf42a6d..cdb9ddc4eb3 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -5210,7 +5210,7 @@ (define_split
&& optimize_insn_for_speed_p ()
&& reload_completed
&& (!EXT_REX_SSE_REG_P (operands[0])
-   || TARGET_AVX512VL || TARGET_EVEX512)"
+   || TARGET_AVX512VL)"
[(set (match_dup 2)
 (float_extend:V2DF
   (vec_select:V2SF
--cut here--

[Bug target/113133] [14 Regression] ICE: SIGSEGV in mark_label_nuses(rtx_def*) (emit-rtl.cc:3896) with -O -fno-tree-ter -mavx512f -march=barcelona

2023-12-28 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113133

--- Comment #2 from Uroš Bizjak  ---
Another testcase:

--cut here--
void
foo1 (double *d, float f)
{
  register float x __asm ("xmm16") = f;
  asm volatile ("" : "+v" (x));

  *d = x;
}

void
foo2 (float *f, double d)
{
  register double x __asm ("xmm16") = d;
  asm volatile ("" : "+v" (x));

  *f = x;
}
--cut here--

[Bug target/113133] [14 Regression] ICE: SIGSEGV in mark_label_nuses(rtx_def*) (emit-rtl.cc:3896) with -O -fno-tree-ter -mavx512f -march=barcelona

2023-12-28 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113133

Uroš Bizjak  changed:

   What|Removed |Added

   Last reconfirmed||2023-12-28
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com

--- Comment #1 from Uroš Bizjak  ---
Created attachment 56962
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56962=edit
Proposed patch

Patch in testing.

lowpart_subreg can't handle:

lowpart_subreg (V4SFmode, operands[0], DFmode);

and

lowpart_subreg (V2DFmode, operands[0], SFmode);

subreg conversions and will return NULL_RTX for these cases.

[Bug rtl-optimization/113106] Missing CSE with cast to volatile

2023-12-21 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113106

--- Comment #7 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #6)

> > BTW: I also checked with clang, and it creates expected code in all cases.
> 
> But you don't get
> 
>movl%gs:b(%rip), %eax
>addl%eax, %eax
> 
> or
> 
>movlb(%rip), %eax
>addl%eax, %eax
> 
> which I think would be wrong.  The volatile access doesn't need to yield
> the same value as the non-volatile one so we can't value-number them the
> same.

The above is the code that clang produces for the testcases in Comment #2 and
Comment #0.

clang version 15.0.7 (Fedora 15.0.7-2.fc37)

[Bug target/113044] [14 Regression] wrong code with vector shift at -O1 since r14-5254

2023-12-21 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113044

Uroš Bizjak  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #5 from Uroš Bizjak  ---
Fixed.

[Bug c/113106] Missing CSE with cast to volatile

2023-12-21 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113106

--- Comment #5 from Uroš Bizjak  ---
The issue in comment #2 happens in a couple of places when compiling linux
kernel (with named address spaces enabled). However, the issue is not specific
to named AS, I was just more attentive to moves from %gs: prefixed locations.

[Bug c/113106] Missing CSE with cast to volatile

2023-12-21 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113106

--- Comment #4 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #3)
> The situation with address-spaces isn't valid as we need to preserve the
> second load because it's volatile.  I think we simply refuse to combine
> volatile loads out of caution in the first case.

int __seg_gs b;
return *(volatile __seg_gs int *)  + b;

But the above is the same w.r.t to volatile as:

int a;
return *(volatile int *)  + a;

?

BTW: I also checked with clang, and it creates expected code in all cases.

[Bug target/113044] [14 Regression] wrong code with vector shift at -O1 since r14-5254

2023-12-21 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113044

Uroš Bizjak  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com
 Status|NEW |ASSIGNED

--- Comment #3 from Uroš Bizjak  ---
Created attachment 56917
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56917=edit
Proposed patch

Patch in testing.

[Bug c/113106] Missing CSE with cast to volatile

2023-12-21 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113106

--- Comment #2 from Uroš Bizjak  ---
For reference, the same optimization should be applied with address spaces:

--cut here--
int __seg_gs b;

int bar(void)
{
  return *(volatile __seg_gs int *)  + b;
}
--cut here--

the above testcase currently compiles to:

movl%gs:b(%rip), %eax
addl%gs:b(%rip), %eax
ret

but can be compiled to:

movl%gs:b(%rip), %eax
addl%eax, %eax
ret

[Bug c/113106] Missing CSE with cast to volatile

2023-12-21 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113106

--- Comment #1 from Uroš Bizjak  ---
Perhaps related,

--cut here--
int a;

int foo(void)
{
  return *(volatile int *)  + *(volatile int *) 
}
--cut here--

compiles with -O2 to:

movla(%rip), %eax
movla(%rip), %edx
addl%edx, %eax
ret

but may be compiled to:

movla(%rip), %eax
addla(%rip), %eax
ret

(the memory read may propagate to the insn)

[Bug c/113106] New: Missing CSE with cast to volatile

2023-12-21 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113106

Bug ID: 113106
   Summary: Missing CSE with cast to volatile
   Product: gcc
   Version: 12.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ubizjak at gmail dot com
  Target Milestone: ---

The following testcase:

--cut here--
int a;

int foo(void)
{
  return *(volatile int *)  + a;
}
--cut here--

compiles with -O2 to:

movla(%rip), %eax
addla(%rip), %eax
ret

with more detail:

#(insn:TI 6 7 8 2 (set (reg:SI 0 ax [orig:98 _1 ] [98])
#(mem/v/c:SI (symbol_ref:DI ("a") [flags 0x2] ) [1 MEM[(volatile int *)]+0 S4 A32])) "vol.c":5:10 85
{*movsi_internal}
# (nil))
movla(%rip), %eax   # 6 [c=5 l=6]  *movsi_internal/0
#(insn 8 6 14 2 (parallel [
#(set (reg:SI 0 ax [102])
#(plus:SI (reg:SI 0 ax [orig:98 _1 ] [98])
#(mem/c:SI (symbol_ref:DI ("a") [flags 0x2] ) [1 a+0 S4 A32])))
#(clobber (reg:CC 17 flags))
#]) "vol.c":5:31 271 {*addsi_1}
# (expr_list:REG_UNUSED (reg:CC 17 flags)
#(nil)))
addla(%rip), %eax   # 8 [c=9 l=6]  *addsi_1/1

This may be compiled to:

movla(%rip), %eax
addl%eax, %eax
ret

since only one read uses volatile.

[Bug sanitizer/111736] Address sanitizer is not compatible with named address spaces

2023-12-19 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736

--- Comment #7 from Uroš Bizjak  ---
(In reply to GCC Commits from comment #5)
> The master branch has been updated by Richard Biener :

Can this patch be backported to gcc-13 branch?

[Bug sanitizer/113043] ICE: in emit_move_insn, at expr.cc:4246 with interrupt attribute and x32

2023-12-17 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113043

Uroš Bizjak  changed:

   What|Removed |Added

  Component|target  |sanitizer
   Last reconfirmed||2023-12-17
 Ever confirmed|0   |1
 CC||dodji at gcc dot gnu.org,
   ||dvyukov at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org,
   ||kcc at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org
 Status|UNCONFIRMED |NEW

--- Comment #1 from Uroš Bizjak  ---
Middle-end is calling emit_move_insn from:

(gdb) f 3
#3  0x00aac95c in expand_gimple_stmt_1 (stmt=0x7fffe9fe70a0) at
../../git/gcc/gcc/cfgexpand.cc:4030
4030  emit_move_insn (target, temp);
(gdb) list
4026else
4027  {
4028temp = force_operand (temp, target);
4029if (temp != target)
4030  emit_move_insn (target, temp);
4031  }

target = 
(reg:SI 99 [ _5 ])

temp = 
(mem/f/c:DI (plus:DI (reg/f:DI 93 virtual-stack-vars)
(const_int -8 [0xfff8])) [8 d+0 S4 A32])

and triggeres gcc_assert in emit_move_insn due to mode mismatch.

This happens when expanding:

  unsigned int _5;

  __builtin___ubsan_handle_type_mismatch_v1 (&*.Lubsan_data0, _5);

where sanitizer builtin expects PTR.

DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_TYPE_MISMATCH_V1,
  "__ubsan_handle_type_mismatch_v1",
  BT_FN_VOID_PTR_PTR,
  ATTR_COLD_NOTHROW_LEAF_LIST)

Confirmed as a sanitizer issue.

[Bug target/112962] [14 Regression] ICE: SIGSEGV in operator() (recog.h:431) with -fexceptions -mssse3 and __builtin_ia32_pabsd128()

2023-12-12 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112962

Uroš Bizjak  changed:

   What|Removed |Added

   Assignee|ubizjak at gmail dot com   |unassigned at gcc dot 
gnu.org
 Status|ASSIGNED|NEW

--- Comment #10 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #7)

> but with -fexceptions (and probably because we incorrectly don't mark the
> builtins nothrow?) this doesn't happen.

Maybe we should finally fix the above nothrow issue?

[Bug target/112962] [14 Regression] ICE: SIGSEGV in operator() (recog.h:431) with -fexceptions -mssse3 and __builtin_ia32_pabsd128()

2023-12-12 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112962

--- Comment #9 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #8)
> Of course, yet another option is:

This goes out of my (limited) area of expertise, so if my proposed (trivial)
patch is papering over some other issue, I'll happily leave the solution to
you.

[Bug target/112962] [14 Regression] ICE: SIGSEGV in operator() (recog.h:431) with -fexceptions -mssse3 and __builtin_ia32_pabsd128()

2023-12-12 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112962

--- Comment #6 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #3)
> I was thinking whether it wouldn't be better to expand x86 const or pure
> builtins when lhs is ignored to nothing in the expanders.

Something like this?

--cut here--
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index a53d69d5400..0f3d6108d77 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -13032,6 +13032,9 @@ ix86_expand_builtin (tree exp, rtx target, rtx
subtarget,
   unsigned int fcode = DECL_MD_FUNCTION_CODE (fndecl);
   HOST_WIDE_INT bisa, bisa2;

+  if (ignore && (TREE_READONLY (fndecl) || DECL_PURE_P (fndecl)))
+return const0_rtx;
+
   /* For CPU builtins that can be folded, fold first and expand the fold.  */
   switch (fcode)
 {
@@ -14401,9 +14404,6 @@ rdseed_step:
   return target;

 case IX86_BUILTIN_READ_FLAGS:
-  if (ignore)
-   return const0_rtx;
-
   emit_insn (gen_pushfl ());

   if (optimize
--cut here--

[Bug target/112962] [14 Regression] ICE: SIGSEGV in operator() (recog.h:431) with -fexceptions -mssse3 and __builtin_ia32_pabsd128()

2023-12-12 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112962

--- Comment #4 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #3)
> I was thinking whether it wouldn't be better to expand x86 const or pure
> builtins when lhs is ignored to nothing in the expanders.

Yes, this could be a better solution.

[Bug target/112962] [14 Regression] ICE: SIGSEGV in operator() (recog.h:431) with -fexceptions -mssse3 and __builtin_ia32_pabsd128()

2023-12-12 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112962

Uroš Bizjak  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com
   Last reconfirmed||2023-12-12
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1

--- Comment #1 from Uroš Bizjak  ---
Created attachment 56862
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56862=edit
Proposed patch

Patch in testing.

[Bug rtl-optimization/112760] [14 Regression] wrong code with -O2 -fno-dce -fno-guess-branch-probability -m8bit-idiv -mavx --param=max-cse-insns=0 and __builtin_add_overflow_p()

2023-11-29 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112760

Uroš Bizjak  changed:

   What|Removed |Added

  Component|target  |rtl-optimization
   Last reconfirmed||2023-11-29
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Target Milestone|--- |14.0

--- Comment #2 from Uroš Bizjak  ---
With the original testcase, ce1 pass is if-converting:

   20: flags:CCZ=cmp(r110:SI,r111:SI)
  REG_DEAD r111:SI
  REG_DEAD r110:SI
   21: pc={(flags:CCZ==0)?L23:pc}
  REG_DEAD flags:CCZ
   39: NOTE_INSN_BASIC_BLOCK 5
   22: r103:HI=0x1
   23: L23:

with:

IF-THEN-JOIN block found, pass 2, test 2, then 5, join 6
scanning new insn with uid = 45.
scanning new insn with uid = 44.
scanning new insn with uid = 46.
if-conversion succeeded through noce_try_cmove
Removing jump 21.
deleting insn with uid = 21.
deleting insn with uid = 22.

to:

   20: flags:CCZ=cmp(r110:SI,r111:SI)
  REG_DEAD r111:SI
  REG_DEAD r110:SI
   45: r118:HI=0x1
   44: flags:CCZ=cmp(r110:SI,r111:SI)
   46: r103:HI={(flags:CCZ==0)?r103:HI:r118:HI}

And things go downhill from here. Before postreload we have:

   20: flags:CCZ=cmp(ax:SI,dx:SI)
  REG_UNUSED flags:CCZ
   44: flags:CCZ=cmp(ax:SI,dx:SI)
  REG_DEAD dx:SI
  REG_DEAD ax:SI
   62: ax:HI=0x1
  REG_EQUIV 0x1
   46: bx:HI={(flags:CCZ==0)?bx:HI:ax:HI}
  REG_DEAD flags:CCZ
  REG_DEAD ax:HI

and in posteload pass (insn 44) is removed:

   20: flags:CCZ=cmp(ax:SI,dx:SI)
  REG_UNUSED flags:CCZ
   62: ax:HI=0x1
  REG_EQUIV 0x1
   46: bx:HI={(flags:CCZ==0)?bx:HI:ax:HI}
  REG_DEAD flags:CCZ
  REG_DEAD ax:HI

here comes pro_and_epilogue pass that detects "unused" (insn 20) and removes
it:

df_analyze called
deleting insn with uid = 20.

Confirmed as RTL optimization problem.

[Bug middle-end/112560] [14 Regression] ICE in try_combine on pr112494.c

2023-11-29 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112560

Uroš Bizjak  changed:

   What|Removed |Added

   Keywords||patch

--- Comment #4 from Uroš Bizjak  ---
Patch at [1].

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638589.html

[Bug middle-end/112560] [14 Regression] ICE in try_combine on pr112494.c

2023-11-28 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112560

Uroš Bizjak  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com
 Status|NEW |ASSIGNED

--- Comment #3 from Uroš Bizjak  ---
Created attachment 56705
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56705=edit
Proposed patch

The code assumes that cc_use_loc represents a comparison operator. Skip the
modification of CC-using operation if this is not the case.

[Bug target/112494] ICE in ix86_cc_mode, at config/i386/i386.cc:16477

2023-11-28 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112494

Uroš Bizjak  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |14.0
 Resolution|--- |FIXED

--- Comment #10 from Uroš Bizjak  ---
Fixed for 14.0.

[Bug middle-end/112560] [14 Regression] ICE in try_combine on pr112494.c

2023-11-28 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112560
Bug 112560 depends on bug 112494, which changed state.

Bug 112494 Summary: ICE in ix86_cc_mode, at config/i386/i386.cc:16477
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112494

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug target/112686] [14 Regression] ICE: in gen_reg_rtx, at emit-rtl.cc:1176 with -fsplit-stack -mcmodel=large

2023-11-24 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112686

Uroš Bizjak  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
 CC|uros at gcc dot gnu.org|

--- Comment #5 from Uroš Bizjak  ---
Fixed.

[Bug target/112672] [14 Regression] wrong code with __builtin_parityl() at -O and above on x86_64

2023-11-24 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112672

Uroš Bizjak  changed:

   What|Removed |Added

   Target Milestone|14.0|11.5

--- Comment #9 from Uroš Bizjak  ---
Fixed everywhere.

[Bug target/112686] [14 Regression] ICE: in gen_reg_rtx, at emit-rtl.cc:1176 with -fsplit-stack -mcmodel=large

2023-11-24 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112686

Uroš Bizjak  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com
 Status|NEW |ASSIGNED

--- Comment #3 from Uroš Bizjak  ---
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 7b922857d80..50e8826dbe5 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -10503,7 +10503,7 @@ ix86_expand_split_stack_prologue (void)
  fn = copy_to_suggested_reg (x, reg11, Pmode);
}
  else
-   fn = split_stack_fn_large;
+   fn = copy_to_suggested_reg (split_stack_fn_large, reg11, Pmode);

  /* When using the large model we need to load the address
 into a register, and we've run out of registers.  So we

[Bug target/89316] ICE with -mforce-indirect-call and -fsplit-stack

2023-11-23 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89316

Uroš Bizjak  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |14.0
 Resolution|--- |FIXED

--- Comment #16 from Uroš Bizjak  ---
Fixed for gcc-14.

[Bug target/112672] [14 Regression] wrong code with __builtin_parityl() at -O and above on x86_64

2023-11-23 Thread ubizjak at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112672

Uroš Bizjak  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com

--- Comment #4 from Uroš Bizjak  ---
(In reply to Andrew Pinski from comment #3)
> parityhi2 should have:
> rtx extra = gen_reg_rtx (HImode);
> emit_move_insn (extra, operands[1]);
> emit_insn (gen_parityhi2_cmp (extra));
> 
> Or something similar because parityqi2_cmp clobbers its argument.

Exactly.

I have a patch in testing.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 6562 matches

Mail list logo