[Bug target/69693] Wrong mode is used to load spilled register

2024-04-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69693

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |10.0

--- Comment #9 from Andrew Pinski  ---
Fixed by r10-2665-gf386ca41386215 which uses `(vec_merge (vec_duplicate..))`
instead of paradoxical subregs.

[Bug target/69693] Wrong mode is used to load spilled register

2019-03-12 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69693

--- Comment #8 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #3)
> Your patch will just paper over the real issue in this particular testcase.

This can be illustrated with an example from PR89654:

--cut here--
unsigned long long
foo (unsigned long long i, int z)
{
  return i << 3;
}
--cut here--

with a patch:

--cut here--
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d4c01407f4a2..6142f5272a2e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1290,6 +1290,16 @@
(set_attr "prefix" "maybe_vex")
(set_attr "mode" "TI")])

+;; Used by STV to load a DI into an xmm register.
+(define_insn "*movdi_to_v2di"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+(subreg:V2DI (match_operand:DI 1 "nonimmediate_operand" "xm") 0))]
+  "!TARGET_64BIT && TARGET_SSE2"
+  "%vmovq\t{%1, %0|%0, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "maybe_vex")
+   (set_attr "mode" "DI")])
+
 ;; Move a DI from a 32-bit register pair (e.g. %edx:%eax) to an xmm.
 ;; We'd rather avoid this entirely; if the 32-bit reg pair was loaded
 ;; from memory, we'd prefer to load the memory directly into the %xmm
--cut here--

./cc1 -O2 -m32 -march=skylake indeed creates:

movl32(%esp), %eax  # 24[c=8 l=4]  *movsi_internal/0
movl36(%esp), %edx  # 25[c=8 l=4]  *movsi_internal/0
movl%eax, (%esp)# 26[c=4 l=3]  *movsi_internal/1
movl%edx, 4(%esp)   # 27[c=4 l=4]  *movsi_internal/1
vmovq   (%esp), %xmm1   # 21[c=24 l=5]  *movdi_to_v2di
vpsllq  $3, %xmm1, %xmm0# 7 [c=4 l=5]  ashlv2di3/1

but ./cc1 -O2 -m32 -march=skylake-avx512 shows:

movl32(%esp), %eax  # 23[c=8 l=4]  *movsi_internal/0
movl36(%esp), %edx  # 24[c=8 l=4]  *movsi_internal/0
movl%eax, (%esp)# 25[c=4 l=3]  *movsi_internal/1
movl%edx, 4(%esp)   # 26[c=4 l=4]  *movsi_internal/1
vpsllq  $3, (%esp), %xmm0   # 7 [c=20 l=6]  *ashlv2di3/1

where VPSLLQ still loads V2DImode from stack.

[Bug target/69693] Wrong mode is used to load spilled register

2019-03-12 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69693

--- Comment #7 from H.J. Lu  ---
*** Bug 89654 has been marked as a duplicate of this bug. ***

[Bug target/69693] Wrong mode is used to load spilled register

2016-02-06 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69693

--- Comment #6 from H.J. Lu  ---
(In reply to Uroš Bizjak from comment #5)
> (In reply to H.J. Lu from comment #4)
> 
> > It looks that it is done on purpose.
> 
> In this case, our planned transition to generic unaligned SSE loads should
> "fix" this issue. The realignment will be necessary only for performance
> reasons, not for the correctness. Maybe we should also look if
> SLOW_UNALIGNED_ACCESS affects moves that may result in muvups insns.
> 
> (However, since movups from unaligned address is slow, I still think that
> realignment should be fixed for STV pass).

STV paradoxical V2DI subregs only needs 8-byte alignment. LRA generates
aligned V2DI move on paradoxical V2DI subregs since we don't provide
paradoxical V2DI subreg move and SLOW_UNALIGNED_ACCESS is 0.  STV doesn't
need 16-byte alignment.  I bootstrapped 32-bit GCC using my patch with
--with-arch=corei7 --with-cpu=corei7. There are no regressions.

[Bug target/69693] Wrong mode is used to load spilled register

2016-02-06 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69693

--- Comment #5 from Uroš Bizjak  ---
(In reply to H.J. Lu from comment #4)

> It looks that it is done on purpose.

In this case, our planned transition to generic unaligned SSE loads should
"fix" this issue. The realignment will be necessary only for performance
reasons, not for the correctness. Maybe we should also look if
SLOW_UNALIGNED_ACCESS affects moves that may result in muvups insns.

(However, since movups from unaligned address is slow, I still think that
realignment should be fixed for STV pass).

[Bug target/69693] Wrong mode is used to load spilled register

2016-02-05 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69693

--- Comment #2 from H.J. Lu  ---
Created attachment 37598
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37598=edit
A patch

It is a backend bug. We need to add

; Used by STV to load a DI into an xmm register.
(define_insn "*movdi_to_v2di"
  [(set (match_operand:V2DI 0 "register_operand" "=x") 
(subreg:V2DI (match_operand:DI 1 "nonimmediate_operand" "xm") 0))]
  "!TARGET_64BIT && TARGET_SSE2"
  "%vmovq\t{%1, %0|%0, %1}"
  [(set_attr "type" "ssemov")
   (set_attr "prefix" "maybe_vex")
   (set_attr "mode" "DI")])

I am testing this patch.

[Bug target/69693] Wrong mode is used to load spilled register

2016-02-05 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69693

Uroš Bizjak  changed:

   What|Removed |Added

 CC||vmakarov at gcc dot gnu.org

--- Comment #3 from Uroš Bizjak  ---
(In reply to H.J. Lu from comment #2)

> It is a backend bug. We need to add

It is not.

Please see in reload dump, where:
3: r109:DI=[argp:SI+0x8]
Inserting insn reload after:
   43: r95:DI=r109:DI
...
   15: r111:V2DI=r111:V2DI:DI#0
  REG_DEAD r108:DI
  REG_DEAD r95:DI
Inserting insn reload before:
   45: r111:V2DI=r95:DI#0
Inserting insn reload after:
   46: r99:DI#0=r111:V2DI

which results in:

(insn 3 2 43 2 (set (reg/v:DI 2 cx [orig:95 p2 ] [95])
(mem/c:DI (plus:SI (reg/f:SI 7 sp)
(const_int 36 [0x24])) [2 p2+0 S8 A32])) pr69693.c:6 85
{*movdi_internal}
 (nil))
(insn 43 3 4 2 (set (mem/c:DI (reg/f:SI 7 sp) [3 %sfp+-16 S8 A128])
(reg/v:DI 2 cx [orig:95 p2 ] [95])) pr69693.c:6 85 {*movdi_internal}
 (nil))
...
(insn 45 40 15 3 (set (reg:V2DI 23 xmm2 [111])
(mem/c:V2DI (reg/f:SI 7 sp) [3 %sfp+-16 S16 A128])) pr69693.c:13 1223
{*movv2di_internal}
 (nil))
(insn 15 45 46 3 (set (reg:V2DI 23 xmm2 [111])
(and:V2DI (reg:V2DI 23 xmm2 [111])
(reg:V2DI 21 xmm0 [108]))) pr69693.c:13 3471 {*andv2di3}
 (nil))

Please note how (insn 45) is accessing uninitialized memory.

Your patch will just paper over the real issue in this particular testcase.

[Bug target/69693] Wrong mode is used to load spilled register

2016-02-05 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69693

H.J. Lu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-02-05
 Ever confirmed|0   |1

--- Comment #4 from H.J. Lu  ---
LRA generates reload:

(insn 45 0 0 (set (reg:V2DI 111)
(subreg:V2DI (reg/v:DI 95 [ p2 ]) 0)) -1
 (nil))

With movdi_to_v2di, we get

(insn 45 40 15 3 (set (reg:V2DI 23 xmm2 [111])
(subreg:V2DI (mem/c:DI (reg/f:SI 7 sp) [3 %sfp+-16 S8 A64]) 0)) x.i:13
1288 {*movdi_to_v2di}
 (nil))

Without movdi_to_v2di, simplify_operand_subreg in lra-constraints.c:

 /* If we change address for paradoxical subreg of memory, the
 address might violate the necessary alignment or the access might
 be slow.  So take this into consideration.  We should not worry
 about access beyond allocated memory for paradoxical memory
 subregs as we don't substitute such equiv memory (see processing
 equivalences in function lra_constraints) and because for spilled
 pseudos we allocate stack memory enough for the biggest
 corresponding paradoxical subreg.  */
  if (MEM_P (reg)
  && (! SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (reg))
  || MEM_ALIGN (reg) >= GET_MODE_ALIGNMENT (mode)))
{
  rtx subst, old = *curr_id->operand_loc[nop];

  alter_subreg (curr_id->operand_loc[nop], false);
  subst = *curr_id->operand_loc[nop];
  lra_assert (MEM_P (subst));
  if (! valid_address_p (innermode, XEXP (reg, 0),
 MEM_ADDR_SPACE (reg))
  || valid_address_p (GET_MODE (subst), XEXP (subst, 0),
  MEM_ADDR_SPACE (subst)))
return true;

turns

(subreg:V2DI (reg/v:DI 95 [ p2 ]) 0)

into

(mem/c:V2DI (reg/f:SI 7 sp) [3 %sfp+-16 S16 A128]

It looks that it is done on purpose.

[Bug target/69693] Wrong mode is used to load spilled register

2016-02-05 Thread ienkovich at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69693

--- Comment #1 from Ilya Enkovich  ---
We should be able to revert r233167 if this issue is fixed