[Bug target/80080] S390: Isses with emitted cs-instructions for __atomic builtins.

2018-12-03 Thread iii at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

--- Comment #16 from iii at gcc dot gnu.org ---
Author: iii
Date: Mon Dec  3 09:49:02 2018
New Revision: 266734

URL: https://gcc.gnu.org/viewcvs?rev=266734=gcc=rev
Log:
Repeat jump threading after combine

Consider the following RTL:

(insn (set (reg 65) (if_then_else (eq %cc 0) 1 0)))
(insn (parallel [(set %cc (compare (reg 65) 0)) (clobber %scratch)]))
(jump_insn (set %pc (if_then_else (ne %cc 0) (label_ref 23) %pc)))

Combine simplifies this into:

(note NOTE_INSN_DELETED)
(note NOTE_INSN_DELETED)
(jump_insn (set %pc (if_then_else (eq %cc 0) (label_ref 23) %pc)))

opening up the possibility to perform jump threading.

gcc/ChangeLog:

2018-12-03  Ilya Leoshkevich  

PR target/80080
* cfgcleanup.c (class pass_postreload_jump): New pass.
(pass_postreload_jump::execute): Likewise.
(make_pass_postreload_jump): Likewise.
* passes.def: Add pass_postreload_jump before
pass_postreload_cse.
* tree-pass.h (make_pass_postreload_jump): New pass.

gcc/testsuite/ChangeLog:

2018-12-03  Ilya Leoshkevich  

PR target/80080
* gcc.target/s390/pr80080-4.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/s390/pr80080-4.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/cfgcleanup.c
trunk/gcc/passes.def
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-pass.h

[Bug target/80080] S390: Isses with emitted cs-instructions for __atomic builtins.

2018-09-24 Thread iii at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

--- Comment #15 from iii at gcc dot gnu.org ---
Author: iii
Date: Mon Sep 24 14:21:03 2018
New Revision: 264535

URL: https://gcc.gnu.org/viewcvs?rev=264535=gcc=rev
Log:
S/390: Fix conditional returns on z196+

S/390 epilogue ends with (parallel [(return) (use %r14)]) instead of
the more usual (return) or (simple_return).  This sequence is not
recognized by the conditional return logic in try_optimize_cfg ().

This was introduced for processors older than z196, where it is
sometimes profitable to use call-clobbered register for returning
instead of %r14.  On newer processors we always return via %r14,
for which the fact that it's used is already reflected by
EPILOGUE_USES.  In this case a simple (return) suffices.

This patch changes return_use () to emit simple (return)s when
returning via %r14.  The resulting sequences are recognized by the
conditional return logic in try_optimize_cfg ().

gcc/ChangeLog:

2018-09-24  Ilya Leoshkevich  

PR target/80080
* config/s390/s390.c (s390_emit_epilogue): Do not use PARALLEL
RETURN+USE when returning via %r14.

gcc/testsuite/ChangeLog:

2018-09-24  Ilya Leoshkevich  

PR target/80080
* gcc.target/s390/risbg-ll-3.c: Expect conditional returns.
* gcc.target/s390/zvector/vec-cmp-2.c: Likewise.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/s390/s390.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/s390/risbg-ll-3.c
trunk/gcc/testsuite/gcc.target/s390/zvector/vec-cmp-2.c

[Bug target/80080] S390: Isses with emitted cs-instructions for __atomic builtins.

2018-09-06 Thread krebbel at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

--- Comment #14 from Andreas Krebbel  ---
Author: krebbel
Date: Thu Sep  6 07:38:42 2018
New Revision: 264143

URL: https://gcc.gnu.org/viewcvs?rev=264143=gcc=rev
Log:
S/390: Prohibit SYMBOL_REF in UNSPECV_CAS

Inhibit constant propagation inlining SYMBOL_REF loads into
UNSPECV_CAS.  Even though reload can later undo it, the resulting
code will be less efficient.

gcc/ChangeLog:

2018-09-06  Ilya Leoshkevich  

PR target/80080
* config/s390/predicates.md: Add nonsym_memory_operand.
* config/s390/s390.c (s390_legitimize_cs_operand): If operand
contains a SYMBOL_REF, load it into an intermediate pseudo.
(s390_emit_compare_and_swap): Legitimize operand.
* config/s390/s390.md: Use the new nonsym_memory_operand
with UNSPECV_CAS patterns.

gcc/testsuite/ChangeLog:

2018-09-06  Ilya Leoshkevich  

PR target/80080
* gcc.target/s390/pr80080-3.c: New test.
* gcc.target/s390/s390.exp: Make sure the new test passes
on all optimization levels.



Added:
trunk/gcc/testsuite/gcc.target/s390/pr80080-3.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/s390/predicates.md
trunk/gcc/config/s390/s390.c
trunk/gcc/config/s390/s390.md
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/s390/s390.exp

[Bug target/80080] S390: Isses with emitted cs-instructions for __atomic builtins.

2018-09-06 Thread krebbel at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

--- Comment #13 from Andreas Krebbel  ---
Author: krebbel
Date: Thu Sep  6 07:35:35 2018
New Revision: 264142

URL: https://gcc.gnu.org/viewcvs?rev=264142=gcc=rev
Log:
S/390: Register pass_s390_early_mach statically

The dump file used to come at the end of the sorted dump file list,
because the pass was registered dynamically. This did not reflect the
order in which passes are executed. Static registration fixes this:

* foo4.c.277r.split2
* foo4.c.281r.early_mach
* foo4.c.282r.pro_and_epilogue

gcc/ChangeLog:

2018-09-06  Ilya Leoshkevich  

PR target/80080
* config/s390/s390-passes.def: New file.
* config/s390/s390-protos.h (class rtl_opt_pass): Add forward
declaration.
(make_pass_s390_early_mach): Add declaration.
* config/s390/s390.c (make_pass_s390_early_mach):
(s390_option_override): Remove dynamic registration.
* config/s390/t-s390: Add s390-passes.def.



Added:
trunk/gcc/config/s390/s390-passes.def
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/s390/s390-protos.h
trunk/gcc/config/s390/s390.c
trunk/gcc/config/s390/t-s390

[Bug target/80080] S390: Isses with emitted cs-instructions for __atomic builtins.

2018-08-22 Thread iii at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

--- Comment #12 from Ilya Leoshkevich  ---
I've investigated foo3, foo4 and foo5, and came to the following
conclusions:

When foo3 is compiled with -march=z10 or later, cprop1 pass propagates
global's SYMBOL_REF value into UNSPECV_CAS.  On previous machines it
does not happen, because the result is rejected by insn_invalid_p ().
Then, reload realizes that SYMBOL_REF cannot be a legitimate UNSPECV_CAS
argument, and loads it into a pseudo right before.  The net result is
that loading of SYMBOL_REF is moved from outside of the loop into the
loop.  So we need to somehow inhibit constant propagation for this case.

Jump threading in foo4 does not work, because it's done only during
`jump' pass, at which point there are insns with side-effects in the
basic block of the 2nd jump.  They are later deleted by the `combine'
pass, but we don't request CLEANUP_THREADING after that.  I wonder if we
could introduce it?

In addition, when foo4 is compiled with -O2 or -O3, we don't use
conditional return, because our return sequence contains a PARALLEL,
which is rejected by bb_is_just_return ().  This can also be improved.

Finally, in foo5 `cs' is generated by s390_expand_cs_tdsi (), and
comparison is generated by common expansion logic, so it doesn't look
possible to improve the situation solely in the back-end.  We need to
somehow make gcc aware that (oldval == 0) and (retval != 0) are
equivalent after `cs', but I'm not sure at which point we could and
should do this - in theory doing this on tree rather than RTL level can
help other architectures.

[Bug target/80080] S390: Isses with emitted cs-instructions for __atomic builtins.

2018-08-06 Thread stli at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

--- Comment #11 from stli at linux dot ibm.com  ---
Hi,
I've retested the samples with gcc 7, 8 and head from 2018-07-20, but there are
still issues:
The examples foo1 and foo2 are okay.

The issue in example foo3 is still present (see description of the bug-report):

00a0 :
  a0:   a7 18 00 05 lhi %r1,5
  a4:   c4 2d 00 00 00 00   lrl %r2,a4 
a6: R_390_PC32DBL   foo3_mem+0x2

  aa:   c0 30 00 00 00 00   larl%r3,aa 
ac: R_390_PC32DBL   foo3_mem+0x2
  b0:   ba 21 30 00 cs  %r2,%r1,0(%r3)
  b4:   a7 74 ff fb jne aa 

The address of the global variable is still reloaded within the loop. If the
value was not swapped with cs, the jne can jump directly to the cs instruction
instead of the larl-instruction.

  b8:   b9 14 00 22 lgfr%r2,%r2
  bc:   07 fe   br  %r14
  be:   07 07   nopr%r7

I've found a further issue which is observable with the following two examples.
See the questions in the disassembly:

void foo4(int *mem)
{
  int oldval = 0;
  if (!__atomic_compare_exchange_n (mem, (void *) , 1,
1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
{
  bar (mem);
}
  /*
 :
   0:   e3 10 20 00 00 12   lt  %r1,0(%r2)
   6:   a7 74 00 06 jne 12 

Why do we need to jump to 0x12 first instead of directly jumping to 0x18?

   a:   a7 38 00 01 lhi %r3,1
   e:   ba 13 20 00 cs  %r1,%r3,0(%r2)
  12:   a7 74 00 03 jne 18 
  16:   07 fe   br  %r14
  18:   c0 f4 00 00 00 00   jg  18 
1a: R_390_PC32DBL   bar+0x2
  1e:   07 07   nopr%r7
  */
}


void foo5(int *mem)
{
  int oldval = 0;
  __atomic_compare_exchange_n (mem, (void *) , 1,
   1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
  if (oldval != 0)
bar (mem);
  /*
0040 :
  40:   e3 10 20 00 00 12   lt  %r1,0(%r2)
  46:   a7 74 00 06 jne 52 

This is similar to foo4, but the variable oldval is compared against zero
instead of using the return value of __atomic_compare_exchange_n.
Can't we jump directly to 0x5a instead of 0x52?

  4a:   a7 38 00 01 lhi %r3,1
  4e:   ba 13 20 00 cs  %r1,%r3,0(%r2)
  52:   12 11   ltr %r1,%r1
  54:   a7 74 00 03 jne 5a 
  58:   07 fe   br  %r14
  5a:   c0 f4 00 00 00 00   jg  5a 
5c: R_390_PC32DBL   bar+0x2
   */
}

[Bug target/80080] S390: Isses with emitted cs-instructions for __atomic builtins.

2017-04-25 Thread krebbel at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

--- Comment #10 from Andreas Krebbel  ---
Author: krebbel
Date: Tue Apr 25 11:11:48 2017
New Revision: 247189

URL: https://gcc.gnu.org/viewcvs?rev=247189=gcc=rev
Log:
S/390: PR80080: Optimize atomic patterns.

The attached patch optimizes the atomic_exchange and atomic_compare
patterns on s390 and s390x (mostly limited to SImode and DImode).
Among general optimizaation, the changes fix most of the problems
reported in PR 80080:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

gcc/ChangeLog:

2017-04-25  Dominik Vogt  

Backport from mainline
2017-04-25  Dominik Vogt  

PR target/80080
* s390-protos.h (s390_expand_cs_hqi): Removed.
(s390_expand_cs, s390_expand_atomic_exchange_tdsi): New prototypes.
* config/s390/s390.c (s390_emit_compare_and_swap): Handle all integer
modes as well as CCZ1mode and CCZmode.
(s390_expand_atomic_exchange_tdsi, s390_expand_atomic): Adapt to new
signature of s390_emit_compare_and_swap.
(s390_expand_cs_hqi): Likewise, make static.
(s390_expand_cs_tdsi): Generate an explicit compare before trying
compare-and-swap, in some cases.
(s390_expand_cs): Wrapper function.
(s390_expand_atomic_exchange_tdsi): New backend specific expander for
atomic_exchange.
(s390_match_ccmode_set): Allow CCZmode <-> CCZ1 mode.
* config/s390/s390.md ("atomic_compare_and_swap"): Merge the
patterns for small and large integers.  Forbid symref memory operands.
Move expander to s390.c.  Require cc register.
("atomic_compare_and_swap_internal")
("*atomic_compare_and_swap_1")
("*atomic_compare_and_swapdi_2")
("*atomic_compare_and_swapsi_3"): Use s_operand to forbid
symref memory operands.  Remove CC mode and call s390_match_ccmode
instead.
("atomic_exchange"): Allow and implement all integer modes.

gcc/testsuite/ChangeLog:

2017-04-25  Dominik Vogt  

Backport from mainline
2017-04-25  Dominik Vogt  

PR target/80080
* gcc.target/s390/md/atomic_compare_exchange-1.c: New test.
* gcc.target/s390/md/atomic_compare_exchange-1.inc: New test.
* gcc.target/s390/md/atomic_exchange-1.inc: New test.


Added:
   
branches/gcc-7-branch/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
   
branches/gcc-7-branch/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
branches/gcc-7-branch/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c
Modified:
branches/gcc-7-branch/gcc/ChangeLog
branches/gcc-7-branch/gcc/config/s390/s390-protos.h
branches/gcc-7-branch/gcc/config/s390/s390.c
branches/gcc-7-branch/gcc/config/s390/s390.md
branches/gcc-7-branch/gcc/testsuite/ChangeLog

[Bug target/80080] S390: Isses with emitted cs-instructions for __atomic builtins.

2017-04-25 Thread krebbel at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

--- Comment #9 from Andreas Krebbel  ---
Author: krebbel
Date: Tue Apr 25 07:37:50 2017
New Revision: 247132

URL: https://gcc.gnu.org/viewcvs?rev=247132=gcc=rev
Log:
S/390: PR80080: Optimize atomic patterns.

The attached patch optimizes the atomic_exchange and atomic_compare
patterns on s390 and s390x (mostly limited to SImode and DImode).
Among general optimizaation, the changes fix most of the problems
reported in PR 80080:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

gcc/ChangeLog:

2017-04-25  Dominik Vogt  

PR target/80080
* s390-protos.h (s390_expand_cs_hqi): Removed.
(s390_expand_cs, s390_expand_atomic_exchange_tdsi): New prototypes.
* config/s390/s390.c (s390_emit_compare_and_swap): Handle all integer
modes as well as CCZ1mode and CCZmode.
(s390_expand_atomic_exchange_tdsi, s390_expand_atomic): Adapt to new
signature of s390_emit_compare_and_swap.
(s390_expand_cs_hqi): Likewise, make static.
(s390_expand_cs_tdsi): Generate an explicit compare before trying
compare-and-swap, in some cases.
(s390_expand_cs): Wrapper function.
(s390_expand_atomic_exchange_tdsi): New backend specific expander for
atomic_exchange.
(s390_match_ccmode_set): Allow CCZmode <-> CCZ1 mode.
* config/s390/s390.md ("atomic_compare_and_swap"): Merge the
patterns for small and large integers.  Forbid symref memory operands.
Move expander to s390.c.  Require cc register.
("atomic_compare_and_swap_internal")
("*atomic_compare_and_swap_1")
("*atomic_compare_and_swapdi_2")
("*atomic_compare_and_swapsi_3"): Use s_operand to forbid
symref memory operands.  Remove CC mode and call s390_match_ccmode
instead.
("atomic_exchange"): Allow and implement all integer modes.

gcc/testsuite/ChangeLog:

2017-04-25  Dominik Vogt  

PR target/80080
* gcc.target/s390/md/atomic_compare_exchange-1.c: New test.
* gcc.target/s390/md/atomic_compare_exchange-1.inc: New test.
* gcc.target/s390/md/atomic_exchange-1.inc: New test.


Added:
trunk/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
trunk/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
trunk/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/s390/s390-protos.h
trunk/gcc/config/s390/s390.c
trunk/gcc/config/s390/s390.md
trunk/gcc/testsuite/ChangeLog

[Bug target/80080] S390: Isses with emitted cs-instructions for __atomic builtins.

2017-03-22 Thread vogt at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

--- Comment #8 from Dominik Vogt  ---
The patch has a performance regression on s390x.

  .L1
lr  %r3,%r1
cs  %r1,%r4,0(%r2)
jne .L1

becomes

  .L1
cs %r1,%r3,0(%r2)
ipm %r4
sra %r4,28
cijne %r4,0,.L1

Although this can be fixed on s390/s390x in a follow-up patch, I cannot really
prove that something similar won't happen on other targets without testing
them.

[Bug target/80080] S390: Isses with emitted cs-instructions for __atomic builtins.

2017-03-21 Thread vogt at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

--- Comment #7 from Dominik Vogt  ---
(In reply to Jakub Jelinek from comment #6)
> I think it depends on what
> (success, old_reg) = compare-and-swap(mem, old_reg, new_reg)
> sets if success is true.  Is there a guarantee that old_reg will be assigned
> whatever has been passed as the second argument in that case on all targets?

Isn't that part of the interface of the atomic_compare_and_swap pattern?

Gccinternals:
> Operand 1 is an output operand which is set to the contents of the memory
> before the operation was attempted. Operand 3 is the value that is
> expected to be in memory.

If the CS instruction on a target does not overwrite the register old_reg with
the memory value before swapping, the target's backend must do that manually, I
think.

[Bug target/80080] S390: Isses with emitted cs-instructions for __atomic builtins.

2017-03-21 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

--- Comment #6 from Jakub Jelinek  ---
I think it depends on what
(success, old_reg) = compare-and-swap(mem, old_reg, new_reg)
sets if success is true.  Is there a guarantee that old_reg will be assigned
whatever has been passed as the second argument in that case on all targets?
If yes, then it can work.  If not, then it will return a different value.
So I think you need to analyze all targets that have HW compare-and-swap and
verify that.

[Bug target/80080] S390: Isses with emitted cs-instructions for __atomic builtins.

2017-03-21 Thread vogt at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

--- Comment #5 from Dominik Vogt  ---
What case do you mean?  The

+  if (oldval != old_reg)
+emit_move_insn (old_reg, oldval);

at the end should make sure that the oldval-rtx is either not changed by the
call, or its value is copied into old_reg at runtime.  In both cases, the value
of old_reg should be the old value to be returned (as it was hopefully before
the patch).  Me and Stefan have spent quite some time to proof read this patch;
do you see somethings we might have overlooked?

[Bug target/80080] S390: Isses with emitted cs-instructions for __atomic builtins.

2017-03-21 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
(In reply to Dominik Vogt from comment #3)
> Most of this can be fixed in the backend, but all attempts to get rid of the
> additional load ("lr %r3,%31") in "bar" within the backend have failed. 
> This is caused by overly complicated code being generated in
> expand_compare_and_swap_loop().  I'm currently testing the patch below which
> eliminates the superfluous variable cmp_reg (and the register allocated for
> it).  Is there any chance to get something like this into Gcc7?  The patch
> could possibly also help other targets.  (Will post on gcc-patches when the
> tests are complete.)
> 
> 
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -5798,17 +5798,15 @@ find_cc_set (rtx x, const_rtx pat, void *data)
>  static bool
>  expand_compare_and_swap_loop (rtx mem, rtx old_reg, rtx new_reg, rtx seq)
>  {
> -  machine_mode mode = GET_MODE (mem);
>rtx_code_label *label;
> -  rtx cmp_reg, success, oldval;
> +  rtx success, oldval;
>  
>/* The loop we want to generate looks like
>  
> - cmp_reg = mem;
> + old_reg = mem;
>label:
> -old_reg = cmp_reg;
>   seq;
> - (success, cmp_reg) = compare-and-swap(mem, old_reg, new_reg)
> + (success, old_reg) = compare-and-swap(mem, old_reg, new_reg)
>   if (success)
> goto label;
>  
> @@ -5816,23 +5814,21 @@ expand_compare_and_swap_loop (rtx mem, rtx old_reg,
> rtx new_reg, rtx seq)
>   iterations use the value loaded by the compare-and-swap pattern.  */
>  
>label = gen_label_rtx ();
> -  cmp_reg = gen_reg_rtx (mode);
>  
> -  emit_move_insn (cmp_reg, mem);
> +  emit_move_insn (old_reg, mem);
>emit_label (label);
> -  emit_move_insn (old_reg, cmp_reg);
>if (seq)
>  emit_insn (seq);
>  
>success = NULL_RTX;
> -  oldval = cmp_reg;
> +  oldval = old_reg;
>if (!expand_atomic_compare_and_swap (, , mem, old_reg,
>  new_reg, false, MEMMODEL_SYNC_SEQ_CST,
>  MEMMODEL_RELAXED))
>  return false;
>  
> -  if (oldval != cmp_reg)
> -emit_move_insn (cmp_reg, oldval);
> +  if (oldval != old_reg)
> +emit_move_insn (old_reg, oldval);
>  
>/* Mark this jump predicted not taken.  */
>emit_cmp_and_jump_insns (success, const0_rtx, EQ, const0_rtx,

That doesn't look correct.  expand_compare_and_swap_loop is called from two
places and at least in the first case it wants old_reg to contain on success
something that is the target that will be returned.
While with your patch it might be something different.

[Bug target/80080] S390: Isses with emitted cs-instructions for __atomic builtins.

2017-03-21 Thread vogt at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

Dominik Vogt  changed:

   What|Removed |Added

 CC||vogt at linux dot vnet.ibm.com

--- Comment #3 from Dominik Vogt  ---
Most of this can be fixed in the backend, but all attempts to get rid of the
additional load ("lr %r3,%31") in "bar" within the backend have failed.  This
is caused by overly complicated code being generated in
expand_compare_and_swap_loop().  I'm currently testing the patch below which
eliminates the superfluous variable cmp_reg (and the register allocated for
it).  Is there any chance to get something like this into Gcc7?  The patch
could possibly also help other targets.  (Will post on gcc-patches when the
tests are complete.)


--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -5798,17 +5798,15 @@ find_cc_set (rtx x, const_rtx pat, void *data)
 static bool
 expand_compare_and_swap_loop (rtx mem, rtx old_reg, rtx new_reg, rtx seq)
 {
-  machine_mode mode = GET_MODE (mem);
   rtx_code_label *label;
-  rtx cmp_reg, success, oldval;
+  rtx success, oldval;

   /* The loop we want to generate looks like

-   cmp_reg = mem;
+   old_reg = mem;
   label:
-old_reg = cmp_reg;
seq;
-   (success, cmp_reg) = compare-and-swap(mem, old_reg, new_reg)
+   (success, old_reg) = compare-and-swap(mem, old_reg, new_reg)
if (success)
  goto label;

@@ -5816,23 +5814,21 @@ expand_compare_and_swap_loop (rtx mem, rtx old_reg, rtx
new_reg, rtx seq)
  iterations use the value loaded by the compare-and-swap pattern.  */

   label = gen_label_rtx ();
-  cmp_reg = gen_reg_rtx (mode);

-  emit_move_insn (cmp_reg, mem);
+  emit_move_insn (old_reg, mem);
   emit_label (label);
-  emit_move_insn (old_reg, cmp_reg);
   if (seq)
 emit_insn (seq);

   success = NULL_RTX;
-  oldval = cmp_reg;
+  oldval = old_reg;
   if (!expand_atomic_compare_and_swap (, , mem, old_reg,
   new_reg, false, MEMMODEL_SYNC_SEQ_CST,
   MEMMODEL_RELAXED))
 return false;

-  if (oldval != cmp_reg)
-emit_move_insn (cmp_reg, oldval);
+  if (oldval != old_reg)
+emit_move_insn (old_reg, oldval);

   /* Mark this jump predicted not taken.  */
   emit_cmp_and_jump_insns (success, const0_rtx, EQ, const0_rtx,

[Bug target/80080] S390: Isses with emitted cs-instructions for __atomic builtins.

2017-03-17 Thread krebbel at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

Andreas Krebbel  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2017-03-17
 Ever confirmed|0   |1

--- Comment #2 from Andreas Krebbel  ---
(In reply to Richard Biener from comment #1)
> Is this a duplicate of PR79981 (and thus fixed in trunk)?

No it is a different problem. We are working on a fix.

[Bug target/80080] S390: Isses with emitted cs-instructions for __atomic builtins.

2017-03-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization

--- Comment #1 from Richard Biener  ---
Is this a duplicate of PR79981 (and thus fixed in trunk)?