[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization

2021-12-20 Thread hp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600

Hans-Peter Nilsson  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #14 from Hans-Peter Nilsson  ---
If there was a reason to keep this open, then I've forgot to mention it.
Closing as fixed.

[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization

2020-12-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600

--- Comment #13 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Hans-Peter Nilsson
:

https://gcc.gnu.org/g:ac2347289d4d8000a078b540b6c9c2c74bb33471

commit r10-9121-gac2347289d4d8000a078b540b6c9c2c74bb33471
Author: Hans-Peter Nilsson 
Date:   Fri Dec 4 20:27:23 2020 +0100

doc/implement-c.texi: About same-as-scalar-type volatile aggregate
accesses, PR94600

We say very little about reads and writes to aggregate /
compound objects, just scalar objects (i.e. assignments don't
cause reads).  Let's lets say something safe about aggregate
objects, but only for those that are the same size as a scalar
type.

There's an equal-sounding section (Volatiles) in extend.texi,
but this seems a more appropriate place, as specifying the
behavior of a standard qualifier.

gcc:

2020-12-04  Hans-Peter Nilsson  
Martin Sebor  

PR middle-end/94600
* doc/implement-c.texi (Qualifiers implementation): Add blurb
about access to the whole of a volatile aggregate object, only for
same-size as a scalar object.

(cherry picked from commit eb79f4db49c5f5a807555e9d374524664eb537bf)

[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization

2020-12-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600

--- Comment #12 from CVS Commits  ---
The master branch has been updated by Hans-Peter Nilsson :

https://gcc.gnu.org/g:eb79f4db49c5f5a807555e9d374524664eb537bf

commit r11-5749-geb79f4db49c5f5a807555e9d374524664eb537bf
Author: Hans-Peter Nilsson 
Date:   Fri Dec 4 20:27:23 2020 +0100

doc/implement-c.texi: About same-as-scalar-type volatile aggregate
accesses, PR94600

We say very little about reads and writes to aggregate /
compound objects, just scalar objects (i.e. assignments don't
cause reads).  Let's lets say something safe about aggregate
objects, but only for those that are the same size as a scalar
type.

There's an equal-sounding section (Volatiles) in extend.texi,
but this seems a more appropriate place, as specifying the
behavior of a standard qualifier.

gcc:

2020-12-04  Hans-Peter Nilsson  
Martin Sebor  

PR middle-end/94600
* doc/implement-c.texi (Qualifiers implementation): Add blurb
about access to the whole of a volatile aggregate object, only for
same-size as a scalar object.

[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization

2020-08-12 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600

--- Comment #11 from CVS Commits  ---
The master branch has been updated by Hans-Peter Nilsson :

https://gcc.gnu.org/g:5db1fa9bc69dd58ce2f93dd707d05efcfe89ffa8

commit r11-2682-g5db1fa9bc69dd58ce2f93dd707d05efcfe89ffa8
Author: Hans-Peter Nilsson 
Date:   Thu Aug 13 05:12:23 2020 +0200

gcc.dg/pr94600-5.c .. -8.c: Align struct t0 explictly, as a type, PR
middle-end/94600

The bitfield-struct t0 in gcc.dg/pr94600-1.c ..-4.c is assigned to a
pointer that is a (volatile-and-pointer-)cast literal, so gcc doesn't
need to be otherwise told that the address is aligned.  But, variants
pr94600-5.c ..-8.c are assigned through a "volatile t0 *", and rely on
the *type* being naturally aligned, or that the machine has
non-strict-alignment moves.

Unfortunately, systems exist (for some definitions of exist) where
such structs aren't always naturally aligned, for example if it
contains only (small) bitfields, even though the size is a naturally
accessible size.  Specifically, the mmix-knuth-mmixware port has only
*byte* alignment for this struct.  (If an int is added to the struct,
alignment is promoted.)  IOW, a prerequisite of the test is false: the
struct doesn't have the same alignment as an integer of the same size.
The effect is assignment in byte-size pieces, and the test fails.
(For a non-volatile assignment, memcpy is called.)  That's easily
fixable by defining the type as having a specific alignment.  This is
also closer to the type in the original code, and also as the first
variants aren't affected, no second thought or re-visit of pre-fixed
compiler is needed.  I don't plan to back-port this to gcc-10 branch
however.  I did sanity-check that the tests still pass on
ppc64le-linux.

gcc/testsuite:

PR middle-end/94600
* gcc.dg/pr94600-5.c, gcc.dg/pr94600-6.c, gcc.dg/pr94600-7.c,
gcc.dg/pr94600-8.c: Align t0 to 4-byte boundary.

[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization

2020-07-13 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600

--- Comment #10 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Hans-Peter Nilsson
:

https://gcc.gnu.org/g:6f49c66ed4e060c333d8bcd5e4ae127ab7bca15b

commit r10-8468-g6f49c66ed4e060c333d8bcd5e4ae127ab7bca15b
Author: Hans-Peter Nilsson 
Date:   Sun Jul 5 20:50:52 2020 +0200

PR94600: fix volatile access to the whole of a compound object.

The store to the whole of each volatile object was picked apart
like there had been an individual assignment to each of the
fields.  Reads were added as part of that; see PR for details.
The reads from volatile memory were a clear bug; individual
stores questionable.  A separate patch clarifies the docs.

gcc:

2020-07-09  Richard Biener  

PR middle-end/94600
* expr.c (expand_constructor): Make a temporary also if we're
storing to volatile memory.

gcc/testsuite:

2020-07-09  Hans-Peter Nilsson  

PR middle-end/94600
* gcc.dg/pr94600-1.c, gcc.dg/pr94600-2.c, gcc.dg/pr94600-3.c,
gcc.dg/pr94600-4.c, gcc.dg/pr94600-5.c, gcc.dg/pr94600-6.c,
gcc.dg/pr94600-7.c, gcc.dg/pr94600-8.c: New tests.

(cherry picked from commit a4aca1edaf37d43b2b7e9111825837a7a317b1b0)

[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization

2020-07-13 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600

--- Comment #9 from CVS Commits  ---
The master branch has been updated by Hans-Peter Nilsson :

https://gcc.gnu.org/g:a4aca1edaf37d43b2b7e9111825837a7a317b1b0

commit r11-2045-ga4aca1edaf37d43b2b7e9111825837a7a317b1b0
Author: Hans-Peter Nilsson 
Date:   Sun Jul 5 20:50:52 2020 +0200

PR94600: fix volatile access to the whole of a compound object.

The store to the whole of each volatile object was picked apart
like there had been an individual assignment to each of the
fields.  Reads were added as part of that; see PR for details.
The reads from volatile memory were a clear bug; individual
stores questionable.  A separate patch clarifies the docs.

gcc:

2020-07-09  Richard Biener  

PR middle-end/94600
* expr.c (expand_constructor): Make a temporary also if we're
storing to volatile memory.

gcc/testsuite:

2020-07-09  Hans-Peter Nilsson  

PR middle-end/94600
* gcc.dg/pr94600-1.c, gcc.dg/pr94600-2.c, gcc.dg/pr94600-3.c,
gcc.dg/pr94600-4.c, gcc.dg/pr94600-5.c, gcc.dg/pr94600-6.c,
gcc.dg/pr94600-7.c, gcc.dg/pr94600-8.c: New tests.

[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization

2020-04-15 Thread hp at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600

--- Comment #8 from Hans-Peter Nilsson  ---
(In reply to jos...@codesourcery.com from comment #7)
> The Arm AAPCS has detailed rules for operations on individual volatile 
> bit-fields, but not for this case where the whole struct is volatile and 
> the operation is on the whole struct.  I think you can consider this issue 
> in general as quality-of-implementation.

Can we extrapolate this statement to the language level, that this is
implementation- or port-specific?

I'm curious to see how those reads from volatile LHS can be justified from a
correctness point of view, in the context of this being an aggregate
assignment.

[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization

2020-04-15 Thread joseph at codesourcery dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600

--- Comment #7 from joseph at codesourcery dot com  ---
The Arm AAPCS has detailed rules for operations on individual volatile 
bit-fields, but not for this case where the whole struct is volatile and 
the operation is on the whole struct.  I think you can consider this issue 
in general as quality-of-implementation.

[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization

2020-04-15 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600

--- Comment #6 from Richard Biener  ---
So for example

diff --git a/gcc/expr.c b/gcc/expr.c
index b97c217e86d..a980811c1e9 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8263,7 +8263,8 @@ expand_constructor (tree exp, rtx target, enum
expand_modifier modifier,
   /* Handle calls that pass values in multiple non-contiguous
  locations.  The Irix 6 ABI has examples of this.  */
   if (target == 0 || ! safe_from_p (target, exp, 1)
-  || GET_CODE (target) == PARALLEL || modifier == EXPAND_STACK_PARM)
+  || GET_CODE (target) == PARALLEL || modifier == EXPAND_STACK_PARM
+  || (GET_CODE (target) == MEM && MEM_VOLATILE_P (target)))
 {
   if (avoid_temp_mem)
return NULL_RTX;

avoids the constant folding and produces

foo:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
mov r3, #655360
ldr r2, .L4
push{r4, lr}
ldm r2, {r4, lr}
ldr ip, [r2, #8]
add r0, r2, #12
str r4, [r3, #44]
ldm r0, {r0, r1, r2}
str lr, [r3, #48]
str ip, [r3, #52]
str r0, [r3, #56]
str r1, [r3, #60]
str r2, [r3, #64]
pop {r4, pc}

on arm.

[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization

2020-04-15 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600

Richard Biener  changed:

   What|Removed |Added

 CC||jsm28 at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org

--- Comment #5 from Richard Biener  ---
Joseph, what's the constraints on an aggregate assignment through a volatile
qualified pointer using a type like

typedef struct {
  unsigned int f0 : 4;
  unsigned int f1 : 11;
  unsigned int f2 : 10;
  unsigned int f3 : 7;
} t0;

?

[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization

2020-04-15 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600

--- Comment #4 from Richard Biener  ---
We're going through

rtx
store_expr (tree exp, rtx target, int call_param_p,
bool nontemporal, bool reverse)
{
...
  normal_expr:
  /* If we want to use a nontemporal or a reverse order store, force the
 value into a register first.  */
  tmp_target = nontemporal || reverse ? NULL_RTX : target;
  temp = expand_expr_real (exp, tmp_target, GET_MODE (target),
   (call_param_p
? EXPAND_STACK_PARM : EXPAND_NORMAL),
   _rtl, false);

which expands directly into target via

else if (optimize >= 1
 && modifier != EXPAND_CONST_ADDRESS
 && modifier != EXPAND_INITIALIZER
 && modifier != EXPAND_MEMORY
 && TREE_READONLY (array) && ! TREE_SIDE_EFFECTS (array)
 && TREE_CODE (index) == INTEGER_CST
 && (VAR_P (array) || TREE_CODE (array) == CONST_DECL)
 && (init = ctor_for_folding (array)) != error_mark_node)
  { 
...
  /* If VALUE is a CONSTRUCTOR, this
 optimization is only useful if
 this doesn't store the CONSTRUCTOR
 into memory.  If it does, it is more
 efficient to just load the data from
 the array directly.  */
  rtx ret = expand_constructor (value, target,
modifier, true);

and then store_constructor (exp, target, 0, int_expr_size (exp), false);

store_constructor cannot anymore fail and since we're invoking
expand_constructor with avoid_temp_mem == true that's the place to
fixup I think, possibly by amending

  /* Handle calls that pass values in multiple non-contiguous
 locations.  The Irix 6 ABI has examples of this.  */
  if (target == 0 || ! safe_from_p (target, exp, 1)
  || GET_CODE (target) == PARALLEL || modifier == EXPAND_STACK_PARM)
{ 
  if (avoid_temp_mem)
return NULL_RTX;

with the case for volatile target.  Or of course at the very toplevel
by passing down a different expand modifier or by skipping ctor folding.

Note that an aggregate volatile assignment is not very well specified
when bitfields are involved - C says the copy is elementwise and then
strict volatile bitfield semantics would suggest that N stores happen
together with the required reads for RMW cycles.

So I'm not sure if this is really a bug or rather wrong user expectation.

[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization

2020-04-15 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600

--- Comment #3 from Richard Biener  ---
Confirmed on arm.  The odd thing is that the optimized GIMPLE for foo() is
much nicer:

foo ()
{
   [local count: 153437707]:
  MEM[(volatile struct t0 *)655404B] ={v} a0[0];
  MEM[(volatile struct t0 *)655408B] ={v} a0[1];
  MEM[(volatile struct t0 *)655412B] ={v} a0[2];
  MEM[(volatile struct t0 *)655416B] ={v} a0[3];
  MEM[(volatile struct t0 *)655420B] ={v} a0[4];
  MEM[(volatile struct t0 *)655424B] ={v} a0[5];
  return;

while for bar() we have additional stmts to construct the stack objects
though the actual stores are the same:

   [local count: 1073741824]:
  a01 = {};
  a02 = {};
  a03 = {};
  a04 = {};
  a05 = {};
  MEM[(struct  *)] = 33556023;
  MEM[(volatile struct t0 *)655404B] ={v} a00;
  _38 = MEM[(struct  *)];
  _39 = _38 & 33521664;
  _40 = _39 | 33558455;
  MEM[(struct  *)] = _40;
  MEM[(volatile struct t0 *)655408B] ={v} a01;
  _41 = MEM[(struct  *)];
  _42 = _41 & 33521664;
  _43 = _42 | 167774200;
  MEM[(struct  *)] = _43;
  MEM[(volatile struct t0 *)655412B] ={v} a02;
  _44 = MEM[(struct  *)];
  _45 = _44 & 33521664;
  _46 = _45 | 33554453;
  MEM[(struct  *)] = _46;
  MEM[(volatile struct t0 *)655416B] ={v} a03;
  _47 = MEM[(struct  *)];
  _48 = _47 & 33521664;
  _49 = _48 | 33554453;
  MEM[(struct  *)] = _49;
  MEM[(volatile struct t0 *)655420B] ={v} a04;
  _50 = MEM[(struct  *)];
  _51 = _50 & 33521664;
  _52 = _51 | 33554453;
  MEM[(struct  *)] = _52;
  MEM[(volatile struct t0 *)655424B] ={v} a05;
  a00 ={v} {CLOBBER};
  a01 ={v} {CLOBBER};
  a02 ={v} {CLOBBER};
  a03 ={v} {CLOBBER};
  a04 ={v} {CLOBBER};
  a05 ={v} {CLOBBER};
  return;

I suspect that the RTL expansion for foo() applies "premature" optimization
via constant folding the aggregate initializer and thus compiling this into
partly initializing the destination which isn't valid for volatile qualified
LHSs.  For bar() there's nothing to optimize for it.

[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization

2020-04-14 Thread hp at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600

--- Comment #2 from Hans-Peter Nilsson  ---
For various targets and gcc versions I've noticed this bug as far back as
gcc-4.7.

[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization

2020-04-14 Thread hp at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600

Hans-Peter Nilsson  changed:

   What|Removed |Added

   Last reconfirmed||2020-04-14
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Hans-Peter Nilsson  ---
Confirmed for arm-eabi at a615ea71bc8:dda9278f192:bb87d5cc77db1 and cris-elf at
fa9a57ed91d:b46f6c8e2e8:a126a1577ffcbf.