[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600 Hans-Peter Nilsson changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #14 from Hans-Peter Nilsson --- If there was a reason to keep this open, then I've forgot to mention it. Closing as fixed.
[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600 --- Comment #13 from CVS Commits --- The releases/gcc-10 branch has been updated by Hans-Peter Nilsson : https://gcc.gnu.org/g:ac2347289d4d8000a078b540b6c9c2c74bb33471 commit r10-9121-gac2347289d4d8000a078b540b6c9c2c74bb33471 Author: Hans-Peter Nilsson Date: Fri Dec 4 20:27:23 2020 +0100 doc/implement-c.texi: About same-as-scalar-type volatile aggregate accesses, PR94600 We say very little about reads and writes to aggregate / compound objects, just scalar objects (i.e. assignments don't cause reads). Let's lets say something safe about aggregate objects, but only for those that are the same size as a scalar type. There's an equal-sounding section (Volatiles) in extend.texi, but this seems a more appropriate place, as specifying the behavior of a standard qualifier. gcc: 2020-12-04 Hans-Peter Nilsson Martin Sebor PR middle-end/94600 * doc/implement-c.texi (Qualifiers implementation): Add blurb about access to the whole of a volatile aggregate object, only for same-size as a scalar object. (cherry picked from commit eb79f4db49c5f5a807555e9d374524664eb537bf)
[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600 --- Comment #12 from CVS Commits --- The master branch has been updated by Hans-Peter Nilsson : https://gcc.gnu.org/g:eb79f4db49c5f5a807555e9d374524664eb537bf commit r11-5749-geb79f4db49c5f5a807555e9d374524664eb537bf Author: Hans-Peter Nilsson Date: Fri Dec 4 20:27:23 2020 +0100 doc/implement-c.texi: About same-as-scalar-type volatile aggregate accesses, PR94600 We say very little about reads and writes to aggregate / compound objects, just scalar objects (i.e. assignments don't cause reads). Let's lets say something safe about aggregate objects, but only for those that are the same size as a scalar type. There's an equal-sounding section (Volatiles) in extend.texi, but this seems a more appropriate place, as specifying the behavior of a standard qualifier. gcc: 2020-12-04 Hans-Peter Nilsson Martin Sebor PR middle-end/94600 * doc/implement-c.texi (Qualifiers implementation): Add blurb about access to the whole of a volatile aggregate object, only for same-size as a scalar object.
[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600 --- Comment #11 from CVS Commits --- The master branch has been updated by Hans-Peter Nilsson : https://gcc.gnu.org/g:5db1fa9bc69dd58ce2f93dd707d05efcfe89ffa8 commit r11-2682-g5db1fa9bc69dd58ce2f93dd707d05efcfe89ffa8 Author: Hans-Peter Nilsson Date: Thu Aug 13 05:12:23 2020 +0200 gcc.dg/pr94600-5.c .. -8.c: Align struct t0 explictly, as a type, PR middle-end/94600 The bitfield-struct t0 in gcc.dg/pr94600-1.c ..-4.c is assigned to a pointer that is a (volatile-and-pointer-)cast literal, so gcc doesn't need to be otherwise told that the address is aligned. But, variants pr94600-5.c ..-8.c are assigned through a "volatile t0 *", and rely on the *type* being naturally aligned, or that the machine has non-strict-alignment moves. Unfortunately, systems exist (for some definitions of exist) where such structs aren't always naturally aligned, for example if it contains only (small) bitfields, even though the size is a naturally accessible size. Specifically, the mmix-knuth-mmixware port has only *byte* alignment for this struct. (If an int is added to the struct, alignment is promoted.) IOW, a prerequisite of the test is false: the struct doesn't have the same alignment as an integer of the same size. The effect is assignment in byte-size pieces, and the test fails. (For a non-volatile assignment, memcpy is called.) That's easily fixable by defining the type as having a specific alignment. This is also closer to the type in the original code, and also as the first variants aren't affected, no second thought or re-visit of pre-fixed compiler is needed. I don't plan to back-port this to gcc-10 branch however. I did sanity-check that the tests still pass on ppc64le-linux. gcc/testsuite: PR middle-end/94600 * gcc.dg/pr94600-5.c, gcc.dg/pr94600-6.c, gcc.dg/pr94600-7.c, gcc.dg/pr94600-8.c: Align t0 to 4-byte boundary.
[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600 --- Comment #10 from CVS Commits --- The releases/gcc-10 branch has been updated by Hans-Peter Nilsson : https://gcc.gnu.org/g:6f49c66ed4e060c333d8bcd5e4ae127ab7bca15b commit r10-8468-g6f49c66ed4e060c333d8bcd5e4ae127ab7bca15b Author: Hans-Peter Nilsson Date: Sun Jul 5 20:50:52 2020 +0200 PR94600: fix volatile access to the whole of a compound object. The store to the whole of each volatile object was picked apart like there had been an individual assignment to each of the fields. Reads were added as part of that; see PR for details. The reads from volatile memory were a clear bug; individual stores questionable. A separate patch clarifies the docs. gcc: 2020-07-09 Richard Biener PR middle-end/94600 * expr.c (expand_constructor): Make a temporary also if we're storing to volatile memory. gcc/testsuite: 2020-07-09 Hans-Peter Nilsson PR middle-end/94600 * gcc.dg/pr94600-1.c, gcc.dg/pr94600-2.c, gcc.dg/pr94600-3.c, gcc.dg/pr94600-4.c, gcc.dg/pr94600-5.c, gcc.dg/pr94600-6.c, gcc.dg/pr94600-7.c, gcc.dg/pr94600-8.c: New tests. (cherry picked from commit a4aca1edaf37d43b2b7e9111825837a7a317b1b0)
[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600 --- Comment #9 from CVS Commits --- The master branch has been updated by Hans-Peter Nilsson : https://gcc.gnu.org/g:a4aca1edaf37d43b2b7e9111825837a7a317b1b0 commit r11-2045-ga4aca1edaf37d43b2b7e9111825837a7a317b1b0 Author: Hans-Peter Nilsson Date: Sun Jul 5 20:50:52 2020 +0200 PR94600: fix volatile access to the whole of a compound object. The store to the whole of each volatile object was picked apart like there had been an individual assignment to each of the fields. Reads were added as part of that; see PR for details. The reads from volatile memory were a clear bug; individual stores questionable. A separate patch clarifies the docs. gcc: 2020-07-09 Richard Biener PR middle-end/94600 * expr.c (expand_constructor): Make a temporary also if we're storing to volatile memory. gcc/testsuite: 2020-07-09 Hans-Peter Nilsson PR middle-end/94600 * gcc.dg/pr94600-1.c, gcc.dg/pr94600-2.c, gcc.dg/pr94600-3.c, gcc.dg/pr94600-4.c, gcc.dg/pr94600-5.c, gcc.dg/pr94600-6.c, gcc.dg/pr94600-7.c, gcc.dg/pr94600-8.c: New tests.
[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600 --- Comment #8 from Hans-Peter Nilsson --- (In reply to jos...@codesourcery.com from comment #7) > The Arm AAPCS has detailed rules for operations on individual volatile > bit-fields, but not for this case where the whole struct is volatile and > the operation is on the whole struct. I think you can consider this issue > in general as quality-of-implementation. Can we extrapolate this statement to the language level, that this is implementation- or port-specific? I'm curious to see how those reads from volatile LHS can be justified from a correctness point of view, in the context of this being an aggregate assignment.
[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600 --- Comment #7 from joseph at codesourcery dot com --- The Arm AAPCS has detailed rules for operations on individual volatile bit-fields, but not for this case where the whole struct is volatile and the operation is on the whole struct. I think you can consider this issue in general as quality-of-implementation.
[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600 --- Comment #6 from Richard Biener --- So for example diff --git a/gcc/expr.c b/gcc/expr.c index b97c217e86d..a980811c1e9 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -8263,7 +8263,8 @@ expand_constructor (tree exp, rtx target, enum expand_modifier modifier, /* Handle calls that pass values in multiple non-contiguous locations. The Irix 6 ABI has examples of this. */ if (target == 0 || ! safe_from_p (target, exp, 1) - || GET_CODE (target) == PARALLEL || modifier == EXPAND_STACK_PARM) + || GET_CODE (target) == PARALLEL || modifier == EXPAND_STACK_PARM + || (GET_CODE (target) == MEM && MEM_VOLATILE_P (target))) { if (avoid_temp_mem) return NULL_RTX; avoids the constant folding and produces foo: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 mov r3, #655360 ldr r2, .L4 push{r4, lr} ldm r2, {r4, lr} ldr ip, [r2, #8] add r0, r2, #12 str r4, [r3, #44] ldm r0, {r0, r1, r2} str lr, [r3, #48] str ip, [r3, #52] str r0, [r3, #56] str r1, [r3, #60] str r2, [r3, #64] pop {r4, pc} on arm.
[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600 Richard Biener changed: What|Removed |Added CC||jsm28 at gcc dot gnu.org, ||rguenth at gcc dot gnu.org --- Comment #5 from Richard Biener --- Joseph, what's the constraints on an aggregate assignment through a volatile qualified pointer using a type like typedef struct { unsigned int f0 : 4; unsigned int f1 : 11; unsigned int f2 : 10; unsigned int f3 : 7; } t0; ?
[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600 --- Comment #4 from Richard Biener --- We're going through rtx store_expr (tree exp, rtx target, int call_param_p, bool nontemporal, bool reverse) { ... normal_expr: /* If we want to use a nontemporal or a reverse order store, force the value into a register first. */ tmp_target = nontemporal || reverse ? NULL_RTX : target; temp = expand_expr_real (exp, tmp_target, GET_MODE (target), (call_param_p ? EXPAND_STACK_PARM : EXPAND_NORMAL), _rtl, false); which expands directly into target via else if (optimize >= 1 && modifier != EXPAND_CONST_ADDRESS && modifier != EXPAND_INITIALIZER && modifier != EXPAND_MEMORY && TREE_READONLY (array) && ! TREE_SIDE_EFFECTS (array) && TREE_CODE (index) == INTEGER_CST && (VAR_P (array) || TREE_CODE (array) == CONST_DECL) && (init = ctor_for_folding (array)) != error_mark_node) { ... /* If VALUE is a CONSTRUCTOR, this optimization is only useful if this doesn't store the CONSTRUCTOR into memory. If it does, it is more efficient to just load the data from the array directly. */ rtx ret = expand_constructor (value, target, modifier, true); and then store_constructor (exp, target, 0, int_expr_size (exp), false); store_constructor cannot anymore fail and since we're invoking expand_constructor with avoid_temp_mem == true that's the place to fixup I think, possibly by amending /* Handle calls that pass values in multiple non-contiguous locations. The Irix 6 ABI has examples of this. */ if (target == 0 || ! safe_from_p (target, exp, 1) || GET_CODE (target) == PARALLEL || modifier == EXPAND_STACK_PARM) { if (avoid_temp_mem) return NULL_RTX; with the case for volatile target. Or of course at the very toplevel by passing down a different expand modifier or by skipping ctor folding. Note that an aggregate volatile assignment is not very well specified when bitfields are involved - C says the copy is elementwise and then strict volatile bitfield semantics would suggest that N stores happen together with the required reads for RMW cycles. So I'm not sure if this is really a bug or rather wrong user expectation.
[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600 --- Comment #3 from Richard Biener --- Confirmed on arm. The odd thing is that the optimized GIMPLE for foo() is much nicer: foo () { [local count: 153437707]: MEM[(volatile struct t0 *)655404B] ={v} a0[0]; MEM[(volatile struct t0 *)655408B] ={v} a0[1]; MEM[(volatile struct t0 *)655412B] ={v} a0[2]; MEM[(volatile struct t0 *)655416B] ={v} a0[3]; MEM[(volatile struct t0 *)655420B] ={v} a0[4]; MEM[(volatile struct t0 *)655424B] ={v} a0[5]; return; while for bar() we have additional stmts to construct the stack objects though the actual stores are the same: [local count: 1073741824]: a01 = {}; a02 = {}; a03 = {}; a04 = {}; a05 = {}; MEM[(struct *)] = 33556023; MEM[(volatile struct t0 *)655404B] ={v} a00; _38 = MEM[(struct *)]; _39 = _38 & 33521664; _40 = _39 | 33558455; MEM[(struct *)] = _40; MEM[(volatile struct t0 *)655408B] ={v} a01; _41 = MEM[(struct *)]; _42 = _41 & 33521664; _43 = _42 | 167774200; MEM[(struct *)] = _43; MEM[(volatile struct t0 *)655412B] ={v} a02; _44 = MEM[(struct *)]; _45 = _44 & 33521664; _46 = _45 | 33554453; MEM[(struct *)] = _46; MEM[(volatile struct t0 *)655416B] ={v} a03; _47 = MEM[(struct *)]; _48 = _47 & 33521664; _49 = _48 | 33554453; MEM[(struct *)] = _49; MEM[(volatile struct t0 *)655420B] ={v} a04; _50 = MEM[(struct *)]; _51 = _50 & 33521664; _52 = _51 | 33554453; MEM[(struct *)] = _52; MEM[(volatile struct t0 *)655424B] ={v} a05; a00 ={v} {CLOBBER}; a01 ={v} {CLOBBER}; a02 ={v} {CLOBBER}; a03 ={v} {CLOBBER}; a04 ={v} {CLOBBER}; a05 ={v} {CLOBBER}; return; I suspect that the RTL expansion for foo() applies "premature" optimization via constant folding the aggregate initializer and thus compiling this into partly initializing the destination which isn't valid for volatile qualified LHSs. For bar() there's nothing to optimize for it.
[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600 --- Comment #2 from Hans-Peter Nilsson --- For various targets and gcc versions I've noticed this bug as far back as gcc-4.7.
[Bug middle-end/94600] Ignored volatile specifier on loop unrolling and bitfield misoptimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94600 Hans-Peter Nilsson changed: What|Removed |Added Last reconfirmed||2020-04-14 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Hans-Peter Nilsson --- Confirmed for arm-eabi at a615ea71bc8:dda9278f192:bb87d5cc77db1 and cris-elf at fa9a57ed91d:b46f6c8e2e8:a126a1577ffcbf.