[Bug middle-end/104800] reodering of potentially trapping operations and volatile stores

2022-03-07 Thread paulmckrcu at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104800

--- Comment #7 from Paul McKenney  ---
(In reply to Richard Biener from comment #6)
> Generally GCCs middle-end considers volatile stores (or loads) to not have
> any side-effects that are not visible in the IL.  That includes (synchronous)
> raise of signals (and thus effects on control flow), effects on other
> (non-volatile) memory, etc.  If a volatile access has to be considered having
> an effect on control flow then the standard should explicitely mention that,
> I don't think it does that at the moment, it merely says those statements
> invoke "changes in the state of the execution environment"
> 
> So volatile "issues" like this are no different from issues that arise
> with respect to observability when you consider asynchronous events and
> the effect of re-ordering of statements.  In fact C17 especially notes
> that objects that are not volatile sig_atomic_t have unspecified value
> on such events, so that IMHO also covers generating a trap from a volatile
> access.
> 
> Volatile accesses are deemed observable but we do not re-order those, this
> bug is about re-ordering unrelated stmts with respect to such accesses.
> 
> I don't think the standard requires us to fix this reported behavior.
> 
> A mitigation in the middle-end requires volatile accesses to behave as
> possibly altering control flow.  That's iffy if they continue to be
> represented as simple assign statements, the closest would be to always
> mark them as possibly trapping (something we cannot do right now).
> 
> The PRE "fix" is just covering up a single place in the compiler that fails
> to consider volatile accesses as altering control flow.

Understood that normal accesses can be reordered with volatile accesses.  Cases
where this is a problem can prevent it, for example, using an asm with the
"memory" clobber.

The concern here is not the memory access, but the potential for trapping,
which in this case is the potential division by zero.

[Bug middle-end/104800] reodering of potentially trapping operations and volatile stores

2022-03-05 Thread paulmckrcu at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104800

Paul McKenney  changed:

   What|Removed |Added

 CC||paulmckrcu at gmail dot com

--- Comment #3 from Paul McKenney  ---
Expanding on Martin's description...

The volatile write's device semantics are unknown to the compiler.  Those
semantics can include generating a trap.  This trap need not return, for
example, it could resume execution in a different place via
setjump()/longjump().  The overall design (including the
unknown-to-the-compiler device semantics) might guarantee that when b is equal
to zero, execution will resume somewhere else, that is, the trap handler is
guaranteed not to return in that case.

And this is one reason why possibly trapping operations must not be moved
across volatile accesses.  Doing so breaks device drivers.

[Bug target/96327] Inefficient increment through pointer to volatile on x86

2020-07-30 Thread paulmckrcu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96327

--- Comment #6 from Paul McKenney  ---
(In reply to Marc Glisse from comment #5)
> I don't think bug 3506 has been fixed (its status seems wrong to me). But
> don't worry, there are several other duplicates that still have status NEW
> (bug 50677 for instance).
> This is a sensible enhancement request, I think some gcc backends already do
> a similar optimization, it simply isn't a priority, because volatile almost
> means "don't optimize this".
> At least the difference between the gcc and clang codes matches those other
> PRs. Not sure why you are talking of address computations.

Probably because I was confused by the addressing mode and by wishful thinking,
and yes, you are quite correct.

Anyway, if you look at https://godbolt.org/z/fGze8E, you can see that
Clang/LLVM is using a to-memory addl rather than loading, adding, and storing.

[Bug target/96327] Inefficient increment through pointer to volatile on x86

2020-07-26 Thread paulmckrcu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96327

--- Comment #4 from Paul McKenney  ---
Bug 3506 has since been fixed, at least for the example shown in this bug
report, as you can see if you look at the godbolt, which shows that both
compilers generate a single addl instruction, which is exactly what the
submitter of 3506 requested.

This bug is different, instead asking that the calculation of the address of
the volatile object not be split into multiple instructions.

[Bug c/96327] Inefficient increment through pointer to volatile on x86

2020-07-26 Thread paulmckrcu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96327

--- Comment #1 from Paul McKenney  ---
This manifests on GCC trunk (see the godbolt.org URL), but was first noted in
gcc version 7.5.0.  This is specific to x86, but might apply to any other
architecture that provides increment-memory instructions.  This behavior does
not seem to be affected by GCC options.

This can be reproduced by placing the sample code in a file "rrl.c" and
running:

cc -o rrl rrl.c

This completes successfully with no error or warnings.

Running "cc -o rrl rrl.c --save-temps" generates the following file:

# 1 "rrl.c"
# 1 ""
# 1 ""
# 31 ""
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 32 "" 2
# 1 "rrl.c"
struct task {
int other;
int rcu_count;
};

struct task *current;

void rcu_read_lock()
{
(*(volatile int*)>rcu_count)++;
}

[Bug c/96327] New: Inefficient increment through pointer to volatile on x86

2020-07-26 Thread paulmckrcu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96327

Bug ID: 96327
   Summary: Inefficient increment through pointer to volatile on
x86
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: paulmckrcu at gmail dot com
  Target Milestone: ---

Although the code generation for increment (++, --) through a pointer to
volatile has improved greatly over the past 15 years, there is a case in which
the address calculation is needlessly done separately instead of by the x86
increment instruction itself.  Here is some example code:

struct task {
int other;
int rcu_count;
};

struct task *current;

void rcu_read_lock()
{
(*(volatile int*)>rcu_count)++;
}

As can be seen in godbolt.org (https://godbolt.org/z/fGze8E), the address
calculation is split by GCC. The shorter code sequence generated by clang/LLVM
is preferable.

Fixing this would allow the Linux kernel to use safer code sequences for
certain fastpaths, in this example, rcu_read_lock() and rcu_read_unlock() for
kernels built with CONFIG_PREEMPT=y.