[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-27 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #52 from post+gcc at ralfj dot de ---
For the point discussed earlier with the `restrict` in the musl memcpy, I had
another look at the definition of `restrict` and it's not entirely clear to me
any more that there is UB here. The restrict rules only apply to objects that
are "also modified (by any means)". Now the question is, does "*X = *X" modify
the object? Is a write always a modification or only if the stores
representation changes or only if the stored value changes?

If it requires a representation change, then "memcpy(x, x, n)" does not modify
anything, and hence there is no UB from "restrict".

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-27 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #51 from post+gcc at ralfj dot de ---
Oh great, I love it when one part of the C standard just adds exceptions to
statements made elsewhere. It's almost as if the authors want this to be as
hard to understand as possible...

That then raises the question which version of the signature is actually used
for building (and optimizing) the function: the one in the declaration or the
one in the definition. Does the standard have an answer to that?

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-27 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #49 from post+gcc at ralfj dot de ---
Even glibc itself seems to use `restrict`:

https://codebrowser.dev/glibc/glibc/string/string.h.html#43

So the compiler building glibc might inadvertently rely on the memory written
through dst and the memory read through src being disjoint, making even the
perfectly-overlapping case UB (unless the implementation has a guard somewhere
that skips the copy when src==dst, but I was not able to find such a guard).

(The implementation at https://codebrowser.dev/glibc/glibc/string/memcpy.c.html
does not have the `restrict`, but it includes the string.h header and I think
the compiler is allowed to apply attributes from the declaration to the
definition. Or, alternatively, it might even be UB to have `restrict` in one
place and not the other: "All declarations that refer to the same object or
function shall have compatible type; otherwise, the behavior is undefined" [C23
§6.2.7.2] and "For two qualified types to be compatible, both shall have the
identically qualified version of a compatible type; the order of type
qualifiers within a list of specifiers or qualifiers does not affect the
specified type" [C23 §6.7.3.11].)

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-24 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #48 from post+gcc at ralfj dot de ---
> Note, clang makes the same assumption apparently (while MSVC emits rep movs 
> inline and ICC either that, or calls _intel_fast_memcpy).

MSVC does the same thing as clang and GCC, if godbolt is to be trusted:
https://rust.godbolt.org/z/o7TevfvcY

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #43 from post+gcc at ralfj dot de ---
That is not my reading of the standard, but absent a proper (formal,
mathematical) spec I guess it is hard to tell.

With your reading, "if ((uintptr_t)src == 0x400400)" is UB, since changing the
"src" argument to a different copy located at that address would change the
execution. I strongly doubt that is the intent of the standard.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #41 from post+gcc at ralfj dot de ---
> This entitles a compiler to emit asm equivalent to if (src==dest) system("rm 
> -rf /") if it likes.

No it does not. restrict causes UB if the two pointers are used to access the
same memory. It has nothing to do with whether the pointers are equal. So it
would have to be "if (src==dest && n>0)" and the compiler would have to first
prove that "n>0" implies that later accesses through both pointers occur at
offset 0 (and at least one of them is a write).

But it's still UB to call this the way GCC does, that much I agree with.

> Our memcpy is not written in asm but in C, and it has the restrict qualifier 
> on src and dest.

The question is, does that qualifier help? If you remove it, does the generated
assembly change in any way, does the performance change? If not, it clearly
doesn't matter and can just be removed. If yes, then yeah compilers clearly
shouldn't call this with identical ranges.

Basically, compiler devs are claiming that libc can support the src==dest case
"for free", without any noticeable cost to other uses of the function. libc
devs are claiming that compilers can insert a branch that tests for equality,
without any noticeable cost. Both of these are testable hypotheses. I'm not a
compiler dev nor a libc dev, I just want to make sure that my compiler and my
libc use the same contract when talking to each other -- but I hope someone who
is a compiler dev or a libc dev can go and actually test these hypotheses,
rather than just speculating about it as has been happening so far.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-22 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #30 from post+gcc at ralfj dot de ---
There have been several assertions above that a certain way to solve this
either has no performance cost at all or severe performance cost. That sounds
like we are missing data -- ideally, someone would benchmark the actual cost of
emitting that branch. It seems kind of pointless to just make assertions about
the impact of this change without real data.

> On the other hand, expecting the libc memcpy to make this check greatly 
> pessimizes every reasonable small use of memcpy with a gratuitous branch for 
> what is undefined behavior and should never appear in any valid program.

I don't think this is true. As far as I can see, the performance impact of
having memcpy support the src==dest case is zero -- the assembly generated by
the current implementations already supports that case. (At least I have not
seen any evidence to the contrary.) No new check in memcpy is required.

[Bug c/112449] Arithmetic operations can produce signaling NaNs

2023-11-10 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112449

--- Comment #12 from post+gcc at ralfj dot de ---
> GCC will not create an sNaN out of nowhere.

That's the part I was hoping for. :)  I just don't think it obviously follows
from any docs (the C standard or GCC docs).

[Bug c/112449] Arithmetic operations can produce signaling NaNs

2023-11-09 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112449

--- Comment #10 from post+gcc at ralfj dot de ---
The standard says

> This annex does not require the full support for signaling NaNs specified in 
> IEC 60559. This
annex uses the term NaN, unless explicitly qualified, to denote quiet NaNs.
Where specification of
signaling NaNs is not provided, the behavior of signaling NaNs is
implementation-defined (either
treated as an IEC 60559 quiet NaN or treated as an IEC 60559 signaling NaN).

I have no idea how that allows a situation where the *output* of an operation
becomes signaling -- that can't usually happen no matter whether the inputs are
signaling or quiet.

But that seems to be the common interpretation.

Still, it seems important that `pow(1.0, 0.0/0.0)` returns `1.0` and not a NaN.
That's what the `pow` docs say. So for this there must be a guarantee that
`0.0/0.0` is a quiet NaN.

[Bug c/112449] Arithmetic operations can produce signaling NaNs

2023-11-08 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112449

--- Comment #7 from post+gcc at ralfj dot de ---
I guess the idea is that by passing a signaling NaN to a float operation, I am
already entering unspecified behavior, so it's okay for that float operation to
violate its usual contract and return a signaling NaN?

[Bug c/112449] Arithmetic operations can produce signaling NaNs

2023-11-08 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112449

--- Comment #6 from post+gcc at ralfj dot de ---
Hm, OTOH the C standard says

> The expressions 1×x, x/1, and x are equivalent (on IEC 60559 machines, among
others).

So, it seems like when they say "The + ,- , * , and / operators provide the IEC
60559 add, subtract, multiply, and divide operations.", they don't quite mean
that.

This seems internally inconsistent in the C standard, since C also permits
`pow(1, sNaN)` to behave different from `pow(1, qNaN)` -- and in fact they do
behave different in GNU's libm. So on the one hand `pow(1, x * y)` must always
be `1` but on the other hand it can return a NaN when `x` is an sNaN and `y` is
`1`?

[Bug c/112449] Arithmetic operations can produce signaling NaNs

2023-11-08 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112449

--- Comment #5 from post+gcc at ralfj dot de ---
> See 
> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-fsignaling-nans

That's unrelated. That's about whether operation on signaling NaNs can trap. I
am asking when operations can output a signaling NaN.

So, for code like

float x = y  z;
return is_signaling_nan(x);

when can that code return `true`? Normal IEEE semantics would say "never". And
yet if "z" is the constant 1,  is `*`, and "y" is a signaling NaN, then
this evidently can output a signaling NaN.

I would hope the answer is "this can output a signaling NaN only if one of the
inputs is a signaling NaN", but is that documented anywhere?

> Note mips and sh and a few other targets have the quiet bit meaning the 
> opposite.

I know. LLVM is currently buggy on those targets.

> GCC does document some of this on 
> https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Floating-point-implementation.html
>  but not the signaling nan part.

This seems to list a bunch of implementation-defined aspects of C? To my
knowledge, my question is not implementation defined. C (with the annex for
floating-point arithmetic) requires the above operations to always return
"false". GCC violates the C spec here (since it defines __STDC_IEC_559__,
declaring support for the annex), and it'd be good to know how far it is going
in that violation.

[Bug c/112449] New: Arithmetic operations can produce signaling NaNs

2023-11-08 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112449

Bug ID: 112449
   Summary: Arithmetic operations can produce signaling NaNs
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: post+gcc at ralfj dot de
  Target Milestone: ---

According to the IEEE 754 specification, the output of an arithmetic operation
can never be a signaling NaN. However, GCC performs optimizations that turn `x
* 1.0` into just `x`, and if `x` is a signaling NaN, that means that the
multiplication will now (seem to) return a signaling NaN. (proof of GCC
performing that transformation: https://godbolt.org/z/scPhn1d8s)

It is very common for C compilers to violate this IEEE 754 requirement, but it
does open the door to a great many questions. Since GCC evidently does not seem
to implement the original IEEE 754 semantics, it would be great to have some
documentation on what exactly GCC *does* implement, and in particular under
which conditions operations are allowed to return a signaling NaN.

So currently, GCC is either buggy because it violates the IEEE 754 spec, or
there's a documentation bug in that the actual floating point spec GCC intends
to implement is not documented. At least, all I was able to find is
https://gcc.gnu.org/wiki/FloatingPointMath, which just says "does not care
about signalling NaNs". (I hope this does not mean that any arithmetic
operation may arbitrarily produce signaling NaNs. That would be an issue for
operations which are sensitive to the difference between quiet NaN and
signaling NaN, such as `pow`.)

As a point of comparison, LLVM recently added this to their documentation to
answer these kinds of questions:
https://llvm.org/docs/LangRef.html#behavior-of-floating-point-nan-values. (That
PR was authored by me but received input from a lot of people.) LLVM goes
further than to just document signaling vs quiet NaN there, since in practice
there's some critical code that would break if arithmetic operations returned
NaNs with arbitrary bits in their payload (specifically, that would break NaN
boxing as performing by some JavaScript engines, or at least make it a lot less
efficient since engines would have to re-normalize NaNs after every single
operation -- which to my knowledge, they don't actually do in practice).

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-10-28 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

post+gcc at ralfj dot de changed:

   What|Removed |Added

 CC||post+gcc at ralfj dot de

--- Comment #23 from post+gcc at ralfj dot de ---
>  Is glibc community ready to provide such guarantee?

This is indeed a key question here I think. Currently GCC makes assumptions
that *even the libc produced by the same project* does not document as stable
guarantees. That's rather dissonant. The GNU project should at least within
itself come to a proper conclusion on the question of whether memcpy should be
UB or not when both pointers are equal. Right now we have everyone pointing at
everyone else, and users are left in the rain with their valgrind errors.

Ideally of course the C standard would be updated to ensure that slowly but
steadily, the memcpy contract is updated to match reality. That will take a
while, but given that this issue was filed 16 years ago (!), there clearly
would have been enough time. (If someone does, please join forces with the
clang people that are interested in getting C updated:
https://reviews.llvm.org/D86993#4585590).

But GNU controls glibc so there's not really any excuse for not updating those
docs, I think? glibc making such a move would be a great step towards
convincing valgrind and the C committee that memcpy should have defined
behavior when both pointers are equal.