[Bug libstdc++/77776] C++17 std::hypot implementation is poor

2024-04-10 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6

--- Comment #29 from g.peterh...@t-online.de ---
(In reply to Jakub Jelinek from comment #28)
> As long as the scale is a power of two or 1.0 / power of two, I don't see
> why any version wouldn't be inaccurate.

Yes, but the constant scale_up is incorrectly selected.
scale_up = std::exp2(Type(limits::max_exponent-1)) --> ok
scale_up = std::exp2(Type(limits::max_exponent/2)) --> error
scale_up = prev_power2(sqrt_max) --> error

scale_down = std::exp2(Type(limits::min_exponent-1))
also seems to me to be more favorable.

PS:
There seems to be a problem with random numbers and std::float16_t, which is
why I use std::uniform_real_distribution. I have not yet found
out exactly where the error lies.

thx
Gero


template 
inline constexpr Type   hypot_exp(Type x, Type y, Type z)   noexcept
{
using limits = std::numeric_limits;

constexpr Type
zero = 0;

x = std::abs(x);
y = std::abs(y);
z = std::abs(z);

if (!(std::isnormal(x) && std::isnormal(y) && std::isnormal(z)))
[[unlikely]]
{
if  (std::isinf(x) | std::isinf(y) | std::isinf(z))
return limits::infinity();
else if (std::isnan(x) | std::isnan(y) | std::isnan(z)) return
limits::quiet_NaN();
else
{
const bool
xz{x == zero},
yz{y == zero},
zz{z == zero};

if (xz)
{
if (yz) return zz ? zero : z;
else if (zz)return y;
}
else if (yz && zz)  return x;
}
}

if (x > z) std::swap(x, z);
if (y > z) std::swap(y, z);

int
exp;

z = std::frexp(z, );
y = std::ldexp(y, -exp);
x = std::ldexp(x, -exp);
return std::ldexp(std::sqrt(__builtin_assoc_barrier(x*x + y*y) + z*z),
exp);
}

template 
inline constexpr Type   hypot_gp(Type x, Type y, Type z)noexcept
{
using limits = std::numeric_limits;

constexpr Type
sqrt_min= std::sqrt(limits::min()),
sqrt_max= std::sqrt(limits::max()),
scale_up= std::exp2(Type(limits::max_exponent-1)),
scale_down  = std::exp2(Type(limits::min_exponent-1)),
zero= 0;

x = std::abs(x);
y = std::abs(y);
z = std::abs(z);

if (!(std::isnormal(x) && std::isnormal(y) && std::isnormal(z)))
[[unlikely]]
{
if  (std::isinf(x) | std::isinf(y) | std::isinf(z))
return limits::infinity();
else if (std::isnan(x) | std::isnan(y) | std::isnan(z)) return
limits::quiet_NaN();
else
{
const bool
xz{x == zero},
yz{y == zero},
zz{z == zero};

if (xz)
{
if (yz) return zz ? zero : z;
else if (zz)return y;
}
else if (yz && zz)  return x;
}
}

if (x > z) std::swap(x, z);
if (y > z) std::swap(y, z);

if (const bool b{z>=sqrt_min}; b && z<=sqrt_max) [[likely]]
{
//  no scale
return std::sqrt(__builtin_assoc_barrier(x*x + y*y) + z*z);
}
else
{
const Type
scale = b ? scale_down : scale_up;

x *= scale;
y *= scale;
z *= scale;
return std::sqrt(__builtin_assoc_barrier(x*x + y*y) + z*z) /
scale;
}
}

template 
voidtest(const size_t count, const Type min, const Type max, const Type
factor)
{
std::random_device rd{};
std::mt19937 gen{rd()};
std::uniform_real_distribution dis{min, max};

auto rnd = [&]() noexcept -> Type { return Type(dis(gen) * factor); };

for (size_t i=0; i;

test(1024*1024, 0.5, 1, limits::max());
test(1024*1024, 0, 1, limits::min());

return EXIT_SUCCESS;
}

[Bug libstdc++/77776] C++17 std::hypot implementation is poor

2024-04-10 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6

--- Comment #27 from g.peterh...@t-online.de ---
Hi Matthias,
thanks for your benchmark. I still have 2 questions:

1) Accuracy
The frexp/ldexp variant seems to be the most accurate; is that correct? Then
other constants would have to be used in hypot_gp:
scale_up   = std::exp2(Type(limits::max_exponent-1))
scale_down = std::exp2(Type(limits::min_exponent-1))

2) Speed
Your benchmark outputs several columns (Δ)Latency/(Δ)Throughput/Speedup. What
exactly do the values stand for; what should be optimized for?

thx
Gero

[Bug libquadmath/114623] sqrtq and std::numeric_limits<__float128>::max()

2024-04-06 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114623

--- Comment #4 from g.peterh...@t-online.de ---
That is precisely the design error of C/C++/etc. There should be no
float/double/long double/__float128/etc, but *only* floatN_t. Then there
wouldn't be these discrepancies (if necessary you have to emulate by SW).
But that's just my humble opinion ... and now we have to face reality and make
the best of it. 
One step might be to put std::float128_t and __float128 on a common/uniform
code base :-)

cu
Gero

[Bug libquadmath/114623] sqrtq and std::numeric_limits<__float128>::max()

2024-04-06 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114623

--- Comment #2 from g.peterh...@t-online.de ---
#include 
#include 
#include 
#include 
#include 
#include 
#include 

void print_hex(const std::float128_t value)
{
std::array
buffer{};
const std::to_chars_result
result{std::to_chars(buffer.data(),
buffer.data()+buffer.size(), value, std::chars_format::hex)};

std::cout << std::string_view{buffer.data(), result.ptr} << std::endl;
}

template 
voidprint_sqrt_max_hex()
{
using limits = std::numeric_limits;

print_hex(std::sqrt(limits::max()));
}

int main()
{
print_sqrt_max_hex();
print_sqrt_max_hex<__float128>();

return EXIT_SUCCESS;
}

gets
1.p+8191
1p+8192

[Bug libquadmath/114623] New: sqrt

2024-04-06 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114623

Bug ID: 114623
   Summary: sqrt
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libquadmath
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

Hello,
sqrt does not work for std::numeric_limits<__float128>::max().
I have not checked other (special) values, perhaps the problem also occurs
there.

Please see https://godbolt.org/z/bx8or94v7

regards
Gero

[Bug libstdc++/77776] C++17 std::hypot implementation is poor

2024-04-04 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6

--- Comment #26 from g.peterh...@t-online.de ---
must of course be "... / scale".
How can I still edit posts?

[Bug libstdc++/77776] C++17 std::hypot implementation is poor

2024-04-04 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6

--- Comment #25 from g.peterh...@t-online.de ---
Hi Matthias,
to get good results on average (all FP-types: (B)FP16..FP128,
scalar/vectorized(SIMD)/parallel/...) this algorithm seems to me (so far) to be
suitable:

template 
inline constexpr Type   hypot_gp(Type x, Type y, Type z)noexcept
{
using limits = std::numeric_limits;

constexpr Type
sqrt_min= std::sqrt(limits::min()),
sqrt_max= std::sqrt(limits::max()),
scale_up= std::exp2( Type(limits::max_exponent/2)),
scale_down  = std::exp2(-Type(limits::max_exponent/2)),
zero= 0;

x = std::abs(x);
y = std::abs(y);
z = std::abs(z);

if (!(std::isnormal(x) && std::isnormal(y) && std::isnormal(z)))
[[unlikely]]
{
if (std::isinf(x) | std::isinf(y) | std::isinf(z))  return
limits::infinity();
else if (std::isnan(x) | std::isnan(y) | std::isnan(z)) return
limits::quiet_NaN();
else
{
const bool
xz{x == zero},
yz{y == zero},
zz{z == zero};

if (xz)
{
if (yz) return zz ? zero : z;
else if (zz)return y;
}
else if (yz && zz)  return x;
}
}

if (x > z) std::swap(x, z);
if (y > z) std::swap(y, z);

if ((z >= sqrt_min) && (z <= sqrt_max)) [[likely]]
{
//  no scale
return std::sqrt(__builtin_assoc_barrier(x*x + y*y) + z*z);
}
else
{
const Type
scale = (z >= sqrt_min) ? scale_down : scale_up;

x *= scale;
y *= scale;
z *= scale;
return std::sqrt(__builtin_assoc_barrier(x*x + y*y) + z*z);
}
}

[Bug libstdc++/77776] C++17 std::hypot implementation is poor

2024-03-18 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6

--- Comment #23 from g.peterh...@t-online.de ---
Hello Matthias,
you've given me new ideas. I think we agree on implementing hypot3 using a
scaling factor. But the correct value is not yet implemented here either; do
you have a suggestion?
A version here: https://godbolt.org/z/Gd53cG9YG
I've intentionally broken hypot_gp into small pieces so that you can play
around with it. This is of course unnecessary for a final version.

General
* The function must of course work efficiently with all FP types.

Questions
* Sorting: It is theoretically sufficient to sort the values x,y,z only to the
extent that the condition x,y <= z is fulfilled (HYPOT_SORT_FULL).
* Accuracy: This is better with fma (HYPOT_FMA).
* How do you create the benchmarks? I could do this myself without getting on
your nerves.

thx
Gero

[Bug libstdc++/77776] C++17 std::hypot implementation is poor

2024-03-05 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6

--- Comment #19 from g.peterh...@t-online.de ---
> So, no need to use frexp/ldexp, just comparisons of hi above against sqrt of
> (max finite / 3), in that case scale by multiplying all 3 args by some
> appropriate scale constant, and similarly otherwise if lo1 is too small by
> some large scale.

I don't really know. With frexp/ldexp you probably get the highest accuracy
(even if it is probably slower) instead of doing it manually. The problem is to
determine suitable scaling factors and to adjust the (return)values
accordingly. I have implemented both cases.

Error
* In the case (x==y && y==z), x*std::sqrt(T(3)) must not simply be returned, as
this can lead to an overflow (inf).

Generally
* Instead of using fmin/fmax to determine the values hi,lo1,lo0, it is better
to sort x,y,z. This is faster and clearer and no additional variables need to
be introduced.
* It also makes sense to consider the case (x==0 && y==0 && z==0).

Optimizations
* You were probably wondering why I wrote "if (std::isinf(x) | std::isinf(y) |
std::isinf(z))", for example. This is intentional. The problem is that gcc
almost always produces branch code for logical operations, so *a lot* of
conditional jumps. By using arithmetic operations, so instead of || && just |
&, I can get it to generate only actually necessary conditional jumps or
cmoves. branchfree code is always better.


template 
constexpr T hypot3_exp(T x, T y, T z) noexcept
{
using limits = std::numeric_limits;

constexpr T
zero = 0;

x = std::abs(x);
y = std::abs(y);
z = std::abs(z);

if (std::isinf(x) | std::isinf(y) | std::isinf(z))  [[unlikely]]
return limits::infinity();
if (std::isnan(x) | std::isnan(y) | std::isnan(z))  [[unlikely]]
return limits::quiet_NaN();
if ((x==zero) & (y==zero) & (z==zero))  [[unlikely]]
return zero;
if ((y==zero) & (z==zero))  [[unlikely]]
return x;
if ((x==zero) & (z==zero))  [[unlikely]]
return y;
if ((x==zero) & (y==zero))  [[unlikely]]
return z;

auto sort = [](T& a, T& b, T& c)constexpr noexcept -> void
{
if (a > b) std::swap(a, b);
if (b > c) std::swap(b, c);
if (a > b) std::swap(a, b);
};

sort(x, y, z);  //  x <= y <= z

int
exp = 0;

z = std::frexp(z, );
y = std::ldexp(y, -exp);
x = std::ldexp(x, -exp);

T
sum = x*x + y*y;

sum += z*z;
return std::ldexp(std::sqrt(sum), exp);
}

template 
constexpr T hypot3_scale(T x, T y, T z) noexcept
{
using limits = std::numeric_limits;

auto prev_power2 = [](const T value)constexpr noexcept -> T
{
return std::exp2(std::floor(std::log2(value)));
};

constexpr T
sqrtmax = std::sqrt(limits::max()),
scale_up= prev_power2(sqrtmax),
scale_down  = T(1) / scale_up,
zero= 0;

x = std::abs(x);
y = std::abs(y);
z = std::abs(z);

if (std::isinf(x) | std::isinf(y) | std::isinf(z))  [[unlikely]]
return limits::infinity();
if (std::isnan(x) | std::isnan(y) | std::isnan(z))  [[unlikely]]
return limits::quiet_NaN();
if ((x==zero) & (y==zero) & (z==zero))  [[unlikely]]
return zero;
if ((y==zero) & (z==zero))  [[unlikely]]
return x;
if ((x==zero) & (z==zero))  [[unlikely]]
return y;
if ((x==zero) & (y==zero))  [[unlikely]]
return z;

auto sort = [](T& a, T& b, T& c)constexpr noexcept -> void
{
if (a > b) std::swap(a, b);
if (b > c) std::swap(b, c);
if (a > b) std::swap(a, b);
};

sort(x, y, z);  //  x <= y <= z

const T
scale = (z > sqrtmax) ? scale_down : (z < 1) ? scale_up : 1;

x *= scale;
y *= scale;
z *= scale;

T
sum = x*x + y*y;

sum += z*z;
return std::sqrt(sum) / scale;
}


regards
Gero

[Bug libstdc++/77776] C++17 std::hypot implementation is poor

2024-03-02 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6

--- Comment #13 from g.peterh...@t-online.de ---
Thanks for the suggestions:
template 
constexpr _Tp __hypot3(_Tp __x, _Tp __y, _Tp __z)   noexcept
{
if (std::isinf(__x) | std::isinf(__y) | std::isinf(__z))
[[__unlikely__]]
return _Tp(INFINITY);
__x = std::fabs(__x);
__y = std::fabs(__y);
__z = std::fabs(__z);
const _Tp __max = std::fmax(std::fmax(__x, __y), __z);
if (__max == _Tp{}) [[__unlikely__]]
return __max;
__x /= __max;
__y /= __max;
__z /= __max;
return std::sqrt(__x*__x + __y*__y + __z*__z) * __max;
}

The functions are then set to constexpr/noexcept.

regards
Gero

[Bug c/114181] issubnormal is a macro

2024-02-29 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114181

g.peterh...@t-online.de changed:

   What|Removed |Added

 Resolution|INVALID |FIXED

--- Comment #12 from g.peterh...@t-online.de ---
If this comes into the C++ standard I would have to rewrite it anyway. Why not
now that I have reported this error?

Are there already plans how to deal with the
https://en.cppreference.com/w/c/experimental/fpext1
https://en.cppreference.com/w/c/experimental/fpext4
regarding C++?

thx
Gero

[Bug c/114181] issubnormal is a macro

2024-02-29 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114181

g.peterh...@t-online.de changed:

   What|Removed |Added

 Resolution|INVALID |FIXED

--- Comment #10 from g.peterh...@t-online.de ---
Exactly that does not work, because issubnormal is a simple macro.
Only if before the implementation
#undef issubnormal
is made before implementation: https://godbolt.org/z/z3PG3hYev

That is incorrect. 
* I have no idea what happens if math.h is already included somewhere and I
subsequently undefine issubnormal.
* It is not my job to program around any (compiler-specific) problems. The
compiler has to do it right or it doesn't support it at all.

Therefore issubnormal must be provided as a "real" function or via builtin.

[Bug c/114181] issubnormal is a macro

2024-02-29 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114181

g.peterh...@t-online.de changed:

   What|Removed |Added

 Resolution|INVALID |FIXED

--- Comment #8 from g.peterh...@t-online.de ---
Of course, std::issubnormal is not yet available at the moment. To be able to
implement this at all, issubnormal from math.h must not be a macro!

[Bug c/114181] issubnormal is a macro

2024-02-29 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114181

g.peterh...@t-online.de changed:

   What|Removed |Added

 Resolution|MOVED   |FIXED

--- Comment #5 from g.peterh...@t-online.de ---
> If you are implementing a cmath for a C++ implementation, you need to a 
> similar thing and `#undef` it.
> The math.h that defines issubnormal comes from glibc.

That's what I mean. See also e.g.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77925
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77926

[Bug c/114181] issubnormal is a macro

2024-02-29 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114181

--- Comment #3 from g.peterh...@t-online.de ---
Of course issubnormal is defined in math.h (in my case line 1088, gcc 13.2).

[Bug c/114181] New: issubnormal is a macro

2024-02-29 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114181

Bug ID: 114181
   Summary: issubnormal is a macro
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

issubnormal is a macro and therefore not a (builtin)function.
This is incorrect, as no further issubnormal functions can be implemented, e.g.
for C++
namespace std
{
bool issubnormal(...);
}

thx
Gero

[Bug libstdc++/77776] C++17 std::hypot implementation is poor

2024-02-29 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6

g.peterh...@t-online.de changed:

   What|Removed |Added

 CC||g.peterh...@t-online.de

--- Comment #11 from g.peterh...@t-online.de ---
Would this be a good implementation for hypot3 in cmath?

#define GCC_UNLIKELY(x) __builtin_expect(x, 0)
#define GCC_LIKELY(x) __builtin_expect(x, 1)

namespace __detail
{
template 
inline _GLIBCXX_CONSTEXPR typename
enable_if::value, bool>::type __isinf3(const _Tp __x,
const _Tp __y, const _Tp __z)   noexcept
{
return bool(int(std::isinf(__x)) | int(std::isinf(__y)) |
int(std::isinf(__z)));
}

template 
inline _GLIBCXX_CONSTEXPR typename
enable_if::value, _Tp>::type  __hypot3(_Tp __x, _Tp __y,
_Tp __z) noexcept
{
__x = std::fabs(__x);
__y = std::fabs(__y);
__z = std::fabs(__z);

const _Tp
__max = std::fmax(std::fmax(__x, __y), __z);

if (GCC_UNLIKELY(__max == _Tp{}))
{
return __max;
}
else
{
__x /= __max;
__y /= __max;
__z /= __max;
return std::sqrt(__x*__x + __y*__y + __z*__z) * __max;
}
}
}   //  __detail


template 
inline _GLIBCXX_CONSTEXPR typename
enable_if::value, _Tp>::type  __hypot3(const _Tp __x,
const _Tp __y, const _Tp __z)   noexcept
{
return (GCC_UNLIKELY(__detail::__isinf3(__x, __y, __z))) ?
numeric_limits<_Tp>::infinity() : __detail::__hypot3(__x, __y, __z);
}

#undef GCC_UNLIKELY
#undef GCC_LIKELY

How does it work?
* Basically, I first pull out the special case INFINITY (see
https://en.cppreference.com/w/cpp/numeric/math/hypot).
* As an additional safety measure (to prevent misuse) the functions are defined
by enable_if.

constexpr
* The hypot3 functions can thus be defined as _GLIBCXX_CONSTEXPR.

Questions
* To get a better runtime behavior I define GCC_(UN)LIKELY. Are there already
such macros (which I have overlooked)?
* The functions are noexcept. Does that make sense? If yes: why are the math
functions not noexcept?

thx
Gero

[Bug libquadmath/114140] different results for std::fmin/std::fmax and quadmath fminq/fmaxq if one argument=signaling_NaN

2024-02-27 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114140

--- Comment #13 from g.peterh...@t-online.de ---
> The cppreference page is wrong.
But then *all* of your implementations for fmin/fmax (float, double, long
double, std::floatN_t) would be wrong, because they give exactly the results as
described on cppreference.
Is this really the case (which I don't believe)? And if so, that still doesn't
solve the original problem: std::math-functions and quadmath-functions *must*
of course return the same results - no matter which implementation is correct.

[Bug middle-end/114140] different results for std::fmin/std::fmax and quadmath fminq/fmaxq if one argument=signaling_NaN

2024-02-27 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114140

--- Comment #7 from g.peterh...@t-online.de ---
I think there is a misunderstanding. The problem is that std::fmin/std::fmax
and quadmath fminq/fmaxq give different results when only *one* argument is
signaling_NaN.
The standard (https://en.cppreference.com/w/cpp/numeric/math/fmin +
https://en.cppreference.com/w/cpp/numeric/math/fmax) says:
* If one of the two arguments is NaN, the value of the other argument is
returned
* Only if both arguments are NaN, NaN is returned

quadmath fminq/fmaxq also return NaN if only *one* argument is signaling_NaN.

[Bug target/50597] printf_fp.o: relocation R_X86_64_PC32 against `hack_digit.6607' can not be used when making a shared object; recompile with -fPIC

2024-02-27 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50597

g.peterh...@t-online.de changed:

   What|Removed |Added

 CC||g.peterh...@t-online.de

--- Comment #2 from g.peterh...@t-online.de ---
I think there is a misunderstanding. The problem is that std::fmin/std::fmax
and quadmath fminq/fmaxq give different results when only *one* argument is
signaling_NaN.
The standard (https://en.cppreference.com/w/cpp/numeric/math/fmin +
https://en.cppreference.com/w/cpp/numeric/math/fmax) says:
* If one of the two arguments is NaN, the value of the other argument is
returned
* Only if both arguments are NaN, NaN is returned

quadmath fminq/fmaxq also return NaN if only *one* argument is signaling_NaN.

thx
Gero

[Bug libquadmath/114140] New: quadmath fminq/fmaxq with signaling_NaN not work

2024-02-27 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114140

Bug ID: 114140
   Summary: quadmath fminq/fmaxq with signaling_NaN not work
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libquadmath
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

please see https://godbolt.org/z/T4W8Mejxz

Notes:
* std::numeric_limits<__float128> (from boost) does not work properly, so I
fall back to builtins.
* std::fmin/fmax for __float128 calls fminq/fmaxq (boost, this works)

thx
Gero

[Bug libgcc/114131] New: std::isinf(std::float128_t) generates superfluous nan-checks

2024-02-27 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114131

Bug ID: 114131
   Summary: std::isinf(std::float128_t) generates superfluous
nan-checks
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

please see https://godbolt.org/z/djc9q1vcv
test1(default): includes nan-checks (__unordtf2)
test2: no nan-checks, but calls __eqtf2
test3: only checks for inf (via bit_cast); no additional function calls +
branchfree. Of course, this only works if (unsigned) __int128 is available.

thx
Gero

[Bug libstdc++/114018] New: std::nexttoward is not implemented for C++23-FP-Types

2024-02-20 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114018

Bug ID: 114018
   Summary: std::nexttoward is not implemented for C++23-FP-Types
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

please see https://godbolt.org/z/EoKnEE8eT

thx
Gero

[Bug libstdc++/113260] missing from_chars/to_chars for __float128

2024-01-07 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113260

--- Comment #7 from g.peterh...@t-online.de ---
Thank you. That was my question whether these two functions could be added.
At the moment I'm using boost.charconv https://github.com/cppalliance/charconv
https://develop.charconv.cpp.al (not official yet) - but it's still completely
buggy.

[Bug libstdc++/113260] missing from_chars/to_chars for __float128

2024-01-07 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113260

--- Comment #5 from g.peterh...@t-online.de ---
??? I asked for std::from_chars/std::to_chars - which of course doesn't work:
https://godbolt.org/z/n34dTajoc

[Bug libquadmath/113259] quadmath::nanq not support payload

2024-01-07 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113259

--- Comment #2 from g.peterh...@t-online.de ---
I'm currently fiddling around with a library for/with boost. I don't need this
kind of incompatibility.

[Bug libstdc++/113260] missing from_chars/to_chars for __float128

2024-01-07 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113260

--- Comment #3 from g.peterh...@t-online.de ---
My problem is that I need from_chars/to_chars for __float128 also for older C++
standards that do not yet support _Float128/std::float128_t.

[Bug libquadmath/113260] New: missing from_chars/to_chars for __float128

2024-01-07 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113260

Bug ID: 113260
   Summary: missing from_chars/to_chars for __float128
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libquadmath
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

Hello,
can you add this?

thx
Gero

[Bug libquadmath/113259] New: quadmath::nanq not support payload

2024-01-07 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113259

Bug ID: 113259
   Summary: quadmath::nanq not support payload
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libquadmath
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

Hello,
in https://github.com/gcc-mirror/gcc/blob/master/libquadmath/math/nanq.c there
is only a comment that payloads are not supported. So it is incompatible with
the standard.
Will this be fixed?

thx
Gero

[Bug c++/109924] missing __builtin_nanf16b

2023-06-03 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109924

--- Comment #3 from g.peterh...@t-online.de ---
But in your documentation
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html it is stated that the
__builtin's would be available for all FP types.

For upcoming standards https://en.cppreference.com/w/c/experimental/fpext1 this
is needed anyway (setpayload etc.)

thx
Gero

[Bug c++/109928] New: std::abs(long/long long) are not constexpr

2023-05-22 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109928

Bug ID: 109928
   Summary: std::abs(long/long long) are not constexpr
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

see std_abs.h

regards
Gero

[Bug c++/109924] New: missing __builtin_nanf16b

2023-05-21 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109924

Bug ID: 109924
   Summary: missing __builtin_nanf16b
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

like __builtin_nansf16b

regards
Gero

[Bug c++/109884] __builtin_Xq returns _Float128 instead of __float128

2023-05-17 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109884

--- Comment #3 from g.peterh...@t-online.de ---
But these are different types (even if they are mathematically/behaviorally
equivalent)
std::is_same_v --> false

[Bug c++/109884] New: __builtin_Xq returns _Float128 instead of __float128

2023-05-17 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109884

Bug ID: 109884
   Summary: __builtin_Xq returns _Float128 instead of __float128
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

#include 
#include 
#include 
#include 
#include 

template 
inline std::string nameof()
{
 return boost::core::demangle(typeid(Type).name());
}

int main()
{
 std::cout << nameof() << std::endl;
 std::cout << nameof() << std::endl;
 std::cout << nameof() << std::endl;
}

compiled with 13 returns the incorrect type
_Float128
_Float128
_Float128
with 12 or older gives the correct type
__float128
__float128
__float128

regards
Gero

[Bug libstdc++/109758] std::abs(__float128) doesn't support NaN

2023-05-06 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109758

--- Comment #7 from g.peterh...@t-online.de ---
1) Can you please still submit a proposal to the STD/ISO committee so that abs
(besides copysign/signbit) ALWAYS works ?
2) What do you think about my proposal for a C++ interface quadmath.hpp ?

[Bug libstdc++/109758] std::abs(__float128) doesn't support NaN

2023-05-06 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109758

--- Comment #5 from g.peterh...@t-online.de ---
>> Again, what do you mean by "quadmath"?

__float128 https://github.com/gcc-mirror/gcc/tree/master/libquadmath
This is not to be confused with C++23 std::float128_t.

[Bug libstdc++/109758] std::abs(__float128) doesn't support NaN

2023-05-06 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109758

--- Comment #3 from g.peterh...@t-online.de ---
>> libstdc++ doesn't depend on libquadmath and the __float128 support is there 
>> very limited.
Yes, exactly. There should be nothing of quadmath in the std implementations of
C/C++. But in bits/std_abs.h this is the case.

>> Use std::float128_t instead (in GCC 13.1)?
std::float128_t can only be used from C++23 on, but quadmath can also be used
with older standard/compiler versions.

[Bug libquadmath/109758] New: quadmath abs

2023-05-06 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109758

Bug ID: 109758
   Summary: quadmath abs
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libquadmath
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

Hello gcc-team,
Problem:
#include 
#include 
#include 
#include 

using T = __float128;

int main()
{
const T 
neg_nan_v = -std::numeric_limits::quiet_NaN();

std::cout << neg_nan_v << std::endl;

std::cout << "std::abs " << std::abs(neg_nan_v) << std::endl;
std::cout << "fabsq " << fabsq(neg_nan_v) << std::endl;
std::cout << "builtin " << __builtin_fabsf128(neg_nan_v) << std::endl;
}

-nan
std::abs -nan
fabsq nan
builtin nan

The problem can be found in bits/std_abs.h:
#if !defined(__STRICT_ANSI__) && defined(_GLIBCXX_USE_FLOAT128)
  __extension__ inline _GLIBCXX_CONSTEXPR
  __float128
  abs(__float128 __x)
  {
   return __x < 0 ? -__x : __x;
  }
#endif

Is this actually correct? If I compile with -U__STRICT_ANSI__ or remove/comment
abs from bits/std_abs.h abs falls back to fabsq, which then also works.
With std::abs(float/double/...) this problem does not occur.

Wouldn't it make sense in principle to also provide a C++ header
(quadmath.hpp)?
#include 
namespace std
{
math-functions
to_string/to_wstring
to_chars/from_chars
operator<<
operator>>
...
}

thx
Gero

[Bug c++/109378] new builtin like __builtin_sqrt but does not set errno

2023-04-02 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109378

g.peterh...@t-online.de changed:

   What|Removed |Added

 Resolution|INVALID |FIXED

--- Comment #11 from g.peterh...@t-online.de ---
Ok, in detail:
std::sqrt/__builtin_sqrt performs the check for nan in the calling context.
This causes the following problems:
* the calling context contains error handling/conditional jumps, which have
nothing to do there but have to be handled in the error handling of std::sqrt
* Because this does NOT happen in your implementation of std::sqrt, the code
gets bloated, at the latest when a function contains more than one std::sqrt.

Therefore
* do complete error handling in std::sqrt/__builtin_sqrt
* so there is only one exact call for std::sqrt, which can/must be vectorized.

[Bug c++/109378] new builtin like __builtin_sqrt but does not set errno

2023-04-02 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109378

--- Comment #8 from g.peterh...@t-online.de ---
But I don't want and can't use a version of std::sqrt that requires compiler
specific flags/options/__builtins and injects internals of
std::sqrt/__builtin_sqrt into the calling context/function.
I just want to have a very dumb std::sqrt that does its error handling
internally.
Sorry, but is that too much to ask?

[Bug c++/109378] improve __builtin_sqrt

2023-04-02 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109378

--- Comment #4 from g.peterh...@t-online.de ---
Hm. Maybe we misunderstood each other or I don't understand. I don't want to
set -fno-math-errno or any other compiler-specific flag. My intention is that
__builtin_sqrt doesn't "contaminate" the calling context with internals of
__builtin_sqrt, but simply returns the result.

[Bug c++/109378] improve __builtin_sqrt

2023-04-02 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109378

--- Comment #2 from g.peterh...@t-online.de ---
But this is of no use if I want to compile something "normally" without
compiler specific options.

[Bug c++/109379] New: improve __builtin_fmal

2023-04-02 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109379

Bug ID: 109379
   Summary: improve __builtin_fmal
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

Hello gcc team,
__builtin_fmal generates quite a lot of overhead. Can you please optimize this
or make it an inline function?

thx
Gero

[Bug c++/109378] New: improve __builtin_sqrt

2023-04-02 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109378

Bug ID: 109378
   Summary: improve __builtin_sqrt
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

Hello gcc team,
https://godbolt.org/z/Wa1rfxrPo
when I write a function that contains std::sqrt, it always contains the nan(?)
tests for the argument. E.g. sqrtf64.
If I use my_sqrt the tests are done inside sqrt and not in the calling function
- clear (because noinline).

Wouldn't it be better to rewrite __builtin_sqrt so that these tests are done
inside __builtin_sqrt and not already in the calling context?
This would have the advantage that std::sqrt would not "contaminate" the
calling function with conditional jumps and thus inflate it.
I can make this clear with foo vs. bar.
And of course __builtin_sqrt must be able to be vectorized automatically and
must be inline for certain contexts (e.g. __FAST_MATH__).

regards
Gero

[Bug c++/109029] std::signbit(double) generiert sehr ineffizienten code

2023-03-05 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109029

--- Comment #1 from g.peterh...@t-online.de ---
Ok in english
std::signbit(double) generates very inefficient code and thus cannot be
vectorized (https://godbolt.org/z/se6Ea8bo9).

[Bug c++/109029] New: std::signbit(double) generiert sehr ineffizienten code

2023-03-05 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109029

Bug ID: 109029
   Summary: std::signbit(double) generiert sehr ineffizienten code
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

Hallo,
std::signbit(double) generiert sehr ineffizienten code und kann somit nicht
vektorisiert werden (https://godbolt.org/z/se6Ea8bo9).

thx
Gero

-std=c++20 -march=x86-64-v3 -O3 -mno-vzeroupper

#include 
#include 
#include 
#include 

static constexpr size_t Size = 1024;

using float80_t = long double;
using float64_t = double;
using float32_t = float;

template 
inline constexpr bool   foo(const Type x)   noexcept
{
return std::signbit(x);
}

template 
inline constexpr Type   bar(const Type x)   noexcept
{
   return std::signbit(x) ? std::numbers::pi_v : 0;
}

template 
inline constexpr void for_all(Container& cnt, Function&& f) noexcept
{
std::transform(cnt.begin(), cnt.end(), cnt.begin(), f);
}

template 
inline constexpr void for_all(ContainerRes& res, const ContainerArg& arg,
Function&& f) noexcept
{
std::transform(arg.begin(), arg.end(), res.begin(), f);
}

float64_t foo64(const float64_t x)   noexcept { return foo(x); }
float32_t foo32(const float32_t x)   noexcept { return foo(x); }

float64_t bar64(const float64_t x)   noexcept { return bar(x); }
float32_t bar32(const float32_t x)   noexcept { return bar(x); }

void foos64(std::array& res, const std::array&
arg)   noexcept { for_all(res, arg, foo); }
void foos32(std::array& res, const std::array&
arg)   noexcept { for_all(res, arg, foo); }

void bars64(std::array& cnt)   noexcept { for_all(cnt,
bar); }
void bars32(std::array& cnt)   noexcept { for_all(cnt,
bar); }

[Bug target/109028] fcmov will not be generated

2023-03-05 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109028

--- Comment #2 from g.peterh...@t-online.de ---
> X87 code generation is definitely not as optimized as other code really.
Ok
> Also fcmov is newish.
New?
fcmov was introduced with the PentiumPro (1995) - that's 27 years ago. :-)

[Bug target/109028] New: fcmov will not be generated

2023-03-05 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109028

Bug ID: 109028
   Summary: fcmov will not be generated
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

Hello,
very rarely fcmov instructions are generated (https://godbolt.org/z/qE6f76Gda)

thx
Gero

#include 
#include 
#include 

static constexpr size_t Size = 1024;

using float80_t = long double;
using float64_t = double;
using float32_t = float;

template 
inline constexpr Type   foo(const Type x)   noexcept
{
return (x > 42) ? std::numbers::pi_v : std::numbers::e_v;
}

template 
inline constexpr Type   bar(const Type x)   noexcept
{
   return std::signbit(x) ? std::numbers::pi_v : 0;
}

template 
inline constexpr Type   baz(const Type x)   noexcept
{
return std::copysign(std::numbers::pi_v, x);
}

template 
inline constexpr void for_all(Container& cnt, Function&& f) noexcept
{
for (auto& val : cnt)
{
val = f(val);
}
}

float80_t foo80(const float80_t x)   noexcept { return foo(x); }
float80_t bar80(const float80_t x)   noexcept { return bar(x); }
float80_t baz80(const float80_t x)   noexcept { return baz(x); }

void foos80(std::array& cnt)   noexcept { for_all(cnt,
foo); }
void bars80(std::array& cnt)   noexcept { for_all(cnt,
bar); }
void bazs80(std::array& cnt)   noexcept { for_all(cnt,
baz); }

[Bug target/108902] Conversions std::float16_t<->float with FP16C are not vectorized

2023-02-23 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108902

--- Comment #5 from g.peterh...@t-online.de ---
add test case (https://godbolt.org/z/q65cWKhWx)

void inc_builtin(array_t& arr)noexcept
{
auto load_cvt = [](const std::float16_t*const ptr) noexcept
{
return __builtin_convertvector(*((const __m128h*const)ptr), __m256);
};

auto save_cvt = [](std::float16_t* ptr, const __m256 arg)noexcept
{
*((__m128h*)ptr) = __builtin_convertvector(arg, __m128h);
};

for (std::size_t i=0; i

[Bug c++/108902] New: Conversions std::float16_t<->float with FP16C are not vectorized

2023-02-23 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108902

Bug ID: 108902
   Summary: Conversions std::float16_t<->float with FP16C are not
vectorized
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

Please see
https://godbolt.org/z/dGn4qhPef

thx
Gero

[Bug c++/107458] New: std::fma generates slow scalar-call

2022-10-29 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107458

Bug ID: 107458
   Summary: std::fma generates slow scalar-call
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

Please see https://godbolt.org/z/bxxc9ezeM

thx
Gero

[Bug target/107432] __builtin_convertvector generates inefficient code

2022-10-27 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

--- Comment #2 from g.peterh...@t-online.de ---
Another example. I want to convert an array to array.
There are basically 3 options:
- Copy
- Test (b2f64_default)
- optimized version (b2f64_manually)

gcc12.2 + gcctrunc
convertSIZE_copy only generates scalar code (_mm_cvtsi64_sd)
convertSIZE_default always generates conditional jumps

convertSIZE_manually
gcctrunc always generates branch-free scalar code
gcc12.2
convert1024_manually generates vector code, but does not use HW conversion
int8->int64 (_mm(256)_cvtepi8_epi64) and converts int8->int16->int32->int64
manually
convert8_manually generates branch-free scalar code
convert4_manually generates vector code and uses HW conversion int8->int64


NONE of these conversions are transformed/optimized to the extent that always
- all available intrinsics are used
- no "normal" registers are used
- branch-free code is generated

https://godbolt.org/z/f74vK79of

thx
Gero

[Bug c++/107432] New: __builtin_convertvector generates inefficient code

2022-10-27 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

Bug ID: 107432
   Summary: __builtin_convertvector generates inefficient code
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

Example: conversion int64_t -> int32_t

avx512f + avx512vl
HW conversions are available.

avx2
There is a correctly working 32-bit-permutation
(_mm256_permutevar8x32_epi32/vpermd) that can be used.

I have not (yet) evaluated whether other conversions (larger int -> smaller
int) are also affected.
PS: On x86 it's already hell to optimize all cases depending on the instruction
set.
PPS: What about -march=znver4 ?

https://godbolt.org/z/3s79bnh7v

thx
Gero

[Bug tree-optimization/107283] conversions u/int64_t to float64/32_t are not vectorized

2022-10-16 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107283

--- Comment #2 from g.peterh...@t-online.de ---
That will be right. I had reported something similar many years ago - but it
was not fixed.

thx
Gero

[Bug c++/107283] New: conversions u/int64_t to float64/32_t are not vectorized

2022-10-16 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107283

Bug ID: 107283
   Summary: conversions u/int64_t to float64/32_t are not
vectorized
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

The conversions u/int64_t to float64/32_t are not vectorisized if no HW-support
(eg AVX512) available.

But we can do that manually
https://stackoverflow.com/questions/41144668/how-to-efficiently-perform-double-int64-conversions-with-sse-avx

In the case u/int64_t -> float32_t i first convert to float64_t and then to
float32_t. There might be a better way to implement this.

With HW-support the standard implementation is of course faster.

https://godbolt.org/z/WTa663PrK

thx
Gero

[Bug c++/107281] New: comparisations with u/int64_t constants not generate vector-result

2022-10-16 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107281

Bug ID: 107281
   Summary: comparisations with u/int64_t constants not generate
vector-result
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

If no 64-bit vector comparisons are available no vectorized results are
produced for the cases <=, >=, <, and >.

The cases == and != works. The comparisons themselves are then carried out
individually, but the result is combined with unpcklqdq.

It would be better if this works with all comparisons so that can better
(auto)vectorized.

It might be possible to further optimize this so that no scalar comparisons are
necessary - especially for the frequent case constant=0.

https://godbolt.org/z/cj8n9TenK

thx
Gero

[Bug libquadmath/104695] different bit patterns in __builtin_nans and libquadmath::nanq

2022-02-25 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104695

--- Comment #2 from g.peterh...@t-online.de ---
Yes, that is very vaguely worded. However, the std functions or builtins must
always return the same values on the same platform.
quiet nan:
libquadmath::nanq != __builtin_nanf128
signaling nan:
__builtin_nansf64x != __builtin_nansl
__builtin_nansf64 != __builtin_nans
__builtin_nansf32 != __builtin_nansf

[Bug c++/104695] New: different bit patterns in __builtin_nans and libquadmath::nanq

2022-02-25 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104695

Bug ID: 104695
   Summary: different bit patterns in __builtin_nans and
libquadmath::nanq
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

Hello gcc-Team,
the related __builtin_nans return different values and libquadmath::nanq
ignores the parameter.
Please see my test case https://godbolt.org/z/fda5vevPe

regards
Gero

[Bug target/100627] missing optimization

2021-05-21 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100627

--- Comment #2 from g.peterh...@t-online.de ---
Hello,
i found a better solution here
https://stackoverflow.com/questions/41144668/how-to-efficiently-perform-double-int64-conversions-with-sse-avx
and ported to "normal" C++-code (no intrinsics)
https://godbolt.org/z/scjEdze99. This has these advantages:
- constexpr
- flexible - can be vectorized (autovectorization)

These implementations require C++20 (std::bit_cast and constexpr std::exp2),
but can easily be implemented with older C++ versions. Possibly this trick can
also be used on s/uint64 -> float32, so that one saves the detour s/uint64 ->
float64 -> float32.

However, i have stated:
- with -march=skylake-avx512 no AVX512 code is generated
- only with -march=skylake-avx512 -mprefer-vector-width=512 or -mavx512f
-mavx512dq -mavx512vl does that work
- for s/uint64 -> float32 no correct AVX512 code is generated either
(_mm512_cvtepi64_ps, _mm512_cvtepu64_ps)

thx
Gero

[Bug tree-optimization/100627] New: missing optimization

2021-05-16 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100627

Bug ID: 100627
   Summary: missing optimization
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

Hello gcc team,
i think i wrote something like that a long time ago, but i'm not sure. I think
the standard conversion uint64_t -> float/double is inefficient when AVX512 is
not available. At least on x86, but with SVE or other CPUs this may not be the
case. Problems:
- a lot of conditional jumps are generated, not BPU-friendly
- and therefore not branchfree
- larger codesize
I briefly implemented a few conversions for SSE/SSE2
(https://godbolt.org/z/n63WedKT9). Advantages:
- branchfree
- mostly smaller codesize
- more quickly
Wouldn't it make sense to implement the standard conversion in this way
(including for AVX/AVX2)?

thx
Gero

[Bug c++/100171] New: autovectorizer

2021-04-20 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100171

Bug ID: 100171
   Summary: autovectorizer
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

Hello gcc team,
I once wrote a small test case to show the problems with the autovectorizer
https://godbolt.org/z/xs35P45MM . In particular, the += operator is not
vectorized. The + operator works in the same context. I do not understand that.
If you decrement the arraysize in foo from 2 to 1 it doesn't work at all
anymore - scalar operations are always generated for ARR_2x.
In general, I made the experience that the autovectorizer starts much too late.
It should always do this from 2 values, even if these are much smaller than a
simd register. This also saves a lot of memory accesses - especially when the
data is linear in the memory (as in the example). Usually, however,
vectorization is only carried out when the data is at least as large as a simd
register, but often only when it is twice or even four times as large.
I think you should urgently update/optimize the autovectorizer.

thx & regards
Gero

[Bug c++/99841] (temporary) refs

2021-03-30 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99841

--- Comment #2 from g.peterh...@t-online.de ---
That is not the problem. I only made using type = ... and type(x) in the ctor
calls so that I can test different types. You like to throw that out - has no
influence.

[Bug c++/99841] New: (temporary) refs

2021-03-30 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99841

Bug ID: 99841
   Summary: (temporary) refs
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

please see https://godbolt.org/z/Ez1K7eofr
gcc gives different (false?) results than clang/icc. If you set O0 or remove
O-option gives same results.

[Bug target/99228] blend/shuffle

2021-03-02 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228

--- Comment #5 from g.peterh...@t-online.de ---
Here is a better test case. https://godbolt.org/z/3Gq783
I've found:

sgn_complex
- always inefficient code, TYPE and SIZE do not matter, even with -Ofast or
-fast-math

for TYPE=double
SIZE=1
- abs/mul/div/pow2_complex ok
- zero_complex not vectorized, also with -Ofast or -ffast-math

SIZE=2
- abs/mul/div/pow2/zero_complex only with scalar operations, never vectorized

SIZE=4 and larger
- abs/mul/div/pow2/zero_complex ok


for TYPE=float
SIZE=1
- abs/mul/pow2_complex ok
- div/zero_complex not vectorized, also with -Ofast or -ffast-math

SIZE=2
- abs/mul/div/pow2/zero_complex only with scalar operations, never vectorized

SIZE=4
- abs/pow2/zero_complex ok
- mul_complex inefficient, xmm instead of ymm, also with -Ofast or -ffast-math
- div_complex ok with O3, but with Ofast/fast-math only xmm instead of ymm

SIZE=8 and larger
- abs/mul/div/pow2_complex ok

[Bug target/99228] blend/shuffle

2021-02-23 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228

--- Comment #2 from g.peterh...@t-online.de ---
I only use the types of boost here. You can remove boost and use:
using float80_t = long double;
using float64_t = double;
using float32_t = float;

[Bug c++/99228] New: blend/shuffle

2021-02-23 Thread g.peterhoff--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228

Bug ID: 99228
   Summary: blend/shuffle
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: g.peterh...@t-online.de
  Target Milestone: ---

Hello ggc team,
the compiler generates very inefficient code for the sgn functions (scalar and
complex arguments)
https://godbolt.org/z/zvE3Mf

scalar
- float32/64: 2 conditional jumps instead of blend/shuffle
- float80: no fcmov
- integer: only cmov instead of blend/shuffle

complex
- float32/64: 4 conditional jumps instead of blend/shuffle
- float80: no fcmov
- integer: only cmov instead of blend/shuffle

For testing I have 3 versions each:
v1: total disaster
v2: better, only half of the jumps each time, but clang can't really handle
that
v3: like v2, but clang seems to work too. If you remove [[likely]] from
conditional_move like v1.

regards
Gero