https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #29 from g.peterh...@t-online.de ---
(In reply to Jakub Jelinek from comment #28)
> As long as the scale is a power of two or 1.0 / power of two, I don't see
> why any version wouldn't be inaccurate.
Yes, but the constant scale_up is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #28 from Jakub Jelinek ---
As long as the scale is a power of two or 1.0 / power of two, I don't see why
any version wouldn't be inaccurate.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #27 from g.peterh...@t-online.de ---
Hi Matthias,
thanks for your benchmark. I still have 2 questions:
1) Accuracy
The frexp/ldexp variant seems to be the most accurate; is that correct? Then
other constants would have to be used in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #26 from g.peterh...@t-online.de ---
must of course be "... / scale".
How can I still edit posts?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #25 from g.peterh...@t-online.de ---
Hi Matthias,
to get good results on average (all FP-types: (B)FP16..FP128,
scalar/vectorized(SIMD)/parallel/...) this algorithm seems to me (so far) to be
suitable:
template
inline constexpr Type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #24 from Matthias Kretz (Vir) ---
(In reply to g.peterhoff from comment #23)
> * How do you create the benchmarks?
https://github.com/mattkretz/simd-benchmarks
Look at hypot3.cpp :)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #23 from g.peterh...@t-online.de ---
Hello Matthias,
you've given me new ideas. I think we agree on implementing hypot3 using a
scaling factor. But the correct value is not yet implemented here either; do
you have a suggestion?
A
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #22 from Matthias Kretz (Vir) ---
I took your hypot3_scale and reduced latency and throughput. I don't think the
sqrtmax/sqrtmin limits are correct (sqrtmax² * 3 -> infinity).
TYPE Latency
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #21 from Jonathan Wakely ---
(In reply to g.peterhoff from comment #19)
> * You were probably wondering why I wrote "if (std::isinf(x) | std::isinf(y)
> | std::isinf(z))", for example. This is intentional. The problem is that gcc
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #20 from Matthias Kretz (Vir) ---
Thanks, I'd be very happy if such a relatively clear implementation could make
it!
> branchfree code is always better.
Don't say it like that. Smart branching, making use of how static
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #19 from g.peterh...@t-online.de ---
> So, no need to use frexp/ldexp, just comparisons of hi above against sqrt of
> (max finite / 3), in that case scale by multiplying all 3 args by some
> appropriate scale constant, and similarly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #18 from Jakub Jelinek ---
I was looking at the sysdeps/ieee754/ldbl-128/ version, i.e. what is used for
hypotf128.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #17 from Matthias Kretz (Vir) ---
hypotf(a, b) is implemented using double precision and hypot(a, b) uses 80-bit
long double on i386 and x86_64 hypot does what you describe, right?
std::experimental::simd benchmarks of hypot(a, b),
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
Jakub Jelinek changed:
What|Removed |Added
CC||jakub at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #15 from Matthias Kretz (Vir) ---
Your implementation still needs to solve:
1. Loss of precision because of division & subsequent scaling by max. Users
comparing std::hypot(x, y, z) against a simple std::sqrt(x * x + y * y + z * z)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
Jiang An changed:
What|Removed |Added
CC||de34 at live dot cn
--- Comment #14 from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #13 from g.peterh...@t-online.de ---
Thanks for the suggestions:
template
constexpr _Tp __hypot3(_Tp __x, _Tp __y, _Tp __z) noexcept
{
if (std::isinf(__x) | std::isinf(__y) | std::isinf(__z))
[[__unlikely__]]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #12 from Jonathan Wakely ---
(In reply to g.peterhoff from comment #11)
> Would this be a good implementation for hypot3 in cmath?
Thanks for the suggestion!
> #define GCC_UNLIKELY(x) __builtin_expect(x, 0)
> #define GCC_LIKELY(x)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
g.peterh...@t-online.de changed:
What|Removed |Added
CC||g.peterh...@t-online.de
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
Richard Biener changed:
What|Removed |Added
Status|NEW |ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #10 from Matthias Kretz ---
Experience from testing my simd implementation:
I had failures (2 ULP deviation from long double result) when using
auto __xx = abs(__x);
auto __yy = abs(__y);
auto __zz =
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #9 from Matthias Kretz ---
(In reply to emsr from comment #7)
> What does this do?
>
> auto __hi_exp =
> __hi & simd<_T, _Abi>(std::numeric_limits<_T>::infinity()); // no error
component-wise bitwise and of __hi and +inf. Or
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #8 from emsr at gcc dot gnu.org ---
(In reply to Matthias Kretz from comment #6)
> > How precise is hypot supposed to be? I know it is supposed to try and avoid
> > spurious overflow/underflow, but I am not convinced that it should
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #7 from emsr at gcc dot gnu.org ---
What does this do?
auto __hi_exp =
__hi & simd<_T, _Abi>(std::numeric_limits<_T>::infinity()); // no error
Sorry, I have no simd knowlege yet.
Anyway, doesn't the large scale risk overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #6 from Matthias Kretz ---
(In reply to Marc Glisse from comment #4)
> Your "reference" number seems strange. Why not do the computation with
> double (or long double or mpfr) or use __builtin_hypotf? Note that it
> changes the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #5 from emsr at gcc dot gnu.org ---
Right. fma is irrelevant.
I will wind up with sqrt(1 + __lo).
I won't hope that max * __scale == 1 here but just add 1. And why waste the
partial sort?
New patch tomorrow a.m. (I guess I'm too
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #4 from Marc Glisse ---
(In reply to Matthias Kretz from comment #3)
> Did you consider the error introduced by scaling with __amax? I made sure
> that the division is without error by zeroing the mantissa bits. Here's a
> motivating
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #3 from Matthias Kretz ---
Did you consider the error introduced by scaling with __amax? I made sure that
the division is without error by zeroing the mantissa bits. Here's a motivating
example that shows an error of 1 ulp otherwise:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #2 from emsr at gcc dot gnu.org ---
I have this in another tree which solves the inf issue:
namespace __detail
{
// Avoid including all of
template
constexpr _Tp
__fmax3(_Tp __x, _Tp __y, _Tp __z)
{
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
Matthias Kretz changed:
What|Removed |Added
CC||kretz at kde dot org
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
Jonathan Wakely changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Last reconfirmed|
31 matches
Mail list logo