On Friday, 3 November 2023 at 15:11:31 UTC, Bogdan wrote:
Can anyone help me to understand what I am missing?
Your loop is likely dominated by sin() calls, And the rest of the
loop isn't complicated enough to outperform the compiler.
What you could do is use the intrinsics to implement a _m
On Friday, 3 November 2023 at 15:32:08 UTC, Sergey wrote:
On Friday, 3 November 2023 at 15:11:31 UTC, Bogdan wrote:
Hi everyone,
I was playing around with the intel-intrinsics library, trying
to improve the speed of a simple area function. I could not
see any performance improvements from the
On Friday, 3 November 2023 at 15:11:31 UTC, Bogdan wrote:
Hi everyone,
I was playing around with the intel-intrinsics library, trying
to improve the speed of a simple area function. I could not see
any performance improvements from the non-SIMD implementation.
The SIMD version is a little bit
On Friday, 3 November 2023 at 15:17:43 UTC, Imperatorn wrote:
On Friday, 3 November 2023 at 15:11:31 UTC, Bogdan wrote:
Hi everyone,
I was playing around with the intel-intrinsics library, trying
to improve the speed of a simple area function. I could not
see any performance improvements from
On Friday, 3 November 2023 at 15:11:31 UTC, Bogdan wrote:
Hi everyone,
I was playing around with the intel-intrinsics library, trying
to improve the speed of a simple area function. I could not see
any performance improvements from the non-SIMD implementation.
The SIMD version is a little bit
Hi everyone,
I was playing around with the intel-intrinsics library, trying to
improve the speed of a simple area function. I could not see any
performance improvements from the non-SIMD implementation. The
SIMD version is a little bit slower even with LDC2 and --o3. Can
anyone help me to und