Faster Euclidean algorithm

2024-08-26 Thread demotomohiro
Thank you for suggesting new GCD and sharing benchmark results from two compilers! `gcd_impulse` is the fastest with Clang, and the slowest with GCC, looks strange. So I compared generated assembly code. * Clang version 18.1.8 * GCC version 13.3.1 `var u = u shr u.countTrailingZeroBits()

Faster Euclidean algorithm

2024-08-23 Thread mratsim
This implementation I made in is faster on my machine func gcd_impulse(u, v: int): int = # From https://github.com/SciNim/impulse/blob/49b813232507470a04772

Faster Euclidean algorithm

2024-08-23 Thread ingo
multiple runs of @mratsim version. Win10, VCC gcd in stdlib: 1540 micro second gcdLAR:1294 micro second gcdLAR2: 1794 micro second gcdLAR3: 1650 micro second gcdLAR4: 1289 micro second gcdSub:918 micro second gcdSub2: 161

Faster Euclidean algorithm

2024-08-23 Thread demotomohiro
According to following resource: > Instruction tables: Lists of instruction latencies, throughputs and > micro-operation breakdowns for Intel, AMD and VIA CPUs > CPU| Latency of idiv with 64bit register ---|--- Ivy Bridge| 28-103 Skylake

Faster Euclidean algorithm

2024-08-23 Thread demotomohiro
Thank you for your advices and running benchmark code on many CPUs with different compiler options and shared results. > Possibly harmful to result interpretation esp. cross-CPUs with things like > RPi/ARM: The minimum over 101 runs in your bench template is good to reduce > noise from CPU spin

Faster Euclidean algorithm

2024-08-23 Thread demotomohiro
Thank you for your advices and running benchmark code and shared results! I just wanted to see if LAR makes gcd in Nim's stdlib faster. If so, I would test it with negative inputs and other int types and if it still works correctly and faster, make PR that replace stdlib's gcd with the new gcd.

Faster Euclidean algorithm

2024-08-23 Thread cblake
Mostly I wrote all that to show how big an impact the "table's corner" effect you mention can be and to encourage moving away from anti-thermal sleep()s in favor of small times which require more care. Just to be clear, there are 9*9 inner loops by 500 pairs in that last batch, so 40_500 gcd pai

Faster Euclidean algorithm

2024-08-23 Thread dlesnoff
If timings are dependent from the order placement within the block of tests, I guess it reveals that timings are heavily impacted by branch prediction. Looking at these new timings, it seems though that gcdSub, while not always the optimal one, is always a good option to take and performs much b

Faster Euclidean algorithm

2024-08-22 Thread cblake
Possibly harmful to result interpretation esp. cross-CPUs with things like RPi/ARM: The minimum over `101` runs in your `bench` template is good to reduce noise from CPU spin up to higher clock rates, BUT the two `sleep()` s are bad since you are probably giving the CPU/OS time to put the CPU ba

Faster Euclidean algorithm

2024-08-22 Thread dlesnoff
Here are my timings: OS: Ubuntu 22.04.4 LTS x86_64 Kernel: 5.15.0-118-generic CPU: 11th Gen Intel i5-1135G7 (8) @ 4.200GHz (neofetch provides wrong value for CPU frequency) nim c -d:danger -d:release demotohiro_gcd_bench.nim gcd in stdlib: 2373 micro second gcdLAR: 2506 micro second gcdLAR2: 203

Faster Euclidean algorithm

2024-08-22 Thread dlesnoff
> In Nim (and most of programming languages I know), a mod b returns r such > that a = b * q + r (0 <= r < b). Not completely true. -4 mod 3 # = -1 Run Is your goal to optimize stdlib's gcd? Then you should care about negative inputs as well. Which algorithm do othe

Faster Euclidean algorithm

2024-08-22 Thread dlesnoff
The timings are faster on the Rasperry Pi 3 than on your desktop?

Faster Euclidean algorithm

2024-08-21 Thread demotomohiro
I recently found several ways to implement Euclidean algorithm. This code has 7 different gcd funcs implements Euclidean algorithm and measures the runtime speed of them. * gcd: gcd copied from math module in Nim's stdlib