Fast-math is not a magic "break everything" compiler option.
For example one of the biggest speedup fast-math brings is assuming that `(a +
b) + c` is equivalent to `a + (b + c)` (associativity) which is not true in
floating point math due to rounding.
This is key for reductions like
In the comments to the original gist, I do use fastmath (although I acknowledge
the potential problem), and I also spent some time optimising...
But you could also use fastmath on Julia, it would also be faster. Anyway you
shouldn't use that option on any language because can produce unexpected
results. Your Julia program can also be optimized.
Yes, it's to make sure that if the cpu governor is "ondemand", benchmarking
starts with the CPU running at full speed and not at ~800Mhz for the first few
iterations.
Branch prediction, CPU powersaving I guess.
@mratsim Why does Nim need warm-up? I've never heard about that.
@luked2 I see, this confirms what I suspected in my post above (which appeared
recently due to moderation).
Exponential link-time
-d:release should use LTO whenever available on the underlying compiler if you
ask me :) Any downside on making this the default behavior?|
I ran it under perf ("perf record $cmd && perf report"). It looks like it's
mostly down to the maths library and Nim isn't really coming into it.
# Total Lost Samples: 3
#
# Samples: 130 of event 'cycles:uppp'
# Event count (approx.): 136002472
#
# Overhead
Dear All.
After all, there was something funny going on with my laptop - clear when I
looked at my C++ code - even though (apparently) the power manager was set to
performance. Here is my current Nim implementation, which uses a random number
generator that is faster for small n*p (see gist
I can cut Nim code to 0.040 by using OpenMP:
proc simulate():float64 =
let parms: array[5,float64] = [2.62110617498984, 0.5384615384615384,
0.5, 403.0, 0.1]
var seed: uint = 123
var r = initXorshift128Plus(seed)
let tf = 540
let nsims = 1000
@mratsim As far as I know, you can do the same for Julia so it sounds like
cheating.
Idem for me on macOS with i5-5227U:
Hint: operation successful (19122 lines compiled; 0.414 sec total;
22.422MiB peakmem; Release Build) [SuccessX]
Hint: ./bin/sir [Exec]
0.060975
21.12
$ julia bin/sir.jl
0.063728 seconds (2.09 k allocations: 232.998 KiB)
I also don't see much of a difference (macOS Sierra 10.12.6 with clang-gcc):
$ nim cc --verbosity:0 --hints:off -d:release -r sir.nim
0.066046999
21.12
$ julia sir.jl
0.059344 seconds (2.09 k allocations: 233.342 KiB)
21.019
As I said before, in my benchmarks Nim is faster, both on an Intel Celeron
J1900 and a Core i7-6700k on Linux. Compiled with `nim -d:release c sir.nim`
using Nim 0.17.2 from
[https://nim-lang.org/install_unix.html](https://nim-lang.org/install_unix.html)
(tried GCC 7.2.0 and GCC 6.3.0), as
One more explanation for a factor of 3 - 4 in performance can be of course SIMD
instructions. Maybe latest Julia is very good in using SIMD? Maybe you can try
clang instead of gcc to see if clang can better apply SIMD and related parallel
instructions to your code. (Or Julia may pre-compute
I compiled using the following:
nim c -d:release --passC:"-flto" sir
but it made no difference in the runtime. I also made a mistake in the above
code (now corrected), but it didn't make a difference either.
Hi @stefan_salewski
My box is 64 bit (Intel Core i7-5500U CPU @ 2.40GHz x 4), with gcc 5.4.0. AFAIK
default optimisation is O3; how does one check via Nim?
Note I add some advice to my previous post!
You have to tell us if your box is 32 or 64 bit. Size of data (4 or 8 byte) can
make a difference.
And maybe tell us gcc version and optimize level of gcc. Is it default -O3
And you may prove if your random() proc is inlined. Recently we had a case
where a plain proc from std lib was not
Hello, I played a bit with it (config: i5-2675QM Linux 64-bit gcc 64-bit gcc
5.4.0 nim 0.17.2 julia 0.6.1), and I found also that, with the `-d:release`
flag, that the Nim version is 3-4x slower than the Julia one.
However, with different flags to the C compiler, the results varied
Hi all,
A few things:
1. Yes, @def @miran, I compiled with -d:release.
2. I'm using Julia 0.6.0 vs Nim 0.17.2 on Linux, and I'm getting way better
times on Julia.
3. Apologies for the cast! I didn't like it, and didn't know about setting a
single member working (BTW this works)
And can you please try to avoid the ugly cast:
#var u:array[4,int64] = cast[array[4,int64]]([60,1,342,0])
var u: array[4, int64] = [60.int64, 1, 342, 0] # should work -- if not tell
Araq
Additionally: 32bit ints might be faster than 64, Julia might opt to use them
by default while Nim uses 64bit ints on x86-64 by default.
But locally, compiling Nim with `-d:release`, I see Nim being slightly faster
than Julia: 0.22 s instead of 0.28 s.
A wild guess, have you used -d:release flag when compiling?
Dear All,
I'm trying to compare a simple discrete-time simulation in Nim and Julia; I've
put a gist
[here](https://gist.github.com/sdwfrost/7c660322c6c33961297a826df4cbc30d). I
wrote the code to be as similar as possible between the two languages, but Nim
is about 3-4x slower than my Julia
27 matches
Mail list logo