Re: Finding large difference b/w execution time of c++ and D codes for same problem
Thanks a lot for your reply.
Re: Finding large difference b/w execution time of c++ and D codes for same problem
When you are comparing LDC and GDC, you should either use -mcpu=generic for ldc or -march=native for GDC, because their default targets are different. GDC will produce code that works on most x86_64 (if you are on a x86_64 system) CPUs by default, and LDC targets the host CPU. But this does not explain the difference in timings you are seeing here. One reason why the code generaged by GDC is slower is that squarePlusMag isn't inlined. It seems that the fact that its parameter is const is somehow preventing it from being inlined - I have no idea why. Removing const and adding -march=native to gdc flags gives me: gdc -O3 -finline-functions -frelease tmp.d -o tmp -march=native: using floats Total time: 8.283 [sec] using doubles Total time: 6.827 [sec] using reals Total time: 6.795 [sec] ldc2 -O3 -release -singleobj tmp.d -oftmp: using floats Total time: 3.348 [sec] using doubles Total time: 3.08 [sec] using reals Total time: 4.174 [sec] The difference is smaller, but still pretty large. I have noticed that there are needless conversions in this code that are slowing down both GDC generated and LDC generated code. This code is a bit faster: module main; import std.datetime; import std.metastrings; import std.stdio; import std.typetuple; enum DIM = 32 * 1024; int juliaValue; template Julia(TReal) { struct ComplexStruct { TReal r; TReal i; TReal squarePlusMag(ComplexStruct another) { TReal r1 = r*r - i*i + another.r; TReal i1 = cast(TReal)2.0*i*r + another.i; r = r1; i = i1; return (r1*r1 + i1*i1); } } int juliaFunction( int x, int y ) { auto c = ComplexStruct(0.8, 0.156); auto a = ComplexStruct(x, y); foreach (i; 0 .. 200) if (a.squarePlusMag(c) > cast(TReal) 1000) return 0; return 1; } void kernel() { foreach (x; 0 .. DIM) { foreach (y; 0 .. DIM) { juliaValue = juliaFunction( x, y ); } } } } void main() { writeln("D code serial with dimension " ~ toStringNow!DIM ~ " ..."); StopWatch sw; foreach (Math; TypeTuple!(float, double, real)) { sw.start(); Julia!(Math).kernel(); sw.stop(); writefln(" using %ss Total time: %s [sec]", Math.stringof, (sw.peek().msecs * 0.001)); sw.reset(); } } This gives me: gdc -O3 -finline-functions -frelease tmp.d -o tmp -march=native: using floats Total time: 6.746 [sec] using doubles Total time: 6.872 [sec] using reals Total time: 5.226 [sec] ldc2 -O3 -release -singleobj tmp.d -oftmp: using floats Total time: 2.36 [sec] using doubles Total time: 2.535 [sec] using reals Total time: 4.106 [sec] At least part of the difference is due to the fact that juliaFunction still isn't getting inlined (but squarePlusMag is). Making juliaFunction a static method of ComplexStruct causes it to get inlined (again, I have no idea why). Moving juliaFunction inside ComplexStruct does not affect the performance of LDC generated code, but for GDC it gives me: using floats Total time: 4.262 [sec] using doubles Total time: 4.251 [sec] using reals Total time: 3.512 [sec] There is still a large difference between LDC and GDC four floats and doubles and I can't explain it. But at least it is much smaller than it was initially. I ran all the benchmarks on 64 bit linux, using core i5 2500k.
Re: Finding large difference b/w execution time of c++ and D codes for same problem
Am Wed, 13 Feb 2013 18:10:47 +0100 schrieb Joseph Rushton Wakeling : > Just to update on times. I was running another large job at the same time as > doing all these tests, so there was some slowdown. Current results are: > > -- with g++ -O3 and using double rather than float: about 4.3 s > > -- with clang++ -O3 and using double rather than float: about 3.1 s > > -- with gdmd -O -release -inline: > > D code serial with dimension 32768 ... >using floats Total time: 17.179 [sec], Julia value: 0 >using doubles Total time: 10.298 [sec], Julia value: 0 >using reals Total time: 17.126 [sec], Julia value: 0 > > -- with ldmd2 -O -release -inline: > > D code serial with dimension 32768 ... >using floats Total time: 3.548 [sec], Julia value: 0 >using doubles Total time: 2.708 [sec], Julia value: 0 >using reals Total time: 4.371 [sec], Julia value: 0 > > -- with dmd -O -release -inline: > > D code serial with dimension 32768 ... >using floats Total time: 15.696 [sec], Julia value: 0 >using doubles Total time: 7.233 [sec], Julia value: 0 >using reals Total time: 28.71 [sec], Julia value: 0 > > You'll note that I added a writeout of the global juliaValue in order to > check > that certain calculations weren't being optimized away. > > It's striking that in this case GDC is slower not only than LDC but also DMD. > Current GDC is based off 2.060 as far as I know, whereas current LDC has > upgraded to 2.061, so are there some changes between D 2.060 and 2.061 that > could explain this? ??? Anyways I upgraded to LLVM 3.2 - no change. You have an i7, I have a Core2. It would be really interesting to know what LDC does there. Since GDC's output seems rather CPU agnostic and LDC's output is better in every case but also exhibits system specific details so harshly I would never have imagined possible. Should Intel have changed their CPU design so radically? > It's also interesting that clang++ produces a faster executable than g++, but > it's not possible to make a direct LLVM vs GCC comparison here, as g++ is GCC > 4.7.2 whereas GDC is based off a GCC snapshot. I've compiled GDC based on the same source that the Gentoo package manager built G++ 4.7.2 from and, I get similar numbers. > My guess would be that it's some combination of LLVM superiority in a > particular > case here, together with some 2.060 --> 2.061. > > Are these results comparable to what other people are getting? > > I can confirm that where code of mine is concerned, GDC still seems to have > the > edge in terms of executable speed ... I've seen a tête à tête between LDC and GDC in some of my code. -- Marco
Re: Finding large difference b/w execution time of c++ and D codes for same problem
On 02/13/2013 04:41 PM, Joseph Rushton Wakeling wrote: On 02/13/2013 04:17 PM, FG wrote: Good point about choosing the right type of floating point numbers. Conclusion: when there's enough space, always pick double over float. Tested with GDC in win64. floats: 16.0s / doubles: 14.1s / reals: 11.2s. I thought to myself: cool, I almost beat the 13.4s I got with C++, until I changed the C++ code to also use doubles and... got a massive speedup: 7.1s! Yea, ditto for C++: 5.3 sec with double, 9.3 with float (using g++ -O3). Just to update on times. I was running another large job at the same time as doing all these tests, so there was some slowdown. Current results are: -- with g++ -O3 and using double rather than float: about 4.3 s -- with clang++ -O3 and using double rather than float: about 3.1 s -- with gdmd -O -release -inline: D code serial with dimension 32768 ... using floats Total time: 17.179 [sec], Julia value: 0 using doubles Total time: 10.298 [sec], Julia value: 0 using reals Total time: 17.126 [sec], Julia value: 0 -- with ldmd2 -O -release -inline: D code serial with dimension 32768 ... using floats Total time: 3.548 [sec], Julia value: 0 using doubles Total time: 2.708 [sec], Julia value: 0 using reals Total time: 4.371 [sec], Julia value: 0 -- with dmd -O -release -inline: D code serial with dimension 32768 ... using floats Total time: 15.696 [sec], Julia value: 0 using doubles Total time: 7.233 [sec], Julia value: 0 using reals Total time: 28.71 [sec], Julia value: 0 You'll note that I added a writeout of the global juliaValue in order to check that certain calculations weren't being optimized away. It's striking that in this case GDC is slower not only than LDC but also DMD. Current GDC is based off 2.060 as far as I know, whereas current LDC has upgraded to 2.061, so are there some changes between D 2.060 and 2.061 that could explain this? It's also interesting that clang++ produces a faster executable than g++, but it's not possible to make a direct LLVM vs GCC comparison here, as g++ is GCC 4.7.2 whereas GDC is based off a GCC snapshot. My guess would be that it's some combination of LLVM superiority in a particular case here, together with some 2.060 --> 2.061. Are these results comparable to what other people are getting? I can confirm that where code of mine is concerned, GDC still seems to have the edge in terms of executable speed ...
Re: Finding large difference b/w execution time of c++ and D codes for same problem
Am Wed, 13 Feb 2013 16:17:12 +0100 schrieb FG : > Good point about choosing the right type of floating point numbers. > Conclusion: when there's enough space, always pick double over float. > Tested with GDC in win64. floats: 16.0s / doubles: 14.1s / reals: 11.2s. > I thought to myself: cool, I almost beat the 13.4s I got with C++, until I > changed the C++ code to also use doubles and... got a massive speedup: 7.1s! Yeah we are living in the 32-bit past ;) Still, be aware that we only write to 2 memory locations in that program! We have neither exceeded the L1 cache size with that nor have we put any strain on the prefetcher and memory bandwidth. With the modification below it is more clear why I said "use float for storage". The result with LDC2 for me is: D code serial with dimension 8192 ... using floats Total time: 4.235 [sec] using doubles Total time: 5.58 [sec] // ~+32% over float using reals Total time: 6.432 [sec] So all the in-CPU performance gain from using doubles is more than lost, when you run out of bandwidth. ---8<--- module main; import std.datetime; import std.metastrings; import std.stdio; import std.typetuple; import std.random; import core.stdc.stdlib; enum DIM = 8 * 1024; int juliaValue; size_t* randomAcc; static this() { randomAcc = cast(size_t*) malloc((DIM * DIM + 200) * size_t.sizeof); foreach (i; 0 .. DIM * DIM) randomAcc[i] = i; randomAcc[0 .. DIM * DIM].randomShuffle(); randomAcc[DIM * DIM .. DIM * DIM + 200] = randomAcc[0 .. 200]; } static ~this() { free(randomAcc); } template Julia(TReal) { TReal* squares; static this() { squares = cast(TReal*) malloc(DIM * DIM * TReal.sizeof); } static ~this() { free(squares); } struct ComplexStruct { TReal r; TReal i; TReal squarePlusMag(const ComplexStruct another) { TReal r1 = r*r - i*i + another.r; TReal i1 = 2.0*i*r + another.i; r = r1; i = i1; return (r1*r1 + i1*i1); } } int juliaFunction( int x, int y ) { auto c = ComplexStruct(0.8, 0.156); auto a = ComplexStruct(x, y); foreach (i; 0 .. 200) { size_t idx = randomAcc[DIM * x + y + i]; squares[idx] = a.squarePlusMag(c); if (squares[idx] > 1000) return 0; } return 1; } void kernel() { foreach (x; 0 .. DIM) { foreach (y; 0 .. DIM) { juliaValue = juliaFunction( x, y ); } } } } void main() { writeln("D code serial with dimension " ~ toStringNow!DIM ~ " ..."); StopWatch sw; foreach (Math; TypeTuple!(float, double, real)) { sw.start(); Julia!(Math).kernel(); sw.stop(); writefln(" using %ss Total time: %s [sec]", Math.stringof, (sw.peek().msecs * 0.001)); sw.reset(); } } -- Marco
Re: Finding large difference b/w execution time of c++ and D codes for same problem
On 02/13/2013 04:17 PM, FG wrote: Good point about choosing the right type of floating point numbers. Conclusion: when there's enough space, always pick double over float. Tested with GDC in win64. floats: 16.0s / doubles: 14.1s / reals: 11.2s. I thought to myself: cool, I almost beat the 13.4s I got with C++, until I changed the C++ code to also use doubles and... got a massive speedup: 7.1s! Yea, ditto for C++: 5.3 sec with double, 9.3 with float (using g++ -O3).
Re: Finding large difference b/w execution time of c++ and D codes for same problem
On 2013-02-13 16:26, Marco Leise wrote: I'd still bet a dollar that with an array of values floats would outperform doubles, when cache misses happen. (E.g. more or less random memory access.) I'll play it safe and only bet my opDollar. :)
Re: Finding large difference b/w execution time of c++ and D codes for same problem
Am Wed, 13 Feb 2013 15:45:13 +0100 schrieb Joseph Rushton Wakeling : > On 02/13/2013 03:29 PM, Marco Leise wrote: > > They are actual storage in memory, where every increase in > > size hurts. > > When I replaced with TReal, it sped things up for double. Oh this gets even better... I only added double as last step to that code, so I didn't notice this effect. Looks like we've got: - CPUs that are good at converting to double - 64-bit, so the size of a double matches - only 16 bytes of memory in total With double struct fields the 'double' case gains 50% speed for me, making it the overall fastest now (on LDC). I'd still bet a dollar that with an array of values floats would outperform doubles, when cache misses happen. (E.g. more or less random memory access.) -- Marco
Re: Finding large difference b/w execution time of c++ and D codes for same problem
Good point about choosing the right type of floating point numbers. Conclusion: when there's enough space, always pick double over float. Tested with GDC in win64. floats: 16.0s / doubles: 14.1s / reals: 11.2s. I thought to myself: cool, I almost beat the 13.4s I got with C++, until I changed the C++ code to also use doubles and... got a massive speedup: 7.1s!
Re: Finding large difference b/w execution time of c++ and D codes for same problem
Am Wed, 13 Feb 2013 15:45:13 +0100 schrieb Joseph Rushton Wakeling : > On 02/13/2013 03:29 PM, Marco Leise wrote: > > They are actual storage in memory, where every increase in > > size hurts. > > When I replaced with TReal, it sped things up for double. Give me that stuff, your northbridge is on! But I still want to rule out the LLVM version, since GDC seems to produce code with similar runtime on both our systems, but LDC2 divergess so much. -- Marco
Re: Finding large difference b/w execution time of c++ and D codes for same problem
On 02/13/2013 03:56 PM, Marco Leise wrote: Ok, I get pretty much the same numbers as before with: ldmd2 -O -inline -release It's even a bit faster than my lng command line. My experience has been that the higher -O values of ldc don't do much, but of course, that's going to vary depending on your code. I think above -O3 it's all link-time, no? Do these numbers tell us, that there are such huge differences in the handling of floating point value between different AMD64 CPUs? I can't quite make a rhyme of it yet. AMD vs Intel might make a difference (my machine is an i7). What version of LLVM are you using, mine is 3.1. 3.0 is minimum and 3.2 is recommended for LDC2. LLVM 3.2. _THAT_ I can reproduce with GDC! : D code serial with dimension 32768 ... using floats Total time: 24.415 [sec] using doubles Total time: 23.268 [sec] using reals Total time: 25.168 [sec] It's the exact same pattern. I've never, EVER had ldc-compiled code run four times faster than GDC-compiled code. In fact, I don't think I've ever had LDC-compiled code run faster than GDC-compiled code at all, except where the choice of optimizations was different. That's what makes me concerned that there's some kind of bug in play here
Re: Finding large difference b/w execution time of c++ and D codes for same problem
Am Wed, 13 Feb 2013 15:00:21 +0100 schrieb Joseph Rushton Wakeling : > Compiling with ldmd2 -O -inline -release on 64-bit Ubuntu, latest from-GitHub > LDC, LLVM 3.2: > >D code serial with dimension 32768 ... > using floats Total time: 4.751 [sec] > using doubles Total time: 4.362 [sec] > using reals Total time: 5.95 [sec] Ok, I get pretty much the same numbers as before with: ldmd2 -O -inline -release It's even a bit faster than my lng command line. Do these numbers tell us, that there are such huge differences in the handling of floating point value between different AMD64 CPUs? I can't quite make a rhyme of it yet. What version of LLVM are you using, mine is 3.1. 3.0 is minimum and 3.2 is recommended for LDC2. > Using double is indeed marginally faster than float, but real is slower than > both. > > What's disturbing is that when compiled instead with gdmd -O -inline -release > the code is dramatically slower: > >D code serial with dimension 32768 ... > using floats Total time: 22.108 [sec] > using doubles Total time: 21.203 [sec] > using reals Total time: 23.717 [sec] > > It's the first time I've encountered such a dramatic difference between GDC > and > LDC, and I'm wondering whether it's down to a bug or some change between D > releases 2.060 and 2.061. _THAT_ I can reproduce with GDC! : D code serial with dimension 32768 ... using floats Total time: 24.415 [sec] using doubles Total time: 23.268 [sec] using reals Total time: 25.168 [sec] It's the exact same pattern. -- Marco
Re: Finding large difference b/w execution time of c++ and D codes for same problem
On 02/13/2013 03:29 PM, Marco Leise wrote: They are actual storage in memory, where every increase in size hurts. When I replaced with TReal, it sped things up for double.
Re: Finding large difference b/w execution time of c++ and D codes for same problem
Am Wed, 13 Feb 2013 14:48:21 +0100 schrieb Joseph Rushton Wakeling : > On 02/13/2013 02:26 PM, Marco Leise wrote: > > You get both, 50% more speed and more precision! It is a > > win-win situation. Also take a look at Phobos' std.math that > > returns real everywhere. > > I have to say, it's not been my experience that using real improves speed. > Exactly what optimizations are you using when compiling? The target is Linux, AMD64 and the compiler arguments are: ldc2 -O5 -check-printf-calls -fdata-sections -ffunction-sections -release -singleobj -strip-debug -wi -L=--gc-sections -L=-s -- Marco
Re: Finding large difference b/w execution time of c++ and D codes for same problem
Am Wed, 13 Feb 2013 14:44:36 +0100 schrieb FG : > On 2013-02-13 14:26, Marco Leise wrote: > > template Julia(TReal) > > { > > struct ComplexStruct > > { > > float r; > > float i; > > ... > > Why aren't r and i of type TReal? They are actual storage in memory, where every increase in size hurts. And they cannot be optimized away, like temporary reals, which can be kept on the FPU stack. -- Marco
Re: Finding large difference b/w execution time of c++ and D codes for same problem
On 02/13/2013 02:26 PM, Marco Leise wrote: I compiled with LDC2 and these are the results: D code serial with dimension 32768 ... using floats Total time: 13.399 [sec] using doubles Total time: 9.429 [sec] using reals Total time: 8.909 [sec] // <- !!! You get both, 50% more speed and more precision! Compiling with ldmd2 -O -inline -release on 64-bit Ubuntu, latest from-GitHub LDC, LLVM 3.2: D code serial with dimension 32768 ... using floats Total time: 4.751 [sec] using doubles Total time: 4.362 [sec] using reals Total time: 5.95 [sec] Using double is indeed marginally faster than float, but real is slower than both. What's disturbing is that when compiled instead with gdmd -O -inline -release the code is dramatically slower: D code serial with dimension 32768 ... using floats Total time: 22.108 [sec] using doubles Total time: 21.203 [sec] using reals Total time: 23.717 [sec] It's the first time I've encountered such a dramatic difference between GDC and LDC, and I'm wondering whether it's down to a bug or some change between D releases 2.060 and 2.061.
Re: Finding large difference b/w execution time of c++ and D codes for same problem
On 02/13/2013 02:26 PM, Marco Leise wrote: You get both, 50% more speed and more precision! It is a win-win situation. Also take a look at Phobos' std.math that returns real everywhere. I have to say, it's not been my experience that using real improves speed. Exactly what optimizations are you using when compiling?
Re: Finding large difference b/w execution time of c++ and D codes for same problem
On 2013-02-13 14:26, Marco Leise wrote: template Julia(TReal) { struct ComplexStruct { float r; float i; ... Why aren't r and i of type TReal?
Re: Finding large difference b/w execution time of c++ and D codes for same problem
I like optimization challenges. This is an excellent test program to check the effect of different floating point types on intermediate values. Remember that when you store values in a float variable, the FPU actually has to round it down to that precision, store it in a 32-bit memory location, then load it back in and expand it - you _asked_ for that. I compiled with LDC2 and these are the results: D code serial with dimension 32768 ... using floats Total time: 13.399 [sec] using doubles Total time: 9.429 [sec] using reals Total time: 8.909 [sec] // <- !!! You get both, 50% more speed and more precision! It is a win-win situation. Also take a look at Phobos' std.math that returns real everywhere. Modified code: ---8<--- module main; import std.datetime; import std.metastrings; import std.stdio; import std.typetuple; enum DIM = 32 * 1024; int juliaValue; template Julia(TReal) { struct ComplexStruct { float r; float i; TReal squarePlusMag(const ComplexStruct another) { TReal r1 = r*r - i*i + another.r; TReal i1 = 2.0*i*r + another.i; r = r1; i = i1; return (r1*r1 + i1*i1); } } int juliaFunction( int x, int y ) { auto c = ComplexStruct(0.8, 0.156); auto a = ComplexStruct(x, y); foreach (i; 0 .. 200) if (a.squarePlusMag(c) > 1000) return 0; return 1; } void kernel() { foreach (x; 0 .. DIM) { foreach (y; 0 .. DIM) { juliaValue = juliaFunction( x, y ); } } } } void main() { writeln("D code serial with dimension " ~ toStringNow!DIM ~ " ..."); StopWatch sw; foreach (Math; TypeTuple!(float, double, real)) { sw.start(); Julia!(Math).kernel(); sw.stop(); writefln(" using %ss Total time: %s [sec]", Math.stringof, (sw.peek().msecs * 0.001)); sw.reset(); } } --->8--- -- Marco
Re: Finding large difference b/w execution time of c++ and D codes for same problem
On 02/12/2013 11:17 PM, FG wrote: Winblows and DMD 32-bit, the rest 64-bit, but still, dmd was quite fast. Interesting how gdc -O3 gave no extra boost vs. -O2. ... try adding -frelease to the gdc call?
Re: Finding large difference b/w execution time of c++ and D codes for same problem
Well technically it was that much faster because it did optimize away the useless calcOn Tuesday, 12 February 2013 at 23:31:17 UTC, FG wrote: On 2013-02-13 00:06, Sparsh Mittal wrote: I had a look, but first had to make juliaValue global, because g++ had optimized all the calculations away. Brilliant! Yes, that is why the time was coming out to be zero, regardless of what value of DIM I put. Thank you very very much. LOL. For a while you thought that C++ could be that much faster than D? :D Well technically it's not that C++ is faster than D or visa-versa, it's that the two compilers did different optimizations, and in this case one of the optimizations that g++ did (removing redundancies) had a large effect on the outcome. It's entirely possible that DMD can still beat g++ under different circumstances. --rt
Re: Finding large difference b/w execution time of c++ and D codes for same problem
LOL. For a while you thought that C++ could be that much faster than D? :D I was stunned and shared it with others who could not find. It was like a scientist discovering a phenomenon which is against established laws. Good that I was wrong and a right person pointed it.
Re: Finding large difference b/w execution time of c++ and D codes for same problem
On 2013-02-13 00:06, Sparsh Mittal wrote: I had a look, but first had to make juliaValue global, because g++ had optimized all the calculations away. Brilliant! Yes, that is why the time was coming out to be zero, regardless of what value of DIM I put. Thank you very very much. LOL. For a while you thought that C++ could be that much faster than D? :D
Re: Finding large difference b/w execution time of c++ and D codes for same problem
I had a look, but first had to make juliaValue global, because g++ had optimized all the calculations away. Brilliant! Yes, that is why the time was coming out to be zero, regardless of what value of DIM I put. Thank you very very much.
Re: Finding large difference b/w execution time of c++ and D codes for same problem
On 2013-02-12 21:39, Sparsh Mittal wrote: I am finding C++ code is much faster than D code. I had a look, but first had to make juliaValue global, because g++ had optimized all the calculations away. :) Also changed DIM to 32 * 1024. 13.2s -- g++ -O3 16.0s -- g++ -O2 15.9s -- gdc -O3 15.9s -- gdc -O2 16.2s -- dmd -O -release -inline(v.2.060) Winblows and DMD 32-bit, the rest 64-bit, but still, dmd was quite fast. Interesting how gdc -O3 gave no extra boost vs. -O2.
Re: Finding large difference b/w execution time of c++ and D codes for same problem
Thanks for your insights. It was very helpful.
Re: Finding large difference b/w execution time of c++ and D codes for same problem
13-Feb-2013 01:09, Sparsh Mittal пишет: Pardon me, can you please point me to suitable reference or tell just command here. Searching on google, I could not find anything yet. Performance is my main concern. GDC, seems like its mostly "build from source" kind of thing. Moved to gitbub: https://github.com/D-Programming-GDC (See also newsgroup digitalmars.d.D.gnu) GDC binaries for Windows TDM-GCC toolchain are still available there: https://bitbucket.org/goshawk/gdc/downloads AFAIK it needs 4.6.1 version of TDM toolset. LDC(2), recent release with binaries. https://github.com/downloads/ldc-developers/ldc/ldc-0.10.0-src.tar.gz https://github.com/downloads/ldc-developers/ldc/ldc2-0.10.0-linux-x86_64.tar.gz https://github.com/downloads/ldc-developers/ldc/ldc2-0.10.0-linux-x86_64.tar.xz https://github.com/downloads/ldc-developers/ldc/ldc2-0.10.0-linux-x86.tar.gz https://github.com/downloads/ldc-developers/ldc/ldc2-0.10.0-linux-x86.tar.xz https://github.com/downloads/ldc-developers/ldc/ldc2-0.10.0-osx-x86_64.tar.gz https://github.com/downloads/ldc-developers/ldc/ldc2-0.10.0-osx-x86_64.tar.xz (See also announce on the newsgroup digitalmars.d.D.ldc) Both compilers ship dmd-style compiler driver called gdmd or ldmd2. Speed is mostly what you'd expect of GCC and LLVM respectively. -- Dmitry Olshansky
Re: Finding large difference b/w execution time of c++ and D codes for same problem
OK. I found it.
Re: Finding large difference b/w execution time of c++ and D codes for same problem
On Wed, Feb 13, 2013 at 12:56:01AM +0400, Dmitry Olshansky wrote: > 13-Feb-2013 00:39, Sparsh Mittal пишет: > >I am finding C++ code is much faster than D code. > > Seems like DMD's floating point issue. The issue being that it > always works with floats as full-width reals + rounding. Basically > if nothing changed (and I doubt it changed) then DMD with floating > point code is about two (or more) times slower then GDC/LDC. > > The cure is using GDC/LDC compiler as they are pretty stable and up > to date on the front-end side these days. [...] I did a few benchmarks somewhat recently where I compared the performance of code produced by GDC with DMD. Code produced by GDC consistently outperforms code produced by DMD by about 20-30% or so. This is across the board, with both floats, reals, and applications that don't do heavy arithmetic (just basic looping/recursion constructs). I didn't investigate in detail the cause of this difference, but the last time I looked at the assembly code generated by both compilers, I noticed that GDC's optimizer is far more advanced than DMD's, esp. when it comes to loop-unrolling, strength reduction, inlining, etc.. For non-trivial code, GDC pretty much consistently produces superior code in general (not just in floating-point operations). So if performance is a concern, I'd say definitely look into GDC or LDC instead of DMD. T -- Two wrongs don't make a right; but three rights do make a left...
Re: Finding large difference b/w execution time of c++ and D codes for same problem
Pardon me, can you please point me to suitable reference or tell just command here. Searching on google, I could not find anything yet. Performance is my main concern.
Re: Finding large difference b/w execution time of c++ and D codes for same problem
13-Feb-2013 00:39, Sparsh Mittal пишет: I am finding C++ code is much faster than D code. Seems like DMD's floating point issue. The issue being that it always works with floats as full-width reals + rounding. Basically if nothing changed (and I doubt it changed) then DMD with floating point code is about two (or more) times slower then GDC/LDC. The cure is using GDC/LDC compiler as they are pretty stable and up to date on the front-end side these days. -- Dmitry Olshansky
Re: Finding large difference b/w execution time of c++ and D codes for same problem
On Tuesday, 12 February 2013 at 20:39:36 UTC, Sparsh Mittal wrote: I am finding C++ code is much faster than D code. dmd (AFAIK) is known to be slower. try LDC or GDC if speed is your major concern.
Re: Finding large difference b/w execution time of c++ and D codes for same problem
I am finding C++ code is much faster than D code.
Finding large difference b/w execution time of c++ and D codes for same problem
I am writing Julia sets program in C++ and D; exactly same way as much as possible. On executing I find large difference in their execution time. Can you comment what wrong am I doing or is it expected? //===C++ code, compiled with -O3 == #include #include using namespace std; const int DIM= 4194304; struct complexClass { float r; float i; complexClass( float a, float b ) { r = a; i = b; } float squarePlusMag(complexClass another) { float r1 = r*r - i*i + another.r; float i1 = 2.0*i*r + another.i; r = r1; i = i1; return (r1*r1+ i1*i1); } }; int juliaFunction( int x, int y ) { complexClass a (x,y); complexClass c(-0.8, 0.156); int i = 0; for (i=0; i<200; i++) { if( a.squarePlusMag(c) > 1000) return 0; } return 1; } void kernel( ){ for (int x=0; x cout<<" C++ code with dimension " << DIM <<" Total time: "<< delta << "[sec]\n"; } //=D++ code, compiled with -O -release -inline= #!/usr/bin/env rdmd import std.stdio; import std.datetime; immutable int DIM= 4194304; struct complexClass { float r; float i; float squarePlusMag(complexClass another) { float r1 = r*r - i*i + another.r; float i1 = 2.0*i*r + another.i; r = r1; i = i1; return (r1*r1+ i1*i1); } }; int juliaFunction( int x, int y ) { complexClass c = complexClass(0.8, 0.156); complexClass a= complexClass(x, y); for (int i=0; i<200; i++) { if( a.squarePlusMag(c) > 1000) return 0; } return 1; } void kernel( ){ for (int x=0; x writeln(" D code serial with dimension ", DIM ," Total time: ", (sw.peek().msecs/1000), "[sec]"); } // I will appreciate any help.