Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-14 Thread Sparsh Mittal

Thanks a lot for your reply.


Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread jerro
When you are comparing LDC and GDC, you should either use 
-mcpu=generic for ldc or -march=native for GDC, because their 
default targets are different. GDC will produce code that works 
on most x86_64 (if you are on a x86_64 system) CPUs by default, 
and LDC targets the host CPU. But this does not explain the 
difference in timings you are seeing here.


One reason why the code generaged by GDC is slower is that 
squarePlusMag isn't inlined. It seems that the fact that its 
parameter is const is somehow preventing it from being inlined - 
I have no idea why. Removing const and adding -march=native to 
gdc flags gives me:


gdc -O3 -finline-functions -frelease tmp.d -o tmp -march=native:
  using floats Total time: 8.283 [sec]
  using doubles Total time: 6.827 [sec]
  using reals Total time: 6.795 [sec]

ldc2 -O3  -release -singleobj tmp.d -oftmp:
  using floats Total time: 3.348 [sec]
  using doubles Total time: 3.08 [sec]
  using reals Total time: 4.174 [sec]

The difference is smaller, but still pretty large.

I have noticed that there are needless conversions in this code 
that are slowing down both GDC generated and LDC generated code. 
This code is a bit faster:


module main;

import std.datetime;
import std.metastrings;
import std.stdio;
import std.typetuple;


enum DIM = 32 * 1024;

int juliaValue;

template Julia(TReal)
{
struct ComplexStruct
{
TReal r;
TReal i;

TReal squarePlusMag(ComplexStruct another)
{
TReal r1 = r*r - i*i + another.r;
TReal i1 = cast(TReal)2.0*i*r + another.i;

r = r1;
i = i1;

return (r1*r1 + i1*i1);
}
}

int juliaFunction( int x, int y )
{
auto c = ComplexStruct(0.8, 0.156);
auto a = ComplexStruct(x, y);

foreach (i; 0 .. 200)
if (a.squarePlusMag(c) > cast(TReal) 1000)
return 0;
return 1;
}

void kernel()
{
foreach (x; 0 .. DIM) {
foreach (y; 0 .. DIM) {
juliaValue = juliaFunction( x, y );
}
}
}
}

void main()
{
writeln("D code serial with dimension " ~ toStringNow!DIM ~ " 
...");

StopWatch sw;
foreach (Math; TypeTuple!(float, double, real))
{
sw.start();
Julia!(Math).kernel();
sw.stop();
writefln("  using %ss Total time: %s [sec]",
 Math.stringof, (sw.peek().msecs * 0.001));
sw.reset();
}
}

This gives me:

gdc -O3 -finline-functions -frelease tmp.d -o tmp -march=native:
  using floats Total time: 6.746 [sec]
  using doubles Total time: 6.872 [sec]
  using reals Total time: 5.226 [sec]

ldc2 -O3  -release -singleobj tmp.d -oftmp:
  using floats Total time: 2.36 [sec]
  using doubles Total time: 2.535 [sec]
  using reals Total time: 4.106 [sec]

At least part of the difference is due to the fact that 
juliaFunction still isn't getting inlined (but squarePlusMag is). 
Making juliaFunction a static method of ComplexStruct causes it 
to get inlined (again, I have no idea why). Moving juliaFunction 
inside ComplexStruct does not affect the performance of LDC 
generated code, but for GDC it gives me:


  using floats Total time: 4.262 [sec]
  using doubles Total time: 4.251 [sec]
  using reals Total time: 3.512 [sec]

There is still a large difference between LDC and GDC four floats 
and doubles and I can't explain it. But at least it is much 
smaller than it was initially.


I ran all the benchmarks on 64 bit linux, using core i5 2500k.


Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread Marco Leise
Am Wed, 13 Feb 2013 18:10:47 +0100
schrieb Joseph Rushton Wakeling :

> Just to update on times.  I was running another large job at the same time as 
> doing all these tests, so there was some slowdown.  Current results are:
> 
> -- with g++ -O3 and using double rather than float: about 4.3 s
> 
> -- with clang++ -O3 and using double rather than float: about 3.1 s
> 
> -- with gdmd -O -release -inline:
> 
>  D code serial with dimension 32768 ...
>using floats Total time: 17.179 [sec], Julia value: 0
>using doubles Total time: 10.298 [sec], Julia value: 0
>using reals Total time: 17.126 [sec], Julia value: 0
> 
> -- with ldmd2 -O -release -inline:
> 
>  D code serial with dimension 32768 ...
>using floats Total time: 3.548 [sec], Julia value: 0
>using doubles Total time: 2.708 [sec], Julia value: 0
>using reals Total time: 4.371 [sec], Julia value: 0
> 
> -- with dmd -O -release -inline:
> 
>  D code serial with dimension 32768 ...
>using floats Total time: 15.696 [sec], Julia value: 0
>using doubles Total time: 7.233 [sec], Julia value: 0
>using reals Total time: 28.71 [sec], Julia value: 0
> 
> You'll note that I added a writeout of the global juliaValue in order to 
> check 
> that certain calculations weren't being optimized away.
> 
> It's striking that in this case GDC is slower not only than LDC but also DMD. 
> Current GDC is based off 2.060 as far as I know, whereas current LDC has 
> upgraded to 2.061, so are there some changes between D 2.060 and 2.061 that 
> could explain this?

???
Anyways I upgraded to LLVM 3.2 - no change. You have an i7, I
have a Core2. It would be really interesting to know what LDC
does there. Since GDC's output seems rather CPU agnostic and
LDC's output is better in every case but also exhibits system
specific details so harshly I would never have imagined
possible. Should Intel have changed their CPU design so
radically?

> It's also interesting that clang++ produces a faster executable than g++, but 
> it's not possible to make a direct LLVM vs GCC comparison here, as g++ is GCC 
> 4.7.2 whereas GDC is based off a GCC snapshot.

I've compiled GDC based on the same source that the Gentoo
package manager built G++ 4.7.2 from and, I get similar
numbers.

> My guess would be that it's some combination of LLVM superiority in a 
> particular 
> case here, together with some 2.060 --> 2.061.
> 
> Are these results comparable to what other people are getting?
> 
> I can confirm that where code of mine is concerned, GDC still seems to have 
> the 
> edge in terms of executable speed ...

I've seen a tête à tête between LDC and GDC in some of my
code.

-- 
Marco



Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread Joseph Rushton Wakeling

On 02/13/2013 04:41 PM, Joseph Rushton Wakeling wrote:

On 02/13/2013 04:17 PM, FG wrote:

Good point about choosing the right type of floating point numbers.
Conclusion: when there's enough space, always pick double over float.
Tested with GDC in win64. floats: 16.0s / doubles: 14.1s / reals: 11.2s.
I thought to myself: cool, I almost beat the 13.4s I got with C++, until I
changed the C++ code to also use doubles and... got a massive speedup: 7.1s!


Yea, ditto for C++: 5.3 sec with double, 9.3 with float (using g++ -O3).


Just to update on times.  I was running another large job at the same time as 
doing all these tests, so there was some slowdown.  Current results are:


-- with g++ -O3 and using double rather than float: about 4.3 s

-- with clang++ -O3 and using double rather than float: about 3.1 s

-- with gdmd -O -release -inline:

D code serial with dimension 32768 ...
  using floats Total time: 17.179 [sec], Julia value: 0
  using doubles Total time: 10.298 [sec], Julia value: 0
  using reals Total time: 17.126 [sec], Julia value: 0

-- with ldmd2 -O -release -inline:

D code serial with dimension 32768 ...
  using floats Total time: 3.548 [sec], Julia value: 0
  using doubles Total time: 2.708 [sec], Julia value: 0
  using reals Total time: 4.371 [sec], Julia value: 0

-- with dmd -O -release -inline:

D code serial with dimension 32768 ...
  using floats Total time: 15.696 [sec], Julia value: 0
  using doubles Total time: 7.233 [sec], Julia value: 0
  using reals Total time: 28.71 [sec], Julia value: 0

You'll note that I added a writeout of the global juliaValue in order to check 
that certain calculations weren't being optimized away.


It's striking that in this case GDC is slower not only than LDC but also DMD. 
Current GDC is based off 2.060 as far as I know, whereas current LDC has 
upgraded to 2.061, so are there some changes between D 2.060 and 2.061 that 
could explain this?


It's also interesting that clang++ produces a faster executable than g++, but 
it's not possible to make a direct LLVM vs GCC comparison here, as g++ is GCC 
4.7.2 whereas GDC is based off a GCC snapshot.


My guess would be that it's some combination of LLVM superiority in a particular 
case here, together with some 2.060 --> 2.061.


Are these results comparable to what other people are getting?

I can confirm that where code of mine is concerned, GDC still seems to have the 
edge in terms of executable speed ...


Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread Marco Leise
Am Wed, 13 Feb 2013 16:17:12 +0100
schrieb FG :

> Good point about choosing the right type of floating point numbers.
> Conclusion: when there's enough space, always pick double over float.
> Tested with GDC in win64. floats: 16.0s / doubles: 14.1s / reals: 11.2s.
> I thought to myself: cool, I almost beat the 13.4s I got with C++, until I 
> changed the C++ code to also use doubles and... got a massive speedup: 7.1s!

Yeah we are living in the 32-bit past ;)

Still, be aware that we only write to 2 memory locations in
that program!
We have neither exceeded the L1 cache size with that nor have
we put any strain on the prefetcher and memory bandwidth.
With the modification below it is more clear why I said "use
float for storage". The result with LDC2 for me is:

D code serial with dimension 8192 ...
  using floats Total time: 4.235 [sec]
  using doubles Total time: 5.58 [sec] // ~+32% over float
  using reals Total time: 6.432 [sec]

So all the in-CPU performance gain from using doubles is more
than lost, when you run out of bandwidth.

---8<---

module main;

import std.datetime;
import std.metastrings;
import std.stdio;
import std.typetuple;
import std.random;
import core.stdc.stdlib;


enum DIM = 8 * 1024;

int juliaValue;

size_t* randomAcc;

static this()
{
randomAcc = cast(size_t*) malloc((DIM * DIM + 200) * size_t.sizeof);
foreach (i; 0 .. DIM * DIM)
randomAcc[i] = i;
randomAcc[0 .. DIM * DIM].randomShuffle();
randomAcc[DIM * DIM .. DIM * DIM + 200] = randomAcc[0 .. 200];
}

static ~this() { free(randomAcc); }

template Julia(TReal)
{
TReal* squares;

static this() { squares = cast(TReal*) malloc(DIM * DIM * 
TReal.sizeof); }

static ~this() { free(squares); }

struct ComplexStruct
{
TReal r;
TReal i;

TReal squarePlusMag(const ComplexStruct another)
{
TReal r1 = r*r - i*i + another.r;
TReal i1 = 2.0*i*r + another.i;

r = r1;
i = i1;

return (r1*r1 + i1*i1);
}
}

int juliaFunction( int x, int y )
{
auto c = ComplexStruct(0.8, 0.156);
auto a = ComplexStruct(x, y);

foreach (i; 0 .. 200) {
size_t idx = randomAcc[DIM * x + y + i];
squares[idx] = a.squarePlusMag(c);
if (squares[idx] > 1000)
return 0;
}
return 1;
}

void kernel()
{
foreach (x; 0 .. DIM) {
foreach (y; 0 .. DIM) {
juliaValue = juliaFunction( x, y );
}
}
}
}

void main()
{
writeln("D code serial with dimension " ~ toStringNow!DIM ~ " ...");
StopWatch sw;
foreach (Math; TypeTuple!(float, double, real))
{
sw.start();
Julia!(Math).kernel();
sw.stop();
writefln("  using %ss Total time: %s [sec]",
 Math.stringof, (sw.peek().msecs * 0.001));
sw.reset();
}
}
-- 
Marco



Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread Joseph Rushton Wakeling

On 02/13/2013 04:17 PM, FG wrote:

Good point about choosing the right type of floating point numbers.
Conclusion: when there's enough space, always pick double over float.
Tested with GDC in win64. floats: 16.0s / doubles: 14.1s / reals: 11.2s.
I thought to myself: cool, I almost beat the 13.4s I got with C++, until I
changed the C++ code to also use doubles and... got a massive speedup: 7.1s!


Yea, ditto for C++: 5.3 sec with double, 9.3 with float (using g++ -O3).


Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread FG

On 2013-02-13 16:26, Marco Leise wrote:

I'd still bet a dollar that with an array of values floats would
outperform doubles, when cache misses happen. (E.g. more or
less random memory access.)


I'll play it safe and only bet my opDollar. :)



Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread Marco Leise
Am Wed, 13 Feb 2013 15:45:13 +0100
schrieb Joseph Rushton Wakeling :

> On 02/13/2013 03:29 PM, Marco Leise wrote:
> > They are actual storage in memory, where every increase in
> > size hurts.
> 
> When I replaced with TReal, it sped things up for double.

Oh this gets even better... I only added double as last step
to that code, so I didn't notice this effect. Looks like we've
got:

- CPUs that are good at converting to double
- 64-bit, so the size of a double matches
- only 16 bytes of memory in total

With double struct fields the 'double' case gains 50% speed
for me, making it the overall fastest now (on LDC). I'd still
bet a dollar that with an array of values floats would
outperform doubles, when cache misses happen. (E.g. more or
less random memory access.)

-- 
Marco



Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread FG

Good point about choosing the right type of floating point numbers.
Conclusion: when there's enough space, always pick double over float.
Tested with GDC in win64. floats: 16.0s / doubles: 14.1s / reals: 11.2s.
I thought to myself: cool, I almost beat the 13.4s I got with C++, until I 
changed the C++ code to also use doubles and... got a massive speedup: 7.1s!


Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread Marco Leise
Am Wed, 13 Feb 2013 15:45:13 +0100
schrieb Joseph Rushton Wakeling :

> On 02/13/2013 03:29 PM, Marco Leise wrote:
> > They are actual storage in memory, where every increase in
> > size hurts.
> 
> When I replaced with TReal, it sped things up for double.

Give me that stuff, your northbridge is on!
But I still want to rule out the LLVM version, since GDC seems
to produce code with similar runtime on both our systems, but
LDC2 divergess so much.

-- 
Marco



Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread Joseph Rushton Wakeling

On 02/13/2013 03:56 PM, Marco Leise wrote:

Ok, I get pretty much the same numbers as before with:
   ldmd2 -O -inline -release
It's even a bit faster than my lng command line.


My experience has been that the higher -O values of ldc don't do much, but of 
course, that's going to vary depending on your code.  I think above -O3 it's all 
link-time, no?



Do these numbers tell us, that there are such huge differences
in the handling of floating point value between different
AMD64 CPUs? I can't quite make a rhyme of it yet.


AMD vs Intel might make a difference (my machine is an i7).


What version of LLVM are you using, mine is 3.1. 3.0 is
minimum and 3.2 is recommended for LDC2.


LLVM 3.2.


_THAT_ I can reproduce with GDC! :

D code serial with dimension 32768 ...
   using floats Total time: 24.415 [sec]
   using doubles Total time: 23.268 [sec]
   using reals Total time: 25.168 [sec]

It's the exact same pattern.


I've never, EVER had ldc-compiled code run four times faster than GDC-compiled 
code.  In fact, I don't think I've ever had LDC-compiled code run faster than 
GDC-compiled code at all, except where the choice of optimizations was 
different.  That's what makes me concerned that there's some kind of bug in play 
here 




Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread Marco Leise
Am Wed, 13 Feb 2013 15:00:21 +0100
schrieb Joseph Rushton Wakeling :

> Compiling with ldmd2 -O -inline -release on 64-bit Ubuntu, latest from-GitHub 
> LDC, LLVM 3.2:
> 
>D code serial with dimension 32768 ...
>  using floats Total time: 4.751 [sec]
>  using doubles Total time: 4.362 [sec]
>  using reals Total time: 5.95 [sec]

Ok, I get pretty much the same numbers as before with:
  ldmd2 -O -inline -release
It's even a bit faster than my lng command line.
Do these numbers tell us, that there are such huge differences
in the handling of floating point value between different
AMD64 CPUs? I can't quite make a rhyme of it yet.
What version of LLVM are you using, mine is 3.1. 3.0 is
minimum and 3.2 is recommended for LDC2.

> Using double is indeed marginally faster than float, but real is slower than 
> both.
> 
> What's disturbing is that when compiled instead with gdmd -O -inline -release 
> the code is dramatically slower:
> 
>D code serial with dimension 32768 ...
>  using floats Total time: 22.108 [sec]
>  using doubles Total time: 21.203 [sec]
>  using reals Total time: 23.717 [sec]
> 
> It's the first time I've encountered such a dramatic difference between GDC 
> and 
> LDC, and I'm wondering whether it's down to a bug or some change between D 
> releases 2.060 and 2.061.

_THAT_ I can reproduce with GDC! :

D code serial with dimension 32768 ...
  using floats Total time: 24.415 [sec]
  using doubles Total time: 23.268 [sec]
  using reals Total time: 25.168 [sec]

It's the exact same pattern.

-- 
Marco



Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread Joseph Rushton Wakeling

On 02/13/2013 03:29 PM, Marco Leise wrote:

They are actual storage in memory, where every increase in
size hurts.


When I replaced with TReal, it sped things up for double.



Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread Marco Leise
Am Wed, 13 Feb 2013 14:48:21 +0100
schrieb Joseph Rushton Wakeling :

> On 02/13/2013 02:26 PM, Marco Leise wrote:
> > You get both, 50% more speed and more precision! It is a
> > win-win situation. Also take a look at Phobos' std.math that
> > returns real everywhere.
> 
> I have to say, it's not been my experience that using real improves speed. 
> Exactly what optimizations are you using when compiling?

The target is Linux, AMD64 and the compiler arguments are:

ldc2 -O5 -check-printf-calls -fdata-sections -ffunction-sections -release 
-singleobj -strip-debug -wi -L=--gc-sections -L=-s

-- 
Marco



Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread Marco Leise
Am Wed, 13 Feb 2013 14:44:36 +0100
schrieb FG :

> On 2013-02-13 14:26, Marco Leise wrote:
> > template Julia(TReal)
> > {
> > struct ComplexStruct
> > {
> > float r;
> > float i;
> > ... 
> 
> Why aren't r and i of type TReal?

They are actual storage in memory, where every increase in
size hurts. And they cannot be optimized away,
like temporary reals, which can be kept on the FPU stack.

-- 
Marco



Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread Joseph Rushton Wakeling

On 02/13/2013 02:26 PM, Marco Leise wrote:

I compiled with LDC2 and these are the results:

D code serial with dimension 32768 ...
   using floats  Total time: 13.399 [sec]
   using doubles Total time:  9.429 [sec]
   using reals   Total time:  8.909 [sec] // <- !!!

You get both, 50% more speed and more precision!


Compiling with ldmd2 -O -inline -release on 64-bit Ubuntu, latest from-GitHub 
LDC, LLVM 3.2:


  D code serial with dimension 32768 ...
using floats Total time: 4.751 [sec]
using doubles Total time: 4.362 [sec]
using reals Total time: 5.95 [sec]

Using double is indeed marginally faster than float, but real is slower than 
both.

What's disturbing is that when compiled instead with gdmd -O -inline -release 
the code is dramatically slower:


  D code serial with dimension 32768 ...
using floats Total time: 22.108 [sec]
using doubles Total time: 21.203 [sec]
using reals Total time: 23.717 [sec]

It's the first time I've encountered such a dramatic difference between GDC and 
LDC, and I'm wondering whether it's down to a bug or some change between D 
releases 2.060 and 2.061.


Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread Joseph Rushton Wakeling

On 02/13/2013 02:26 PM, Marco Leise wrote:

You get both, 50% more speed and more precision! It is a
win-win situation. Also take a look at Phobos' std.math that
returns real everywhere.


I have to say, it's not been my experience that using real improves speed. 
Exactly what optimizations are you using when compiling?




Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread FG

On 2013-02-13 14:26, Marco Leise wrote:

template Julia(TReal)
{
struct ComplexStruct
{
float r;
float i;
... 


Why aren't r and i of type TReal?


Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread Marco Leise
I like optimization challenges. This is an excellent test
program to check the effect of different floating point types
on intermediate values. Remember that when you store values in
a float variable, the FPU actually has to round it down to
that precision, store it in a 32-bit memory location, then
load it back in and expand it - you _asked_ for that.

I compiled with LDC2 and these are the results:

D code serial with dimension 32768 ...
  using floats  Total time: 13.399 [sec]
  using doubles Total time:  9.429 [sec]
  using reals   Total time:  8.909 [sec] // <- !!!

You get both, 50% more speed and more precision! It is a
win-win situation. Also take a look at Phobos' std.math that
returns real everywhere.

Modified code:
---8<---

module main;

import std.datetime;
import std.metastrings;
import std.stdio;
import std.typetuple;


enum DIM = 32 * 1024;

int juliaValue;

template Julia(TReal)
{
struct ComplexStruct
{
float r;
float i;

TReal squarePlusMag(const ComplexStruct another)
{
TReal r1 = r*r - i*i + another.r;
TReal i1 = 2.0*i*r + another.i;

r = r1;
i = i1;

return (r1*r1 + i1*i1);
}
}

int juliaFunction( int x, int y )
{
auto c = ComplexStruct(0.8, 0.156);
auto a = ComplexStruct(x, y);

foreach (i; 0 .. 200)
if (a.squarePlusMag(c) > 1000)
return 0;
return 1;
}

void kernel()
{
foreach (x; 0 .. DIM) {
foreach (y; 0 .. DIM) {
juliaValue = juliaFunction( x, y );
}
}
}
}

void main()
{
writeln("D code serial with dimension " ~ toStringNow!DIM ~ " ...");
StopWatch sw;
foreach (Math; TypeTuple!(float, double, real))
{
sw.start();
Julia!(Math).kernel();
sw.stop();
writefln("  using %ss Total time: %s [sec]",
 Math.stringof, (sw.peek().msecs * 0.001));
sw.reset();
}
}

--->8---

-- 
Marco



Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-13 Thread Joseph Rushton Wakeling

On 02/12/2013 11:17 PM, FG wrote:

Winblows and DMD 32-bit, the rest 64-bit, but still, dmd was quite fast.
Interesting how gdc -O3 gave no extra boost vs. -O2.


... try adding -frelease to the gdc call?



Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-12 Thread Rob T
Well technically it was that much faster because it did optimize 
away the useless calcOn Tuesday, 12 February 2013 at 23:31:17 
UTC, FG wrote:

On 2013-02-13 00:06, Sparsh Mittal wrote:


I had a look, but first had to make juliaValue global, 
because g++ had

optimized all the calculations away.


Brilliant! Yes, that is why the time was coming out to be 
zero, regardless of

what value of DIM I put. Thank you very very much.


LOL. For a while you thought that C++ could be that much faster 
than D?  :D


Well technically it's not that C++ is faster than D or 
visa-versa, it's that the two compilers did different 
optimizations, and in this case one of the optimizations that g++ 
did (removing redundancies) had a large effect on the outcome. 
It's entirely possible that DMD can still beat g++ under 
different circumstances.


--rt


Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-12 Thread Sparsh Mittal


LOL. For a while you thought that C++ could be that much faster 
than D?  :D
I was stunned and shared it with others who could not find. It 
was like a scientist discovering a phenomenon which is against 
established laws. Good that I was wrong and a right person 
pointed it.






Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-12 Thread FG

On 2013-02-13 00:06, Sparsh Mittal wrote:



I had a look, but first had to make juliaValue global, because g++ had
optimized all the calculations away.


Brilliant! Yes, that is why the time was coming out to be zero, regardless of
what value of DIM I put. Thank you very very much.


LOL. For a while you thought that C++ could be that much faster than D?  :D


Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-12 Thread Sparsh Mittal


I had a look, but first had to make juliaValue global, because 
g++ had optimized all the calculations away.


Brilliant! Yes, that is why the time was coming out to be zero, 
regardless of what value of DIM I put. Thank you very very much.


Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-12 Thread FG

On 2013-02-12 21:39, Sparsh Mittal wrote:

I am finding C++ code is much faster than D code.


I had a look, but first had to make juliaValue global, because g++ had optimized 
all the calculations away. :)  Also changed DIM to 32 * 1024.


13.2s -- g++ -O3
16.0s -- g++ -O2
15.9s -- gdc -O3
15.9s -- gdc -O2
16.2s -- dmd -O -release -inline(v.2.060)

Winblows and DMD 32-bit, the rest 64-bit, but still, dmd was quite fast.
Interesting how gdc -O3 gave no extra boost vs. -O2.



Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-12 Thread Sparsh Mittal

Thanks for your insights. It was very helpful.




Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-12 Thread Dmitry Olshansky

13-Feb-2013 01:09, Sparsh Mittal пишет:

Pardon me, can you please point me to suitable reference or tell just
command here. Searching on google, I could not find anything yet.
Performance is my main concern.




GDC, seems like its mostly "build from source" kind of thing.
Moved to gitbub:
https://github.com/D-Programming-GDC
(See also newsgroup digitalmars.d.D.gnu)

GDC binaries for Windows TDM-GCC toolchain are still available there:
https://bitbucket.org/goshawk/gdc/downloads

AFAIK it needs 4.6.1 version of TDM toolset.


LDC(2), recent release with binaries.

https://github.com/downloads/ldc-developers/ldc/ldc-0.10.0-src.tar.gz
https://github.com/downloads/ldc-developers/ldc/ldc2-0.10.0-linux-x86_64.tar.gz
https://github.com/downloads/ldc-developers/ldc/ldc2-0.10.0-linux-x86_64.tar.xz
https://github.com/downloads/ldc-developers/ldc/ldc2-0.10.0-linux-x86.tar.gz
https://github.com/downloads/ldc-developers/ldc/ldc2-0.10.0-linux-x86.tar.xz
https://github.com/downloads/ldc-developers/ldc/ldc2-0.10.0-osx-x86_64.tar.gz
https://github.com/downloads/ldc-developers/ldc/ldc2-0.10.0-osx-x86_64.tar.xz 



(See also announce on the newsgroup digitalmars.d.D.ldc)

Both compilers ship dmd-style compiler driver called gdmd or ldmd2.
Speed is mostly what you'd expect of GCC and LLVM respectively.

--
Dmitry Olshansky


Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-12 Thread Sparsh Mittal

OK. I found it.



Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-12 Thread H. S. Teoh
On Wed, Feb 13, 2013 at 12:56:01AM +0400, Dmitry Olshansky wrote:
> 13-Feb-2013 00:39, Sparsh Mittal пишет:
> >I am finding C++ code is much faster than D code.
> 
> Seems like DMD's floating point issue. The issue being that it
> always works with floats as full-width reals + rounding. Basically
> if nothing changed (and I doubt it changed) then  DMD with floating
> point code is about two (or more) times slower then GDC/LDC.
> 
> The cure is using GDC/LDC compiler as they are pretty stable and up
> to date on the front-end side these days.
[...]

I did a few benchmarks somewhat recently where I compared the
performance of code produced by GDC with DMD. Code produced by GDC
consistently outperforms code produced by DMD by about 20-30% or so.
This is across the board, with both floats, reals, and applications that
don't do heavy arithmetic (just basic looping/recursion constructs).

I didn't investigate in detail the cause of this difference, but the
last time I looked at the assembly code generated by both compilers, I
noticed that GDC's optimizer is far more advanced than DMD's, esp. when
it comes to loop-unrolling, strength reduction, inlining, etc.. For
non-trivial code, GDC pretty much consistently produces superior code in
general (not just in floating-point operations).

So if performance is a concern, I'd say definitely look into GDC or LDC
instead of DMD.


T

-- 
Two wrongs don't make a right; but three rights do make a left...


Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-12 Thread Sparsh Mittal
Pardon me, can you please point me to suitable reference or tell 
just command here. Searching on google, I could not find anything 
yet. Performance is my main concern.






Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-12 Thread Dmitry Olshansky

13-Feb-2013 00:39, Sparsh Mittal пишет:

I am finding C++ code is much faster than D code.


Seems like DMD's floating point issue. The issue being that it always 
works with floats as full-width reals + rounding. Basically if nothing 
changed (and I doubt it changed) then  DMD with floating point code is 
about two (or more) times slower then GDC/LDC.


The cure is using GDC/LDC compiler as they are pretty stable and up to 
date on the front-end side these days.


--
Dmitry Olshansky


Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-12 Thread monarch_dodra

On Tuesday, 12 February 2013 at 20:39:36 UTC, Sparsh Mittal wrote:

I am finding C++ code is much faster than D code.


dmd (AFAIK) is known to be slower. try LDC or GDC if speed is 
your major concern.


Re: Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-12 Thread Sparsh Mittal

I am finding C++ code is much faster than D code.


Finding large difference b/w execution time of c++ and D codes for same problem

2013-02-12 Thread Sparsh Mittal
I am writing Julia sets program in C++ and D; exactly same way as 
much as possible. On executing I find large difference in their 
execution time. Can you comment what wrong am I doing or is it 
expected?



//===C++ code, compiled with -O3 ==
#include 
#include 
using namespace std;
const  int DIM= 4194304;

struct complexClass {
  float r;
  float i;
  complexClass( float a, float b )
  {
r = a;
i = b;
  }


  float squarePlusMag(complexClass another)
  {
float r1 = r*r - i*i + another.r;
float i1 = 2.0*i*r + another.i;

r = r1;
i = i1;

return (r1*r1+ i1*i1);
  }
};


int juliaFunction( int x, int y )
{

  complexClass a (x,y);

   complexClass c(-0.8, 0.156);

  int i = 0;

  for (i=0; i<200; i++) {
   if( a.squarePlusMag(c) > 1000)
  return 0;
  }

  return 1;
}


void kernel(  ){
  for (int x=0; x  cout<<" C++ code with dimension " << DIM <<" Total time: "<< 
delta << "[sec]\n";

}






//=D++ code, compiled with -O -release 
-inline=


#!/usr/bin/env rdmd
import std.stdio;
import std.datetime;
immutable int DIM= 4194304;


struct complexClass {
  float r;
  float i;

  float squarePlusMag(complexClass another)
  {
float r1 = r*r - i*i + another.r;
float i1 = 2.0*i*r + another.i;

r = r1;
i = i1;

return (r1*r1+ i1*i1);
  }
};


int juliaFunction( int x, int y )
{

  complexClass c = complexClass(0.8, 0.156);
  complexClass a= complexClass(x, y);


  for (int i=0; i<200; i++) {

if( a.squarePlusMag(c) > 1000)
  return 0;
  }
  return 1;
}


void kernel(  ){
  for (int x=0; x  writeln(" D code serial with dimension ", DIM ," Total time: ", 
(sw.peek().msecs/1000), "[sec]");

}

//
I will appreciate any help.