Re: D code optimization

2016-09-22 Thread Guillaume Piolat via Digitalmars-d-learn

Hi,

Interesting question, so I took your examples and made them do 
the same thing with regards to allocation (using malloc instead 
of new in both languages).

I removed the stopwatch to use "time" instead.
Now the programs should do the very same thing. Will they be as 
fast too?



D code:

 bench.d

import std.stdio, std.math;
import core.stdc.stdlib;
import core.stdc.stdio;

int main() {

double C=0.0;

for (int k=0;k<1;++k) { // iterate 1000x

double S0 = 100.0;
double r = 0.03;
double alpha = 0.07;
double sigma = 0.2;
double T = 1.0;
double strike = 100.0;
double S = 0.0;


const int n = 252;

double dt = T / n;
double R = exp(r*dt);

double u = exp(alpha*dt + sigma*sqrt(dt));
double d = exp(alpha*dt - sigma*sqrt(dt));

double qU = (R - d) / (R*(u - d));
double qD = (1 - R*qU) / R;

double* call = cast(double*)malloc(double.sizeof * (n+1));

for (int i = 0; i <= n; ++i)  call[i] = fmax(S0*pow(u, 
n-i)*pow(d, i)-strike, 0.0);


for (int i = n-1; i >= 0 ; --i) {
for (int j = 0; j <= i; ++j) {
call[j] = qU * call[j] + qD * call[j+1];
}
}

C = call[0];
}
printf("%f\n", C);

return 0;
}




C++ code


 bench.cpp

#include 
#include 
#include 

int main() {

double C=0.0;

for (int k=0;k<1;++k) { // iterate 1000x

double S0 = 100.0;
double r = 0.03;
double alpha = 0.07;
double sigma = 0.2;
double T = 1.0;
double strike = 100.0;
double S = 0.0;


const int n = 252;

double dt = T / n;
double R = exp(r*dt);

double u = exp(alpha*dt + sigma*sqrt(dt));
double d = exp(alpha*dt - sigma*sqrt(dt));

double qU = (R - d) / (R*(u - d));
double qD = (1 - R*qU) / R;

double* call = (double*)malloc(sizeof(double) * (n+1));

for (int i = 0; i <= n; ++i)  call[i] = fmax(S0*pow(u, 
n-i)*pow(d, i)-strike, 0.0);


for (int i = n-1; i >= 0 ; --i) {
for (int j = 0; j <= i; ++j) {
call[j] = qU * call[j] + qD * call[j+1];
}
}

C = call[0];
}
printf("%f\n", C);

return 0;
}




Here is the bench script:


 bench.sh

#!/bin/sh
ldc2 -O2 bench.d
clang++ -O2 bench.cpp -o bench-cpp;
time ./bench
time ./bench-cpp
time ./bench
time ./bench-cpp
time ./bench
time ./bench-cpp
time ./bench
time ./bench-cpp





Note that I use clang-703.0.31 that comes with Xcode 7.3 that is 
based on LLVM 3.8.0 from what I can gather.
Using ldc 1.0.0-b2 which is at LLVM 3.8.0 too! Maybe the backend 
is out of the equation.



The results at -O2 (minimum of 4 samples):

// C++
real0m0.484s
user0m0.466s
sys 0m0.011s

// D
real0m0.390s
user0m0.373s
sys 0m0.012s


Why is the D code 1.25x as fast as the C++ code if they do the 
same thing?

Well I don't know, I've not analyzed further.








Re: D code optimization

2016-09-22 Thread Jonathan Marler via Digitalmars-d-learn

On Thursday, 22 September 2016 at 16:09:49 UTC, Sandu wrote:

It is often being claimed that D is at least as fast as C++.
Now, I am fairly new to D. But, here is an example where I want 
to see how can this be made possible.


So far my C++ code compiles in ~850 ms.
While my D code runs in about 2.1 seconds.


Can you include the C++ source code, the C++ compiler command 
line, and the D compiler command line?





Re: D code optimization

2016-09-22 Thread thedeemon via Digitalmars-d-learn

On Thursday, 22 September 2016 at 16:09:49 UTC, Sandu wrote:

const int n = 252;
double[] call = new double[n+1];
...
	//delete call; // since D is has a garbage collector, 
explicit deallocation of arrays is not necessary.


If you care about speed, better uncomment that `delete`. Without 
delete, when allocating this array 1 times you'll trigger GC 
multiple times without good reason to do so. With delete, the 
same memory shall be reused and no GC triggered, run time should 
be much better.


Re: D code optimization

2016-09-22 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Sep 22, 2016 at 04:09:49PM +, Sandu via Digitalmars-d-learn wrote:
> It is often being claimed that D is at least as fast as C++.
> Now, I am fairly new to D. But, here is an example where I want to see
> how can this be made possible.
> 
> So far my C++ code compiles in ~850 ms.
> While my D code runs in about 2.1 seconds.
[...]

Which compiler are you using?

If you're looking for performance, you should use gdc or ldc, as they
have better optimizers. While dmd is the most up-to-date in terms of
language implementation, I've found that the code it generates
consistently performs about 20-30% slower than code generated by gdc
(sometimes even more, depending on what the program does).


T

-- 
Век живи - век учись. А дураком помрёшь.


Re: D code optimization

2016-09-22 Thread Brad Anderson via Digitalmars-d-learn

On Thursday, 22 September 2016 at 16:09:49 UTC, Sandu wrote:

It is often being claimed that D is at least as fast as C++.
Now, I am fairly new to D. But, here is an example where I want 
to see how can this be made possible.


So far my C++ code compiles in ~850 ms.
While my D code runs in about 2.1 seconds.

[snip]


Just a small tip that applies to both D and C++ in that code. You 
can use a static array rather than a dynamically allocated array 
in the loop (enum n = 252; then double[n+1] call; in D). You can 
also use "double[n+1] call = void;" to mimic C++'s behavior of 
uninitialized memory.


Use GDC or LDC when doing performance related work as they 
generate faster code typically. I'd be surprised if the C++ and D 
code asm wasn't nearly identical for a big chunk of this code 
when using GCC/GDC or Clang/LDC.


Re: D code optimization

2016-09-22 Thread Lodovico Giaretta via Digitalmars-d-learn

On Thursday, 22 September 2016 at 16:09:49 UTC, Sandu wrote:

It is often being claimed that D is at least as fast as C++.
Now, I am fairly new to D. But, here is an example where I want 
to see how can this be made possible.


So far my C++ code compiles in ~850 ms.


I assume you meant that it runs in that time.


While my D code runs in about 2.1 seconds.


Benchmarking C++ vs D is less trivial than it looks, for various 
reasons:

- compiler optimizations:
  - which compilers (both C++ and D) are you using? Are you aware 
of the differences in code optimization between DMD, GDC and LDC?

  - which flags are you passing to your C++ and D compilers?
  - your code is actually testing the compiler ability in loop 
unrolling, constant folding and operation hoisting
- code semantics: C++ and D, when they look similar, they usually 
produce the same results, but the often behave very differently 
internally:
  - in the posted code you allocate a lot of managed memory, 
putting a big burden on the garbage collector, which in C++ you 
don't do, because you talk directly to the C runtime


So it's difficult to extract useful data from this kind of 
benchmark.


D code optimization

2016-09-22 Thread Sandu via Digitalmars-d-learn

It is often being claimed that D is at least as fast as C++.
Now, I am fairly new to D. But, here is an example where I want 
to see how can this be made possible.


So far my C++ code compiles in ~850 ms.
While my D code runs in about 2.1 seconds.

The code translated in D looks as follows (can't see any attach 
button here):


import std.stdio, std.math;
import std.datetime;


int main() {

StopWatch sw;
sw.start();

double C=0.0;

for (int k=0;k<1;++k) { // iterate 1000x

double S0 = 100.0;
double r = 0.03;
double alpha = 0.07;
double sigma = 0.2;
double T = 1.0;
double strike = 100.0;
double S = 0.0;


const int n = 252;

double dt = T / n;
double R = exp(r*dt);

double u = exp(alpha*dt + sigma*sqrt(dt));
double d = exp(alpha*dt - sigma*sqrt(dt));

double qU = (R - d) / (R*(u - d));
double qD = (1 - R*qU) / R;


//double* call = new double [n + 1];
double[] call = new double[n+1];

		for (int i = 0; i <= n; ++i)  call[i] = fmax(S0*pow(u, 
n-i)*pow(d, i)-strike, 0.0);


for (int i = n-1; i >= 0 ; --i) {
for (int j = 0; j <= i; ++j) {
call[j] = qU * call[j] + qD * call[j+1];
}
}

C = call[0];

	//delete call; // since D is has a garbage collector, 
explicit deallocation of arrays is not necessary.

// nevertheless we do this
}

long exec_ms = sw.peek().msecs;

writeln("Option value: ",  C, " / execution time: ", exec_ms, 
" ms\n" );


return 0;
}