Re: drastic slowdown for copies

2015-05-29 Thread Momo via Digitalmars-d-learn

On Friday, 29 May 2015 at 07:51:31 UTC, thedeemon wrote:

On Thursday, 28 May 2015 at 21:23:11 UTC, Momo wrote:

Ah, actually it's more complicated, as it depends on inlining a 
lot.
Yes. And real functions are more complex and inlining is no 
reliable option.
Indeed, without -O and -inline I was able to get by_ref to be 
slightly slower than by_copy for struct of 4 ints. But when 
inlining turns on, the numbers change in different directions. 
And for 5 ints inlining influence is quite different:


4 ints: 5 ints:
-release
by ref: 53  by ref: 53
by copy: 57 by copy: 137
by move: 54 by move: 137

-release -O
by ref: 38  by ref: 34
by copy: 54 by copy: 137
by move: 49 by move: 137

-release -O -inline
by ref: 15  by ref: 20
by copy: 72 by copy: 91
by move: 72 by move: 91


So as you can see, it is 2-3 times slower. Is there an 
alternative?


Re: drastic slowdown for copies

2015-05-29 Thread thedeemon via Digitalmars-d-learn

On Friday, 29 May 2015 at 07:51:31 UTC, thedeemon wrote:

Above was on Core 2 Quad,

here's for Core i3:

4 ints  5 ints
-release
by ref: 67  by ref: 66
by copy: 44 by copy: 142
by move: 45 by move: 137

-release -O
by ref: 29  by ref: 29
by copy: 41 by copy: 141
by move: 40 by move: 142

-release -O -inline
by ref: 16  by ref: 20
by copy: 83 by copy: 104
by move: 83 by move: 104


Re: drastic slowdown for copies

2015-05-29 Thread thedeemon via Digitalmars-d-learn

On Thursday, 28 May 2015 at 21:23:11 UTC, Momo wrote:

Ah, actually it's more complicated, as it depends on inlining a 
lot.
Indeed, without -O and -inline I was able to get by_ref to be 
slightly slower than by_copy for struct of 4 ints. But when 
inlining turns on, the numbers change in different directions. 
And for 5 ints inlining influence is quite different:


4 ints: 5 ints:
-release
by ref: 53  by ref: 53
by copy: 57 by copy: 137
by move: 54 by move: 137

-release -O
by ref: 38  by ref: 34
by copy: 54 by copy: 137
by move: 49 by move: 137

-release -O -inline
by ref: 15  by ref: 20
by copy: 72 by copy: 91
by move: 72 by move: 91



Re: drastic slowdown for copies

2015-05-29 Thread thedeemon via Digitalmars-d-learn

On Thursday, 28 May 2015 at 21:23:11 UTC, Momo wrote:
I'm currently investigating the difference of speed between 
references and copies. And it seems that copies got a immense 
slowdown if they reach a size of = 20 bytes.


This is processor-specific, on different models of CPUs you might 
get different results. Here's what I see running your program 
with 4 and 5 ints in the struct:


C:\prog\Ddmd copyref.d -ofcopyref.exe -release -O -inline
16u

C:\prog\Dcopyref.exe
by ref: 18
by copy: 85
by move: 84

C:\prog\Dcopyref.exe
by ref: 18
by copy: 72
by move: 72

C:\prog\Dcopyref.exe
by ref: 16
by copy: 72
by move: 72

C:\prog\Ddmd copyref.d -ofcopyref.exe -release -O -inline
20u

C:\prog\Dcopyref.exe
by ref: 23
by copy: 98
by move: 91

C:\prog\Dcopyref.exe
by ref: 20
by copy: 91
by move: 102

C:\prog\Dcopyref.exe
by ref: 23
by copy: 91
by move: 91

I see these digits on an old Core 2 Quad and very similar on a 
Core i3. So your findings are not reproducible.


Re: drastic slowdown for copies

2015-05-29 Thread Momo via Digitalmars-d-learn

Perhaps you can give me another detailed answer.
I get a slowdown for all parts (ref, copy and move) if I use 
uninitialized floats. I got these results from the following code:


by ref:  2369
by copy: 2335
by move: 2341

Code:

struct vec2f {
float x;
float y;
}

But if I assign 0 to them I got these results:

by ref:  49
by copy: 22
by move: 25

Why?


Re: drastic slowdown for copies

2015-05-29 Thread Ali Çehreli via Digitalmars-d-learn

On 05/29/2015 06:55 AM, Momo wrote:

Perhaps you can give me another detailed answer.
I get a slowdown for all parts (ref, copy and move) if I use
uninitialized floats.


Floating point variables are initialized to .nan of their types (e.g. 
float.nan). Apparently, the CPU is slow when using those special values:



http://stackoverflow.com/questions/3606054/how-slow-is-nan-arithmetic-in-the-intel-x64-fpu

Ali



Re: drastic slowdown for copies

2015-05-28 Thread Adam D. Ruppe via Digitalmars-d-learn
16 bytes is 64 bit - the same size as a reference. So copying it 
is overall a bit less work - sending a 64 bit struct is as small 
as a 64 bit reference and you don't go through the pointer.


So up to them, it is a bit faster.


Add another byte and now the copy is too big to fit in a 
register, so it needs to spill over into somewhere else which 
means a bunch more work for the cpu.


Re: drastic slowdown for copies

2015-05-28 Thread Timon Gehr via Digitalmars-d-learn

On 05/28/2015 11:27 PM, Adam D. Ruppe wrote:

16 bytes is 64 bit


It's actually 128 bits.


Re: drastic slowdown for copies

2015-05-28 Thread Momo via Digitalmars-d-learn

On Thursday, 28 May 2015 at 21:27:42 UTC, Adam D. Ruppe wrote:
16 bytes is 64 bit - the same size as a reference. So copying 
it is overall a bit less work - sending a 64 bit struct is as 
small as a 64 bit reference and you don't go through the 
pointer.


So up to them, it is a bit faster.


Add another byte and now the copy is too big to fit in a 
register, so it needs to spill over into somewhere else which 
means a bunch more work for the cpu.


But even in release mode (and with optimizations turned on) it is 
 3 times slower. Can I somehow enforce references, like in C++? 
I tried already in ref, const ref and immutable ref, nothing 
works.