Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Rainer Schuetze



On 02.08.2013 00:36, Walter Bright wrote:

I've now upgraded dmc so dmd builds can take advantage of improved code
generation.

http://www.digitalmars.com/download/freecompiler.html


Although my laptop got quite a bit faster overnight (I guess it was 
throttled for some reason yesterday), relative results don't change:


std.algorithm -main -unittest

dmc85?: 12.5 sec
dmc857: 12.5 sec
msc: 7 sec

BTW: I usually use VS2008, but now also tried VS2010 - no difference.


Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Walter Bright

On 8/2/2013 12:57 AM, Rainer Schuetze wrote:

http://www.digitalmars.com/download/freecompiler.html


Although my laptop got quite a bit faster overnight (I guess it was throttled
for some reason yesterday), relative results don't change:

std.algorithm -main -unittest

dmc85?: 12.5 sec
dmc857: 12.5 sec
msc: 7 sec

BTW: I usually use VS2008, but now also tried VS2010 - no difference.


The two dmc times shouldn't be the same. I see a definite improvement. 
Disassemble aav.obj, and look at the function aaGetRvalue. It should look like this:


?_aaGetRvalue@@YAPAXPAUAA@@PAX@Z:
pushEBX
mov EBX,0Ch[ESP]
pushESI
cmp dword ptr 0Ch[ESP],0
je  L184
mov EAX,0Ch[ESP]
mov ECX,4[EAX]
cmp ECX,4
jne L139
mov ESI,EBX
and ESI,3
jmp short   L166
L139:   cmp ECX,01Fh
jne L15E
 note this section does not have a div instruction in it ==
mov EAX,EBX
mov EDX,08421085h
mov ECX,EBX
mul EDX
mov EAX,ECX
sub EAX,EDX
shr EAX,1
lea EDX,[EAX][EDX]
shr EDX,4
imulEAX,EDX,01Fh
sub ECX,EAX
mov ESI,ECX
==
jmp short   L166
L15E:   mov EAX,EBX
xor EDX,EDX
div ECX
mov ESI,EDX
L166:   mov ECX,0Ch[ESP]
mov ECX,[ECX]
mov EDX,[ESI*4][ECX]
testEDX,EDX
je  L184
L173:   cmp 4[EDX],EBX
jne L17E
mov EAX,8[EDX]
pop ESI
pop EBX
ret
L17E:   mov EDX,[EDX]
testEDX,EDX
jne L173
L184:   pop ESI
xor EAX,EAX
pop EBX
ret


Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Rainer Schuetze



On 02.08.2013 10:24, Walter Bright wrote:

On 8/2/2013 12:57 AM, Rainer Schuetze wrote:

http://www.digitalmars.com/download/freecompiler.html


Although my laptop got quite a bit faster overnight (I guess it was
throttled
for some reason yesterday), relative results don't change:

std.algorithm -main -unittest

dmc85?: 12.5 sec
dmc857: 12.5 sec
msc: 7 sec

BTW: I usually use VS2008, but now also tried VS2010 - no difference.


The two dmc times shouldn't be the same. I see a definite improvement.
Disassemble aav.obj, and look at the function aaGetRvalue. It should
look like this:


My disassembly looks exactly the same. I don't think that a single div 
operation in a rather long function has a lot of impact on modern 
processors. I'm running an i7, according to the instruction tables by 
Agner Fog, the div has latency of 17-28 cycles and a reciprocal 
throughput of 7-17 cycles. If I estimate the latency of the asm snippet, 
I also get 16 cycles. And that doesn't take the additional tests and 
jumps into consideration.


 note this section does not have a div instruction in it 
==

mov EAX,EBX
mov EDX,08421085h   ; latency 3
mov ECX,EBX
mul EDX ; latency 5
mov EAX,ECX
sub EAX,EDX ; latency 1
shr EAX,1   ; latency 1
lea EDX,[EAX][EDX]  ; latency 1
shr EDX,4   ; latency 1
imulEAX,EDX,01Fh; latency 3
sub ECX,EAX ; latency 1
mov ESI,ECX
==



Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Richard Webb

On 01/08/2013 00:32, Walter Bright wrote:

Thanks for doing this, this is good information.

On 7/31/2013 2:24 PM, Rainer Schuetze wrote:

I have just tried yesterdays dmd to build Visual D (it builds some
libraries and
contains a few short non-compiling tasks in between):

Debug build dmd_dmc: 23 sec, std new 43 sec
Debug build dmd_msc: 19 sec, std new 20 sec


That makes it clear that the dmc malloc() was the dominator, not code gen.




It still appears that the DMC malloc is a big reason for the difference 
between DMC and MSVC builds when compiling the algorithm unit tests. (a 
very quick test suggests that changing the global new in rmem.c to call 
HeapAlloc instead of malloc gives a large speedup).




Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Daniel Murphy
Rainer Schuetze r.sagita...@gmx.de wrote in message 
news:ktbvam$dvf$1...@digitalmars.com...
large-address-aware).

 This shows that removing most of the allocations was a good optimization 
 for the dmc-Runtime, but does not have a large, but still notable impact 
 on a faster heap implementation (the VS runtime usually maps directly to 
 the Windows API for non-Debug builds). I suspect the backend and the 
 optimizer do not use new a lot, but plain malloc calls, so they still 
 suffer from the slow runtime.

On a related note, I just tried replacing the two ::malloc calls in rmem's 
operator new with VirtualAlloc and I get a reduction from 13 seconds to 9 
seconds (compiling dmd std\range -unittest -main) with a release build of 
dmd. 




Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Walter Bright

On 8/2/2013 2:47 AM, Rainer Schuetze wrote:

My disassembly looks exactly the same. I don't think that a single div operation
in a rather long function has a lot of impact on modern processors. I'm running
an i7, according to the instruction tables by Agner Fog, the div has latency of
17-28 cycles and a reciprocal throughput of 7-17 cycles. If I estimate the
latency of the asm snippet, I also get 16 cycles. And that doesn't take the
additional tests and jumps into consideration.



I'm using an AMD FX-6100.



Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Walter Bright

On 8/2/2013 8:18 AM, Daniel Murphy wrote:

On a related note, I just tried replacing the two ::malloc calls in rmem's
operator new with VirtualAlloc and I get a reduction from 13 seconds to 9
seconds (compiling dmd std\range -unittest -main) with a release build of
dmd.


Hmm, very interesting!



Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Rainer Schuetze



On 02.08.2013 18:37, Walter Bright wrote:

On 8/2/2013 2:47 AM, Rainer Schuetze wrote:

My disassembly looks exactly the same. I don't think that a single div
operation
in a rather long function has a lot of impact on modern processors.
I'm running
an i7, according to the instruction tables by Agner Fog, the div has
latency of
17-28 cycles and a reciprocal throughput of 7-17 cycles. If I estimate
the
latency of the asm snippet, I also get 16 cycles. And that doesn't
take the
additional tests and jumps into consideration.



I'm using an AMD FX-6100.



This processor seems to do a little better with the mov reg,imm 
operation but otherwise is similar. The DIV operation has larger 
worst-case latency, though (16-48 cycles).


Better to just use a power of 2 for the array sizes anyway...


Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Walter Bright

On 8/2/2013 4:18 AM, Richard Webb wrote:

It still appears that the DMC malloc is a big reason for the difference between
DMC and MSVC builds when compiling the algorithm unit tests. (a very quick test
suggests that changing the global new in rmem.c to call HeapAlloc instead of
malloc gives a large speedup).



Yes, I agree, the DMC malloc is clearly a large performance problem. I had not 
realized this.




Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Dmitry Olshansky

02-Aug-2013 20:40, Walter Bright пишет:

On 8/2/2013 8:18 AM, Daniel Murphy wrote:

On a related note, I just tried replacing the two ::malloc calls in
rmem's
operator new with VirtualAlloc and I get a reduction from 13 seconds to 9
seconds (compiling dmd std\range -unittest -main) with a release
build of
dmd.


Hmm, very interesting!



Made a pull to provide an implementation of rmem.c on top of Win32 Heap API.
https://github.com/D-Programming-Language/dmd/pull/2445

Also noting that global new/delete are not reentrant already, added 
NO_SERIALIZE flag to save on locking/unlocking of heap.


For me this gets from 13 to 8 seconds.

--
Dmitry Olshansky


Re: Increasing D Compiler Speed by Over 75%

2013-08-01 Thread Rainer Schuetze



On 01.08.2013 07:33, dennis luehring wrote:

Am 31.07.2013 23:24, schrieb Rainer Schuetze:



On 31.07.2013 09:00, Walter Bright wrote:

On 7/30/2013 11:40 PM, dennis luehring wrote:

currently the vc builded dmd is about 2 times faster in compiling


That's an old number now. Someone want to try it with the current HEAD?



I have just tried yesterdays dmd to build Visual D (it builds some
libraries and contains a few short non-compiling tasks in between):


can you also give us also timings for

(dmd_dmc|dmd_msc) std\algorithm -unittest -main




std.algorithm -unittest -main:

dmd_dmc 20 sec, std new 61 sec
dmd_msc 11 sec, std new 13 sec

std.algorithm -unittest -main -O:

dmd_dmc 27 sec, std new 68 sec
dmd_msc 16 sec, std new 18 sec


Re: Increasing D Compiler Speed by Over 75%

2013-08-01 Thread dennis luehring

Am 01.08.2013 08:16, schrieb Rainer Schuetze:



On 01.08.2013 07:33, dennis luehring wrote:

Am 31.07.2013 23:24, schrieb Rainer Schuetze:



On 31.07.2013 09:00, Walter Bright wrote:

On 7/30/2013 11:40 PM, dennis luehring wrote:

currently the vc builded dmd is about 2 times faster in compiling


That's an old number now. Someone want to try it with the current HEAD?



I have just tried yesterdays dmd to build Visual D (it builds some
libraries and contains a few short non-compiling tasks in between):


can you also give us also timings for

(dmd_dmc|dmd_msc) std\algorithm -unittest -main




std.algorithm -unittest -main:

dmd_dmc 20 sec, std new 61 sec
dmd_msc 11 sec, std new 13 sec

std.algorithm -unittest -main -O:

dmd_dmc 27 sec, std new 68 sec
dmd_msc 16 sec, std new 18 sec



results from mingw, vs2012(13) and llvm-clang builds would be also very 
interesting, but i don't know if dmd can be build with mingw or clang 
out of the box under windows


Re: Increasing D Compiler Speed by Over 75%

2013-07-31 Thread Walter Bright

On 7/30/2013 11:40 PM, dennis luehring wrote:

currently the vc builded dmd is about 2 times faster in compiling


That's an old number now. Someone want to try it with the current HEAD?



Re: Increasing D Compiler Speed by Over 75%

2013-07-31 Thread dennis luehring

Am 31.07.2013 09:00, schrieb Walter Bright:

On 7/30/2013 11:40 PM, dennis luehring wrote:

currently the vc builded dmd is about 2 times faster in compiling


That's an old number now. Someone want to try it with the current HEAD?



tried to but failed

downloaded dmd-master.zip (from github)
downloaded dmd.2.063.2.zip

buidl dmd-master with vs2010
copied the produces dmd_msc.exe to dmd.2.063.2\dmd2\windows\bin

dmd.2.063.2\dmd2\src\phobos..\..\windows\bin\dmd.exe std\algorithm 
-unittest -main


gives

Error: cannot read file ûmain.d (what is this û in front of main.d?)

dmd.2.063.2\dmd2\src\phobos..\..\windows\bin\dmd_msc.exe std\algorithm 
-unittest -main


gives

std\datetime.d(31979): Error: pure function 
'std.datetime.enforceValid!hours.enforceValid' cannot call impure 
function 'core.time.TimeException.this'
std\datetime.d(13556): Error: template instance 
std.datetime.enforceValid!hours error instantiating
std\datetime.d(31984): Error: pure function 
'std.datetime.enforceValid!minutes.enforceValid' cannot call impure 
function 'core.time.TimeException.this'
std\datetime.d(13557): Error: template instance 
std.datetime.enforceValid!minutes error instantiating
std\datetime.d(31989): Error: pure function 
'std.datetime.enforceValid!seconds.enforceValid' cannot call impure 
function 'core.time.TimeException.this'
std\datetime.d(13558): Error: template instance 
std.datetime.enforceValid!seconds error instantiating

std\datetime.d(33284):called from here: (TimeOfDay __ctmp1990;
 , __ctmp1990).this(0, 0, 0)
std\datetime.d(33293): Error: CTFE failed because of previous errors in this
std\datetime.d(31974): Error: pure function 
'std.datetime.enforceValid!months.enforceValid' cannot call impure 
function 'core.time.TimeException.this'
std\datetime.d(8994): Error: template instance 
std.datetime.enforceValid!months error instantiating
std\datetime.d(32012): Error: pure function 
'std.datetime.enforceValid!days.enforceValid' cannot call impure 
function 'core.time.TimeException.this'
std\datetime.d(8995): Error: template instance 
std.datetime.enforceValid!days error instantiating

std\datetime.d(33389):called from here: (Date __ctmp1999;
 , __ctmp1999).this(-3760, 9, 7)
std\datetime.d(33458): Error: CTFE failed because of previous errors in this
Error: undefined identifier '_xopCmp'

and a compiler crash


my former benchmark where done the same way and it worked without any 
problems - this master seems to have problems







Re: Increasing D Compiler Speed by Over 75%

2013-07-31 Thread Rainer Schuetze



On 31.07.2013 09:00, Walter Bright wrote:

On 7/30/2013 11:40 PM, dennis luehring wrote:

currently the vc builded dmd is about 2 times faster in compiling


That's an old number now. Someone want to try it with the current HEAD?



I have just tried yesterdays dmd to build Visual D (it builds some 
libraries and contains a few short non-compiling tasks in between):


Debug build dmd_dmc: 23 sec, std new 43 sec
Debug build dmd_msc: 19 sec, std new 20 sec

std new is the version without the block allocator.

Release build dmd_dmc: 3 min 30, std new 5 min 25
Release build dmd_msc: 1 min 32, std new 1 min 40

The release builds use -release -O -inline and need a bit more than 1 
GB memory for two of the libraries (I still had to patch dmd_dmc to be 
large-address-aware).


This shows that removing most of the allocations was a good optimization 
for the dmc-Runtime, but does not have a large, but still notable impact 
on a faster heap implementation (the VS runtime usually maps directly to 
the Windows API for non-Debug builds). I suspect the backend and the 
optimizer do not use new a lot, but plain malloc calls, so they 
still suffer from the slow runtime.


Re: Increasing D Compiler Speed by Over 75%

2013-07-31 Thread Walter Bright

Thanks for doing this, this is good information.

On 7/31/2013 2:24 PM, Rainer Schuetze wrote:

I have just tried yesterdays dmd to build Visual D (it builds some libraries and
contains a few short non-compiling tasks in between):

Debug build dmd_dmc: 23 sec, std new 43 sec
Debug build dmd_msc: 19 sec, std new 20 sec


That makes it clear that the dmc malloc() was the dominator, not code gen.


std new is the version without the block allocator.

Release build dmd_dmc: 3 min 30, std new 5 min 25
Release build dmd_msc: 1 min 32, std new 1 min 40

The release builds use -release -O -inline and need a bit more than 1 GB
memory for two of the libraries (I still had to patch dmd_dmc to be
large-address-aware).

This shows that removing most of the allocations was a good optimization for the
dmc-Runtime, but does not have a large, but still notable impact on a faster
heap implementation (the VS runtime usually maps directly to the Windows API for
non-Debug builds). I suspect the backend and the optimizer do not use new a
lot, but plain malloc calls, so they still suffer from the slow runtime.


Actually, dmc still should give a better showing. All the optimizations I've put 
into dmd also went into dmc, and do result in significantly better code speed. 
For example, the hash modulus optimization has a significant impact, but I 
haven't released that dmc yet.


Optimized builds have an entirely different profile than debug builds, and I 
haven't investigated that.




Re: Increasing D Compiler Speed by Over 75%

2013-07-31 Thread dennis luehring

Am 31.07.2013 23:24, schrieb Rainer Schuetze:



On 31.07.2013 09:00, Walter Bright wrote:

On 7/30/2013 11:40 PM, dennis luehring wrote:

currently the vc builded dmd is about 2 times faster in compiling


That's an old number now. Someone want to try it with the current HEAD?



I have just tried yesterdays dmd to build Visual D (it builds some
libraries and contains a few short non-compiling tasks in between):


can you also give us also timings for

(dmd_dmc|dmd_msc) std\algorithm -unittest -main




Re: Increasing D Compiler Speed by Over 75%

2013-07-30 Thread Temtaime

DMC is ugly compiler.
It will be much nicer if you'll use mingw for that purpose on 
Windows. GCC usually generates more faster code that VC does.

http://sourceforge.net/projects/mingwbuilds/



Re: Increasing D Compiler Speed by Over 75%

2013-07-30 Thread Brad Anderson

On Tuesday, 30 July 2013 at 09:04:10 UTC, Temtaime wrote:

DMC is ugly compiler.
It will be much nicer if you'll use mingw for that purpose on 
Windows. GCC usually generates more faster code that VC does.

http://sourceforge.net/projects/mingwbuilds/


I'm willing to bet Walter would accept pull requests to add 
support for mingw like he did with VC.  Be sure to document the 
build process when you make the changes.


Sidenote: Insulting Walter's work isn't a great way to get him to 
do your a favor.


Re: Increasing D Compiler Speed by Over 75%

2013-07-30 Thread Walter Bright

On 7/30/2013 11:16 AM, Brad Anderson wrote:

Sidenote: Insulting Walter's work isn't a great way to get him to do your a 
favor.


I'm sad that I never got the opportunity to be insulted by Jobs.


Re: Increasing D Compiler Speed by Over 75%

2013-07-26 Thread dennis luehring

Am 25.07.2013 20:03, schrieb Walter Bright:

http://www.reddit.com/r/programming/comments/1j1i30/increasing_the_d_compiler_speed_by_over_75/



do you compare dmc based and visualc based dmd builds?
the vc dmd build seems to be always two times faster - how does that 
look with your optimization?


Re: Increasing D Compiler Speed by Over 75%

2013-07-26 Thread Don

On Thursday, 25 July 2013 at 18:03:22 UTC, Walter Bright wrote:

http://www.reddit.com/r/programming/comments/1j1i30/increasing_the_d_compiler_speed_by_over_75/



I just reported this compile speed killer:
http://d.puremagic.com/issues/show_bug.cgi?id=10716

It has a big impact on some of the tests in the DMD test suite. 
It might also be responsible for a significant part of the 
compilation time of Phobos, since array literals tend to be 
widely used inside unittest functions.


Re: Increasing D Compiler Speed by Over 75%

2013-07-26 Thread Walter Bright

On 7/26/2013 1:25 AM, dennis luehring wrote:

do you compare dmc based and visualc based dmd builds?
the vc dmd build seems to be always two times faster - how does that look with
your optimization?


It would be most interesting to see just what it was that made the vc build 
faster.

But that won't help on Linux/FreeBSD/OSX.


Re: Increasing D Compiler Speed by Over 75%

2013-07-25 Thread Nick Sabalausky
On Thu, 25 Jul 2013 20:04:10 +0200
Brad Anderson e...@gnuk.net wrote:

 On Thursday, 25 July 2013 at 18:03:22 UTC, Walter Bright wrote:
  http://www.reddit.com/r/programming/comments/1j1i30/increasing_the_d_compiler_speed_by_over_75/
 
 I propose we always refer to compiling as doing the nasty from 
 this moment forward.

Yea, that's just absolutely classic :)