Re: Increasing D Compiler Speed by Over 75%
On 02.08.2013 00:36, Walter Bright wrote: I've now upgraded dmc so dmd builds can take advantage of improved code generation. http://www.digitalmars.com/download/freecompiler.html Although my laptop got quite a bit faster overnight (I guess it was throttled for some reason yesterday), relative results don't change: std.algorithm -main -unittest dmc85?: 12.5 sec dmc857: 12.5 sec msc: 7 sec BTW: I usually use VS2008, but now also tried VS2010 - no difference.
Re: Increasing D Compiler Speed by Over 75%
On 8/2/2013 12:57 AM, Rainer Schuetze wrote: http://www.digitalmars.com/download/freecompiler.html Although my laptop got quite a bit faster overnight (I guess it was throttled for some reason yesterday), relative results don't change: std.algorithm -main -unittest dmc85?: 12.5 sec dmc857: 12.5 sec msc: 7 sec BTW: I usually use VS2008, but now also tried VS2010 - no difference. The two dmc times shouldn't be the same. I see a definite improvement. Disassemble aav.obj, and look at the function aaGetRvalue. It should look like this: ?_aaGetRvalue@@YAPAXPAUAA@@PAX@Z: pushEBX mov EBX,0Ch[ESP] pushESI cmp dword ptr 0Ch[ESP],0 je L184 mov EAX,0Ch[ESP] mov ECX,4[EAX] cmp ECX,4 jne L139 mov ESI,EBX and ESI,3 jmp short L166 L139: cmp ECX,01Fh jne L15E note this section does not have a div instruction in it == mov EAX,EBX mov EDX,08421085h mov ECX,EBX mul EDX mov EAX,ECX sub EAX,EDX shr EAX,1 lea EDX,[EAX][EDX] shr EDX,4 imulEAX,EDX,01Fh sub ECX,EAX mov ESI,ECX == jmp short L166 L15E: mov EAX,EBX xor EDX,EDX div ECX mov ESI,EDX L166: mov ECX,0Ch[ESP] mov ECX,[ECX] mov EDX,[ESI*4][ECX] testEDX,EDX je L184 L173: cmp 4[EDX],EBX jne L17E mov EAX,8[EDX] pop ESI pop EBX ret L17E: mov EDX,[EDX] testEDX,EDX jne L173 L184: pop ESI xor EAX,EAX pop EBX ret
Re: Increasing D Compiler Speed by Over 75%
On 02.08.2013 10:24, Walter Bright wrote: On 8/2/2013 12:57 AM, Rainer Schuetze wrote: http://www.digitalmars.com/download/freecompiler.html Although my laptop got quite a bit faster overnight (I guess it was throttled for some reason yesterday), relative results don't change: std.algorithm -main -unittest dmc85?: 12.5 sec dmc857: 12.5 sec msc: 7 sec BTW: I usually use VS2008, but now also tried VS2010 - no difference. The two dmc times shouldn't be the same. I see a definite improvement. Disassemble aav.obj, and look at the function aaGetRvalue. It should look like this: My disassembly looks exactly the same. I don't think that a single div operation in a rather long function has a lot of impact on modern processors. I'm running an i7, according to the instruction tables by Agner Fog, the div has latency of 17-28 cycles and a reciprocal throughput of 7-17 cycles. If I estimate the latency of the asm snippet, I also get 16 cycles. And that doesn't take the additional tests and jumps into consideration. note this section does not have a div instruction in it == mov EAX,EBX mov EDX,08421085h ; latency 3 mov ECX,EBX mul EDX ; latency 5 mov EAX,ECX sub EAX,EDX ; latency 1 shr EAX,1 ; latency 1 lea EDX,[EAX][EDX] ; latency 1 shr EDX,4 ; latency 1 imulEAX,EDX,01Fh; latency 3 sub ECX,EAX ; latency 1 mov ESI,ECX ==
Re: Increasing D Compiler Speed by Over 75%
On 01/08/2013 00:32, Walter Bright wrote: Thanks for doing this, this is good information. On 7/31/2013 2:24 PM, Rainer Schuetze wrote: I have just tried yesterdays dmd to build Visual D (it builds some libraries and contains a few short non-compiling tasks in between): Debug build dmd_dmc: 23 sec, std new 43 sec Debug build dmd_msc: 19 sec, std new 20 sec That makes it clear that the dmc malloc() was the dominator, not code gen. It still appears that the DMC malloc is a big reason for the difference between DMC and MSVC builds when compiling the algorithm unit tests. (a very quick test suggests that changing the global new in rmem.c to call HeapAlloc instead of malloc gives a large speedup).
Re: Increasing D Compiler Speed by Over 75%
Rainer Schuetze r.sagita...@gmx.de wrote in message news:ktbvam$dvf$1...@digitalmars.com... large-address-aware). This shows that removing most of the allocations was a good optimization for the dmc-Runtime, but does not have a large, but still notable impact on a faster heap implementation (the VS runtime usually maps directly to the Windows API for non-Debug builds). I suspect the backend and the optimizer do not use new a lot, but plain malloc calls, so they still suffer from the slow runtime. On a related note, I just tried replacing the two ::malloc calls in rmem's operator new with VirtualAlloc and I get a reduction from 13 seconds to 9 seconds (compiling dmd std\range -unittest -main) with a release build of dmd.
Re: Increasing D Compiler Speed by Over 75%
On 8/2/2013 2:47 AM, Rainer Schuetze wrote: My disassembly looks exactly the same. I don't think that a single div operation in a rather long function has a lot of impact on modern processors. I'm running an i7, according to the instruction tables by Agner Fog, the div has latency of 17-28 cycles and a reciprocal throughput of 7-17 cycles. If I estimate the latency of the asm snippet, I also get 16 cycles. And that doesn't take the additional tests and jumps into consideration. I'm using an AMD FX-6100.
Re: Increasing D Compiler Speed by Over 75%
On 8/2/2013 8:18 AM, Daniel Murphy wrote: On a related note, I just tried replacing the two ::malloc calls in rmem's operator new with VirtualAlloc and I get a reduction from 13 seconds to 9 seconds (compiling dmd std\range -unittest -main) with a release build of dmd. Hmm, very interesting!
Re: Increasing D Compiler Speed by Over 75%
On 02.08.2013 18:37, Walter Bright wrote: On 8/2/2013 2:47 AM, Rainer Schuetze wrote: My disassembly looks exactly the same. I don't think that a single div operation in a rather long function has a lot of impact on modern processors. I'm running an i7, according to the instruction tables by Agner Fog, the div has latency of 17-28 cycles and a reciprocal throughput of 7-17 cycles. If I estimate the latency of the asm snippet, I also get 16 cycles. And that doesn't take the additional tests and jumps into consideration. I'm using an AMD FX-6100. This processor seems to do a little better with the mov reg,imm operation but otherwise is similar. The DIV operation has larger worst-case latency, though (16-48 cycles). Better to just use a power of 2 for the array sizes anyway...
Re: Increasing D Compiler Speed by Over 75%
On 8/2/2013 4:18 AM, Richard Webb wrote: It still appears that the DMC malloc is a big reason for the difference between DMC and MSVC builds when compiling the algorithm unit tests. (a very quick test suggests that changing the global new in rmem.c to call HeapAlloc instead of malloc gives a large speedup). Yes, I agree, the DMC malloc is clearly a large performance problem. I had not realized this.
Re: Increasing D Compiler Speed by Over 75%
02-Aug-2013 20:40, Walter Bright пишет: On 8/2/2013 8:18 AM, Daniel Murphy wrote: On a related note, I just tried replacing the two ::malloc calls in rmem's operator new with VirtualAlloc and I get a reduction from 13 seconds to 9 seconds (compiling dmd std\range -unittest -main) with a release build of dmd. Hmm, very interesting! Made a pull to provide an implementation of rmem.c on top of Win32 Heap API. https://github.com/D-Programming-Language/dmd/pull/2445 Also noting that global new/delete are not reentrant already, added NO_SERIALIZE flag to save on locking/unlocking of heap. For me this gets from 13 to 8 seconds. -- Dmitry Olshansky
Re: Increasing D Compiler Speed by Over 75%
On 01.08.2013 07:33, dennis luehring wrote: Am 31.07.2013 23:24, schrieb Rainer Schuetze: On 31.07.2013 09:00, Walter Bright wrote: On 7/30/2013 11:40 PM, dennis luehring wrote: currently the vc builded dmd is about 2 times faster in compiling That's an old number now. Someone want to try it with the current HEAD? I have just tried yesterdays dmd to build Visual D (it builds some libraries and contains a few short non-compiling tasks in between): can you also give us also timings for (dmd_dmc|dmd_msc) std\algorithm -unittest -main std.algorithm -unittest -main: dmd_dmc 20 sec, std new 61 sec dmd_msc 11 sec, std new 13 sec std.algorithm -unittest -main -O: dmd_dmc 27 sec, std new 68 sec dmd_msc 16 sec, std new 18 sec
Re: Increasing D Compiler Speed by Over 75%
Am 01.08.2013 08:16, schrieb Rainer Schuetze: On 01.08.2013 07:33, dennis luehring wrote: Am 31.07.2013 23:24, schrieb Rainer Schuetze: On 31.07.2013 09:00, Walter Bright wrote: On 7/30/2013 11:40 PM, dennis luehring wrote: currently the vc builded dmd is about 2 times faster in compiling That's an old number now. Someone want to try it with the current HEAD? I have just tried yesterdays dmd to build Visual D (it builds some libraries and contains a few short non-compiling tasks in between): can you also give us also timings for (dmd_dmc|dmd_msc) std\algorithm -unittest -main std.algorithm -unittest -main: dmd_dmc 20 sec, std new 61 sec dmd_msc 11 sec, std new 13 sec std.algorithm -unittest -main -O: dmd_dmc 27 sec, std new 68 sec dmd_msc 16 sec, std new 18 sec results from mingw, vs2012(13) and llvm-clang builds would be also very interesting, but i don't know if dmd can be build with mingw or clang out of the box under windows
Re: Increasing D Compiler Speed by Over 75%
On 7/30/2013 11:40 PM, dennis luehring wrote: currently the vc builded dmd is about 2 times faster in compiling That's an old number now. Someone want to try it with the current HEAD?
Re: Increasing D Compiler Speed by Over 75%
Am 31.07.2013 09:00, schrieb Walter Bright: On 7/30/2013 11:40 PM, dennis luehring wrote: currently the vc builded dmd is about 2 times faster in compiling That's an old number now. Someone want to try it with the current HEAD? tried to but failed downloaded dmd-master.zip (from github) downloaded dmd.2.063.2.zip buidl dmd-master with vs2010 copied the produces dmd_msc.exe to dmd.2.063.2\dmd2\windows\bin dmd.2.063.2\dmd2\src\phobos..\..\windows\bin\dmd.exe std\algorithm -unittest -main gives Error: cannot read file ûmain.d (what is this û in front of main.d?) dmd.2.063.2\dmd2\src\phobos..\..\windows\bin\dmd_msc.exe std\algorithm -unittest -main gives std\datetime.d(31979): Error: pure function 'std.datetime.enforceValid!hours.enforceValid' cannot call impure function 'core.time.TimeException.this' std\datetime.d(13556): Error: template instance std.datetime.enforceValid!hours error instantiating std\datetime.d(31984): Error: pure function 'std.datetime.enforceValid!minutes.enforceValid' cannot call impure function 'core.time.TimeException.this' std\datetime.d(13557): Error: template instance std.datetime.enforceValid!minutes error instantiating std\datetime.d(31989): Error: pure function 'std.datetime.enforceValid!seconds.enforceValid' cannot call impure function 'core.time.TimeException.this' std\datetime.d(13558): Error: template instance std.datetime.enforceValid!seconds error instantiating std\datetime.d(33284):called from here: (TimeOfDay __ctmp1990; , __ctmp1990).this(0, 0, 0) std\datetime.d(33293): Error: CTFE failed because of previous errors in this std\datetime.d(31974): Error: pure function 'std.datetime.enforceValid!months.enforceValid' cannot call impure function 'core.time.TimeException.this' std\datetime.d(8994): Error: template instance std.datetime.enforceValid!months error instantiating std\datetime.d(32012): Error: pure function 'std.datetime.enforceValid!days.enforceValid' cannot call impure function 'core.time.TimeException.this' std\datetime.d(8995): Error: template instance std.datetime.enforceValid!days error instantiating std\datetime.d(33389):called from here: (Date __ctmp1999; , __ctmp1999).this(-3760, 9, 7) std\datetime.d(33458): Error: CTFE failed because of previous errors in this Error: undefined identifier '_xopCmp' and a compiler crash my former benchmark where done the same way and it worked without any problems - this master seems to have problems
Re: Increasing D Compiler Speed by Over 75%
On 31.07.2013 09:00, Walter Bright wrote: On 7/30/2013 11:40 PM, dennis luehring wrote: currently the vc builded dmd is about 2 times faster in compiling That's an old number now. Someone want to try it with the current HEAD? I have just tried yesterdays dmd to build Visual D (it builds some libraries and contains a few short non-compiling tasks in between): Debug build dmd_dmc: 23 sec, std new 43 sec Debug build dmd_msc: 19 sec, std new 20 sec std new is the version without the block allocator. Release build dmd_dmc: 3 min 30, std new 5 min 25 Release build dmd_msc: 1 min 32, std new 1 min 40 The release builds use -release -O -inline and need a bit more than 1 GB memory for two of the libraries (I still had to patch dmd_dmc to be large-address-aware). This shows that removing most of the allocations was a good optimization for the dmc-Runtime, but does not have a large, but still notable impact on a faster heap implementation (the VS runtime usually maps directly to the Windows API for non-Debug builds). I suspect the backend and the optimizer do not use new a lot, but plain malloc calls, so they still suffer from the slow runtime.
Re: Increasing D Compiler Speed by Over 75%
Thanks for doing this, this is good information. On 7/31/2013 2:24 PM, Rainer Schuetze wrote: I have just tried yesterdays dmd to build Visual D (it builds some libraries and contains a few short non-compiling tasks in between): Debug build dmd_dmc: 23 sec, std new 43 sec Debug build dmd_msc: 19 sec, std new 20 sec That makes it clear that the dmc malloc() was the dominator, not code gen. std new is the version without the block allocator. Release build dmd_dmc: 3 min 30, std new 5 min 25 Release build dmd_msc: 1 min 32, std new 1 min 40 The release builds use -release -O -inline and need a bit more than 1 GB memory for two of the libraries (I still had to patch dmd_dmc to be large-address-aware). This shows that removing most of the allocations was a good optimization for the dmc-Runtime, but does not have a large, but still notable impact on a faster heap implementation (the VS runtime usually maps directly to the Windows API for non-Debug builds). I suspect the backend and the optimizer do not use new a lot, but plain malloc calls, so they still suffer from the slow runtime. Actually, dmc still should give a better showing. All the optimizations I've put into dmd also went into dmc, and do result in significantly better code speed. For example, the hash modulus optimization has a significant impact, but I haven't released that dmc yet. Optimized builds have an entirely different profile than debug builds, and I haven't investigated that.
Re: Increasing D Compiler Speed by Over 75%
Am 31.07.2013 23:24, schrieb Rainer Schuetze: On 31.07.2013 09:00, Walter Bright wrote: On 7/30/2013 11:40 PM, dennis luehring wrote: currently the vc builded dmd is about 2 times faster in compiling That's an old number now. Someone want to try it with the current HEAD? I have just tried yesterdays dmd to build Visual D (it builds some libraries and contains a few short non-compiling tasks in between): can you also give us also timings for (dmd_dmc|dmd_msc) std\algorithm -unittest -main
Re: Increasing D Compiler Speed by Over 75%
DMC is ugly compiler. It will be much nicer if you'll use mingw for that purpose on Windows. GCC usually generates more faster code that VC does. http://sourceforge.net/projects/mingwbuilds/
Re: Increasing D Compiler Speed by Over 75%
On Tuesday, 30 July 2013 at 09:04:10 UTC, Temtaime wrote: DMC is ugly compiler. It will be much nicer if you'll use mingw for that purpose on Windows. GCC usually generates more faster code that VC does. http://sourceforge.net/projects/mingwbuilds/ I'm willing to bet Walter would accept pull requests to add support for mingw like he did with VC. Be sure to document the build process when you make the changes. Sidenote: Insulting Walter's work isn't a great way to get him to do your a favor.
Re: Increasing D Compiler Speed by Over 75%
On 7/30/2013 11:16 AM, Brad Anderson wrote: Sidenote: Insulting Walter's work isn't a great way to get him to do your a favor. I'm sad that I never got the opportunity to be insulted by Jobs.
Re: Increasing D Compiler Speed by Over 75%
Am 25.07.2013 20:03, schrieb Walter Bright: http://www.reddit.com/r/programming/comments/1j1i30/increasing_the_d_compiler_speed_by_over_75/ do you compare dmc based and visualc based dmd builds? the vc dmd build seems to be always two times faster - how does that look with your optimization?
Re: Increasing D Compiler Speed by Over 75%
On Thursday, 25 July 2013 at 18:03:22 UTC, Walter Bright wrote: http://www.reddit.com/r/programming/comments/1j1i30/increasing_the_d_compiler_speed_by_over_75/ I just reported this compile speed killer: http://d.puremagic.com/issues/show_bug.cgi?id=10716 It has a big impact on some of the tests in the DMD test suite. It might also be responsible for a significant part of the compilation time of Phobos, since array literals tend to be widely used inside unittest functions.
Re: Increasing D Compiler Speed by Over 75%
On 7/26/2013 1:25 AM, dennis luehring wrote: do you compare dmc based and visualc based dmd builds? the vc dmd build seems to be always two times faster - how does that look with your optimization? It would be most interesting to see just what it was that made the vc build faster. But that won't help on Linux/FreeBSD/OSX.
Re: Increasing D Compiler Speed by Over 75%
On Thu, 25 Jul 2013 20:04:10 +0200 Brad Anderson e...@gnuk.net wrote: On Thursday, 25 July 2013 at 18:03:22 UTC, Walter Bright wrote: http://www.reddit.com/r/programming/comments/1j1i30/increasing_the_d_compiler_speed_by_over_75/ I propose we always refer to compiling as doing the nasty from this moment forward. Yea, that's just absolutely classic :)