Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Rainer Schuetze



On 02.08.2013 00:36, Walter Bright wrote:

I've now upgraded dmc so dmd builds can take advantage of improved code
generation.

http://www.digitalmars.com/download/freecompiler.html


Although my laptop got quite a bit faster overnight (I guess it was 
throttled for some reason yesterday), relative results don't change:


std.algorithm -main -unittest

dmc85?: 12.5 sec
dmc857: 12.5 sec
msc: 7 sec

BTW: I usually use VS2008, but now also tried VS2010 - no difference.


Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Walter Bright

On 8/2/2013 12:57 AM, Rainer Schuetze wrote:

http://www.digitalmars.com/download/freecompiler.html


Although my laptop got quite a bit faster overnight (I guess it was throttled
for some reason yesterday), relative results don't change:

std.algorithm -main -unittest

dmc85?: 12.5 sec
dmc857: 12.5 sec
msc: 7 sec

BTW: I usually use VS2008, but now also tried VS2010 - no difference.


The two dmc times shouldn't be the same. I see a definite improvement. 
Disassemble aav.obj, and look at the function aaGetRvalue. It should look like this:


?_aaGetRvalue@@YAPAXPAUAA@@PAX@Z:
pushEBX
mov EBX,0Ch[ESP]
pushESI
cmp dword ptr 0Ch[ESP],0
je  L184
mov EAX,0Ch[ESP]
mov ECX,4[EAX]
cmp ECX,4
jne L139
mov ESI,EBX
and ESI,3
jmp short   L166
L139:   cmp ECX,01Fh
jne L15E
 note this section does not have a div instruction in it ==
mov EAX,EBX
mov EDX,08421085h
mov ECX,EBX
mul EDX
mov EAX,ECX
sub EAX,EDX
shr EAX,1
lea EDX,[EAX][EDX]
shr EDX,4
imulEAX,EDX,01Fh
sub ECX,EAX
mov ESI,ECX
==
jmp short   L166
L15E:   mov EAX,EBX
xor EDX,EDX
div ECX
mov ESI,EDX
L166:   mov ECX,0Ch[ESP]
mov ECX,[ECX]
mov EDX,[ESI*4][ECX]
testEDX,EDX
je  L184
L173:   cmp 4[EDX],EBX
jne L17E
mov EAX,8[EDX]
pop ESI
pop EBX
ret
L17E:   mov EDX,[EDX]
testEDX,EDX
jne L173
L184:   pop ESI
xor EAX,EAX
pop EBX
ret


Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Rainer Schuetze



On 02.08.2013 10:24, Walter Bright wrote:

On 8/2/2013 12:57 AM, Rainer Schuetze wrote:

http://www.digitalmars.com/download/freecompiler.html


Although my laptop got quite a bit faster overnight (I guess it was
throttled
for some reason yesterday), relative results don't change:

std.algorithm -main -unittest

dmc85?: 12.5 sec
dmc857: 12.5 sec
msc: 7 sec

BTW: I usually use VS2008, but now also tried VS2010 - no difference.


The two dmc times shouldn't be the same. I see a definite improvement.
Disassemble aav.obj, and look at the function aaGetRvalue. It should
look like this:


My disassembly looks exactly the same. I don't think that a single div 
operation in a rather long function has a lot of impact on modern 
processors. I'm running an i7, according to the instruction tables by 
Agner Fog, the div has latency of 17-28 cycles and a reciprocal 
throughput of 7-17 cycles. If I estimate the latency of the asm snippet, 
I also get 16 cycles. And that doesn't take the additional tests and 
jumps into consideration.


 note this section does not have a div instruction in it 
==

mov EAX,EBX
mov EDX,08421085h   ; latency 3
mov ECX,EBX
mul EDX ; latency 5
mov EAX,ECX
sub EAX,EDX ; latency 1
shr EAX,1   ; latency 1
lea EDX,[EAX][EDX]  ; latency 1
shr EDX,4   ; latency 1
imulEAX,EDX,01Fh; latency 3
sub ECX,EAX ; latency 1
mov ESI,ECX
==



Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Richard Webb

On 01/08/2013 00:32, Walter Bright wrote:

Thanks for doing this, this is good information.

On 7/31/2013 2:24 PM, Rainer Schuetze wrote:

I have just tried yesterdays dmd to build Visual D (it builds some
libraries and
contains a few short non-compiling tasks in between):

Debug build dmd_dmc: 23 sec, std new 43 sec
Debug build dmd_msc: 19 sec, std new 20 sec


That makes it clear that the dmc malloc() was the dominator, not code gen.




It still appears that the DMC malloc is a big reason for the difference 
between DMC and MSVC builds when compiling the algorithm unit tests. (a 
very quick test suggests that changing the global new in rmem.c to call 
HeapAlloc instead of malloc gives a large speedup).




Re: dmc 8.57 now available for download

2013-08-02 Thread bearophile

Walter Bright:


Yes, unless I screwed it up again.


It works now, thank you.

Bye,
bearophile


Re: Article: Increasing the D Compiler Speed by Over 75%

2013-08-02 Thread Dmitry Olshansky

31-Jul-2013 22:20, Walter Bright пишет:

On 7/31/2013 8:26 AM, Dmitry Olshansky wrote:

Ouch... to boot it's always aligned by word size, so
key % sizeof(size_t) == 0
...
rendering lower 2-3 bits useless, that would make straight slice lower
bits
approach rather weak :)


Yeah, I realized that, too. Gotta shift it right 3 or 4 bits.


And that helped a bit... Anyhow after doing a bit more pervasive integer 
hash power of 2 tables stand up to their promise.


The pull that reaps the minor speed benefit over the original (~2% speed 
gain!):

https://github.com/D-Programming-Language/dmd/pull/2436

Not bad given that _aaGetRValue takes only a fraction of time itself.

I failed to see much of any improvement on Win32 though, allocations are 
dominating the picture.


And sharing the joy of having a nice sampling profiler, here is what AMD 
CodeAnalyst have to say (top X functions by CPU clocks not halted).


Original DMD:

Function CPU clocks  DC accesses DC misses
RTLHeap::Alloc   49410   520 3624
Obj::ledata  10300   13083166
Obj::fltused 646432186
cgcs_term40181328626
TemplateInstance::semantic   3362239626
Obj::byte3212506 692
vsprintf 303030602
ScopeDsymbol::search 27801592244
_pformat 2506277216
_aaGetRvalue 2134806 304
memmove  1904108428
strlen   1804486 36
malloc   1282786 40
Parameter::foreach   1240778 34
StringTable::search  952 220 42
MD5Final 918 318

Variation of DMD with pow-2 tables:

Function CPU clocks  DC accesses DC misses
RTLHeap::Alloc   51638   552 3538
Obj::ledata  993613463290
Obj::fltused 739229486
cgcs_term38921292638
TemplateInstance::semantic   3724234620
Obj::byte3280548 676
vsprintf 305630064
ScopeDsymbol::search 26481706220
_pformat 2560271826
memcpy   2014112246
strlen   1694494 32
_aaGetRvalue 1588658 278
Parameter::foreach   1266658 38
malloc   1198758 44
StringTable::search  970 214 24
MD5Final 866 274 2


This underlies the point that DMC RTL allocator is the biggest speed 
detractor. It is followed by ledata (could it be due to linear search 
inside?) and surprisingly the tiny Obj::fltused is draining lots of 
cycles (is it called that often?).


--
Dmitry Olshansky


Re: DScanner is ready for use

2013-08-02 Thread Tofu Ninja

On Saturday, 27 July 2013 at 22:27:35 UTC, Brian Schott wrote:
DScanner is a tool for analyzing D source code. It has the 
following features:


* Prints out a complete AST of a source file in XML format.
* Syntax checks code and prints warning/error messages
* Prints a listing of modules imported by a source file
* Syntax highlights code in HTML format
* Provides more meaningful line of code count than wc
* Counts tokens in a source file

The lexer/parser/AST are located in the std/d directory in 
the repository. These files should prove useful to anyone else 
working on D tooling.


https://github.com/Hackerpilot/Dscanner

Aside: the D grammar that I reverse-engineered can be located 
here: 
https://rawgithub.com/Hackerpilot/DGrammar/master/grammar.html


Any idea on when we might see json output(i am not a fan of xml)? 
Other than that this is a fantastic project! I am planing some 
projects in the future and this will be of great help. Keep up 
the good work!


Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Daniel Murphy
Rainer Schuetze r.sagita...@gmx.de wrote in message 
news:ktbvam$dvf$1...@digitalmars.com...
large-address-aware).

 This shows that removing most of the allocations was a good optimization 
 for the dmc-Runtime, but does not have a large, but still notable impact 
 on a faster heap implementation (the VS runtime usually maps directly to 
 the Windows API for non-Debug builds). I suspect the backend and the 
 optimizer do not use new a lot, but plain malloc calls, so they still 
 suffer from the slow runtime.

On a related note, I just tried replacing the two ::malloc calls in rmem's 
operator new with VirtualAlloc and I get a reduction from 13 seconds to 9 
seconds (compiling dmd std\range -unittest -main) with a release build of 
dmd. 




Re: Article: Increasing the D Compiler Speed by Over 75%

2013-08-02 Thread Leandro Lucarella
Walter Bright, el 30 de July a las 11:13 me escribiste:
 On 7/30/2013 2:59 AM, Leandro Lucarella wrote:
 I just want to point out that being so much people getting this wrong
 (and even fighting to convince other people that the wrong
 interpretation is right) might be an indication that the message you
 wanted to give in that blog is not extremely clear :)
 
 It never occurred to me that anyone would have any difficulty
 understanding the notion of speed. After all, we deal with it
 every day when driving.

That's a completely different context, and I don't think anyone think in
terms of percentage of speed in the daily life (you just say my car is
twice as fast or stuff like that, but I think people hardly say my car
is 10% faster in informal contexts).

For me the problem is, because in informal contexts one tend to think in
multipliers of speed, not percentages (or at least I do), is where the
confusion comes from, is somehow counter intuitive. I understood what
you mean, but I had to think about it, my first reaction was to think
you were saying the compiler took 1/4 of the original time. Then I did
the math and verified what you said was correct. But I had to do the
math.

I'm not say is right or wrong for people to have this reflex of thinking
about multipliers, I'm just saying if you care about transmitting the
message as clear as you can, is better to use numbers everybody can
intuitively think about.

And this is in reply to Andrei too. I understand your POV, but if your
main goal is communication (instead of education about side topics),
I think is better to stick with numbers and language that minimizes
confusion and misinterpretations.

Just a humble opinion of yours truly.

-- 
Leandro Lucarella (AKA luca) http://llucax.com.ar/
--
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
--
You can try the best you can
If you try the best you can
The best you can is good enough


Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Walter Bright

On 8/2/2013 2:47 AM, Rainer Schuetze wrote:

My disassembly looks exactly the same. I don't think that a single div operation
in a rather long function has a lot of impact on modern processors. I'm running
an i7, according to the instruction tables by Agner Fog, the div has latency of
17-28 cycles and a reciprocal throughput of 7-17 cycles. If I estimate the
latency of the asm snippet, I also get 16 cycles. And that doesn't take the
additional tests and jumps into consideration.



I'm using an AMD FX-6100.



Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Walter Bright

On 8/2/2013 8:18 AM, Daniel Murphy wrote:

On a related note, I just tried replacing the two ::malloc calls in rmem's
operator new with VirtualAlloc and I get a reduction from 13 seconds to 9
seconds (compiling dmd std\range -unittest -main) with a release build of
dmd.


Hmm, very interesting!



Re: Article: Increasing the D Compiler Speed by Over 75%

2013-08-02 Thread Walter Bright

On 8/2/2013 6:16 AM, Dmitry Olshansky wrote:

31-Jul-2013 22:20, Walter Bright пишет:

On 7/31/2013 8:26 AM, Dmitry Olshansky wrote:

Ouch... to boot it's always aligned by word size, so
key % sizeof(size_t) == 0
...
rendering lower 2-3 bits useless, that would make straight slice lower
bits
approach rather weak :)


Yeah, I realized that, too. Gotta shift it right 3 or 4 bits.


And that helped a bit... Anyhow after doing a bit more pervasive integer hash
power of 2 tables stand up to their promise.

The pull that reaps the minor speed benefit over the original (~2% speed gain!):
https://github.com/D-Programming-Language/dmd/pull/2436


2% is worth taking.



Not bad given that _aaGetRValue takes only a fraction of time itself.

I failed to see much of any improvement on Win32 though, allocations are
dominating the picture.

And sharing the joy of having a nice sampling profiler, here is what AMD
CodeAnalyst have to say (top X functions by CPU clocks not halted).

Original DMD:

Function CPU clocks DC accesses DC misses
RTLHeap::Alloc 49410 520 3624
Obj::ledata 10300 1308 3166
Obj::fltused 6464 3218 6
cgcs_term 4018 1328 626
TemplateInstance::semantic 3362 2396 26
Obj::byte 3212 506 692
vsprintf 3030 3060 2
ScopeDsymbol::search 2780 1592 244
_pformat 2506 2772 16
_aaGetRvalue 2134 806 304
memmove 1904 1084 28
strlen 1804 486 36
malloc 1282 786 40
Parameter::foreach 1240 778 34
StringTable::search 952 220 42
MD5Final 918 318

Variation of DMD with pow-2 tables:

Function CPU clocks DC accesses DC misses
RTLHeap::Alloc 51638 552 3538
Obj::ledata 9936 1346 3290
Obj::fltused 7392 2948 6
cgcs_term 3892 1292 638
TemplateInstance::semantic 3724 2346 20
Obj::byte 3280 548 676
vsprintf 3056 3006 4
ScopeDsymbol::search 2648 1706 220
_pformat 2560 2718 26
memcpy 2014 1122 46
strlen 1694 494 32
_aaGetRvalue 1588 658 278
Parameter::foreach 1266 658 38
malloc 1198 758 44
StringTable::search 970 214 24
MD5Final 866 274 2


This underlies the point that DMC RTL allocator is the biggest speed detractor.
It is followed by ledata (could it be due to linear search inside?) and
surprisingly the tiny Obj::fltused is draining lots of cycles (is it called that
often?).


It's not fltused() that is taking up time, it is the static function following 
it. The sampling profiler you're using is unaware of non-global function names.




Re: Article: Increasing the D Compiler Speed by Over 75%

2013-08-02 Thread Andrei Alexandrescu

On 2013-08-02 15:44:13 +, Leandro Lucarella said:

I'm not say is right or wrong for people to have this reflex of thinking
about multipliers, I'm just saying if you care about transmitting the
message as clear as you can, is better to use numbers everybody can
intuitively think about.

And this is in reply to Andrei too. I understand your POV, but if your
main goal is communication (instead of education about side topics),
I think is better to stick with numbers and language that minimizes
confusion and misinterpretations.

Just a humble opinion of yours truly.



Fair enough. So what would have been a better way to convey the 
quantitative improvement?


Thanks,

Andrei



Re: Article: Increasing the D Compiler Speed by Over 75%

2013-08-02 Thread Peter Alexander
On Friday, 2 August 2013 at 17:16:30 UTC, Andrei Alexandrescu 
wrote:

On 2013-08-02 15:44:13 +, Leandro Lucarella said:
I'm not say is right or wrong for people to have this reflex 
of thinking
about multipliers, I'm just saying if you care about 
transmitting the
message as clear as you can, is better to use numbers 
everybody can

intuitively think about.

And this is in reply to Andrei too. I understand your POV, but 
if your
main goal is communication (instead of education about side 
topics),
I think is better to stick with numbers and language that 
minimizes

confusion and misinterpretations.

Just a humble opinion of yours truly.



Fair enough. So what would have been a better way to convey the 
quantitative improvement?


Not to speak on Leandro's behalf, but I think the obvious answer 
is Reduced compile times by 43%.


It's much more useful to express it that way because it's easier 
to apply. Say I have a program that takes 100 seconds to compile. 
Knowing that the compilation time is reduced by 43% makes it easy 
to see that my program will now take 57 seconds. Knowing that 
compilation is 75% faster doesn't help much at all - I have to 
get out a calculator and divide by 1.75.


It's always better to use a measure that is linear with what you 
care about. Here, most people care about how long their programs 
take to compile, not how many programs they can compile per 
second.


Re: DScanner is ready for use

2013-08-02 Thread Brian Schott

On Friday, 2 August 2013 at 13:52:14 UTC, Tofu Ninja wrote:
Any idea on when we might see json output(i am not a fan of 
xml)?


Roughly the same time somebody submits a pull request.

I'm currently focusing my spare time on DCD, so the JSON output 
will happen after I'm able to get auto-completion working.


Re: DScanner is ready for use

2013-08-02 Thread Tofu Ninja

On Friday, 2 August 2013 at 18:01:01 UTC, Brian Schott wrote:

On Friday, 2 August 2013 at 13:52:14 UTC, Tofu Ninja wrote:
Any idea on when we might see json output(i am not a fan of 
xml)?


Roughly the same time somebody submits a pull request.

I'm currently focusing my spare time on DCD, so the JSON output 
will happen after I'm able to get auto-completion working.


I will look into adding it my self if I get some time, but I 
don't think I will need to use this for a while. For what I want 
it for, there is a lot of legwork to be done before I get around 
to needing this.


Also roughly how difficult would it be to re-create source code 
from the xml? And does the xml preserve comments and if so does 
it do anything with ddoc?


Re: DScanner is ready for use

2013-08-02 Thread Brian Schott

On Friday, 2 August 2013 at 18:12:15 UTC, Tofu Ninja wrote:
I will look into adding it my self if I get some time, but I 
don't think I will need to use this for a while. For what I 
want it for, there is a lot of legwork to be done before I get 
around to needing this.


The XML output is handled by this class:
https://github.com/Hackerpilot/Dscanner/blob/master/astprinter.d
It shouldn't be much more difficult than changing the print 
statements.


Also roughly how difficult would it be to re-create source code 
from the xml? And does the xml preserve comments and if so does 
it do anything with ddoc?


Aside from the comments, it should be possible to recreate the 
source from the AST. If it's not, there's a bug in the AST 
output. Comments are skipped when syntax checking or generating 
the AST.


Re: dmc 8.57 now available for download

2013-08-02 Thread Michael

On Thursday, 1 August 2013 at 22:32:09 UTC, Walter Bright wrote:

http://www.digitalmars.com/download/freecompiler.html

Using it to compile dmd for win32 will result in a faster dmd.


Change log?


Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Rainer Schuetze



On 02.08.2013 18:37, Walter Bright wrote:

On 8/2/2013 2:47 AM, Rainer Schuetze wrote:

My disassembly looks exactly the same. I don't think that a single div
operation
in a rather long function has a lot of impact on modern processors.
I'm running
an i7, according to the instruction tables by Agner Fog, the div has
latency of
17-28 cycles and a reciprocal throughput of 7-17 cycles. If I estimate
the
latency of the asm snippet, I also get 16 cycles. And that doesn't
take the
additional tests and jumps into consideration.



I'm using an AMD FX-6100.



This processor seems to do a little better with the mov reg,imm 
operation but otherwise is similar. The DIV operation has larger 
worst-case latency, though (16-48 cycles).


Better to just use a power of 2 for the array sizes anyway...


Re: Article: Increasing the D Compiler Speed by Over 75%

2013-08-02 Thread Andrei Alexandrescu

On 8/2/13 10:44 AM, Peter Alexander wrote:

Not to speak on Leandro's behalf, but I think the obvious answer is
Reduced compile times by 43%.

It's much more useful to express it that way because it's easier to
apply. Say I have a program that takes 100 seconds to compile. Knowing
that the compilation time is reduced by 43% makes it easy to see that my
program will now take 57 seconds. Knowing that compilation is 75% faster
doesn't help much at all - I have to get out a calculator and divide by
1.75.

It's always better to use a measure that is linear with what you care
about. Here, most people care about how long their programs take to
compile, not how many programs they can compile per second.


That's cool, thanks!

Andrei


Re: Article: Increasing the D Compiler Speed by Over 75%

2013-08-02 Thread Bill Baxter
Well put, you two.  Exactly the same point I was trying to make, only to
get accused of spouting baloney.

---bb


On Fri, Aug 2, 2013 at 10:44 AM, Peter Alexander 
peter.alexander...@gmail.com wrote:

 On Friday, 2 August 2013 at 17:16:30 UTC, Andrei Alexandrescu wrote:

 On 2013-08-02 15:44:13 +, Leandro Lucarella said:

 I'm not say is right or wrong for people to have this reflex of thinking
 about multipliers, I'm just saying if you care about transmitting the
 message as clear as you can, is better to use numbers everybody can
 intuitively think about.

 And this is in reply to Andrei too. I understand your POV, but if your
 main goal is communication (instead of education about side topics),
 I think is better to stick with numbers and language that minimizes
 confusion and misinterpretations.

 Just a humble opinion of yours truly.



 Fair enough. So what would have been a better way to convey the
 quantitative improvement?


 Not to speak on Leandro's behalf, but I think the obvious answer is
 Reduced compile times by 43%.

 It's much more useful to express it that way because it's easier to apply.
 Say I have a program that takes 100 seconds to compile. Knowing that the
 compilation time is reduced by 43% makes it easy to see that my program
 will now take 57 seconds. Knowing that compilation is 75% faster doesn't
 help much at all - I have to get out a calculator and divide by 1.75.

 It's always better to use a measure that is linear with what you care
 about. Here, most people care about how long their programs take to
 compile, not how many programs they can compile per second.



Re: Article: Increasing the D Compiler Speed by Over 75%

2013-08-02 Thread user
Ha ha, I am a design/controls engineer who deals with speeds and 
accelerations on a daily basis and yet I was also confused by 
Walter's statement.


I guess the confusion arises from what one expects (as opposed to 
understands) by the word speed in the given context.


In the context of compiling my SW programs, I only see a dark 
console with a blocked cursor which I cannot use and every second 
waited will be felt directly. I don't see any  action or hint of 
speed. This makes me think that a faster compiler supposed to 
make me wait less. This creates a kind of mental link between the 
word speed and the feeling of waiting. Hence the expectation: 
50% faster compiler should make me wait less by 50%.


Instead of a dark console with a blocked cursor, if I see lots of 
lines which are been compiled scrolling at very high speed on the 
screen (like when installing some programs) then I would relate 
speed with the number of lines scrolling. And my expectation 
would probably change to: 50% faster compiler would compile 50% 
more lines per second.


What I am saying is that even though technically we understand 
what speed is, its the intuitive subjective feeling based on the 
context which causes an experience of something doesn't add up.


I will stop blabbering now.



Re: Article: Increasing the D Compiler Speed by Over 75%

2013-08-02 Thread user
I am OK with the existing definition of speed, but would like to 
see the definition mentioned somewhere at the top. speed = 
lines_compiled/sec. Even though its obvious to some people, it 
not to me!


I guess that's why all the technical docs I write have a explicit 
definitions section at the top in the template. I should start 
using it more often.




Re: Article: Increasing the D Compiler Speed by Over 75%

2013-08-02 Thread Leandro Lucarella
Andrei Alexandrescu, el  2 de August a las 10:16 me escribiste:
 On 2013-08-02 15:44:13 +, Leandro Lucarella said:
 I'm not say is right or wrong for people to have this reflex of thinking
 about multipliers, I'm just saying if you care about transmitting the
 message as clear as you can, is better to use numbers everybody can
 intuitively think about.
 
 And this is in reply to Andrei too. I understand your POV, but if your
 main goal is communication (instead of education about side topics),
 I think is better to stick with numbers and language that minimizes
 confusion and misinterpretations.
 
 Just a humble opinion of yours truly.
 
 
 Fair enough. So what would have been a better way to convey the
 quantitative improvement?

Reduced execution time by half?

-- 
Leandro Lucarella (AKA luca) http://llucax.com.ar/
--
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
--
Mi infancia fue en un loft, bien al costado del río
Cazabamos correcaminos y lo azabamos en el fogón
Después? Después me vine grande y la ciudad me deslumbró
Jugando al tejo en Lavalle me hice amigo del bongó


Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Walter Bright

On 8/2/2013 4:18 AM, Richard Webb wrote:

It still appears that the DMC malloc is a big reason for the difference between
DMC and MSVC builds when compiling the algorithm unit tests. (a very quick test
suggests that changing the global new in rmem.c to call HeapAlloc instead of
malloc gives a large speedup).



Yes, I agree, the DMC malloc is clearly a large performance problem. I had not 
realized this.




Re: Article: Increasing the D Compiler Speed by Over 75%

2013-08-02 Thread Walter Bright

On 8/2/2013 1:45 PM, user wrote:

I am OK with the existing definition of speed, but would like to see the
definition mentioned somewhere at the top. speed = lines_compiled/sec. Even
though its obvious to some people, it not to me!


Sigh. It's not even lines per second, it's dimensionless when using percentages. 
Note that I never needed to count the number of lines to get a correct percentage.




I guess that's why all the technical docs I write have a explicit definitions
section at the top in the template. I should start using it more often.


I wouldn't even read an article that had a sidebar at the top explaining what 
speed was.


Re: Increasing D Compiler Speed by Over 75%

2013-08-02 Thread Dmitry Olshansky

02-Aug-2013 20:40, Walter Bright пишет:

On 8/2/2013 8:18 AM, Daniel Murphy wrote:

On a related note, I just tried replacing the two ::malloc calls in
rmem's
operator new with VirtualAlloc and I get a reduction from 13 seconds to 9
seconds (compiling dmd std\range -unittest -main) with a release
build of
dmd.


Hmm, very interesting!



Made a pull to provide an implementation of rmem.c on top of Win32 Heap API.
https://github.com/D-Programming-Language/dmd/pull/2445

Also noting that global new/delete are not reentrant already, added 
NO_SERIALIZE flag to save on locking/unlocking of heap.


For me this gets from 13 to 8 seconds.

--
Dmitry Olshansky


Re: Article: Increasing the D Compiler Speed by Over 75%

2013-08-02 Thread Dmitry Olshansky

02-Aug-2013 20:47, Walter Bright пишет:

On 8/2/2013 6:16 AM, Dmitry Olshansky wrote:

I failed to see much of any improvement on Win32 though, allocations are
dominating the picture.

And sharing the joy of having a nice sampling profiler, here is what AMD
CodeAnalyst have to say (top X functions by CPU clocks not halted).


[snip]



This underlies the point that DMC RTL allocator is the biggest speed
detractor.
It is followed by ledata (could it be due to linear search inside?) and
surprisingly the tiny Obj::fltused is draining lots of cycles (is it
called that
often?).


It's not fltused() that is taking up time, it is the static function
following it. The sampling profiler you're using is unaware of
non-global function names.



Thanks, that must be it! And popping that function above another one 
gets Obj::far16thunk to be blamed :) Need to watch out for this sort of 
problem next time. Could it be due to how it works with old CV debug 
info format?


--
Dmitry Olshansky


Re: Article: Increasing the D Compiler Speed by Over 75%

2013-08-02 Thread Walter Bright

On 8/2/2013 3:53 PM, Dmitry Olshansky wrote:

Thanks, that must be it! And popping that function above another one gets
Obj::far16thunk to be blamed :) Need to watch out for this sort of problem next
time. Could it be due to how it works with old CV debug info format?


Try compiling with -g.

Otherwise, you only get global symbols.



Re: dmc 8.57 now available for download

2013-08-02 Thread Chang Long

On Friday, 2 August 2013 at 01:25:08 UTC, Walter Bright wrote:

On 8/1/2013 6:22 PM, bearophile wrote:

Walter Bright:


Fixed.


Do you mean that if I download the dmc zip again it will work?



Yes, unless I screwed it up again.


the package dm857c.zip show dmc file create at 2004, and when I
run it show version is 8.42n, is that correct ?