Re: Passing Templated Function Arguments Solely by Reference
On 07/08/2014 05:13 PM, Nordlöw wrote: If I want randInPlace to take value arguments (such as structs) by reference and reference types (classes) as normal is this I don't understand what it means to fill a struct or a class object with random content. /** Generate Random Contents in $(D x). */ auto ref randInPlace(T)(auto ref T x) @safe /* nothrow */ if (isIterable!T) hasAssignableElements is more correct. { foreach (ref elt; x) { import std.range: ElementType; static if (isInputRange!(ElementType!T)) The documentation of hasAssignableElements mentions that it implies isForwardRange and it makes sense: You don't want the range to be consumed as an InputRange would do. elt[].randInPlace; else elt.randInPlace; } return x; } And how does this compare to using x[].randInPlace() when x is a static array? Range algorithms don't work with static arrays because they can't popFront(). The solution is to use a slice to the entire array as you've already done as x[]. ;) Does x[] create unnecessary GC-heap activity in this case? No. Static array will remain in memory and x[] will be a local slice. A slice consists of two members, the equivalent of the following: struct __SliceImpl { size_t length; void * pointer_to_first_element; } I'm wondering because (auto ref T x) is just used in two places in std.algorithm and std.range in Phobos. Is this a relatively new enhancement? Phobos algorithms use ranges. The following is what I've come up with very quickly: import std.stdio; import std.range; import std.traits; import std.random; void randInPlace(R)(R range) if (hasAssignableElements!R) { foreach (ref e; range) { e.randInPlace(); } } void randInPlace(E)(ref E element) if (isNumeric!E) { // BUG: Never assigns the value E.max element = uniform(E.min, E.max); } void randInPlace(E)(ref E element) if (isBoolean!E) { element = cast(bool)uniform(0, 2); } void main() { auto arr = [ [ 0, 1, 2 ], [ 3, 4, 5 ] ]; arr.randInPlace(); writefln(%s, arr); auto barr = [ [ false, true ], [ false, true ] ]; barr.randInPlace(); writefln(%s, barr); } Ali
__VERSION__ and the different compilers
Is it safe to assume that __VERSION__ is the same among DMD, LDC and GDC when using the equivalent front-end? I want to implement @nogc in Derelict in a backward compatible way. The simple thing to do is (at the suggestion of w0rp): static if( __VERSION__ 2.066 ) enum nogc = 1; I just want to verify that this is sufficient and that I don't need to test for version(DMD) and friends as well. --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
Re: __VERSION__ and the different compilers
Mike Parker: Is it safe to assume that __VERSION__ is the same among DMD, LDC and GDC when using the equivalent front-end? Right. An alternative solution is to use __traits(compiles) and use @nogc inside it. Bye, bearophile
Re: __VERSION__ and the different compilers
On 7/9/14, Mike Parker via Digitalmars-d-learn digitalmars-d-learn@puremagic.com wrote: Is it safe to assume that __VERSION__ is the same among DMD, LDC and GDC when using the equivalent front-end? Yes, but not all future compilers might implement this (although I hope they will). I think there's also __VENDOR__ IIRC.
Small part of a program : d and c versions performances diff.
Hello, I extracted a part of my code written in c. it is deliberately useless here but I would understand the different technics to optimize such kind of code with gdc compiler. it currently runs under a microsecond. Constraint : the way the code is expressed cannot be changed much we need that double loop because there are other operations involved in the first loop scope. main.c : [code] #include stdio.h #include string.h #include stdlib.h #include jol.h #include time.h #include sys/time.h int main(void) { struct timeval s,e; gettimeofday(s,NULL); int pol = 5; tes(pol); int arr[] = {9,16,458,2,68,5452,98,32,4,565,78,985,3215}; int len = 13-1; int g = 0; for (int x = 36; x = 0 ; --x ){ // some code here erased for the test for(int y = len ; y = 0; --y){ //some other code here ++g; arr[y] +=1; } } gettimeofday(e,NULL); printf(so ? %d %lu %d %d %d,g,e.tv_usec - s.tv_usec, arr[4],arr[9],pol); return 0; } [/code] jol.c [code] void tes(int * restrict a){ *a = 9; } [/code] and jol.h #ifndef JOL_H #define JOL_H void tes(int * restrict a); #endif // JOL_H Now, the D counterpart: module main; import std.stdio; import std.datetime; import jol; int main(string[] args) { auto currentTime = Clock.currTime(); int pol = 5; tes(pol); pol = 8; int arr[] = [9,16,458,2,68,5452,98,32,4,565,78,985,3215]; int len = 13-1; int g = 0; for (int x = 31; x = 0 ; --x ){ for(int y = len ; y = 0; --y){ ++g; arr[y] +=1; } } auto currentTime2 = Clock.currTime(); writefln(Hello World %d %s %d %d\n,g, (currentTime2 - currentTime),arr[4],arr[9]); return 0; } and module jol; final void tes(ref int a){ a = 9; } Ok, the compilation options : gdc hello.d jol.d -O3 -frelease -ftree-loop-optimize gcc -march=native -std=c11 -O2 main.c jol.c Now the performance : D : 12 µs C : 1µs Where does the diff comes from ? Is there a way to optimize the d version ? Again, I am absolutely new to D and those are my very first line of code with it. Thanks
Re: Small part of a program : d and c versions performances diff.
On Wednesday, 9 July 2014 at 10:57:33 UTC, Larry wrote: Hello, I extracted a part of my code written in c. it is deliberately useless here but I would understand the different technics to optimize such kind of code with gdc compiler. it currently runs under a microsecond. Constraint : the way the code is expressed cannot be changed much we need that double loop because there are other operations involved in the first loop scope. main.c : [code] #include stdio.h #include string.h #include stdlib.h #include jol.h #include time.h #include sys/time.h int main(void) { struct timeval s,e; gettimeofday(s,NULL); int pol = 5; tes(pol); int arr[] = {9,16,458,2,68,5452,98,32,4,565,78,985,3215}; int len = 13-1; int g = 0; for (int x = 36; x = 0 ; --x ){ // some code here erased for the test for(int y = len ; y = 0; --y){ //some other code here ++g; arr[y] +=1; } } gettimeofday(e,NULL); printf(so ? %d %lu %d %d %d,g,e.tv_usec - s.tv_usec, arr[4],arr[9],pol); return 0; } [/code] jol.c [code] void tes(int * restrict a){ *a = 9; } [/code] and jol.h #ifndef JOL_H #define JOL_H void tes(int * restrict a); #endif // JOL_H Now, the D counterpart: module main; import std.stdio; import std.datetime; import jol; int main(string[] args) { auto currentTime = Clock.currTime(); int pol = 5; tes(pol); pol = 8; int arr[] = [9,16,458,2,68,5452,98,32,4,565,78,985,3215]; int len = 13-1; int g = 0; for (int x = 31; x = 0 ; --x ){ for(int y = len ; y = 0; --y){ ++g; arr[y] +=1; } } auto currentTime2 = Clock.currTime(); writefln(Hello World %d %s %d %d\n,g, (currentTime2 - currentTime),arr[4],arr[9]); return 0; } and module jol; final void tes(ref int a){ a = 9; } Ok, the compilation options : gdc hello.d jol.d -O3 -frelease -ftree-loop-optimize gcc -march=native -std=c11 -O2 main.c jol.c Now the performance : D : 12 µs C : 1µs Where does the diff comes from ? Is there a way to optimize the d version ? Again, I am absolutely new to D and those are my very first line of code with it. Thanks Clock isn't an accurate benchmark instrument. Try std.datetime.benchmark: ``` module main; import std.stdio; import std.datetime; void tes(ref int a) { a = 9; } int[] arr = [9,16,458,2,68,5452,98,32,4,565,78,985,3215]; void foo() { int pol = 5; tes(pol); pol = 8; int g = 0; foreach_reverse(x; 0..31) { foreach_reverse(ref a; arr) { ++g; a += 1; } } } void main() { auto res = benchmark!foo(1000); // take mean of 1000 launches writeln(res[0].msecs, , arr[4], , arr[9]); } ``` Dmd time: 1 us Gcc time: = 1 us
Re: Small part of a program : d and c versions performances diff.
Larry: Now the performance : D : 12 µs C : 1µs Where does the diff comes from ? Is there a way to optimize the d version ? Again, I am absolutely new to D and those are my very first line of code with it. Your C code is not equivalent to the D code, there are small differences, even the output is different. So I've cleaned up your C and D code: // C code. #include stdio.h #include string.h #include stdlib.h #include time.h #include sys/time.h #include jol.h int main() { struct timeval s, e; gettimeofday(s, NULL); int pol = 5; tes(pol); int arr[] = {9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215}; int len = 13 - 1; int g = 0; for (int x = 36; x = 0; --x) { for (int y = len; y = 0; --y) { ++g; arr[y]++; } } gettimeofday(e, NULL); printf(C: %d %lu %d %d %d\n, g, e.tv_usec - s.tv_usec, arr[4], arr[9], pol); return 0; } D code (final functions have not much meaning, but the D compiler is very sloppy and doesn't complain): module jol; void tes(ref int a) { a = 9; } - module maind; void main() { import std.stdio; import std.datetime; import jol; StopWatch sw; sw.start; int pol = 5; tes(pol); int[] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215]; int len = 13 - 1; int g = 0; for (int x = 36; x = 0; --x) { // Some code here erased for the test. for (int y = len; y = 0; --y) { // Some other code here. ++g; arr[y]++; } } sw.stop; writefln(D: %d %d %d %d %d, g, sw.peek.nsecs, arr[4], arr[9], pol); } That D code is not fully idiomatic, this is closer to idiomatic D code: module jol2; void test(ref int x) pure nothrow @safe { x = 9; } module maind; void main() { import std.stdio, std.datetime; import jol2; StopWatch sw; sw.start; int pol = 5; test(pol); int[13] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215]; uint count = 0; foreach_reverse (immutable _; 0 .. 37) { foreach_reverse (ref ai; arr) { count++; ai++; } } sw.stop; writefln(D: %d %d %d %d %d, count, sw.peek.nsecs, arr[4], arr[9], pol); } In my benchmarks I don't have used the more idiomatic D code, I have used the C-like code. But the run-time is essentially the same. I compile the C and D code with (on a 32 bit Windows): gcc -march=native -std=c11 -O2 main.c jol.c -o main ldmd2 -wi -O -release -inline -noboundscheck maind.d jol.d strip maind.exe For the D code I've used the latest ldc2 compiler (V. 0.13.0, based on DMD v2.064 and LLVM 3.4.2), GCC is V.4.8.0 (rubenvb-4.8.0). The C code gives as ouput: C: 481 0 105 602 9 The D code gives as output: D: 481 6076 105 602 9 -- If I slow down the CPU at half speed the C code runs in about 0.05 seconds, the D code runs in about 0.07 seconds. Such run times are too much small to perform a sufficiently meaningful comparison. You need a run-time of about 2 seconds to get meaningful timings. The difference between 0.05 and 0.07 is caused by initializing the D rutime (like the D GC), it takes about 0.015 seconds on my systems at full speed CPU to initialize the D runtime, and it's a constant time. Bye, bearophile
Re: Visual D: Settings to Improve compil and link process
On Monday, 7 July 2014 at 22:00:51 UTC, Rainer Schuetze wrote: On 07.07.2014 12:46, ParticlePeter wrote: On Sunday, 6 July 2014 at 19:27:38 UTC, Rainer Schuetze wrote: These object files are in the library ;-) That means manual selection, though, as incremental builds to multiple object files don't work with dmd, and single file compilation is painfully slow. Not sure if I am getting this right, so when one object file has to be recompiled all other object files, even if up to date, would be recompiled ? That's how it is currently done if you don't use single file compilation. Compiling only modified and dependent modules in one step could work incrementally, but especially template instantiations make this hard to do correctly. The modules form MyProject do import the MyLib modules properly, I do not get compiler errors. However, the compiler should create Object files from MyLib modules, and the linker should link them. But he does not. On the other hand, when I add MyLib modules to MyProject ( Rightclick MyProject - add - existing item... MyLib source files ) then linking works. I do not understand why the later step is necessary. dmd does not compile imported modules, but rdmd does. Ähm ... not seeing the connection here either, why is this significant ? dmd just compiles the files given on the command line. rdmd makes two passes, one to collect imported files, and another to compile all the collected files. So rdmd works the way you want dmd to work (if I understand you correctly). I feel that I could not explain my problem properly, so one example: Importing phobos modules. I do not have to define any import path or lib file in the project settings, I just need to import std.somthing. That's because the import path for phobos modules are stored in the dmd sc.ini file. When I want to import my modules which are somewhere on my hard-drive and not added to my project I need to tell the compiler where these modules can be found, using the additional import path project setting. That's fine, doing this. But result is, std.somthing works, my modules in a path known by the compiler don't work, giving me linker errors. Why ? ( I do not create a lib, I just want to import the module. ) phobos is precompiled to a library and is automatically included in the link. If you want your custom modules to work the same way, you have to compile them to a library. Thanks for claryfying all the above. If the std.library is rebuild whenever a template is used, than my assumption that MyLib compiles slow due to usage of ( very simple ) templates must be wrong. I will dive deeper into profiling. Thanks a lot. Cheers, ParticlePeter
Re: Small part of a program : d and c versions performances diff.
On Wednesday, 9 July 2014 at 12:25:40 UTC, bearophile wrote: Larry: Now the performance : D : 12 µs C : 1µs Where does the diff comes from ? Is there a way to optimize the d version ? Again, I am absolutely new to D and those are my very first line of code with it. Your C code is not equivalent to the D code, there are small differences, even the output is different. So I've cleaned up your C and D code: // C code. #include stdio.h #include string.h #include stdlib.h #include time.h #include sys/time.h #include jol.h int main() { struct timeval s, e; gettimeofday(s, NULL); int pol = 5; tes(pol); int arr[] = {9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215}; int len = 13 - 1; int g = 0; for (int x = 36; x = 0; --x) { for (int y = len; y = 0; --y) { ++g; arr[y]++; } } gettimeofday(e, NULL); printf(C: %d %lu %d %d %d\n, g, e.tv_usec - s.tv_usec, arr[4], arr[9], pol); return 0; } D code (final functions have not much meaning, but the D compiler is very sloppy and doesn't complain): module jol; void tes(ref int a) { a = 9; } - module maind; void main() { import std.stdio; import std.datetime; import jol; StopWatch sw; sw.start; int pol = 5; tes(pol); int[] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215]; int len = 13 - 1; int g = 0; for (int x = 36; x = 0; --x) { // Some code here erased for the test. for (int y = len; y = 0; --y) { // Some other code here. ++g; arr[y]++; } } sw.stop; writefln(D: %d %d %d %d %d, g, sw.peek.nsecs, arr[4], arr[9], pol); } That D code is not fully idiomatic, this is closer to idiomatic D code: module jol2; void test(ref int x) pure nothrow @safe { x = 9; } module maind; void main() { import std.stdio, std.datetime; import jol2; StopWatch sw; sw.start; int pol = 5; test(pol); int[13] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215]; uint count = 0; foreach_reverse (immutable _; 0 .. 37) { foreach_reverse (ref ai; arr) { count++; ai++; } } sw.stop; writefln(D: %d %d %d %d %d, count, sw.peek.nsecs, arr[4], arr[9], pol); } In my benchmarks I don't have used the more idiomatic D code, I have used the C-like code. But the run-time is essentially the same. I compile the C and D code with (on a 32 bit Windows): gcc -march=native -std=c11 -O2 main.c jol.c -o main ldmd2 -wi -O -release -inline -noboundscheck maind.d jol.d strip maind.exe For the D code I've used the latest ldc2 compiler (V. 0.13.0, based on DMD v2.064 and LLVM 3.4.2), GCC is V.4.8.0 (rubenvb-4.8.0). The C code gives as ouput: C: 481 0 105 602 9 The D code gives as output: D: 481 6076 105 602 9 -- If I slow down the CPU at half speed the C code runs in about 0.05 seconds, the D code runs in about 0.07 seconds. Such run times are too much small to perform a sufficiently meaningful comparison. You need a run-time of about 2 seconds to get meaningful timings. The difference between 0.05 and 0.07 is caused by initializing the D rutime (like the D GC), it takes about 0.015 seconds on my systems at full speed CPU to initialize the D runtime, and it's a constant time. Bye, bearophile You are definitely right, I did mess up while translating ! I run the corrected codes (the ones I was meant to provide :S) and on a slow macbook I end up with : C : 2 D : 15994 Of course when run on very high end machines, this diff is almost non existent but we want to run on very low powered hardware. Ok, even with a longer code, there will always be a launch penalty for d. So I cannot use it for very high performance loops. Shame for us.. :) Thanks and bye
Re: Small part of a program : d and c versions performances diff.
On Wednesday, 9 July 2014 at 13:18:00 UTC, Larry wrote: On Wednesday, 9 July 2014 at 12:25:40 UTC, bearophile wrote: Larry: Now the performance : D : 12 µs C : 1µs Where does the diff comes from ? Is there a way to optimize the d version ? Again, I am absolutely new to D and those are my very first line of code with it. Your C code is not equivalent to the D code, there are small differences, even the output is different. So I've cleaned up your C and D code: // C code. #include stdio.h #include string.h #include stdlib.h #include time.h #include sys/time.h #include jol.h int main() { struct timeval s, e; gettimeofday(s, NULL); int pol = 5; tes(pol); int arr[] = {9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215}; int len = 13 - 1; int g = 0; for (int x = 36; x = 0; --x) { for (int y = len; y = 0; --y) { ++g; arr[y]++; } } gettimeofday(e, NULL); printf(C: %d %lu %d %d %d\n, g, e.tv_usec - s.tv_usec, arr[4], arr[9], pol); return 0; } D code (final functions have not much meaning, but the D compiler is very sloppy and doesn't complain): module jol; void tes(ref int a) { a = 9; } - module maind; void main() { import std.stdio; import std.datetime; import jol; StopWatch sw; sw.start; int pol = 5; tes(pol); int[] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215]; int len = 13 - 1; int g = 0; for (int x = 36; x = 0; --x) { // Some code here erased for the test. for (int y = len; y = 0; --y) { // Some other code here. ++g; arr[y]++; } } sw.stop; writefln(D: %d %d %d %d %d, g, sw.peek.nsecs, arr[4], arr[9], pol); } That D code is not fully idiomatic, this is closer to idiomatic D code: module jol2; void test(ref int x) pure nothrow @safe { x = 9; } module maind; void main() { import std.stdio, std.datetime; import jol2; StopWatch sw; sw.start; int pol = 5; test(pol); int[13] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215]; uint count = 0; foreach_reverse (immutable _; 0 .. 37) { foreach_reverse (ref ai; arr) { count++; ai++; } } sw.stop; writefln(D: %d %d %d %d %d, count, sw.peek.nsecs, arr[4], arr[9], pol); } In my benchmarks I don't have used the more idiomatic D code, I have used the C-like code. But the run-time is essentially the same. I compile the C and D code with (on a 32 bit Windows): gcc -march=native -std=c11 -O2 main.c jol.c -o main ldmd2 -wi -O -release -inline -noboundscheck maind.d jol.d strip maind.exe For the D code I've used the latest ldc2 compiler (V. 0.13.0, based on DMD v2.064 and LLVM 3.4.2), GCC is V.4.8.0 (rubenvb-4.8.0). The C code gives as ouput: C: 481 0 105 602 9 The D code gives as output: D: 481 6076 105 602 9 -- If I slow down the CPU at half speed the C code runs in about 0.05 seconds, the D code runs in about 0.07 seconds. Such run times are too much small to perform a sufficiently meaningful comparison. You need a run-time of about 2 seconds to get meaningful timings. The difference between 0.05 and 0.07 is caused by initializing the D rutime (like the D GC), it takes about 0.015 seconds on my systems at full speed CPU to initialize the D runtime, and it's a constant time. Bye, bearophile You are definitely right, I did mess up while translating ! I run the corrected codes (the ones I was meant to provide :S) and on a slow macbook I end up with : C : 2 D : 15994 Of course when run on very high end machines, this diff is almost non existent but we want to run on very low powered hardware. Ok, even with a longer code, there will always be a launch penalty for d. So I cannot use it for very high performance loops. Shame for us.. :) Thanks and bye Could you provide the exact code you are using for that benchmark? Once the program has started up you should be able to obtain performance parity between C and D. Situations where this isn't true are problems we would like to know about. For the amount of work you are doing in the test program (almost nothing), the total runtime is probably dominated by the program load time etc. even when using C.
Re: Small part of a program : d and c versions performances diff.
On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote: Yes you are perfectly right but our need is to run the fastest code on the lowest powered machines. Not servers but embedded systems. That is why I just test the overall structures. The rest of the code is numerical so it will not change by much the fact that d cannot get back the huge launching time. At the microsecond level(even nano) it counts because of electrical consumption, size of hardware, heat and so on. It is definitely not something most care about and i cannot disclose the full code for license reasons (yeah I know I suck and generate some fuss for nothing but.. I just execute.) But D may be of our use for non critical code to replace some Python there and there. It is definitely a good piece of engineering. And it will help save money. @John Colvin : hem, you meant the sample code or the real code ? If the former, it is the one corrected by Bearophile. My excuses
Re: Small part of a program : d and c versions performances diff.
@Bearophile: just tried. No dramatic change. import core.memory; void main() { GC.disable; ... }
Re: Small part of a program : d and c versions performances diff.
Larry: @Bearophile: just tried. No dramatic change. import core.memory; void main() { GC.disable; ... } That just means disabling the GC, so the start time is the same. What you want is to not start the GC/runtime, stubbing it out... (assuming you don't need the GC in your program). I think you can stub out the runtime functions defining few empty extern(C) functions, but I've never had to do it (saving 0.015 seconds is not important for my needs), so if you don't know how to do it, you have to ask to others. Bye, bearophile
Re: Small part of a program : d and c versions performances diff.
On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote: The rest of the code is numerical so it will not change by much the fact that d cannot get back the huge launching time. At the microsecond level(even nano) it counts because of electrical consumption, size of hardware, heat and so on. You say you are worried about microseconds and power consumption, but you are suggesting launching a new process - a lot of overhead - to do a small amount of numerical work. Surely no matter what programming language you use you would not want to work like this?
Re: Small part of a program : d and c versions performances diff.
On Wednesday, 9 July 2014 at 14:30:41 UTC, John Colvin wrote: You say you are worried about microseconds and power consumption, but you are suggesting launching a new process - a lot of overhead - to do a small amount of numerical work. Not much overhead if you don't use a MMU and use static linking.
Re: Opinions: The Best and Worst of D (for a lecture/talk I intend to give)
On Wed, Jul 09, 2014 at 07:51:24AM +0200, Philippe Sigaud via Digitalmars-d-learn wrote: On Tue, Jul 8, 2014 at 7:50 AM, H. S. Teoh via Digitalmars-d-learn digitalmars-d-learn@puremagic.com wrote quite a wall of text Wow, what to add to that? Maybe you scared other from participating ;-) I hope not. :) [...] * I'd add static introspection to the mix: using static if, __traits(...) and is(...), clunky as the syntax is (there, one 'ugly' thing for you), is easy and very powerful: [...] Oh yeah, I forgot about that one. The syntax of is-expressions is very counterintuitive (not to mention noisy), and has too many special-cased meanings that are completely non-obvious for the uninitiated, for example: // Assume T = some type is(T) // is T a valid type? is(T U) // is T a valid type? If so, alias it to U is(T : U) // is T implicitly convertible to U? is(T U : V) // is T implicitly convertible to V? If so, // alias it to U is(T U : V, W) // does T match the type pattern V, for some // template arguments W? If so, alias to U is(T == U) // is T the same type as U? // You thought the above is (somewhat) consistent? Well look at // this one: is(T U : __parameters) // is T the type of a function? If so, alias U // to the parameter tuple of its arguments. That last one is remarkably pathological: it breaks away from the general interpretation of the other cases, where T is matched against the right side of the expression; here, __parameters is a magic keyword that makes the whole thing mean something else completely. Not to mention, what is returned in U is something extremely strange; it looks like a type tuple, but it's actually something more than that. Unlike usual type tuples, in addition to encoding the list of types of the function's parameters, it also includes the parameter names and attributes... except that you can only get at the parameter names using __traits(name,...). But indexing it like a regular type tuple will reduce its elements into mere types, on which __traits(name,...) will fail; you need to take 1-element slices of it in order to preserve the additional information. This strange, inconsistent behaviour only barely begins to make sense once you understand how it's implemented in the compiler. It's the epitome of leaky abstraction. T -- Do not reason with the unreasonable; you lose by definition.
Re: Opinions: The Best and Worst of D (for a lecture/talk I intend to give)
On Monday, 7 July 2014 at 23:47:26 UTC, Aerolite wrote: Hey all, I've not posted here in a while, but I've been keeping up to speed with D's progress over the last couple of years and remain consistently impressed with the language. I'm part of a new computing society in the University of Newcastle, Australia, and am essentially known throughout our Computer Science department as 'the D guy'. At the insistence of my peers, I have decided to give an introductory lecture on the D Programming Language, in order to expose more students to the increasingly amazing aspects of D. I expect to cover the good, the bad, the awesome, and the ugly, in a complement-criticism-complement styled talk, and while I have my own opinions regarding each of these things, I'd like a broader view from the community regarding these aspects, so that I may provide as accurate and as useful information as possible. So, if you would be so kind, give me a bullet list of the aspects of D you believe to be good, awesome, bad, and/or ugly. If you have the time, some code examples wouldn't go amiss either! Try not to go in-depth to weird edge cases - remain general, yet informative. E.g. I consider D's string mixins to be in the 'awesome' category, but its reliance on the GC for large segments of the standard library to be in the 'ugly' category. Thanks so much for your time! opDispatch is a mostly untapped goldmine of potential. Just take a look at this thread, where an (almost, depends on the compiler) no-cost safe dereference wrapper was implemented using it: http://forum.dlang.org/post/mailman.2584.1403213951.2907.digitalmar...@puremagic.com opDisptach also allows for vector swizzling, which is really nice for any kind of vector work. One of the uglier things in D is also a long-standing problem with C and C++, in that comparison of signed and unsigned values is allowed.
How to interact with fortran code
I apologize many times for this question, may be this had already been answered somewhere, but considering today the last of my nerve is broken, I can not really find the soution. So I have a D code, which acts as a central manager of all my codes, reads user input, reads files, etc, and based on the file readouts, I would like to pass some variables from the D code to a fortran code, in binary format, perhaps, if such a thing exists, instead of encoding to text/ ASCII first. I would also like to read some (not all) variables back from the fortran code. The Fortran code resides in a subdirectory to the path/to/d/code How to do this? is there a preffered way / easier than system call way to interface D and Fortran code? This must be Fortan code - these are the standard atmospheric chemistry codes. I apologize again if the question is stupid, trust me, today all my nerves are broken.
Re: Small part of a program : d and c versions performances diff.
On Wednesday, 9 July 2014 at 14:30:41 UTC, John Colvin wrote: On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote: The rest of the code is numerical so it will not change by much the fact that d cannot get back the huge launching time. At the microsecond level(even nano) it counts because of electrical consumption, size of hardware, heat and so on. You say you are worried about microseconds and power consumption, but you are suggesting launching a new process - a lot of overhead - to do a small amount of numerical work. Surely no matter what programming language you use you would not want to work like this? @John : A new process ? Where ? Or maybe I got you wrong on this one John I am writing libraries and before going further I wondered if there were alternatives that I could have a grab on. The idea is to have an homogeneous software so we were ready to switch to d for the whole tasks/asset. No new process involved. I was seaking for maybe a python like programming language that offers c-like perfs, without so much writing as in c. Exit Cython. Debugging it is a real pain. And executable size is.. well.. I am becoming lazy and seek for the Holy Grail. Java not welcome. D seemed like a very good choice and maybe it is, or more certainly will.
Re: Small part of a program : d and c versions performances diff.
On Wednesday, 9 July 2014 at 15:09:09 UTC, Larry wrote: On Wednesday, 9 July 2014 at 14:30:41 UTC, John Colvin wrote: On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote: The rest of the code is numerical so it will not change by much the fact that d cannot get back the huge launching time. At the microsecond level(even nano) it counts because of electrical consumption, size of hardware, heat and so on. You say you are worried about microseconds and power consumption, but you are suggesting launching a new process - a lot of overhead - to do a small amount of numerical work. Surely no matter what programming language you use you would not want to work like this? @John : A new process ? Where ? Or maybe I got you wrong on this one John I am writing libraries and before going further I wondered if there were alternatives that I could have a grab on. The idea is to have an homogeneous software so we were ready to switch to d for the whole tasks/asset. No new process involved. I was seaking for maybe a python like programming language that offers c-like perfs, without so much writing as in c. Exit Cython. Debugging it is a real pain. And executable size is.. well.. I am becoming lazy and seek for the Holy Grail. Java not welcome. D seemed like a very good choice and maybe it is, or more certainly will. I wouldn't give up on D (as you've already signalled). It's getting better with each iteration. BTW, have you measured the power consumption yet? Does it make a big difference if you use D or C?
Re: Small part of a program : d and c versions performances diff.
On Wednesday, 9 July 2014 at 15:09:09 UTC, Larry wrote: On Wednesday, 9 July 2014 at 14:30:41 UTC, John Colvin wrote: On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote: The rest of the code is numerical so it will not change by much the fact that d cannot get back the huge launching time. At the microsecond level(even nano) it counts because of electrical consumption, size of hardware, heat and so on. You say you are worried about microseconds and power consumption, but you are suggesting launching a new process - a lot of overhead - to do a small amount of numerical work. Surely no matter what programming language you use you would not want to work like this? @John : A new process ? Where ? Or maybe I got you wrong on this one John process == program in this case. Launching a new process == running the program The startup cost of the D runtime is only paid when you start the program. If the amount of work done per execution of the program is more than a trivial amount then the startup cost will only be a small part of the total running time and power consumption etc. I am writing libraries and before going further I wondered if there were alternatives that I could have a grab on. The idea is to have an homogeneous software so we were ready to switch to d for the whole tasks/asset. No new process involved. I was seaking for maybe a python like programming language that offers c-like perfs, without so much writing as in c. Exit Cython. Debugging it is a real pain. And executable size is.. well.. I am becoming lazy and seek for the Holy Grail. Java not welcome. D seemed like a very good choice and maybe it is, or more certainly will. I think D could be a good choice for you.
Re: Small part of a program : d and c versions performances diff.
I may definitely help on the D project. I noticed that gdc doesn't have profile guided optimization too. So yeah, I cannot use D right now, I mean for this project. Ok, I will do my best to have some spare time on Dlang. Didn't really looked at the code already and I code for years in C, which is my first class coding language. Hope it will not be any kind of barrier (c++ is my.. third best coding buddy anyway (after python, excellent for managing systems)). Many thanks to all the community. I will stick with you and see what I can bring (or cannot). :) Bye
Re: Opinions: The Best and Worst of D (for a lecture/talk I intend to give)
On Wednesday, 9 July 2014 at 14:51:41 UTC, Meta wrote: One of the uglier things in D is also a long-standing problem with C and C++, in that comparison of signed and unsigned values is allowed. I would like that, if it would be implemented along this line: /// Returns -1 if a b, 0 if they are equal or 1 if a b. /// this will always yield a correct result, no matter which numeric types are compared. /// It uses one extra comparison operation if and only if /// one type is signed and the other unsigned but the signed value is = 0 /// (that is what you need to pay for stupid choice of type). int opCmp(T, U)(const(T) a, const(U) b) @primitive if(isNumeric!T isNumeric!U) { static if(Unqual!T == Unqual!U) { // use the standard D implementation } else static if(isFloatingPoint!T || isFloatingPoint!U) { alias CommonType!(T, U) C; return opCmp!(cast(C)a, cast(C)b); } else static if(isSigned!T isUnsigned!U) { alias CommonType!(Unsigned!T, U) C; return (a 0) ? -1 : opCmp!(cast(C)a, cast(C)b); } else static if(isUnsigned!T isSigned!U) { alias CommonType!(T, Unsigned!U) C; return (b 0) ? 1 : opCmp!(cast(C)a, cast(C)b); } else // both signed or both unsigned { alias CommonType!(T, U) C; return opCmp!(cast(C)a, cast(C)b); } }
Re: Opinions: The Best and Worst of D (for a lecture/talk I intend to give)
Of course without the ! after opCmp in the several cases.
Re: Small part of a program : d and c versions performances diff.
@Chris : Actually yes. If we consider the device to run 20h a day, by shaving a few microseconds there and there on billions of operations a day over a whole machine park, you can enable yourself to shut down some of them for maintenance more easily, or pause some of them letting their battery lasting a bit longer and economies have proven to be in the order of thousands $$ thanks to a redefined coding strategy. Not even mentionning hardware usage which is related to heat and savings you can pretend to have over a long run. By changing some hardware a few monthes after their theorical obsolescence, you can save a bit further. And the accountant is very happy because he can optimize the finance further (staggered repayment) It enabled us to hire more engineers/hardware. Of course, the saving is not only on this loop but on the whole chain. And it definitely adds up $$$. And there are a lot more things involved that benefit it (latency and so on). Yep. :)
Re: Opinions: The Best and Worst of D (for a lecture/talk I intend to give)
On Wed, Jul 09, 2014 at 04:24:38PM +, Dominikus Dittes Scherkl via Digitalmars-d-learn wrote: On Wednesday, 9 July 2014 at 14:51:41 UTC, Meta wrote: One of the uglier things in D is also a long-standing problem with C and C++, in that comparison of signed and unsigned values is allowed. I would like that, if it would be implemented along this line: /// Returns -1 if a b, 0 if they are equal or 1 if a b. /// this will always yield a correct result, no matter which numeric types are compared. /// It uses one extra comparison operation if and only if /// one type is signed and the other unsigned but the signed value is = 0 /// (that is what you need to pay for stupid choice of type). [...] Yeah, I don't see what's the problem with comparing signed and unsigned values, as long as the result is as expected. Currently, however, this code asserts, which is wrong: uint x = uint.max; int y = -1; assert(x y); { static if(Unqual!T == Unqual!U) Nitpick: should be: static if(is(Unqual!T == Unqual!U)) [...] else static if(isSigned!T isUnsigned!U) { alias CommonType!(Unsigned!T, U) C; return (a 0) ? -1 : opCmp!(cast(C)a, cast(C)b); } else static if(isUnsigned!T isSigned!U) { alias CommonType!(T, Unsigned!U) C; return (b 0) ? 1 : opCmp!(cast(C)a, cast(C)b); } [...] Hmm. I wonder if there's a more efficient way to do this. For the comparison s u, where s is a signed value and u is an unsigned value, whenever s is negative, the return value of opCmp must be negative. Assuming 2's-complement representation of integers, this means we simply copy the MSB of s (i.e., the sign bit) to the result. So we can implement s u as: enum signbitMask = 1u (s.sizeof*8 - 1); // this is a compile-time constant return (s - u) | (s signbitMask); // look ma, no branches! which would translate (roughly) to the assembly code: mov eax, [address of s] mov ebx, [address of u] mov ecx, eax; save the value of s for signbit extraction sub eax, ebx; s - u and ecx, #$1000 ; s signbitMask or eax, ecx; (s - u) | (s signbitMask) (ret; this is deleted if opCmp is inlined) which avoids a branch hazard in the CPU pipeline. Similarly, for the comparison u s, whenever s is negative, then opCmp must always be positive. So this means we copy over the *negation* of the sign bit of s to the result. So we get this for u s: enum signbitMask = 1u (s.sizeof*8 - 1); // as before return (u - s) ~(s signbitMask); // look ma, no branches! which translates roughly to: mov eax, [address of u] mov ebx, [address of s] sub eax, ebx; u - s and ebx, #$1000 ; s signbitMask not ebx ; ~(s signbitMask) and eax, ebx; (u - s) ~(s signbitMask) (ret; this is deleted if opCmp is inlined) Again, this avoid a branch hazard in the CPU pipeline. In both cases, the first 2 instructions are unnecessary if the values to be compared are already in CPU registers. The naïve implementation of opCmp is just a single sub instruction (this is why opCmp is defined the way it is, BTW), whereas the smart signed/unsigned comparison is 4 instructions long. The branched version would look something like this: mov eax, [address of u] mov ebx, [address of s] cmp ebx, $#0 jge label1 ; first branch mov eax, $# jmp label2 ; 2nd branch label1: sub eax, ebx label2: (ret) The 2nd branch can be replaced with ret if opCmp is not inlined, but requiring a function call to compare integers seems excessive, so let's assume it's inlined, in which case the 2nd branch is necessary. So as you can see, the branched version is 5 instructions long, and always causes a CPU pipeline hazard. So I submit that the unbranched version is better. ;-) (So much for premature optimization... now lemme go and actually benchmark this stuff and see how well it actually performs in practice. Often, such kinds of hacks often perform more poorly than expected due to unforeseen complications with today's complex CPU's. So for all I know, I could've just been spouting nonsense above. :P) T -- Debian GNU/Linux: Cray on your desktop.
Re: Opinions: The Best and Worst of D (for a lecture/talk I intend to give)
On Wednesday, 9 July 2014 at 17:13:21 UTC, H. S. Teoh via Digitalmars-d-learn wrote: On Wed, Jul 09, 2014 at 04:24:38PM +, Dominikus Dittes Scherkl via Digitalmars-d-learn wrote: /// Returns -1 if a b, 0 if they are equal or 1 if a b. /// this will always yield a correct result, no matter which numeric types are compared. /// It uses one extra comparison operation if and only if /// one type is signed and the other unsigned but the signed value is = 0 /// (that is what you need to pay for stupid choice of type). [...] Yeah, I don't see what's the problem with comparing signed and unsigned values, as long as the result is as expected. Currently, however, this code asserts, which is wrong: uint x = uint.max; int y = -1; assert(x y); Yes, this is really bad. But last time I got the response that this is so to be compatible with C. That is what I really thought was the reason why D throw away balast from C, to fix bugs. static if(Unqual!T == Unqual!U) Nitpick: should be: static if(is(Unqual!T == Unqual!U)) Of course. [...] else static if(isSigned!T isUnsigned!U) { alias CommonType!(Unsigned!T, U) C; return (a 0) ? -1 : opCmp!(cast(C)a, cast(C)b); } else static if(isUnsigned!T isSigned!U) { alias CommonType!(T, Unsigned!U) C; return (b 0) ? 1 : opCmp!(cast(C)a, cast(C)b); } [...] Hmm. I wonder if there's a more efficient way to do this. I'm sure. But I think it should be done at the compiler, not in a library. {...] opCmp is just a single sub instruction (this is why opCmp is defined the way it is, BTW), whereas the smart signed/unsigned comparison is 4 instructions long. [...] you can see, the branched version is 5 instructions long, and always causes a CPU pipeline hazard. So I submit that the unbranched version is better. ;-) I don't think so, because the branch will only be taken if the signed type is = 0 (in fact unsigned). So if the signed/unsigned comparison is by accident, you pay the extra runtime. But if it is intentional the signed value is likely to be negative, so you get a correct result with no extra cost. Even better for constants, where the compiler can not only evaluate expressions like (uint.max -1) correct, but it should optimize them completely away! (So much for premature optimization... now lemme go and actually benchmark this stuff and see how well it actually performs in practice. Yes, we should do this. Often, such kinds of hacks often perform more poorly than expected due to unforeseen complications with today's complex CPU's. So for all I know, I could've just been spouting nonsense above. :P) I don't see such a compiler change as a hack. It is a strong improvement IMHO.
Re: Opinions: The Best and Worst of D (for a lecture/talk I intend to give)
On Wednesday, 9 July 2014 at 17:13:21 UTC, H. S. Teoh via Digitalmars-d-learn wrote: For the comparison s u, where s is a signed value and u is an unsigned value, whenever s is negative, the return value of opCmp must be negative. Assuming 2's-complement representation of integers, this means we simply copy the MSB of s (i.e., the sign bit) to the result. So we can implement s u as: enum signbitMask = 1u (s.sizeof*8 - 1); // this is a compile-time constant return (s - u) | (s signbitMask); // look ma, no branches! This is a problem, isn't it: void main() { assert(cmp(0, uint.max) 0); /* fails */ } int cmp(int s, uint u) { enum signbitMask = 1u (s.sizeof*8 - 1); // this is a compile-time constant return (s - u) | (s signbitMask); // look ma, no branches! }
Re: Opinions: The Best and Worst of D (for a lecture/talk I intend to give)
On Wednesday, 9 July 2014 at 17:13:21 UTC, H. S. Teoh via Digitalmars-d-learn wrote: The branched version would look something like this: mov eax, [address of u] mov ebx, [address of s] cmp ebx, $#0 jge label1 ; first branch mov eax, $# jmp label2 ; 2nd branch label1: sub eax, ebx label2: (ret) Why? I would say: mov eax, [adress of s] ; mov directly compares to zero jl lable; less - jump to return sub eax, [adress of u] neg eax ; because we subtracted in the wrong order lable: ret
Re: Small part of a program : d and c versions performances diff.
On 07/09/2014 03:57 AM, Larry wrote: struct timeval s,e; [...] gettimeofday(e,NULL); printf(so ? %d %lu %d %d %d,g,e.tv_usec - s.tv_usec, arr[4],arr[9],pol); Changing the topic a little, the calculation above ignores the tv_sec members of s and e. Ali
Re: Opinions: The Best and Worst of D (for a lecture/talk I intend to give)
On Wed, Jul 09, 2014 at 05:43:15PM +, Dominikus Dittes Scherkl via Digitalmars-d-learn wrote: On Wednesday, 9 July 2014 at 17:13:21 UTC, H. S. Teoh via Digitalmars-d-learn wrote: [..] Yeah, I don't see what's the problem with comparing signed and unsigned values, as long as the result is as expected. Currently, however, this code asserts, which is wrong: uint x = uint.max; int y = -1; assert(x y); Yes, this is really bad. But last time I got the response that this is so to be compatible with C. That is what I really thought was the reason why D throw away balast from C, to fix bugs. I think the slogan was that if something in D looks like C, then it should either have C semantics or not compile. According to this logic, the only recourse here is to prohibit comparison of signed with unsigned, which I don't think is workable because there are many valid use cases for it (plus, it will break a ton of code and people will be very unhappy). I don't like the current behaviour, though. It just reeks of wrong in so many different ways. If you *really* want semantives like the above, you really should be casting y to unsigned so that it's clear what exactly you're trying to achieve. [...] Hmm. I wonder if there's a more efficient way to do this. I'm sure. But I think it should be done at the compiler, not in a library. Obviously, yes. But I wasn't thinking about implementing opCmp in the library -- that would be strange since ints, of all things, need to have native compiler support. I was thinking more about how the compiler would implement safe signed/unsigned comparisons. [...] So I submit that the unbranched version is better. ;-) I don't think so, because the branch will only be taken if the signed type is = 0 (in fact unsigned). So if the signed/unsigned comparison is by accident, you pay the extra runtime. But if it is intentional the signed value is likely to be negative, so you get a correct result with no extra cost. Good point. Moreover, I have discovered multiple bugs in my proposed implementation; the correct implementation should be as follows: int compare(int x, uint y) { enum signbitMask = 1u (int.sizeof*8 - 1); static assert(signbitMask == 0x8000); // The (x|y) signbitMask basically means that if either x is negative // or y int.max, then the result will always be negative (sign bit // set). return (cast(uint)x - y) | ((x | y) signbitMask); } unittest { // Basic cases assert(compare(5, 10u) 0); assert(compare(5, 5u) == 0); assert(compare(10, 5u) 0); // Large cases assert(compare(0, uint.max) 0); assert(compare(50, uint.max) 0); // Sign-dependent cases assert(compare(-1, 0u) 0); assert(compare(-1, 10u) 0); assert(compare(-1, uint.max) 0); } int compare(uint x, int y) { enum signbitMask = 1u (int.sizeof*8 - 1); static assert(signbitMask == 0x8000); return ((x - y) ~(x signbitMask)) | ((cast(uint)y signbitMask) 1); } unittest { // Basic cases assert(compare(0u, 10) 0); assert(compare(10u, 10) == 0); assert(compare(10u, 5) 0); // Large cases assert(compare(uint.max, 10) 0); assert(compare(uint.max, -10) 0); // Sign-dependent cases assert(compare(0u, -1) 0); assert(compare(10u, -1) 0); assert(compare(uint.max, -1) 0); } Using gdc -O3, I managed to get a very good result for compare(int,uint), only 5 instructions long. However, for compare(uint,int), there is the annoying special case of compare(uint.max, -1), which can only be fixed by the hack ... | ((y signbitMask) 1). Unfortunately, this makes it 11 instructions long, which is unacceptable. So it looks like a simple compare and branch would be far better in the compare(uint,int) case -- it's far more costly to avoid the branch than to live with it. Even better for constants, where the compiler can not only evaluate expressions like (uint.max -1) correct, but it should optimize them completely away! Actually, with gdc -O3, I found that the body of the above unittests got completely optimized away at compile-time, so that the unittest body is empty in the executable! So even with a library implementation the compiler was able to maximize performance. DMD left the assert calls in, but then it's not exactly known for generating optimal code anyway. (So much for premature optimization... now lemme
Re: Small part of a program : d and c versions performances diff.
On Wednesday, 9 July 2014 at 18:18:43 UTC, Ali Çehreli wrote: On 07/09/2014 03:57 AM, Larry wrote: struct timeval s,e; [...] gettimeofday(e,NULL); printf(so ? %d %lu %d %d %d,g,e.tv_usec - s.tv_usec, arr[4],arr[9],pol); Changing the topic a little, the calculation above ignores the tv_sec members of s and e. Ali Absolutely Ali because I know it is under the sec range. I made some test before submitting it :) But you are absolutely right Ali the mileage will vary in a completely different scenario.
Re: Opinions: The Best and Worst of D (for a lecture/talk I intend to give)
On Wed, Jul 09, 2014 at 11:29:06AM -0700, H. S. Teoh via Digitalmars-d-learn wrote: On Wed, Jul 09, 2014 at 05:43:15PM +, Dominikus Dittes Scherkl via Digitalmars-d-learn wrote: On Wednesday, 9 July 2014 at 17:13:21 UTC, H. S. Teoh via Digitalmars-d-learn wrote: [...] Often, such kinds of hacks often perform more poorly than expected due to unforeseen complications with today's complex CPU's. So for all I know, I could've just been spouting nonsense above. :P) I don't see such a compiler change as a hack. It is a strong improvement IMHO. I was talking about using | and to get rid of the branch in signed/unsigned comparison. As it turns out, the compare(uint,int) case seems far more costly than a simple compare-and-branch as you had it at the beginning. So at least that part of what I wrote is probably nonsense. :P But I can't say for sure until I actually run some benchmarks on it. [...] Hmph. I'm having trouble coming up with a fair benchmark, because I realized that D doesn't actually have a way of expressing opCmp for unsigned int's in a minimal way! The problem is that the function needs to return int, but given two uints, their difference may be greater than int.max, so simply subtracting them will not work. So the best I can come up with is: int compare2(int x, uint y) { return (x 0) ? -1 : (y int.max) ? -1 : (x - y); } which requires 2 comparisons. Similarly, for the uint-int case: int compare2(uint x, int y) { return (y 0) ? 1 : (x int.max) ? 1 : (x - y); } If you have a better implementation in mind, I'm all ears. In any case, I went ahead and benchmarked the above two functions along with my branchless implementations, and here are the results: (with dmd -O -unittest:) non-branching compare(signed,unsigned): 5513 msecs branching compare(signed,unsigned): 5442 msecs non-branching compare(unsigned,signed): 5441 msecs branching compare(unsigned,signed): 5744 msecs Optimizer-thwarting value: 0 (with gdc -O3 -funittest:) non-branching compare(signed,unsigned): 516 msecs branching compare(signed,unsigned): 1209 msecs non-branching compare(unsigned,signed): 453 msecs branching compare(unsigned,signed): 756 msecs Optimizer-thwarting value: 0 (Ignore the last lines of each output; that's just a way to prevent gdc -O3 from being over-eager and optimizing out the entire test so that everything returns 0 msecs.) Interestingly, with dmd, the branching compare for the signed-unsigned case is faster than my non-branching one, but the order is reversed for the unsigned-signed case. They're pretty close, though, and on some runs the order of the latter case is reversed. With gdc, however, it seem the non-branching versions are clearly better, even in the unsigned-signed case, which I thought would be inferior. So clearly, these results are very optimizer-dependent. Keep in mind, though, that this may not necessarily reflect actual performance when the compiler generates the equivalent code for the built-in integer comparison operators, because in codegen the compiler can take advantage of the CPU's carry and overflow bits, and can elide the actual return values of opCmp. This may skew the results enough to reverse the order of some of these cases. Anyway, here's the code (for independent verification): int compare(int x, uint y) { enum signbitMask = 1u (int.sizeof*8 - 1); static assert(signbitMask == 0x8000); // The (x|y) signbitMask basically means that if either x is negative // or y int.max, then the result will always be negative (sign bit // set). return (cast(uint)x - y) | ((x | y) signbitMask); } unittest { // Basic cases assert(compare(5, 10u) 0); assert(compare(5, 5u) == 0); assert(compare(10, 5u) 0); // Large cases assert(compare(0, uint.max) 0); assert(compare(50, uint.max) 0); // Sign-dependent cases assert(compare(-1, 0u) 0); assert(compare(-1, 10u) 0); assert(compare(-1, uint.max) 0); } int compare2(int x, uint y) { return (x 0) ? -1 : (y int.max) ? -1 : x - y; } unittest { // Basic cases assert(compare2(5, 10u) 0); assert(compare2(5, 5u) == 0); assert(compare2(10, 5u) 0); // Large cases
Introspecting a Module with Traits, allMembers
Hello, I'm looking to introspect a module, list all the members, iterate over them and filter them by kind inside of a static constructor. This is in the hope of shortening some hand-written code that is quite repetitive (adding many struct instances to an associative array in a static constructor). The code I'm trying to improve upon can be seen here: https://github.com/maximecb/Higgs/blob/master/source/ir/iir.d#L56 I've done some googling, and it seems I should be able to use the allMembers trait (http://wiki.dlang.org/Finding_all_Functions_in_a_Module), but unfortunately, the module name seems to be unrecognized, no matter which way I spell it: auto members = [__traits(allMembers, ir.ir)]; pragma(msg, members); Produces: ir/iir.d(85): Error: argument has no members Other people seem to have run into this problem. Am I doing it wrong or is this a bug in DMD?
Re: Introspecting a Module with Traits, allMembers
On Wed, 09 Jul 2014 20:07:56 +, NCrashed wrote: On Wednesday, 9 July 2014 at 20:04:47 UTC, Maxime Chevalier-Boisvert wrote: auto members = [__traits(allMembers, ir.ir)]; pragma(msg, members); Have you tried without quotes? pragma(msg, __traits(allMembers, ir.ir)); Also, looks like it should be ir.iir
Re: Introspecting a Module with Traits, allMembers
On Wednesday, 9 July 2014 at 20:07:57 UTC, NCrashed wrote: Produces: ir/iir.d(85): Error: argument has no members If module name is ir.iir: pragma(msg, __traits(allMembers, ir.iir));
Re: Introspecting a Module with Traits, allMembers
On Wednesday, 9 July 2014 at 20:07:57 UTC, NCrashed wrote: On Wednesday, 9 July 2014 at 20:04:47 UTC, Maxime Chevalier-Boisvert wrote: auto members = [__traits(allMembers, ir.ir)]; pragma(msg, members); Have you tried without quotes? pragma(msg, __traits(allMembers, ir.ir)); Did need to write it without the quotes, and to add enum to force compile-time evaluation. It's actually ir.ops that I wanted to list the members of. Got the following snippet to work: static this() { enum members = [__traits(allMembers, ir.ops)]; pragma(msg, members); } Prints: [object, ir, jit, OpArg, OpInfo, Opcode, GET_ARG, SET_STR, MAKE_VALUE, GET_WORD, GET_TYPE, IS_I32, ...]
Re: Introspecting a Module with Traits, allMembers
I got the following code to do what I want: static this() { void addOp(ref Opcode op) { assert ( op.mnem !in iir, duplicate op name ~ op.mnem ); iir[op.mnem] = op; } foreach (memberName; __traits(allMembers, ir.ops)) { static if (__traits(compiles, addOp(__traits(getMember, ir.ops, memberName { writeln(memberName); addOp(__traits(getMember, ir.ops, memberName)); } } } It's a bit of a hack, but it works. Is there any way to create some sort of alias for __traits(getMember, ir.ops, memberName) so that I don't have to write it out in full twice? Made some attempts but only got the compiler to complain.
Re: Introspecting a Module with Traits, allMembers
On Wednesday, 9 July 2014 at 20:52:29 UTC, Maxime Chevalier-Boisvert wrote: It's a bit of a hack, but it works. Is there any way to create some sort of alias for __traits(getMember, ir.ops, memberName) so that I don't have to write it out in full twice? Made some attempts but only got the compiler to complain. alias Alias(alias Sym) = Sym; alias member = Alias!(__traits(getMember, ir.ops, memberName); It does not work with normal alias because of grammar limitation afaik.
Re: Small part of a program : d and c versions performances diff.
On 07/09/2014 12:47 PM, Larry wrote: On Wednesday, 9 July 2014 at 18:18:43 UTC, Ali Çehreli wrote: On 07/09/2014 03:57 AM, Larry wrote: struct timeval s,e; [...] gettimeofday(e,NULL); printf(so ? %d %lu %d %d %d,g,e.tv_usec - s.tv_usec, arr[4],arr[9],pol); Changing the topic a little, the calculation above ignores the tv_sec members of s and e. Ali Absolutely Ali because I know it is under the sec range. I made some test before submitting it :) I know it did work and will work every time you test it. :) However, even if the difference is just one millisecond, if s and e happen to be on different sides of a second boundary, you will get a huge result. Ali
Re: Quicksort Variants
On Tuesday, 8 July 2014 at 20:50:01 UTC, Nordlöw wrote: I recall that Python's default sorting algorithm is related to this, right? https://en.wikipedia.org/wiki/Timsort
Re: Quicksort Variants
On Tuesday, 8 July 2014 at 20:50:01 UTC, Nordlöw wrote: Also related: http://forum.dlang.org/thread/eaxcfzlvsakeucwpx...@forum.dlang.org#post-mailman.2809.1355844427.5162.digitalmars-d:40puremagic.com
Re: Small part of a program : d and c versions performances diff.
Right
Re: Small part of a program : d and c versions performances diff.
Measure a larger number of loops. I understand you're concerned about microseconds, but your benchmark shows nothing because your timer is simply not accurate enough for this. The benchmark that bearophile showed where C took ~2 nanoseconds vs the ~7000 D took heavily implies to me that the C implementation is simply being optimized out and nothing is actually running. All inputs are known at compile-time, the output is known at compile-time, the compiler is perfectly free to simply remove all your code and replace it with the result. I'm somewhat surprised that the D version doesn't do this actually, perhaps because of the dynamic memory allocation. I realize that you can't post your actual code, but this benchmark honestly just has too many flaws to determine anything from. As for startup cost, D will indeed have a higher startup cost than C because of static constructors. Once it's running, it should be very close. If you're looking to start a process that will run for only a few milliseconds, you'd probably want to not use D (or avoid most static constructors, including those in the runtime / standard library).
Re: Small part of a program : d and c versions performances diff.
On Wednesday, 9 July 2014 at 13:18:00 UTC, Larry wrote: You are definitely right, I did mess up while translating ! I run the corrected codes (the ones I was meant to provide :S) and on a slow macbook I end up with : C : 2 D : 15994 Of course when run on very high end machines, this diff is almost non existent but we want to run on very low powered hardware. Ok, even with a longer code, there will always be a launch penalty for d. So I cannot use it for very high performance loops. Shame for us.. :) Thanks and bye This to me pretty much confirms that almost the entirety of your C code is being optimized out and thus not actually executing.
Re: Small part of a program : d and c versions performances diff.
The actual code is not that much slower according to the numerous other operations we do. And certainly faster than D version doing almost nothing. Well it is about massive bitshifts and array accesses and calculations. With all the optimizations we are on par with fortran numerical code (thanks -std=c11). There may be an optimization hidden somewhere or just gdc having to mature. Dunno. But don't get me wrong, D is a fantastic language.