Re: math.log() benchmark of first 1 billion int using std.parallelism
On Tuesday, 23 December 2014 at 07:26:27 UTC, Daniel Kozak wrote: That's very different to my results. I see no important difference between ldc and dmd when using std.math, but when using core.stdc.math ldc halves its time where dmd only manages to get to ~80% What CPU do you have? On my Intel Core i3 I have similar experience as Iov Gherman, but on my Amd FX4200 I have same results as you. Seems std.math.log is not good for my AMD CPU :) Intel Core i5-4278U
Re: std.file.readText() extra Line Feed character
On 12/19/2014 02:22 AM, Colin wrote: On Thursday, 18 December 2014 at 22:29:30 UTC, Ali Çehreli wrote: happy with Emacs :p Does emacs do this aswell? :) Emacs can and does do everything: :) http://www.gnu.org/software/emacs/manual/html_node/emacs/Customize-Save.html Ali
Re: math.log() benchmark of first 1 billion int using std.parallelism
That's very different to my results. I see no important difference between ldc and dmd when using std.math, but when using core.stdc.math ldc halves its time where dmd only manages to get to ~80% I checked again today and the results are interesting, on my pc I don't see any difference between std.math and core.stdc.math with ldc. Here are the results with all compilers. - with std.math: dmd: 4 secs, 878 ms ldc: 5 secs, 650 ms gdc: 9 secs, 161 ms - with core.stdc.math: dmd: 5 secs, 991 ms ldc: 5 secs, 572 ms gdc: 7 secs, 957 ms
Re: math.log() benchmark of first 1 billion int using std.parallelism
On Tuesday, 23 December 2014 at 10:20:04 UTC, Iov Gherman wrote: That's very different to my results. I see no important difference between ldc and dmd when using std.math, but when using core.stdc.math ldc halves its time where dmd only manages to get to ~80% I checked again today and the results are interesting, on my pc I don't see any difference between std.math and core.stdc.math with ldc. Here are the results with all compilers. - with std.math: dmd: 4 secs, 878 ms ldc: 5 secs, 650 ms gdc: 9 secs, 161 ms - with core.stdc.math: dmd: 5 secs, 991 ms ldc: 5 secs, 572 ms gdc: 7 secs, 957 ms These multi-threaded benchmarks can be very sensitive to their environment, you should try running it with nice -20 and do multiple passes to get a vague idea of the variability in the result. Also, it's important to minimise the number of other running processes.
Re: math.log() benchmark of first 1 billion int using std.parallelism
These multi-threaded benchmarks can be very sensitive to their environment, you should try running it with nice -20 and do multiple passes to get a vague idea of the variability in the result. Also, it's important to minimise the number of other running processes. I did not use the nice parameter but I always ran them multiple times and choose the average time. My system has very few running processes, minimalist ArchLinux with Xfce4 so I don't think the running processes are affecting in any way my tests.
Re: math.log() benchmark of first 1 billion int using std.parallelism
On Tuesday, 23 December 2014 at 10:39:13 UTC, Iov Gherman wrote: These multi-threaded benchmarks can be very sensitive to their environment, you should try running it with nice -20 and do multiple passes to get a vague idea of the variability in the result. Also, it's important to minimise the number of other running processes. I did not use the nice parameter but I always ran them multiple times and choose the average time. My system has very few running processes, minimalist ArchLinux with Xfce4 so I don't think the running processes are affecting in any way my tests. And what about single threaded version? Btw. One reason why DMD is faster is because it use fyl2x X87 instruction here is version for others compilers: import std.math, std.stdio, std.datetime; enum SIZE = 100_000_000; version(GNU) { real mylog(double x) pure nothrow { real result; double y = LN2; asm { fldl %2\n fldl %1\n fyl2x : =t (result) : m (x), m (y); } return result; } } else { real mylog(double x) pure nothrow { return yl2x(x, LN2); } } void main() { auto t1 = Clock.currTime(); auto logs = new double[SIZE]; foreach (i; 0 .. SIZE) { logs[i] = mylog(i + 1.0); } auto t2 = Clock.currTime(); writeln(time: , (t2 - t1)); } But it is faster only on all Intel CPU, but on one of my AMD it is slower than core.stdc.log
Re: ini library in OSX
as you properly know, ini files don't support sections arrays. If you know all items at compile time, you could create structs for all of them, but that is properly not what you're looking for.
Re: math.log() benchmark of first 1 billion int using std.parallelism
On Tuesday, 23 December 2014 at 10:20:04 UTC, Iov Gherman wrote: That's very different to my results. I see no important difference between ldc and dmd when using std.math, but when using core.stdc.math ldc halves its time where dmd only manages to get to ~80% I checked again today and the results are interesting, on my pc I don't see any difference between std.math and core.stdc.math with ldc. Here are the results with all compilers. - with std.math: dmd: 4 secs, 878 ms ldc: 5 secs, 650 ms gdc: 9 secs, 161 ms - with core.stdc.math: dmd: 5 secs, 991 ms ldc: 5 secs, 572 ms gdc: 7 secs, 957 ms Btw. I just noticed small issue with D vs. java, you start messure in D before allocation, but in case of Java after allocation
Re: math.log() benchmark of first 1 billion int using std.parallelism
On Monday, 22 December 2014 at 17:16:49 UTC, Iov Gherman wrote: On Monday, 22 December 2014 at 17:16:05 UTC, bachmeier wrote: On Monday, 22 December 2014 at 17:05:19 UTC, Iov Gherman wrote: Hi Guys, First of all, thank you all for responding so quick, it is so nice to see D having such an active community. As I said in my first post, I used no other parameters to dmd when compiling because I don't know too much about dmd compilation flags. I can't wait to try the flags Daniel suggested with dmd (-O -release -inline -noboundscheck) and the other two compilers (ldc2 and gdc). Thank you guys for your suggestions. Meanwhile, I created a git repository on github and I put there all my code. If you find any errors please let me know. Because I am keeping the results in a big array the programs take approximately 8Gb of RAM. If you don't have enough RAM feel free to decrease the size of the array. For java code you will also need to change 'compile-run.bsh' and use the right memory parameters. Thank you all for helping, Iov Link to your repo? Sorry, forgot about it: https://github.com/ghermaniov/benchmarks For posix-style threads, a per-thread workload of 200 calls to log seems rather small. It would interesting to see a graph of execution-time as a function of workgroup-size. Traditionally one would use a workgroup size of (nElements / nCores) or similar, in order to get all the cores working but also minimise pressure on the scheduler, inter-thread communication and so on.
Re: math.log() benchmark of first 1 billion int using std.parallelism
And what about single threaded version? Just ran the single thread examples after I moved time start before array allocation, thanks for that, good catch. Still better results in Java: - java: 21 secs, 612 ms - with std.math: dmd: 23 secs, 994 ms ldc: 31 secs, 668 ms gdc: 52 secs, 576 ms - with core.stdc.math: dmd: 30 secs, 724 ms ldc: 30 secs, 988 ms gdc: time: 25 secs, 970 ms
Re: math.log() benchmark of first 1 billion int using std.parallelism
Btw. I just noticed small issue with D vs. java, you start messure in D before allocation, but in case of Java after allocation Here is the java result for parallel processing after moving the start time as the first line in main. Still best result: 4 secs, 50 ms average
Re: math.log() benchmark of first 1 billion int using std.parallelism
Forgot to mention that I pushed my changes to github.
Re: math.log() benchmark of first 1 billion int using std.parallelism
On Tuesday, 23 December 2014 at 12:26:28 UTC, Iov Gherman wrote: And what about single threaded version? Just ran the single thread examples after I moved time start before array allocation, thanks for that, good catch. Still better results in Java: - java: 21 secs, 612 ms - with std.math: dmd: 23 secs, 994 ms ldc: 31 secs, 668 ms gdc: 52 secs, 576 ms - with core.stdc.math: dmd: 30 secs, 724 ms ldc: 30 secs, 988 ms gdc: time: 25 secs, 970 ms Note that log is done in software on x86 with different levels of precision and with different ability to handle corner cases. It is therefore a very bad benchmark tool.
Re: math.log() benchmark of first 1 billion int using std.parallelism
On Tuesday, 23 December 2014 at 12:31:47 UTC, Iov Gherman wrote: Btw. I just noticed small issue with D vs. java, you start messure in D before allocation, but in case of Java after allocation Here is the java result for parallel processing after moving the start time as the first line in main. Still best result: 4 secs, 50 ms average Java: Exec time: 6 secs, 421 ms LDC (-O3 -release -mcpu=native -singleobj -inline -boundscheck=off) time: 5 secs, 321 ms, 877 μs, and 2 hnsecs GDC(-O3 -frelease -march=native -finline -fno-bounds-check) time: 5 secs, 237 ms, 453 μs, and 7 hnsecs DMD(-O -release -inline -noboundscheck) time: 5 secs, 107 ms, 931 μs, and 3 hnsecs So all d compilers beat Java in my case: but I have made some change in D version: import std.parallelism, std.math, std.stdio, std.datetime; import core.memory; enum XMS = 3*1024*1024*1024; //3GB version(GNU) { real mylog(double x) pure nothrow { double result; double y = LN2; asm { fldl %2\n fldl %1\n fyl2x\n : =t (result) : m (x), m (y); } return result; } } else { real mylog(double x) pure nothrow { return yl2x(x, LN2); } } void main() { GC.reserve(XMS); auto t1 = Clock.currTime(); auto logs = new double[1_000_000_000]; foreach(i, ref elem; taskPool.parallel(logs, 200)) { elem = mylog(i + 1.0); } auto t2 = Clock.currTime(); writeln(time: , (t2 - t1)); }
Is D's GC.calloc and C's memset played the same role?
Today,I meet a question:get all processes names. --C++ CODE- #include stdafx.h #include windows.h #include stdio.h//C standard I/O #include tlhelp32.h int _tmain(int argc, _TCHAR* argv[]) { HANDLE hProcessSnap=CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS,0); if(hProcessSnap==INVALID_HANDLE_VALUE) { _tprintf(_T(CreateToolhelp32Snapshot error!\n)); return -1; } PROCESSENTRY32 pe32; pe32.dwSize = sizeof(PROCESSENTRY32); BOOL bMore=Process32First(hProcessSnap,pe32); int i=0; _tprintf(_T(PID\t thread nums \t name \n)); while(bMore) { bMore=Process32Next(hProcessSnap,pe32); _tprintf(_T(%u\t),pe32.th32ProcessID); _tprintf(_T(%u\t),pe32.cntThreads); _tprintf(_T(%s\n),pe32.szExeFile); i++; } CloseHandle(hProcessSnap); _tprintf(_T(Count:%d\n),i); return 0; } D code-- import std.stdio; import std.string; import core.sys.windows.windows; import core.memory; import win32.tlhelp32; void main() { HANDLE hProcessSnap=CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS,0); if(hProcessSnap is null) { writeln(CreateToolhelp32Snapshot error!\n); return ; } PROCESSENTRY32* pe32 = cast(PROCESSENTRY32*)GC.calloc(PROCESSENTRY32.sizeof); pe32.dwSize = PROCESSENTRY32.sizeof; bool bMore=cast(bool)Process32First(hProcessSnap,pe32); int i=0; writeln(PID\t thread nums\t name \n); while(bMore) { bMore=cast(bool)Process32Next(hProcessSnap,pe32); string s = cast(string)pe32.szExeFile; auto a = s.indexOf('\0'); if(a =0) writeln(\t,pe32.th32ProcessID,\t,pe32.cntThreads,\t,s[0..a]); i++; } CloseHandle(hProcessSnap); writeln(format(count:%d,i)); return ; } ---end-- you will find the different: D: PROCESSENTRY32* pe32 = cast(PROCESSENTRY32*)GC.calloc(PROCESSENTRY32.sizeof); C++:PROCESSENTRY32 pe32; GC.calloc means: memset ?!
Re: Is D's GC.calloc and C's memset played the same role?
FrankLike via Digitalmars-d-learn píše v Út 23. 12. 2014 v 15:37 +: Today,I meet a question:get all processes names. --C++ CODE- #include stdafx.h #include windows.h #include stdio.h//C standard I/O #include tlhelp32.h int _tmain(int argc, _TCHAR* argv[]) { HANDLE hProcessSnap=CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS,0); if(hProcessSnap==INVALID_HANDLE_VALUE) { _tprintf(_T(CreateToolhelp32Snapshot error!\n)); return -1; } PROCESSENTRY32 pe32; pe32.dwSize = sizeof(PROCESSENTRY32); BOOL bMore=Process32First(hProcessSnap,pe32); int i=0; _tprintf(_T(PID\t thread nums \t name \n)); while(bMore) { bMore=Process32Next(hProcessSnap,pe32); _tprintf(_T(%u\t),pe32.th32ProcessID); _tprintf(_T(%u\t),pe32.cntThreads); _tprintf(_T(%s\n),pe32.szExeFile); i++; } CloseHandle(hProcessSnap); _tprintf(_T(Count:%d\n),i); return 0; } D code-- import std.stdio; import std.string; import core.sys.windows.windows; import core.memory; import win32.tlhelp32; void main() { HANDLE hProcessSnap=CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS,0); if(hProcessSnap is null) { writeln(CreateToolhelp32Snapshot error!\n); return ; } PROCESSENTRY32* pe32 = cast(PROCESSENTRY32*)GC.calloc(PROCESSENTRY32.sizeof); pe32.dwSize = PROCESSENTRY32.sizeof; bool bMore=cast(bool)Process32First(hProcessSnap,pe32); int i=0; writeln(PID\t thread nums\t name \n); while(bMore) { bMore=cast(bool)Process32Next(hProcessSnap,pe32); string s = cast(string)pe32.szExeFile; auto a = s.indexOf('\0'); if(a =0) writeln(\t,pe32.th32ProcessID,\t,pe32.cntThreads,\t,s[0..a]); i++; } CloseHandle(hProcessSnap); writeln(format(count:%d,i)); return ; } ---end-- you will find the different: D: PROCESSENTRY32* pe32 = cast(PROCESSENTRY32*)GC.calloc(PROCESSENTRY32.sizeof); C++:PROCESSENTRY32 pe32; GC.calloc means: memset ?! calloc means alloc cleared memory same as malloc but clear all bits to zero
Re: Is D's GC.calloc and C's memset played the same role?
On Tue, 23 Dec 2014 15:37:12 + FrankLike via Digitalmars-d-learn digitalmars-d-learn@puremagic.com wrote: you will find the different: D: PROCESSENTRY32* pe32 = cast(PROCESSENTRY32*)GC.calloc(PROCESSENTRY32.sizeof); C++:PROCESSENTRY32 pe32; GC.calloc means: memset ?! do you see that shining star there? here it is, right in the end: `PROCESSENTRY32*`. and do you see that same star in C sample? jokes aside, it's dead simple: C code using stack-allocated struct (`PROCESSENTRY32` without an inderection) and D code using heap-allocated struct (`PROCESSENTRY32*` with indirection). hence C code using `memset()`, yet D code using `GC.calloc()`. signature.asc Description: PGP signature
Re: math.log() benchmark of first 1 billion int using std.parallelism
I'm getting faster execution on java thank dmd, gdc beats it though. ...although, what this topic really provides is a reason for me to get more RAM for my next laptop. How much do you people run with? I had to scale the java down to 300 million to avoid dying with 4G memory.
Storing arrays as Variant types.
I've run into a problem while trying to coerce array values from a variant; specifically, char[] a = aVariant.coerce!(char[]); // This works just fine. byte[] b = bVariant.coerce!(byte[]); // This causes a static assertion to fail. I'm not really sure why a byte[] would be an unsupported type, since memory-wise the reference should take up as much space as for the char[] (as I understand it). Perhaps I'm missing something, but I'm lost as to why this is the case. Thanks.
Re: Storing arrays as Variant types.
Minimal code for convenience to others: import std.variant; void main() { Variant aVariant; Variant bVariant; char[] a = aVariant.coerce!(char[]); byte[] b = bVariant.coerce!(byte[]); } On 12/23/2014 02:57 PM, Winter M. wrote: I've run into a problem while trying to coerce array values from a variant; specifically, char[] a = aVariant.coerce!(char[]); // This works just fine. byte[] b = bVariant.coerce!(byte[]); // This causes a static assertion to fail. I'm not really sure why a byte[] would be an unsupported type, since memory-wise the reference should take up as much space as for the char[] (as I understand it). Perhaps I'm missing something, but I'm lost as to why this is the case. The difference is that char[] passes the isSomeString test (as it is a string) but byte[] does not: https://github.com/D-Programming-Language/phobos/blob/master/std/variant.d#L877 Ali
Re: Storing arrays as Variant types.
On Tue, 23 Dec 2014 22:57:07 + Winter M. via Digitalmars-d-learn digitalmars-d-learn@puremagic.com wrote: I've run into a problem while trying to coerce array values from a variant; specifically, char[] a = aVariant.coerce!(char[]); // This works just fine. byte[] b = bVariant.coerce!(byte[]); // This causes a static assertion to fail. I'm not really sure why a byte[] would be an unsupported type, since memory-wise the reference should take up as much space as for the char[] (as I understand it). Perhaps I'm missing something, but I'm lost as to why this is the case. heh. this is due to how `.coerce!` written. it doesn't really checks for arrays, what it checks for is: 1. static if (isNumeric!T || isBoolean!T) 2. static if (is(T : Object)) 3. static if (isSomeString!(T)) see the gotcha? ;-) both types you requested are not numeric, not boolean and not objects. but `char[]` satisfies `isSomeString!`, and `byte[]` doesn't. i don't sure that coercing is designed to work this way, it seems that `isSomeString!` is just a hack for coercing to strings. i.e. with `char[]` variant tries to build some textual representation of it's value, and with `byte[]` variant simply don't know what to do. maybe we should allow coercing to `byte[]` and `ubyte[]` with the defined meaning: get raw binary representation of variant contents. signature.asc Description: PGP signature
Re: Is D's GC.calloc and C's memset played the same role?
On Tuesday, 23 December 2014 at 20:22:12 UTC, ketmar via Digitalmars-d-learn wrote: On Tue, 23 Dec 2014 15:37:12 + FrankLike via Digitalmars-d-learn digitalmars-d-learn@puremagic.com wrote: you will find the different: D: PROCESSENTRY32* pe32 = cast(PROCESSENTRY32*)GC.calloc(PROCESSENTRY32.sizeof); C++:PROCESSENTRY32 pe32; GC.calloc means: memset ?! do you see that shining star there? here it is, right in the end: `PROCESSENTRY32*`. and do you see that same star in C sample? Yes,if you not do like it,it will not work. jokes aside, it's dead simple: C code using stack-allocated Not joke.it works fine,you can run it. Not C,it's C++. struct (`PROCESSENTRY32` without an inderection) and D code using heap-allocated struct (`PROCESSENTRY32*` with indirection). hence C code using `memset()`, yet D code using `GC.calloc()`.
Re: Is D's GC.calloc and C's memset played the same role?
On Wed, 24 Dec 2014 00:24:44 + FrankLike via Digitalmars-d-learn digitalmars-d-learn@puremagic.com wrote: On Tuesday, 23 December 2014 at 20:22:12 UTC, ketmar via Digitalmars-d-learn wrote: On Tue, 23 Dec 2014 15:37:12 + FrankLike via Digitalmars-d-learn digitalmars-d-learn@puremagic.com wrote: you will find the different: D: PROCESSENTRY32* pe32 = cast(PROCESSENTRY32*)GC.calloc(PROCESSENTRY32.sizeof); C++:PROCESSENTRY32 pe32; GC.calloc means: memset ?! do you see that shining star there? here it is, right in the end: `PROCESSENTRY32*`. and do you see that same star in C sample? Yes,if you not do like it,it will not work. jokes aside, it's dead simple: C code using stack-allocated Not joke.it works fine,you can run it. Not C,it's C++. struct (`PROCESSENTRY32` without an inderection) and D code using heap-allocated struct (`PROCESSENTRY32*` with indirection). hence C code using `memset()`, yet D code using `GC.calloc()`. you did quoted the relevant part. let me repeat it: C code using stack-allocated struct (`PROCESSENTRY32` without an inderection) and D code using heap-allocated struct (`PROCESSENTRY32*` with indirection). hence C code using `memset()`, yet D code using `GC.calloc()`. i.e. D code using *pointer* *to* *struct*, so you must allocate it manually. signature.asc Description: PGP signature
Getting DAllegro 5 to work in Windows
I can't get implib.exe (http://ftp.digitalmars.com/bup.zip) to produce .lib files from dlls (https://www.allegro.cc/files/). I think it works for other people. Thanks for any help.