[Issue 18723] std.exception.ErrnoException@std/stdio.d(1012): Enforcement failed (Bad file descriptor) when running the simplified benchmark
https://issues.dlang.org/show_bug.cgi?id=18723 Iain Buclaw changed: What|Removed |Added Priority|P1 |P2 --
Re: how to benchmark pure functions?
On Thursday, 27 October 2022 at 18:41:36 UTC, Dennis wrote: On Thursday, 27 October 2022 at 17:17:01 UTC, ab wrote: How can I prevent the compiler from removing the code I want to measure? With many C compilers, you can use volatile assembly blocks for that. With LDC -O3, a regular assembly block also does the trick currently: ```D void main() { import std.datetime.stopwatch; import std.stdio: write, writeln, writef, writefln; import std.conv : to; void f0() {} void f1() { foreach(i; 0..4_000_000) { // nothing, loop gets optimized out } } void f2() { foreach(i; 0..4_000_000) { // defeat optimizations asm @safe pure nothrow @nogc {} } } auto r = benchmark!(f0, f1, f2)(1); writeln(r[0]); // 4 μs writeln(r[1]); // 4 μs writeln(r[2]); // 1 ms } ``` I recommend a volatile data dependency rather than injecting volatile ASM into code FYI i.e. don't modify the pure function but rather make sure the result is actually used in the eyes of the compiler.
Re: how to benchmark pure functions?
On Friday, 28 October 2022 at 09:48:14 UTC, ab wrote: Thanks to H.S. Teoh and Dennis for the suggestions, they both work. I like the empty asm block a bit more because it is less invasive, but it only works with ldc. I used the volatileLoad/volatileStore functions to ensure that the compiler doesn't find a way to optimize out the code (for example, move repetitive calculations out of the loop or even do them at compile time) and the RDTSC/RDTSCP instruction via inline assembly for measurements: https://gist.github.com/ssvb/5c926ed9bc755900fdaac3b71a0f7cfd The goal was to have a very fast way to check (with no measurable overhead) whether reasonable optimization options had been supplied to the compiler.
Re: how to benchmark pure functions?
On Friday, 28 October 2022 at 09:48:14 UTC, ab wrote: On Thursday, 27 October 2022 at 17:17:01 UTC, ab wrote: [...] Thanks to H.S. Teoh and Dennis for the suggestions, they both work. I like the empty asm block a bit more because it is less invasive, but it only works with ldc. @Imperatorn see Dennis code for an example. std.datetime.benchmark works, but at high optimization level (-O2, -O3) the loop can be removed and the time brought down to 0hnsec. E.g. try "ldc2 -O3 -run dennis.d". AB Yeah I didn't read carefully enough sorry
Re: how to benchmark pure functions?
On Thursday, 27 October 2022 at 17:17:01 UTC, ab wrote: Hi, when trying to compare different implementations of the optimized builds of a pure function using benchmark from std.datetime.stopwatch, I get times equal to zero, I suppose because the functions are not executed as they do not have side effects. The same happens with the example from the documentation: https://dlang.org/library/std/datetime/stopwatch/benchmark.html How can I prevent the compiler from removing the code I want to measure? Is there some utility in the standard library or pragma that I should use? Thanks AB Thanks to H.S. Teoh and Dennis for the suggestions, they both work. I like the empty asm block a bit more because it is less invasive, but it only works with ldc. @Imperatorn see Dennis code for an example. std.datetime.benchmark works, but at high optimization level (-O2, -O3) the loop can be removed and the time brought down to 0hnsec. E.g. try "ldc2 -O3 -run dennis.d". AB
Re: how to benchmark pure functions?
On Thursday, 27 October 2022 at 17:17:01 UTC, ab wrote: How can I prevent the compiler from removing the code I want to measure? With many C compilers, you can use volatile assembly blocks for that. With LDC -O3, a regular assembly block also does the trick currently: ```D void main() { import std.datetime.stopwatch; import std.stdio: write, writeln, writef, writefln; import std.conv : to; void f0() {} void f1() { foreach(i; 0..4_000_000) { // nothing, loop gets optimized out } } void f2() { foreach(i; 0..4_000_000) { // defeat optimizations asm @safe pure nothrow @nogc {} } } auto r = benchmark!(f0, f1, f2)(1); writeln(r[0]); // 4 μs writeln(r[1]); // 4 μs writeln(r[2]); // 1 ms } ```
Re: how to benchmark pure functions?
On Thu, Oct 27, 2022 at 06:20:10PM +, Imperatorn via Digitalmars-d-learn wrote: > On Thursday, 27 October 2022 at 17:17:01 UTC, ab wrote: > > Hi, > > > > when trying to compare different implementations of the optimized > > builds of a pure function using benchmark from > > std.datetime.stopwatch, I get times equal to zero, I suppose because > > the functions are not executed as they do not have side effects. > > > > The same happens with the example from the documentation: > > https://dlang.org/library/std/datetime/stopwatch/benchmark.html > > > > How can I prevent the compiler from removing the code I want to > > measure? Is there some utility in the standard library or pragma > > that I should use? [...] To prevent the optimizer from eliding the function completely, you need to do something with the return value. Usually, this means you combine the return value into some accumulating variable, e.g., if it's an int function, have a running int accumulator that you add to: int funcToBeMeasured(...) pure { ... } int accum; auto results = benchmark!({ // Don't just call funcToBeMeasured and ignore the value // here, otherwise the optimizer may delete the call // completely. accum += funcToBeMeasured(...); }); Then at the end of the benchmark, do something with the accumulated value, like print out its value to stdout, so that the optimizer doesn't notice that the value is unused, and decide to kill all previous assignments to it. Something like `writeln(accum);` at the end should do the trick. T -- Indifference will certainly be the downfall of mankind, but who cares? -- Miquel van Smoorenburg
Re: how to benchmark pure functions?
On Thursday, 27 October 2022 at 17:17:01 UTC, ab wrote: Hi, when trying to compare different implementations of the optimized builds of a pure function using benchmark from std.datetime.stopwatch, I get times equal to zero, I suppose because the functions are not executed as they do not have side effects. The same happens with the example from the documentation: https://dlang.org/library/std/datetime/stopwatch/benchmark.html How can I prevent the compiler from removing the code I want to measure? Is there some utility in the standard library or pragma that I should use? Thanks AB Sorry, I don't understand what you're saying. The examples work for me. Can you provide an exact code example which does not work as expected for you?
how to benchmark pure functions?
Hi, when trying to compare different implementations of the optimized builds of a pure function using benchmark from std.datetime.stopwatch, I get times equal to zero, I suppose because the functions are not executed as they do not have side effects. The same happens with the example from the documentation: https://dlang.org/library/std/datetime/stopwatch/benchmark.html How can I prevent the compiler from removing the code I want to measure? Is there some utility in the standard library or pragma that I should use? Thanks AB
Re: HTTP frameworks benchmark focused on D libraries
On Monday, 30 May 2022 at 20:57:02 UTC, tchaloupka wrote: On Sunday, 29 May 2022 at 06:22:43 UTC, Andrea Fontana wrote: On Thursday, 26 May 2022 at 07:49:23 UTC, tchaloupka wrote: I see there is a test where numbers are identical to arsd ones, is it a typo or a coincidence? Andrea Hi Andrea, it was just a coincidence, straight out copy of the tool results. But as I've found some bugs calculating percentiles from `hey` properly, I've updated the results after the fix. I've also added results for `geario` (thanks #zoujiaqing). For `serverino`, I've added variant that uses 16 worker subprocesses in the pool, that should lead to less blocking and worse per request times in the test environment. Tom Thank's again! Benchmark are always welcome :)
Re: HTTP frameworks benchmark focused on D libraries
On Sunday, 29 May 2022 at 06:22:43 UTC, Andrea Fontana wrote: On Thursday, 26 May 2022 at 07:49:23 UTC, tchaloupka wrote: I see there is a test where numbers are identical to arsd ones, is it a typo or a coincidence? Andrea Hi Andrea, it was just a coincidence, straight out copy of the tool results. But as I've found some bugs calculating percentiles from `hey` properly, I've updated the results after the fix. I've also added results for `geario` (thanks #zoujiaqing). For `serverino`, I've added variant that uses 16 worker subprocesses in the pool, that should lead to less blocking and worse per request times in the test environment. Tom
Re: HTTP frameworks benchmark focused on D libraries
On Thursday, 26 May 2022 at 07:49:23 UTC, tchaloupka wrote: Hi, as there are two more HTTP server implementations: * [Serverino](https://forum.dlang.org/thread/bqsatbwjtoobpbzxd...@forum.dlang.org) Thank you! Since it's just a young library that results sounds promising. I'm just working on the next version, focusing on performance enhancement and windows support :) I see there is a test where numbers are identical to arsd ones, is it a typo or a coincidence? Andrea
Re: HTTP frameworks benchmark focused on D libraries
On Saturday, 28 May 2022 at 05:44:11 UTC, tchaloupka wrote: On Friday, 27 May 2022 at 20:51:14 UTC, zoujiaqing wrote: It means two separate syscalls for header and body. This alone have huge impact on the performance and if it can be avoided, it would be much better. sendv/writev also can help to save syscals when you have to send data from non-contiguous buffers.
Re: HTTP frameworks benchmark focused on D libraries
On Friday, 27 May 2022 at 20:51:14 UTC, zoujiaqing wrote: On Thursday, 26 May 2022 at 07:49:23 UTC, tchaloupka wrote: I fixed the performance bug the first time. (The default HTTP 1.1 connection is keep-alive) Archttp version 1.0.2 has been released, and retesting has yielded significant performance improvements. -- zoujiaqing Hi, thanks for the PR. I've rerun the tests for `archttp` and it is indeed much better. Now on par with `vibe-d` Some more notes for a better performance (it's the same with `vibe-d` too). See what syscalls are called during the request processing: ``` [pid 1453] read(10, "GET / HTTP/1.1\r\nHost: 192.168.12"..., 1024) = 117 [pid 1453] write(10, "HTTP/1.1 200 OK\r\nDate: Sat, 28 M"..., 173) = 173 [pid 1453] write(10, "Hello, World!", 13) = 13 ``` It means two separate syscalls for header and body. This alone have huge impact on the performance and if it can be avoided, it would be much better. Also read/write while working with a socket too, are a bit slower than recv/send.
Re: HTTP frameworks benchmark focused on D libraries
On Thursday, 26 May 2022 at 07:49:23 UTC, tchaloupka wrote: Some notes: * `Archttp` have some problem with only 2 CPUs so it's included only in the first test (it was ok with 4 CPUs and was cca 2x faster than `hunt-web`) Hi tchaloupka: First Thank you for the benchmark project! I fixed the performance bug the first time. (The default HTTP 1.1 connection is keep-alive) Archttp version 1.0.2 has been released, and retesting has yielded significant performance improvements. -- zoujiaqing
Re: HTTP frameworks benchmark focused on D libraries
Hi, as there are two more HTTP server implementations: * [Serverino](https://forum.dlang.org/thread/bqsatbwjtoobpbzxd...@forum.dlang.org) * [Archttp](https://forum.dlang.org/thread/jckjrgnmgsulewnre...@forum.dlang.org) It was time to update some numbers! Last results can be seen [here](https://github.com/tchaloupka/httpbench#multi-core-results) - it's a lot of numbers.. Some notes: * I've updated all frameworks and compilers to latest versions * tests has been run on the same host but separated using VMs (for workload generator and servers) with pinned CPUs (so they don't interfere each other) * as I have "only" 16 available threads to be used and in 12 vs 4 CPUs scenario wrk saturated all 12 CPUs, I had to switch it to 14/2 CPUs to give wrk some space * virtio bridged network between VMs * `Archttp` have some problem with only 2 CPUs so it's included only in the first test (it was ok with 4 CPUs and was cca 2x faster than `hunt-web`) * `Serverino` is set to use same number of processes as are CPUs (leaving it to default was slower so I kept it set like that) One may notice some strange `adio-http` it the results. Well, it's a WIP framework (`adio` as an "async dlang I/O"), that is not public (yet). It has some design goals (due to it's targeted usage), that some can prefer and some won't like at all: * `betterC` - so no GC, no delegates, no classes (D ones), no exceptions, etc. * should be possible later to work with full D too, but it's easier to go from `betterC` to full D than other way around and is not in the focus now * linux as an only target atm. * `epoll` and `io_uring` async I/O api backends (can be extended with `IOCP` or `Kqueue`, but linux is main target now) * performance, simplicity, safety in this order (and yes with `betterC` there are many pointers, function callbacks, manual memory management, etc. - thanks for `asan` ldc team ;-)) * middleware support - one can setup router with ie request logger, gzip, auth middlewares easily (REST API middleware would be one of them) * can be used with just callbacks or combined with fibers (http server itself is fibers only as it would be a callback hell otherwise) * each async operation can be set with timeout to simplify usage It doesn't use any "hacks" in the benchmark. Just a real HTTP parser, simple path router, real headers writing, real `Date` header, etc. But has tuned parameters (no timeouts set - which others doesn't use too). It'll be released when API settles a bit and real usage with sockets, websockets, http clients, REST API, etc. would be possible.
Re: HTTP frameworks benchmark focused on D libraries
On Monday, 28 September 2020 at 09:44:14 UTC, Daniel Kozak wrote: I do not see TCP_NODELAY anywhere in your code for raw tests, so maybe you should try that I've added new results with these changes: * added NGINX test * edge and level triggered variants for epoll tests (level should be faster in this particular test) * new hybrid server variant of ARSD (thanks Adam) * added TCP_NODELAY to listen socket in some tests (client sockets should derive this) * make response sizes even for all tests * errors and response size columns are printed out only when there's some difference as they are pretty meaningless otherwise * switch to wrk load generator as a default (hey is still supported) - as it is less resource demanding and gives better control over client connections and it's worker threads Some tests insights: * arsd - I'm more inclined to switch it to multiCore category, at least the hybrid variant now (as it's not too fair against the others that should run just one worker with one thread event loop) - see https://github.com/tchaloupka/httpbench/pull/5 for discussion * ideal would be to add current variant to multiCore tests and limited variant to singleCore * photon - I've assumed that it's working in a single thread, but it doesn't seems to (see https://github.com/tchaloupka/httpbench/issues/7)
Re: HTTP frameworks benchmark focused on D libraries
On Sun, Sep 27, 2020 at 12:10 PM tchaloupka via Digitalmars-d-announce < digitalmars-d-announce@puremagic.com> wrote: > ... > Some results are a bit surprising, ie that even with 10 runs > there are tests that are faster than C/dlang raw tests - as they > should be at the top because they really don't do anything with > HTTP handling.. And eventcore/fibers to beat raw C epoll loop > with fibers overhead? It just seems odd.. > ... I do not see TCP_NODELAY anywhere in your code for raw tests, so maybe you should try that
Re: HTTP frameworks benchmark focused on D libraries
On 9/27/20 6:08 AM, tchaloupka wrote: Hi all, I've just pushed the updated results. Thanks for continuing to work on this! vibe-core performs quite well -- scaling up with additional workers from 8 through 256, whereas vibe-d platform tops out around ~35,000-45,000 RPS irrespective of simultaneous workers (plateauing between 8-64 workers). Given the outstanding performance of vibe-core it looks like there is room to continue to improve the vibe-d platform. Cheers again for your work.
Re: HTTP frameworks benchmark focused on D libraries
On Sunday, 27 September 2020 at 10:08:24 UTC, tchaloupka wrote: * new RAW tests in C to utilize epoll and io_uring (using liburing) event loops - so we have some ground base we can compare against I fixed some buffering issues in cgi.d and, if you have the right concurrency level that happens to align with the number of worker processes... I'm getting incredible results. 65k rps. It *might* just beat the raw there. The kernel does a really good job. Of course, it still will make other connections wait forever... but my new event loop in threads mode is now also giving me a pretty solid 26k rps on random concurrency levels with the buffering fix. I just need to finish testing this to get some confidence before I push live but here it is on a github branch if you're curious to look: https://github.com/adamdruppe/arsd/blob/cgi_preview/cgi.d Compile with `-version=embedded_httpd_threads -version=cgi_use_fiber` to opt into the new event loop. But the buffering improvements should register in all usage modes.
Re: HTTP frameworks benchmark focused on D libraries
I fixed my event loop last night so I'll prolly release that at some point after a lil more testing, it fixes my keep-alive numbers... but harms the others so I wanna see if I can maintain those too.
Re: HTTP frameworks benchmark focused on D libraries
On Sunday, 27 September 2020 at 10:08:24 UTC, tchaloupka wrote: Hi all, I've just pushed the updated results. * new RAW tests in C to utilize epoll and io_uring (using liburing) event loops - so we have some ground base we can I'll probably add wrk[3] load generator too to see a difference with a longer running tests. [1] https://github.com/tchaloupka/during [2] https://github.com/rakyll/hey [3] https://github.com/wg/wrk Thank for this job. It may be worth to add nginx as baseline for real C-based server. I'll add my framework as soon as it will be ready.
Re: HTTP frameworks benchmark focused on D libraries
Hi all, I've just pushed the updated results. Test suite modifications: * added runner command to list available tests * possibility to switch off keepalive connections - causes `hey` to make a new connection for each request * added parameter to run each test multiple times and choose the best result out of the runs Tests additions: * new RAW tests in C to utilize epoll and io_uring (using liburing) event loops - so we have some ground base we can compare against * same RAW tests but in Dlang too - both in betterC, epoll is basically the same, io_uring differs in that it uses my during[1] library - so we can see if there are some performance problems (as it should perform basically the same as C variant) Some insights: I've found the test results from hey[2] pretty inconsistent (run locally or over the network). That's the reason I've added the `bestof` switch to the runner. And the current test results are the best of 10 runs for each of them. Some results are a bit surprising, ie that even with 10 runs there are tests that are faster than C/dlang raw tests - as they should be at the top because they really don't do anything with HTTP handling.. And eventcore/fibers to beat raw C epoll loop with fibers overhead? It just seems odd.. I'll probably add wrk[3] load generator too to see a difference with a longer running tests. [1] https://github.com/tchaloupka/during [2] https://github.com/rakyll/hey [3] https://github.com/wg/wrk
Re: HTTP frameworks benchmark focused on D libraries
On 9/20/20 4:03 PM, tchaloupka wrote: Hi, as it pops up now and then (last one in https://forum.dlang.org/thread/qttjlgxjmrzzuflrj...@forum.dlang.org) I wanted to see the various D libraries performance against each other too and ended up with https://github.com/tchaloupka/httpbench It's just a simple plaintext response testing (nothing fancy as in Techempower) but this interests me the most as it gives the idea about the potential of the library. More details in the README. Hope it helps to test some ideas or improve the current solutions. Tom Thank you for doing this! One of the most fascinating things I think is how photon really shines when concurrency gets dialed up. With 8 workers, it performs about as well, but below, the rest of the micro, including below Rust and Go /platforms/. However, at 64 concurrent workers, photon rises to the top of the stack, performing about as well as eventcore and hunt. When going all the way up to 256, it was the only one that demonstrated **consistent performance** -- about the same as w/64, whereas ALL others dropped off, performing WORSE with 256 workers.
Re: HTTP frameworks benchmark focused on D libraries
On Sunday, 20 September 2020 at 20:03:27 UTC, tchaloupka wrote: Hi, as it pops up now and then (last one in https://forum.dlang.org/thread/qttjlgxjmrzzuflrj...@forum.dlang.org) I wanted to see the various D libraries performance against each other too and ended up with https://github.com/tchaloupka/httpbench It's just a simple plaintext response testing (nothing fancy as in Techempower) but this interests me the most as it gives the idea about the potential of the library. More details in the README. Hope it helps to test some ideas or improve the current solutions. Tom thanks! Very good news.
Re: HTTP frameworks benchmark focused on D libraries
On Monday, 21 September 2020 at 05:48:54 UTC, Imperatorn wrote: On Sunday, 20 September 2020 at 20:03:27 UTC, tchaloupka wrote: Hi, as it pops up now and then (last one in https://forum.dlang.org/thread/qttjlgxjmrzzuflrj...@forum.dlang.org) I wanted to see the various D libraries performance against each other too and ended up with https://github.com/tchaloupka/httpbench It's just a simple plaintext response testing (nothing fancy as in Techempower) but this interests me the most as it gives the idea about the potential of the library. More details in the README. Hope it helps to test some ideas or improve the current solutions. Tom Cool! Nice to see such good results for D. Did you try netcore 3.1 btw? 樂 There's really no reason for D to by any slower than others. It's just about the whole library package and how efficiently it's written. Eventcore is probably closest to the system and all above just adds more overhead. I've tried to run .Net core out of docker (I'm using podman actually) and it seems to be more performant than .Net Core 5. But it was out of the container so maybe it's just that. I've added switches to CLI to set some load generator parameters so we can test scaling easier. Thanks to Adam I've also pushed tests for arsd:cgi package. It's in it's own category as others are using async I/O loops. But everything has it's pros and cons.
Re: HTTP frameworks benchmark focused on D libraries
On Sunday, 20 September 2020 at 20:03:27 UTC, tchaloupka wrote: Hi, as it pops up now and then (last one in https://forum.dlang.org/thread/qttjlgxjmrzzuflrj...@forum.dlang.org) I wanted to see the various D libraries performance against each other too and ended up with https://github.com/tchaloupka/httpbench It's just a simple plaintext response testing (nothing fancy as in Techempower) but this interests me the most as it gives the idea about the potential of the library. More details in the README. Hope it helps to test some ideas or improve the current solutions. Tom Cool! Nice to see such good results for D. Did you try netcore 3.1 btw? 樂
Re: HTTP frameworks benchmark focused on D libraries
With my lib, the -version=embedded_httpd_threads build should give more consistent results in tests like this. The process pool it uses by default in a dub build is more crash resilient, but does have a habit of dropping excessive concurrent connections. This forces them to retry which slaughters benchmarks like this. It will have like 5 ms 99th percentile (2x faster than the same test with the threads version btw), but then that final 1% of responses can take several seconds complete (indeed with 256 concurrent on my box it takes a whopping 30 seconds!). Even with only like 40 concurrent, there's a final 1% spike there, but it is more like 10ms so it isn't so noticeable, but with hundreds it grows fast. That's probably what you're seeing here. The thread build accepts more smoothly and thus evens it out giving a nicer benchmark number... but it actually performs worse on average in real world deployments in my experience and is not as resilient to buggy code segfaulting (with processes, the individual handler respawns and resets that individual connection with no other requests affected. with threads, the whole server must respawn which also often slips by unnoticed but is more likely to disrupt unrelated users). There is a potential "fix" for the process handler to complete these benchmarks more smoothly too, but it comes at a cost: even in the long retry cases, at least the client has some feedback. It knows its connection is not accepted and can respond appropriately. At a minimum, they won't be shoveling data at you yet. The "fix" though breaks this - you accept ALL the connections, even if you are too busy to actually process them. This leads to more inbound data potentially worsening the existing congestion and leaving users more likely to just hang. At least the unaccepted connection is specified (by TCP) to retry later automatically, but if it is accepted, acknowledged, yet unprocessed, it is unclear what to do. Odds are the user will just be left hanging until the browser decides to timeout and display its error which can actually take longer than the TCP retry window. My threads version does it this way anyway though. So it'd probably look better on the benchmark. But BTW stuff like this is why I don't put too much stock in benchmarks. Even if you aren't "cheating" like checking length instead of path and other tricks like that (which btw I think are totally legitimate in some cases, I said recently I see it as a *strength* when you can do that), it still leaves some nuance on the ground. Is it crash resilient? Debuggable when it crashes? Is it compatible with third-party libraries or force you to choose from ones that share your particular event loop at risk of blocking the whole server when you disobey? Does it *actually* provide the scalability it claims to under real world conditions, or did it optimize to the controlled conditions of benchmarks at the expense of dynamic adaptation to reality? Harder to measure those.
HTTP frameworks benchmark focused on D libraries
Hi, as it pops up now and then (last one in https://forum.dlang.org/thread/qttjlgxjmrzzuflrj...@forum.dlang.org) I wanted to see the various D libraries performance against each other too and ended up with https://github.com/tchaloupka/httpbench It's just a simple plaintext response testing (nothing fancy as in Techempower) but this interests me the most as it gives the idea about the potential of the library. More details in the README. Hope it helps to test some ideas or improve the current solutions. Tom
Re: Updated compiler-benchmark
On Thursday, 16 July 2020 at 23:36:58 UTC, IGotD- wrote: On Thursday, 16 July 2020 at 15:56:45 UTC, Per Nordlöw wrote: D's compiler `dmd` is still far ahead of all its competition especially when it comes to default build (standard compilation) performance. I don't think this comparison is fair as dmd is far behind when it comes to code generation compared to the competitors. What should be included are benchmarks done with LDC as well. Since you already have the D code, adding LDC should be pretty easy. Both dmd and ldc have superior check stage (lexical, syntactic and semantic analysis) because of a language designed in conjunction with the needs and limitations of a compiler. One key-property of such a design is that the D language is design to be a so called, single-pass language. The compiler dmd is superior because of a super-fast but less qualitative code generation giving outstanding productivity incremental development. At that stage in the development cycle fast builds is much more important that optimized machine code. The machine code generated by dmd in this stage is sufficiently fast for the needs of the developer trying to be productive in this stage. That is by design, not by accident. I suggest you ask Walter Bright if you want more details around his design. The compiler ldc is about 10x slower than dmd for the debug stage because of the larger overhead of the LLVM-backend and is often preferred when doing release builds.
Re: Updated compiler-benchmark
On Thursday, 16 July 2020 at 15:56:45 UTC, Per Nordlöw wrote: D's compiler `dmd` is still far ahead of all its competition especially when it comes to default build (standard compilation) performance. I don't think this comparison is fair as dmd is far behind when it comes to code generation compared to the competitors. What should be included are benchmarks done with LDC as well. Since you already have the D code, adding LDC should be pretty easy.
Re: Updated compiler-benchmark
On Thursday, 16 July 2020 at 19:08:59 UTC, Per Nordlöw wrote: On Thursday, 16 July 2020 at 18:27:54 UTC, jmh530 wrote: How are the functions generated? I see something about function-depth, but it might be good to have an example in the readme. This is, of course, a very contrived benchmark but I had to pick something to get me started and I'll happily receive suggestions on how to improve the benchmarking-relevance of the generated code. Thanks!
Re: Updated compiler-benchmark
On Thursday, 16 July 2020 at 18:27:54 UTC, jmh530 wrote: How are the functions generated? I see something about function-depth, but it might be good to have an example in the readme. This is, of course, a very contrived benchmark but I had to pick something to get me started and I'll happily receive suggestions on how to improve the benchmarking-relevance of the generated code.
Re: Updated compiler-benchmark
On Thursday, 16 July 2020 at 18:27:54 UTC, jmh530 wrote: How are the functions generated? I see something about function-depth, but it might be good to have an example in the readme. Added here https://github.com/nordlow/compiler-benchmark#sample-generated-code
Re: Updated compiler-benchmark
On Thursday, 16 July 2020 at 18:27:54 UTC, jmh530 wrote: How are the functions generated? I see something about function-depth, but it might be good to have an example in the readme. Ah, I'll add that. Thanks
Re: Updated compiler-benchmark
On Thursday, 16 July 2020 at 15:56:45 UTC, Per Nordlöw wrote: I've updated https://github.com/nordlow/compiler-benchmark with - source variants with templated function variants for languages having generics - stdout-printing in Markdown (used in README.md) - benchmarks for the languages Zig and V [snip] How are the functions generated? I see something about function-depth, but it might be good to have an example in the readme.
Updated compiler-benchmark
I've updated https://github.com/nordlow/compiler-benchmark with - source variants with templated function variants for languages having generics - stdout-printing in Markdown (used in README.md) - benchmarks for the languages Zig and V ## Conclusions (from sample run shown below) D's compiler `dmd` is still far ahead of all its competition especially when it comes to default build (standard compilation) performance. The performance of both GCC and Clang gets significanly worse with each new release (currently 8, 9, 10 in the table below). The generic C++ and D versions compiles about 1.5 to 2 times slower whereas the generic Rust version interestingly is processed 2-3 times faster than the non-generic version.
Re: Updated compiler-benchmark
On Thursday, 16 July 2020 at 15:56:45 UTC, Per Nordlöw wrote: The generic C++ and D versions compiles about 1.5 to 2 times slower With DMD, that is.
Re: Is there a way to benchmark/profile portably?
On Thursday, 7 May 2020 at 10:51:27 UTC, Simen Kjærås wrote: If I understand correctly, you want to measure how many cycles pass, rather than clock time? Something like that. Well, I would also like to eliminate differences based on different memory caches between machines. In addition, if the profiler could eliminate the randomness of the benchamrks that results from memory aligment, context switches and dunno, that would obviously be a big plus for automatic testing. If so, it seems perf can do that: https://perf.wiki.kernel.org/index.php/Tutorial Sounds worth a look. Thanks!
Re: Is there a way to benchmark/profile portably?
On Thursday, 7 May 2020 at 11:06:17 UTC, Dennis wrote: You can make a reference program that you use to get a measure for how fast the computer is that you run the benchmark on. Then you can use that to scale your actual benchmark results. When testing regressions there's a fairly obvious choice for this reference program: the old version. You can compare those results with the new version and report the relative difference. I have to keep that in mind, but I'd prefer something that does not require keeping old sources working, or floating their binaries around.
Re: Is there a way to benchmark/profile portably?
On Thursday, 7 May 2020 at 10:21:07 UTC, Dukc wrote: Is there some way to measure the performance of a function so that the results will be same in different computers (all x86, but otherwise different processors)? I'm thinking of making a test suite that could find performance regressions automatically. You can make a reference program that you use to get a measure for how fast the computer is that you run the benchmark on. Then you can use that to scale your actual benchmark results. When testing regressions there's a fairly obvious choice for this reference program: the old version. You can compare those results with the new version and report the relative difference.
Re: Is there a way to benchmark/profile portably?
On Thursday, 7 May 2020 at 10:21:07 UTC, Dukc wrote: Is there some way to measure the performance of a function so that the results will be same in different computers (all x86, but otherwise different processors)? I'm thinking of making a test suite that could find performance regressions automatically. I figured out Bochs[1] could be used for that, but it seems an overkill for me to install a whole virtual operating system just to benchmark functions. Does anybody know more lightweight way? [1] http://bochs.sourceforge.net/ If I understand correctly, you want to measure how many cycles pass, rather than clock time? If so, it seems perf can do that: https://perf.wiki.kernel.org/index.php/Tutorial -- Simen
Is there a way to benchmark/profile portably?
Is there some way to measure the performance of a function so that the results will be same in different computers (all x86, but otherwise different processors)? I'm thinking of making a test suite that could find performance regressions automatically. I figured out Bochs[1] could be used for that, but it seems an overkill for me to install a whole virtual operating system just to benchmark functions. Does anybody know more lightweight way? [1] http://bochs.sourceforge.net/
Re: Beginner's Comparison Benchmark
On Tuesday, 5 May 2020 at 20:07:54 UTC, RegeleIONESCU wrote: [...] Python should be ruled out, this is not its war :) I have done benchmarks against NumPy if you are interested: https://github.com/tastyminerals/mir_benchmarks
Re: Beginner's Comparison Benchmark
On Wed, May 06, 2020 at 09:59:48AM +, welkam via Digitalmars-d-learn wrote: > On Tuesday, 5 May 2020 at 20:29:13 UTC, Steven Schveighoffer wrote: > > the optimizer recognizes what you are doing and changes your code > > to: > > > > writeln(1_000_000_001); > > > Oh yes a classic constant folding. The other thing to worry about is > dead code elimination. Walter has a nice story where he sent his > compiler for benchmarking and the compiler figured out that the the > result of the calculation in benchmark is not used so it deleted the > whole benchmark. I remember one time I was doing some benchmarks between different compilers, and LDC consistently beat them all -- which is not surprising, but what was surprising was that running times were suspiciously short. Curious to learn what magic code transformation LDC applied to make it run so incredibly fast, I took a look at the generated assembly. Turns out, because I was calling the function being benchmarked with constant arguments, LDC decided to execute the entire danged thing at compile-time and substitute the entire function call with a single instruction that loaded its return value(!). Another classic guffaw was when the function return value was simply discarded: LDC figured out that the function had no side-effects and its return value was not being used, so it deleted the function call, leaving the benchmark with the equivalent of: void main() {} which, needless to say, beat all other benchmarks hands down. :-D Lessons learned: (1) Always use external input to your benchmark (e.g., load from a file, so that an overly aggressive optimizer won't decide to execute the entire program at compile-time); (2) Always make use of the return value somehow, even if it's just to print 0 to stdout, or pipe the whole thing to /dev/null, so that the overly aggressive optimizer won't decide that since your program has no effect on the outside world, it should just consist of a single ret instruction. :-D T -- This is not a sentence.
Re: Beginner's Comparison Benchmark
On Tuesday, 5 May 2020 at 20:29:13 UTC, Steven Schveighoffer wrote: the optimizer recognizes what you are doing and changes your code to: writeln(1_000_000_001); Oh yes a classic constant folding. The other thing to worry about is dead code elimination. Walter has a nice story where he sent his compiler for benchmarking and the compiler figured out that the the result of the calculation in benchmark is not used so it deleted the whole benchmark.
Re: Beginner's Comparison Benchmark
On 5/5/20 4:07 PM, RegeleIONESCU wrote: Hello! I made a little test(counting to 1 billion by adding 1)to compare execution speed of a small counting for loop in C, D, Julia and Python. = The C version: |The D version: |The Julia version: |The Python Version #include |import std.stdio; |function counter() |def counter(): int a=0; |int main(){ | z = 0 | z = 0 int main(){ |int a = 0; | for i=1:bil | for i in range(1, bil): int i; |for(int i=0; i<=bil; | z=z+1 | z=z+1 for(i=0; i= Test Results without optimization: C |DLANG |JULIA | Python real 0m2,981s | real 0m3,051s | real 0m0,413s | real 2m19,501s user 0m2,973s | user 0m2,975s | user 0m0,270s | user 2m18,095s sys 0m0,001s | sys 0m0,006s | sys 0m0,181s | sys 0m0,033s = Test Results with optimization: C - GCC -O3 |DLANG LDC2 --O3 |JULIA --optimize=3 | Python -O real 0m0,002s | real 0m0,006s | real 0m0,408s | real 2m21,801s user 0m0,001s | user 0m0,003s | user 0m0,269s | user 2m19,964s sys 0m0,001s | sys 0m0,003s | sys 0m0,177s | sys 0m0,050s = = bil is the shortcut for 10 gcc 9.3.0 ldc2 1.21.0 python 3.8.2 julia 1.4.1 all on Ubuntu 20.04 - 64bit Host CPU: k8-sse3 Unoptimized C and D are slow compared with Julia. Optimization increases the execution speed very much for C and D but has almost no effect on Julia. Python, the slowest of all, when optimized, runs even slower :))) Although I see some times are better than others, I do not really know the difference between user and sys, I do not know which one is the time the app run. I am just a beginner, I am not a specialist. I made it just out of curiosity. If there is any error in my method please let me know. 1: you are interested in "real" time, that's how much time the whole thing took. 2: if you want to run benchmarks, you want to run multiple tests, and throw out the outliers, or use an average. 3: with simple things like this, the compiler is smarter than you ;) It doesn't really take 0.002s to do what you wrote, what happens is the optimizer recognizes what you are doing and changes your code to: writeln(1_000_000_001); (yes, you can use underscores to make literals more readable in D) doing benchmarks like this is really tricky. Julia probably recognizes the thing too, but has to optimize at runtime? Not sure. -Steve
Beginner's Comparison Benchmark
Hello! I made a little test(counting to 1 billion by adding 1)to compare execution speed of a small counting for loop in C, D, Julia and Python. = The C version: |The D version: |The Julia version: |The Python Version #include |import std.stdio;|function counter() |def counter(): int a=0;|int main(){ | z = 0 | z = 0 int main(){ |int a = 0; | for i=1:bil | for i in range(1, bil): int i; |for(int i=0; i<=bil; | z=z+1| z=z+1 for(i=0; ia=a+1; | a=a+1; |print(z) |counter() } | } |end | printf("%d", a);| write(a); |counter() | } |return 0;|| |}|| = Test Results without optimization: C |DLANG |JULIA | Python real 0m2,981s | real 0m3,051s | real 0m0,413s | real 2m19,501s user 0m2,973s | user 0m2,975s | user 0m0,270s | user 2m18,095s sys 0m0,001s | sys 0m0,006s | sys 0m0,181s | sys 0m0,033s = Test Results with optimization: C - GCC -O3|DLANG LDC2 --O3 |JULIA --optimize=3 | Python -O real 0m0,002s | real 0m0,006s | real 0m0,408s | real 2m21,801s user 0m0,001s | user 0m0,003s | user 0m0,269s | user 2m19,964s sys 0m0,001s | sys 0m0,003s | sys 0m0,177s | sys 0m0,050s = = bil is the shortcut for 10 gcc 9.3.0 ldc2 1.21.0 python 3.8.2 julia 1.4.1 all on Ubuntu 20.04 - 64bit Host CPU: k8-sse3 Unoptimized C and D are slow compared with Julia. Optimization increases the execution speed very much for C and D but has almost no effect on Julia. Python, the slowest of all, when optimized, runs even slower :))) Although I see some times are better than others, I do not really know the difference between user and sys, I do not know which one is the time the app run. I am just a beginner, I am not a specialist. I made it just out of curiosity. If there is any error in my method please let me know.
Re: There is a computer languages benchmark compare site, but no Dlang benchmark. I think they should support D.
On Saturday, 22 June 2019 at 01:27:31 UTC, lili wrote: A nick site, has a lot of languages, unfortunately no dlang in there. https://benchmarksgame-team.pages.debian.net/benchmarksgame/ See https://github.com/kostya/benchmarks
Re: There is a computer languages benchmark compare site, but no Dlang benchmark. I think they should support D.
On Saturday, 22 June 2019 at 01:27:31 UTC, lili wrote: A nick site, has a lot of languages, unfortunately no dlang in there. https://benchmarksgame-team.pages.debian.net/benchmarksgame/ This page frequently pops up in this forum, please refer to existing posts: https://forum.dlang.org/search?q=benchmarksgame=forum In particular the last thread: https://forum.dlang.org/post/rcrhjgnskiuzzhrnz...@forum.dlang.org TL;DR: not happening.
There is a computer languages benchmark compare site, but no Dlang benchmark. I think they should support D.
A nick site, has a lot of languages, unfortunately no dlang in there. https://benchmarksgame-team.pages.debian.net/benchmarksgame/
Re: ldexp and frexp benchmark between Mir, C and Phobos
On Wednesday, 2 January 2019 at 09:35:39 UTC, kinke wrote: On Tuesday, 1 January 2019 at 23:36:55 UTC, Guillaume Piolat wrote: llvm_exp (defers to C runtime) gives considerable speed improvement over `std.math.exp`. My tests back then on Linux also showed new `exp(float)` being about half as fast as C, while the double-version was somehow 4x faster. Interesting. At least the VS runtime seems to have different code for `exp(float)` and `exp(double)`. This could be an explanation. Then look at the implementation of exp() and you'll see that it uses ldexp() once. So by porting Ilya's version (or the Cephes one) to Phobos, I'm sure we can match the C speed for single-precision too. Good idea.
Re: ldexp and frexp benchmark between Mir, C and Phobos
On Tuesday, 1 January 2019 at 23:36:55 UTC, Guillaume Piolat wrote: llvm_exp (defers to C runtime) gives considerable speed improvement over `std.math.exp`. My tests back then on Linux also showed new `exp(float)` being about half as fast as C, while the double-version was somehow 4x faster. I've tested `expf` form the VS runtime exhaustively for 32-bit `float` and it showed the relative accuracy was within < 0.0002% of std.math.exp, It's not concerning at all, what is more is the variability of C runtime though vs a D function. Looking for speed AND control :) Then look at the implementation of exp() and you'll see that it uses ldexp() once. So by porting Ilya's version (or the Cephes one) to Phobos, I'm sure we can match the C speed for single-precision too.
Re: ldexp and frexp benchmark between Mir, C and Phobos
On Monday, 31 December 2018 at 13:24:29 UTC, kinke wrote: On Sunday, 30 December 2018 at 13:39:44 UTC, Guillaume Piolat wrote: Been waiting for an exp() rewrite. And Boost-licensed! I'm using expf() from whatever libc is shipped and the variability of results and lack of control is annoying. exp != {ld,fr}exp. Phobos includes a proper single/double precision exp implementation since v2.082 and is Boost licensed... llvm_exp (defers to C runtime) gives considerable speed improvement over `std.math.exp`. I've tested `expf` form the VS runtime exhaustively for 32-bit `float` and it showed the relative accuracy was within < 0.0002% of std.math.exp, It's not concerning at all, what is more is the variability of C runtime though vs a D function. Looking for speed AND control :)
Re: ldexp and frexp benchmark between Mir, C and Phobos
On Sunday, 30 December 2018 at 13:39:44 UTC, Guillaume Piolat wrote: Been waiting for an exp() rewrite. And Boost-licensed! I'm using expf() from whatever libc is shipped and the variability of results and lack of control is annoying. exp != {ld,fr}exp. Phobos includes a proper single/double precision exp implementation since v2.082 and is Boost licensed...
Re: ldexp and frexp benchmark between Mir, C and Phobos
On Friday, 28 December 2018 at 19:48:28 UTC, 9il wrote: ldexp and frexp are base building blocks for a lot of math functions. Here is a small benchmark that compares Mir, C and Phobos implementations: Wow, thanks! Been waiting for an exp() rewrite. And Boost-licensed! I'm using expf() from whatever libc is shipped and the variability of results and lack of control is annoying.
Re: ldexp and frexp benchmark between Mir, C and Phobos
On Saturday, 29 December 2018 at 15:15:48 UTC, Iain Buclaw wrote: On Fri, 28 Dec 2018 at 20:50, 9il via Digitalmars-d-announce wrote: ldexp and frexp are base building blocks for a lot of math functions. Here is a small benchmark that compares Mir, C and Phobos implementations: https://github.com/libmir/mir-core/blob/master/bench_ldexp_frexp.d Mir ldexp is 2.5 (5.5 - dmd) times faster for double and float. You could double the speed of ldexp if you actually used the checkedint compiler intrinsics rather than implementing it yourself. Using libm's ldexp() is also likely going to be 2-5x slower than using the implementation you've written for mir.ldexp(). For one, your version will be inlined! Mir has support for LLVM checkedint intrinsics. GDC checkedint intrinsics are not yet integrated in Mir. https://github.com/libmir/mir-core/blob/master/source/mir/checkedint.d
Re: ldexp and frexp benchmark between Mir, C and Phobos
On Fri, 28 Dec 2018 at 20:50, 9il via Digitalmars-d-announce wrote: > > ldexp and frexp are base building blocks for a lot of math > functions. > > Here is a small benchmark that compares Mir, C and Phobos > implementations: > > https://github.com/libmir/mir-core/blob/master/bench_ldexp_frexp.d > > Mir ldexp is 2.5 (5.5 - dmd) times faster for double and float. > You could double the speed of ldexp if you actually used the checkedint compiler intrinsics rather than implementing it yourself. Using libm's ldexp() is also likely going to be 2-5x slower than using the implementation you've written for mir.ldexp(). For one, your version will be inlined! -- Iain
Re: ldexp and frexp benchmark between Mir, C and Phobos
On Saturday, 29 December 2018 at 12:35:03 UTC, kinke wrote: On Friday, 28 December 2018 at 19:48:28 UTC, 9il wrote: Any chance the multi-precision ldexp can be upstreamed to Phobos (which currently uses real precision for the float/double overloads, which explains the suboptimal performance)? It'd make a *lot* more sense there, instead of having it in a separate library. It's well-known that there's a lot of remaining std.math functions which need proper single/double precision implementations, and ldexp is one of them. Yes, Mir is Boost licensed, but I don't work on Phobos anymore. Mir libraries are going to be independent of Phobos.
Re: ldexp and frexp benchmark between Mir, C and Phobos
On Friday, 28 December 2018 at 19:48:28 UTC, 9il wrote: LDC, macos x64: --- float ldexp (Phobos time / Mir time) = 2.55584 ldexp ( stdc time / Mir time) = 0.773019 frexp (Phobos time / Mir time) = 1.04093 frexp ( stdc time / Mir time) = 1.748 --- double ldexp (Phobos time / Mir time) = 2.49162 ldexp ( stdc time / Mir time) = 1.31868 frexp (Phobos time / Mir time) = 0.937906 frexp ( stdc time / Mir time) = 1.82241 --- real ldexp (Phobos time / Mir time) = 0.999327 (LDC Phobos uses C func for real) ldexp ( stdc time / Mir time) = 0.969467 (LDC Mir uses C func for real) frexp (Phobos time / Mir time) = 1.02512 frexp ( stdc time / Mir time) = 1.77901 Any chance the multi-precision ldexp can be upstreamed to Phobos (which currently uses real precision for the float/double overloads, which explains the suboptimal performance)? It'd make a *lot* more sense there, instead of having it in a separate library. It's well-known that there's a lot of remaining std.math functions which need proper single/double precision implementations, and ldexp is one of them.
ldexp and frexp benchmark between Mir, C and Phobos
ldexp and frexp are base building blocks for a lot of math functions. Here is a small benchmark that compares Mir, C and Phobos implementations: https://github.com/libmir/mir-core/blob/master/bench_ldexp_frexp.d Mir ldexp is 2.5 (5.5 - dmd) times faster for double and float. = LDC, macos x64: --- float ldexp (Phobos time / Mir time) = 2.55584 ldexp ( stdc time / Mir time) = 0.773019 frexp (Phobos time / Mir time) = 1.04093 frexp ( stdc time / Mir time) = 1.748 --- double ldexp (Phobos time / Mir time) = 2.49162 ldexp ( stdc time / Mir time) = 1.31868 frexp (Phobos time / Mir time) = 0.937906 frexp ( stdc time / Mir time) = 1.82241 --- real ldexp (Phobos time / Mir time) = 0.999327 (LDC Phobos uses C func for real) ldexp ( stdc time / Mir time) = 0.969467 (LDC Mir uses C func for real) frexp (Phobos time / Mir time) = 1.02512 frexp ( stdc time / Mir time) = 1.77901 = DMD, macos x64: --- float ldexp (Phobos time / Mir time) = 5.53172 ldexp ( stdc time / Mir time) = 0.535711 frexp (Phobos time / Mir time) = 2.06024 frexp ( stdc time / Mir time) = 0.739571 --- double ldexp (Phobos time / Mir time) = 5.32189 ldexp ( stdc time / Mir time) = 0.772949 frexp (Phobos time / Mir time) = 2.02758 frexp ( stdc time / Mir time) = 0.637328 --- real ldexp (Phobos time / Mir time) = 2.61905 ldexp ( stdc time / Mir time) = 0.803806 frexp (Phobos time / Mir time) = 1.22398 frexp ( stdc time / Mir time) = 1.08659 Best, Ilya This work has been sponsored by Symmetry Investments and Kaleidic Associates. http://symmetryinvestments.com/ https://github.com/kaleidicassociates/
Re: D hash table comparison benchmark
On Tuesday, 26 June 2018 at 03:45:27 UTC, Seb wrote: Did you by chance also benchmark it with other languages like C++, Go or Rust? = Not reusing hashtables, optimizations enabled = 79 msecs Rust std::collections::HashMap 90 msecs Go built-in map 177 msecs C++ std::unordered_map (whichever implementation comes with Xcode) = Reusing hashtables, optimizations enabled = 24 msecs C++ std::unordered_map (whichever implementation comes with Xcode) 26 msecs Go built-in map 36 msecs Rust std::collections::HashMap
Re: D hash table comparison benchmark
On Tuesday, 26 June 2018 at 14:33:25 UTC, Eugene Wissner wrote: Tanya hashes any value, also integral types; other hashtables probably not. Your intuition is correct here. Most of the tables use `typeid(key).getHash()`, which for `int` just returns `key`. = Built-in AA = General case: `typeid.getHash` with additional scrambling. Customizable: no. =memutils.hashmap= General case: `typeid.getHash`. Customizable: no. Other notes: * Special case for `toHash` member (bypasses `typeid`). * Interesting special cases for ref-counted data types, with further special cases for ref-counted strings and arrays. =vibe.utils.hashmap= General case: `typeid.getHash`. Customizable: yes, through optional `Traits` template parameter. Other notes: * Special case for `toHash` member (bypasses `typeid`). * Tries to implement a special case for objects with the default `Object.toHash` function, but it seems like it can never work. =jive.map= General case: `typeid.getHash`. Customizable: no. =containers.hashmap= General case: `typeid.getHash`. Customizable: yes, through optional `hashFunction` template parameter. Other notes: * Special case for strings on 64-bit builds. Uses FNV-1a instead of default hash function.
Re: D hash table comparison benchmark
On Tuesday, 26 June 2018 at 09:03:10 UTC, Eugene Wissner wrote: It seems it doesn't work with a branch in dub.sdl. I just replaced the files in ~/.dub/packages. And to make tanya perform better than built-in AAs in the first test, define a hash function: size_t hasher(int e) { return e; } and then: mixin(benchmarkCode!("tanya.container.hashtable", "Tanya_HashTable!(int,int, hasher)")); or mixin(benchmarkCodeReuse!("tanya.container.hashtable", "Tanya_HashTable!(int,int,hasher)")) Tanya hashes any value, also integral types; other hashtables probably not. It should theoretically provide better key distribution.
Re: D hash table comparison benchmark
BTW the output is formatted so you can get a sorted list of times across all trials by piping the output through `sort -n`. That's also why the tests reusing maps start with ( instead of [, so they will be grouped separately.
Re: D hash table comparison benchmark
It seems it doesn't work with a branch in dub.sdl. I just replaced the files in ~/.dub/packages.
Re: D hash table comparison benchmark
On Tuesday, 26 June 2018 at 04:17:44 UTC, Nathan S. wrote: On Tuesday, 26 June 2018 at 03:45:27 UTC, Seb wrote: Did you by chance also benchmark it with other languages like C++, Go or Rust? I didn't since I was evaluating hashtable implementations for use in a D application. BTW I'm not sure what your plans are, but are you aware of this recent article? https://probablydance.com/2018/05/28/a-new-fast-hash-table-in-response-to-googles-new-fast-hash-table I wasn't, thanks. It isn't fair to post the benchmarks without giving some time to fix the bug :). Just kidding. Thanks a lot for taking time to report the bug. tanya's HashTable implementation is very young. I fixed the bug in the "bugfix/hashtable-endless" branch (still need to fix another rehash() before merging into master). Just put in dub.sdl: dependency "tanya" version="~bugfix/hashtable-endless" Here are the results from my machine with tanya included. I'm posting only optimized version for dmd and ldc. I just say that tanya's HashTable sucks in debug mode, I'm using an underlying structure to combine common functionality for the hash table and hash set. With optimizations a lot of function calls can be inlined. So dmd: Hashtable benchmark N (size) = 100 (repetitions) = 1 =Results (new hashtables)= *Trial #1* [checksum 1526449824] 139 msecs built-in AA [checksum 1526449824] 368 msecs containers.hashmap w/Mallocator [checksum 1526449824] 422 msecs containers.hashmap w/GCAllocator [checksum 1526449824] 97 msecs memutils.hashmap [checksum 1526449824] 101 msecs vibe.utils.hashmap [checksum 1526449824] 181 msecs jive.map [checksum 1526449824] 242 msecs tanya.container.hashtable *Trial #2* [checksum 1526449824] 128 msecs built-in AA [checksum 1526449824] 361 msecs containers.hashmap w/Mallocator [checksum 1526449824] 416 msecs containers.hashmap w/GCAllocator [checksum 1526449824] 95 msecs memutils.hashmap [checksum 1526449824] 109 msecs vibe.utils.hashmap [checksum 1526449824] 179 msecs jive.map [checksum 1526449824] 239 msecs tanya.container.hashtable *Trial #3* [checksum 1526449824] 131 msecs built-in AA [checksum 1526449824] 360 msecs containers.hashmap w/Mallocator [checksum 1526449824] 421 msecs containers.hashmap w/GCAllocator [checksum 1526449824] 89 msecs memutils.hashmap [checksum 1526449824] 105 msecs vibe.utils.hashmap [checksum 1526449824] 180 msecs jive.map [checksum 1526449824] 237 msecs tanya.container.hashtable =Results (reusing hashtables)= *Trial #1* (checksum 1526449824) 57.66 msecs built-in AA (checksum 1526449824) 52.76 msecs containers.hashmap w/Mallocator (checksum 1526449824) 48.49 msecs containers.hashmap w/GCAllocator (checksum 1526449824) 31.16 msecs memutils.hashmap (checksum 1526449824) 45.19 msecs vibe.utils.hashmap (checksum 1526449824) 47.52 msecs jive.map (checksum 1526449824) 114.41 msecs tanya.container.hashtable *Trial #2* (checksum 1526449824) 54.42 msecs built-in AA (checksum 1526449824) 52.37 msecs containers.hashmap w/Mallocator (checksum 1526449824) 53.10 msecs containers.hashmap w/GCAllocator (checksum 1526449824) 32.39 msecs memutils.hashmap (checksum 1526449824) 46.94 msecs vibe.utils.hashmap (checksum 1526449824) 48.90 msecs jive.map (checksum 1526449824) 113.73 msecs tanya.container.hashtable *Trial #3* (checksum 1526449824) 58.06 msecs built-in AA (checksum 1526449824) 53.29 msecs containers.hashmap w/Mallocator (checksum 1526449824) 55.08 msecs containers.hashmap w/GCAllocator (checksum 1526449824) 30.94 msecs memutils.hashmap (checksum 1526449824) 44.89 msecs vibe.utils.hashmap (checksum 1526449824) 47.69 msecs jive.map (checksum 1526449824) 112.62 msecs tanya.container.hashtable LDC: Hashtable benchmark N (size) = 100 (repetitions) = 1 =Results (new hashtables)= *Trial #1* [checksum 1526449824] 103 msecs built-in AA [checksum 1526449824] 261 msecs containers.hashmap w/Mallocator [checksum 1526449824] 274 msecs containers.hashmap w/GCAllocator [checksum 1526449824] 64 msecs memutils.hashmap [checksum 1526449824] 52 msecs vibe.utils.hashmap [checksum 1526449824] 131 msecs jive.map [checksum 1526449824] 102 msecs tanya.container.hashtable *Trial #2* [checksum 1526449824] 97 msecs built-in AA [checksum 1526449824] 257 msecs containers.hashmap w/Mallocator [checksum 1526449824] 274 msecs containers.hashmap w/GCAllocator [checksum 1526449824] 59 msecs memutils.hashmap [checksum 1526449824] 50 msecs vibe.utils.hashmap [checksum 1526449824] 131 msecs jive.map [checksum 1526449824] 102 msecs tanya.container.hashtable *Trial #3* [checksum 1526449824] 96 msecs built-in AA [checksum 1526449824] 258 msecs containers.hashmap w/Mallocator [checksum 1526449824] 271 msecs containers.hashmap w/GCAllocator [checksum 1526449824] 60 msecs memutils.hashmap [checksum 1526449824] 50 msecs vibe.utils.hashmap [checksum 1526449824] 131 msecs jive.map [checksum 1526449824] 102 msecs tanya.container.hashtable =Results (reusing hashtables)= *
Re: D hash table comparison benchmark
On Tuesday, 26 June 2018 at 03:45:27 UTC, Seb wrote: Did you by chance also benchmark it with other languages like C++, Go or Rust? I didn't since I was evaluating hashtable implementations for use in a D application. BTW I'm not sure what your plans are, but are you aware of this recent article? https://probablydance.com/2018/05/28/a-new-fast-hash-table-in-response-to-googles-new-fast-hash-table I wasn't, thanks.
Re: D hash table comparison benchmark
On Tuesday, 26 June 2018 at 02:53:22 UTC, Nathan S. wrote: With LDC2 the times for vibe.utils.hashmap and memutils.hashmap are suspiciously low, leading me to suspect that the optimizer might be omitting most of the work. Here are the figures without optimizations enabled. == Speed Ranking using DMD (no optimizations) == 95 msecs built-in AA 168 msecs vibe.utils.hashmap 182 msecs jive.map 224 msecs memutils.hashmap 663 msecs containers.hashmap w/GCAllocator 686 msecs containers.hashmap w/Mallocator == Speed Ranking using LDC2 (no optimizations) == 68 msecs built-in AA 143 msecs vibe.utils.hashmap 155 msecs jive.map 164 msecs memutils.hashmap 515 msecs containers.hashmap w/GCAllocator 537 msecs containers.hashmap w/Mallocator Did you by chance also benchmark it with other languages like C++, Go or Rust? BTW I'm not sure what your plans are, but are you aware of this recent article? https://probablydance.com/2018/05/28/a-new-fast-hash-table-in-response-to-googles-new-fast-hash-table There were also plans to lower the AA implementation entirely into D runtime/user space, s.t. specialization can be done easier, but sadly these plans stagnated so far: https://github.com/dlang/druntime/pull/1282 https://github.com/dlang/druntime/pull/1985
Re: D hash table comparison benchmark
With LDC2 the times for vibe.utils.hashmap and memutils.hashmap are suspiciously low, leading me to suspect that the optimizer might be omitting most of the work. Here are the figures without optimizations enabled. == Speed Ranking using DMD (no optimizations) == 95 msecs built-in AA 168 msecs vibe.utils.hashmap 182 msecs jive.map 224 msecs memutils.hashmap 663 msecs containers.hashmap w/GCAllocator 686 msecs containers.hashmap w/Mallocator == Speed Ranking using LDC2 (no optimizations) == 68 msecs built-in AA 143 msecs vibe.utils.hashmap 155 msecs jive.map 164 msecs memutils.hashmap 515 msecs containers.hashmap w/GCAllocator 537 msecs containers.hashmap w/Mallocator
D hash table comparison benchmark
The below benchmarks come from writing 100 int-to-int mappings to a new hashtable then reading them back, repeated 10_000 times. The built-in AA doesn't deallocate memory when it falls out of scope but the other maps do. Benchmark code in next post. == Speed Ranking using LDC2 (optimized) == 21 msecs vibe.utils.hashmap 37 msecs memutils.hashmap 57 msecs built-in AA 102 msecs jive.map 185 msecs containers.hashmap w/GCAllocator 240 msecs containers.hashmap w/Mallocator == Speed Ranking using DMD (optimized) == 55 msecs memutils.hashmap 64 msecs vibe.utils.hashmap 80 msecs built-in AA 131 msecs jive.map 315 msecs containers.hashmap w/GCAllocator 361 msecs containers.hashmap w/Mallocator ** What if the array size is smaller or larger? The ordering didn't change so I won't post the results. ** ** What if we reuse the hashtable? ** == Speed Ranking using LDC2 (optimized) == 10.45 msecs vibe.utils.hashmap 11.85 msecs memutils.hashmap 12.61 msecs containers.hashmap w/GCAllocator 12.91 msecs containers.hashmap w/Mallocator 14.30 msecs built-in AA 19.21 msecs jive.map == Speed Ranking using DMD (optimized) == 18.05 msecs memutils.hashmap 21.03 msecs jive.map 24.99 msecs built-in AA 25.22 msecs containers.hashmap w/Mallocator 25.75 msecs containers.hashmap w/GCAllocator 29.93 msecs vibe.utils.hashmap == Not benchmarked == stdx.collections.hashtable (dlang-stdx/collections): compilation error kontainer.orderedAssocArray (alphaKAI/kontainer): doesn't accept int keys tanya.container.hashtable (caraus-ecms/tanya): either has a bug or is very slow
Re: D hash table comparison benchmark
Benchmark code: dub.sdl ``` name "hashbench" description "D hashtable comparison." dependency "emsi_containers" version="~>0.7.0" dependency "memutils" version="~>0.4.11" dependency "vibe-d:utils" version="~>0.8.4" dependency "jive" version="~>0.2.0" //dependency "collections" version="~>0.1.0" //dependency "tanya" version="~>0.10.0" //dependency "kontainer" version="~>0.0.2" ``` app.d ```d int nthKey(in uint n) @nogc nothrow pure @safe { // Can be any invertible function. // The goal is to map [0 .. N] to a sequence not in ascending order. int h = cast(int) (n + 1); h = (h ^ (h >>> 16)) * 0x85ebca6b; h = (h ^ (n >>> 13)) * 0xc2b2ae35; return h ^ (h >>> 16); } pragma(inline, false) uint hashBench(HashTable, Args...)(in uint N, in uint seed, Args initArgs) { static if (initArgs.length) HashTable hashtable = HashTable(initArgs); else // Separate branch needed for builtin AA. HashTable hashtable; foreach (uint n; 0 .. N) hashtable[nthKey(n)] = n + seed; uint sum; foreach_reverse (uint n; 0 .. N/2) sum += hashtable[nthKey(n)]; foreach_reverse(uint n; N/2 .. N) sum += hashtable[nthKey(n)]; return sum; } pragma(inline, false) uint hashBenchReuse(HashTable)(in uint N, in uint seed, ref HashTable hashtable) { foreach (uint n; 0 .. N) hashtable[nthKey(n)] = n + seed; uint sum; foreach_reverse (uint n; 0 .. N/2) sum += hashtable[nthKey(n)]; foreach_reverse(uint n; N/2 .. N) sum += hashtable[nthKey(n)]; return sum; } enum benchmarkCode(string name, string signature = name) = ` { sw.reset(); result = 0; sw.start(); foreach (_; 0 .. M) { result += hashBench!(`~signature~`)(N, result); } sw.stop(); string s = "`~name~`"; printf("[checksum %d] %3d msecs %s\n", result, sw.peek.total!"msecs", [0]); } `; enum benchmarkCodeReuse(string name, string signature = name) = ` { sw.reset(); result = 0; sw.start(); `~signature~` hashtable; foreach (_; 0 .. M) { result += hashBenchReuse!(`~signature~`)(N, result, hashtable); } sw.stop(); string s = "`~name~`"; printf("(checksum %d) %3.2f msecs %s\n", result, sw.peek.total!"usecs" / 1000.0, [0]); } `; void main(string[] args) { import std.datetime.stopwatch : AutoStart, StopWatch; import core.stdc.stdio : printf, puts; import std.experimental.allocator.gc_allocator : GCAllocator; import std.experimental.allocator.mallocator : Mallocator; alias BuiltinAA(K,V) = V[K]; import containers.hashmap : EMSI_HashMap = HashMap; import memutils.hashmap : Memutils_HashMap = HashMap; import vibe.utils.hashmap : Vibe_HashMap = HashMap; import jive.map : Jive_Map = Map; //import stdx.collections.hashtable : Stdx_Hashtable = Hashtable; //import tanya.container.hashtable : Tanya_HashTable = HashTable; //import kontainer.orderedAssocArray.orderedAssocArray : Kontainer_OrderedAssocArray = OrderedAssocArray; immutable uint N = args.length < 2 ? 100 : () { import std.conv : to; auto result = to!uint(args[1]); return (result == 0 ? 100 : result); }(); immutable M = N <= 500_000 ? (1000_000 / N) : 2; enum topLevelRepetitions = 3; printf("Hashtable benchmark N (size) = %d (repetitions) = %d\n", N, M); StopWatch sw = StopWatch(AutoStart.no); uint result; version(all) { puts("\n=Results (new hashtables)="); foreach (_repetition; 0 .. topLevelRepetitions) { printf("*Trial #%d*\n", _repetition+1); mixin(benchmarkCode!("built-in AA", "BuiltinAA!(int, int)")); mixin(benchmarkCode!("containers.hashmap w/Mallocator", "EMSI_HashMap!(int, int, Mallocator)")); mixin(benchmarkCode!("containers.hashmap w/GCAllocator", "EMSI_HashMap!(int, int, GCAllocator)")); mixin(benchmarkCode!("memutils.hashmap", "Memutils_HashMap!(int,int)")); mixin(benchmarkCode!("vibe.utils.hashmap", "Vibe_HashMap!(int,int)")); mixin(benchmarkCode!("jive.map", "Jive_Map!(int,int)")); //mixin(benchmarkCode!("stdx.collections.hashtable", "Stdx_Hashtable!(int,in
Re: Found on proggit: simple treap language benchmark, includes D
On 22/05/2018 3:31 AM, ixid wrote: On Saturday, 19 May 2018 at 15:09:38 UTC, Joakim wrote: D does well, comes in second on Mac/Win/linux: https://github.com/frol/completely-unscientific-benchmarks https://www.reddit.com/r/programming/comments/8jbfa7/naive_benchmark_treap_implementation_of_c_rust/ Can any experts improve this to come first? That's how you win hearts and minds. I want to see assembly of D and C++ tuned versions. Something has got to be different.
Re: Found on proggit: simple treap language benchmark, includes D
On Saturday, 19 May 2018 at 15:09:38 UTC, Joakim wrote: D does well, comes in second on Mac/Win/linux: https://github.com/frol/completely-unscientific-benchmarks https://www.reddit.com/r/programming/comments/8jbfa7/naive_benchmark_treap_implementation_of_c_rust/ Can any experts improve this to come first? That's how you win hearts and minds.
Re: Found on proggit: simple treap language benchmark, includes D
21.05.2018 17:11, Nerve пишет: Sorry for double-posting, but I've included a GC-enabled solution based on their Java solution, and have a pull request up that's a bit more idiomatic, pulling unnecessary static methods out as functions. It scores VERY HIGH across the board on their "naive" benchmark. High expressivity, high maintainability, extremely fast, moderately low memory usage. This is the sort of thing that will convince people of D. Benchmarks involving GC and lots of allocations that still have it way ahead of the competition, with clean code! For any other language, this is front-page, first-glance material. Thank you for your efforts!
Re: Found on proggit: simple treap language benchmark, includes D
On Sunday, 20 May 2018 at 15:30:37 UTC, Nerve wrote: I'll see if I can get it included so they can test it on their specific setup. Sorry for double-posting, but I've included a GC-enabled solution based on their Java solution, and have a pull request up that's a bit more idiomatic, pulling unnecessary static methods out as functions. It scores VERY HIGH across the board on their "naive" benchmark. High expressivity, high maintainability, extremely fast, moderately low memory usage. This is the sort of thing that will convince people of D. Benchmarks involving GC and lots of allocations that still have it way ahead of the competition, with clean code! For any other language, this is front-page, first-glance material.
Re: Found on proggit: simple treap language benchmark, includes D
On Saturday, 19 May 2018 at 15:09:38 UTC, Joakim wrote: D does well, comes in second on Mac/Win/linux: https://github.com/frol/completely-unscientific-benchmarks https://www.reddit.com/r/programming/comments/8jbfa7/naive_benchmark_treap_implementation_of_c_rust/ The results in these tests are blazing fast, but they all forego the GC for manual allocation. In the Issues section of the repo, I included some simple, vanilla D translated from their Java implementation and made for use with the GC and runtime. I also included some raw sample times that are competitive with desktop i7 times of Rust and ref-counted C++ on...get this...a much slower laptop i5. This sort of thing needs to be shouted from the rooftops by the Foundation. I'll see if I can get it included so they can test it on their specific setup.
Found on proggit: simple treap language benchmark, includes D
D does well, comes in second on Mac/Win/linux: https://github.com/frol/completely-unscientific-benchmarks https://www.reddit.com/r/programming/comments/8jbfa7/naive_benchmark_treap_implementation_of_c_rust/
Re: Benchmark Game
On Saturday, 19 May 2018 at 01:15:10 UTC, RhyS wrote: More then worth the effort because its used a lot when discussions are ongoing regarding languages like Go, C, ... Its one of the best form of free advertisement. D used to be there, but at some point was at the maintainer's whim for no specific reason. Isaac has acknowledged the arbitrariness of (t)his decision multiple times, and is usually quick to add, "but you can host your own copy". Of course this is only true in a very literal sense – our own copy wouldn't get nearly the same exposure than his "official" one. Still, it would be useful to have a page comparing optimized D results with C/C++, as the Benchmarks Game still enjoys quite a bit of popularity despite the maintainer's antics. — David
Re: Benchmark Game
On Thursday, 17 May 2018 at 15:39:08 UTC, Andrei Alexandrescu wrote: D is not there. Anyone interested, if it's worth it? It would be well worth the effort. More then worth the effort because its used a lot when discussions are ongoing regarding languages like Go, C, ... Its one of the best form of free advertisement.
Re: Benchmark Game
On Thursday, 17 May 2018 at 09:26:34 UTC, rikki cattermole wrote: On 17/05/2018 8:52 PM, ixid wrote: On Thursday, 17 May 2018 at 08:51:39 UTC, rikki cattermole wrote: On 17/05/2018 8:50 PM, Chris wrote: For what it's worth, I came across this website: https://benchmarksgame-team.pages.debian.net/benchmarksgame/ D is not there. Anyone interested, if it's worth it? It isn't happening /thread. Isn't it being hosted by a different person now? So it could happen. I haven't heard that, so can we get this verified? The original site went down and is being rehosted somewhere else. As far as I know though igouy is still involved so I doubt the situation has improved.
Re: Benchmark Game
On 05/17/2018 04:50 AM, Chris wrote: For what it's worth, I came across this website: https://benchmarksgame-team.pages.debian.net/benchmarksgame/ D is not there. Anyone interested, if it's worth it? It would be well worth the effort.
Re: Benchmark Game
On Thu, 2018-05-17 at 12:51 +, Chris M. via Digitalmars-d wrote: […] > He'll probably pop up and confirm for us since he somehow always > knows when we bring this up He has (had?) robot listeners trawling the various email collectors which would then let him know the Benchmarks Game was being discussed. -- Russel. == Dr Russel Winder t: +44 20 7585 2200 41 Buckmaster Roadm: +44 7770 465 077 London SW11 1EN, UK w: www.russel.org.uk signature.asc Description: This is a digitally signed message part
Re: Benchmark Game
On Thursday, 17 May 2018 at 09:26:34 UTC, rikki cattermole wrote: On 17/05/2018 8:52 PM, ixid wrote: On Thursday, 17 May 2018 at 08:51:39 UTC, rikki cattermole wrote: On 17/05/2018 8:50 PM, Chris wrote: For what it's worth, I came across this website: https://benchmarksgame-team.pages.debian.net/benchmarksgame/ D is not there. Anyone interested, if it's worth it? It isn't happening /thread. Isn't it being hosted by a different person now? So it could happen. I haven't heard that, so can we get this verified? He'll probably pop up and confirm for us since he somehow always knows when we bring this up
Re: Benchmark Game
On 17/05/2018 8:52 PM, ixid wrote: On Thursday, 17 May 2018 at 08:51:39 UTC, rikki cattermole wrote: On 17/05/2018 8:50 PM, Chris wrote: For what it's worth, I came across this website: https://benchmarksgame-team.pages.debian.net/benchmarksgame/ D is not there. Anyone interested, if it's worth it? It isn't happening /thread. Isn't it being hosted by a different person now? So it could happen. I haven't heard that, so can we get this verified?
Re: Benchmark Game
On Thursday, 17 May 2018 at 08:51:39 UTC, rikki cattermole wrote: On 17/05/2018 8:50 PM, Chris wrote: For what it's worth, I came across this website: https://benchmarksgame-team.pages.debian.net/benchmarksgame/ D is not there. Anyone interested, if it's worth it? It isn't happening /thread. Isn't it being hosted by a different person now? So it could happen.
Benchmark Game
For what it's worth, I came across this website: https://benchmarksgame-team.pages.debian.net/benchmarksgame/ D is not there. Anyone interested, if it's worth it?
Re: Benchmark Game
On 17/05/2018 8:50 PM, Chris wrote: For what it's worth, I came across this website: https://benchmarksgame-team.pages.debian.net/benchmarksgame/ D is not there. Anyone interested, if it's worth it? It isn't happening /thread.
Re: Simple web server benchmark - vibe.d is slower than node.js and Go?
On Friday, 11 May 2018 at 23:24:32 UTC, Arun Chandrasekaran wrote: On Friday, 11 May 2018 at 07:56:04 UTC, Daniel Kozak wrote: [...] siege makes a difference. Earlier I had two chrome windows open and I just tried a simple GET from each, with a considerable delay and I saw the same thread ID in both the responses. Thanks for your help! But couldn't understand how two chrome windows could consistently get the same thread ID. netstat shows one ESTABLISHED socket. May be Chrome multiplexes the connection? That doesn't sound right to me. Very likely: https://www.igvita.com/posa/high-performance-networking-in-google-chrome/#tcp-pre-connect
Re: Simple web server benchmark - vibe.d is slower than node.js and Go?
On Friday, 11 May 2018 at 07:56:04 UTC, Daniel Kozak wrote: On Wednesday, 9 May 2018 at 22:37:22 UTC, Arun Chandrasekaran wrote: [...] I have change my example a little: case "/": res.writeBody("Hello World " ~ to!string(thisThreadID), "text/plain"); And I get this (siege -p -c15 0b -t 10s http://127.0.0.1:3000 | grep World | sort | uniq): Hello World 140064214951680 Hello World 140064223344384 Hello World 140064231737088 Hello World 140064240129792 Hello World 140064248522496 Hello World 140064256915200 Si I get six different thread ids, which is OK because I have 6 cores siege makes a difference. Earlier I had two chrome windows open and I just tried a simple GET from each, with a considerable delay and I saw the same thread ID in both the responses. Thanks for your help! But couldn't understand how two chrome windows could consistently get the same thread ID. netstat shows one ESTABLISHED socket. May be Chrome multiplexes the connection? That doesn't sound right to me.
Re: Simple web server benchmark - vibe.d is slower than node.js and Go?
On Friday, 11 May 2018 at 07:56:04 UTC, Daniel Kozak wrote: On Wednesday, 9 May 2018 at 22:37:22 UTC, Arun Chandrasekaran wrote: [...] I have change my example a little: case "/": res.writeBody("Hello World " ~ to!string(thisThreadID), "text/plain"); And I get this (siege -p -c15 0b -t 10s http://127.0.0.1:3000 | grep World | sort | uniq): Hello World 140064214951680 Hello World 140064223344384 Hello World 140064231737088 Hello World 140064240129792 Hello World 140064248522496 Hello World 140064256915200 Si I get six different thread ids, which is OK because I have 6 cores my dub.json: { "name": "sbench", "authors": [ "Daniel Kozak" ], "dependencies": { "vibe-d": "~>0.8.4-beta.1", "vibe-d:tls": "0.8.4-beta.1", "vibe-core": "1.4.1-beta.2" }, "subConfigurations": { "vibe-d:core": "vibe-core", "vibe-d:tls": "notls" }, "description": "A simple vibe.d server application.", "copyright": "Copyright © 2018, Daniel Kozak", "license": "proprietary" }
Re: Simple web server benchmark - vibe.d is slower than node.js and Go?
On Wednesday, 9 May 2018 at 22:37:22 UTC, Arun Chandrasekaran wrote: That could be the reason for slowness. Ubuntu 17.10 64 bit, DMD v2.079.1, E7-4860, 8 core 32 GB RAM. With slight modifcaition to capture the timestamp of the request on the server: import std.datetime.systime : Clock; auto tm = Clock.currTime().toISOExtString(); writeln(tm, " My Thread Id: ", to!string(thisThreadID)); // simulate long runnig task Thread.sleep(dur!("seconds")(3)); if (req.path == "/") res.writeBody(tm ~ " Hello, World! from " ~ to!string(thisThreadID), "text/plain"); Launch two parallel curls.. and here is the server log.. Master 13284 is running [vibe-6(5fQI) INF] Listening for requests on http://0.0.0.0:8080/ [vibe-7(xljY) INF] Listening for requests on http://0.0.0.0:8080/ [vibe-2(FVCk) INF] Listening for requests on http://0.0.0.0:8080/ [vibe-3(peZP) INF] Listening for requests on http://0.0.0.0:8080/ [vibe-8(c5pQ) INF] Listening for requests on http://0.0.0.0:8080/ [vibe-4(T/oM) INF] Listening for requests on http://0.0.0.0:8080/ [vibe-5(zc5i) INF] Listening for requests on http://0.0.0.0:8080/ [vibe-1(Rdux) INF] Listening for requests on http://0.0.0.0:8080/ [vibe-0(PNMK) INF] Listening for requests on http://0.0.0.0:8080/ 2018-05-09T15:32:41.5424275 My Thread Id: 140129463940864 2018-05-09T15:32:44.5450092 My Thread Id: 140129463940864 2018-05-09T15:32:56.3998322 My Thread Id: 140129463940864 2018-05-09T15:32:59.4022579 My Thread Id: 140129463940864 2018-05-09T15:33:12.4973215 My Thread Id: 140129463940864 2018-05-09T15:33:15.4996923 My Thread Id: 140129463940864 PS: Your top posting makes reading your replies difficult I have change my example a little: case "/": res.writeBody("Hello World " ~ to!string(thisThreadID), "text/plain"); And I get this (siege -p -c15 0b -t 10s http://127.0.0.1:3000 | grep World | sort | uniq): Hello World 140064214951680 Hello World 140064223344384 Hello World 140064231737088 Hello World 140064240129792 Hello World 140064248522496 Hello World 140064256915200 Si I get six different thread ids, which is OK because I have 6 cores
Re: Simple web server benchmark - vibe.d is slower than node.js and Go?
On Wednesday, 9 May 2018 at 21:55:15 UTC, Daniel Kozak wrote: On which system? AFAIK HTTPServerOption.reusePort works on Linux but maybe not on others OSes. Other question is what events driver is use (libasync, libevent, vibe-core) On Wed, May 9, 2018 at 9:12 PM, Arun Chandrasekaran via Digitalmars-d < digitalmars-d@puremagic.com> wrote: On Monday, 30 October 2017 at 17:23:02 UTC, Daniel Kozak wrote: Maybe this one: import vibe.d; import std.regex; import std.array : appender; static reg = ctRegex!"^/greeting/([a-z]+)$"; void main() { setupWorkerThreads(logicalProcessorCount); runWorkerTaskDist(); runApplication(); } void runServer() { auto settings = new HTTPServerSettings; settings.options |= HTTPServerOption.reusePort; settings.port = 3000; settings.serverString = null; listenHTTP(settings, ); } void handleRequest(HTTPServerRequest req, HTTPServerResponse res) { switch(req.path) { case "/": res.writeBody("Hello World", "text/plain"); break; default: auto m = matchFirst(req.path, reg); string message = "Hello, "; auto app = appender(message); app.reserve(32); app ~= m[1]; res.writeBody(app.data, "text/plain"); } } On Mon, Oct 30, 2017 at 5:41 PM, ade90036 via Digitalmars-d < digitalmars-d@puremagic.com> wrote: On Thursday, 21 September 2017 at 13:09:33 UTC, Daniel Kozak wrote: wrong version, this is my letest version: https://paste.ofcode.org/qWsQi kdhKiAywgBpKwANFR On Thu, Sep 21, 2017 at 3:01 PM, Daniel Kozakwrote: my version: https://paste.ofcode.org/RLX7GM6SHh3DjBBHd7wshj On Thu, Sep 21, 2017 at 2:50 PM, Sönke Ludwig via Digitalmars-d < digitalmars-d@puremagic.com> wrote: Am 21.09.2017 um 14:41 schrieb Vadim Lopatin: [...] Oh, sorry, I forgot the reusePort option, so that multiple sockets can listen on the same port: auto settings = new HTTPServerSettings("0.0.0.0:3000"); settings.options |= HTTPServerOption.reusePort; listenHTTP(settings, ); Hi, would it be possible to re-share the example of vibe.d woth multithreaded support. The pastebin link has expired and the pull request doesnt have the latest version. Thanks Ade With vibe.d 0.8.2, even when multiple worker threads are setup, only one thread handles the requests: ``` import core.thread; import vibe.d; import std.experimental.all; auto reg = ctRegex!"^/greeting/([a-z]+)$"; void main() { writefln("Master %d is running", getpid()); setupWorkerThreads(logicalProcessorCount + 1); runWorkerTaskDist(); runApplication(); } void runServer() { auto settings = new HTTPServerSettings; settings.options |= HTTPServerOption.reusePort; settings.port = 8080; settings.bindAddresses = ["127.0.0.1"]; listenHTTP(settings, ); } void handleRequest(HTTPServerRequest req, HTTPServerResponse res) { writeln("My Thread Id: ", to!string(thisThreadID)); // simulate long runnig task Thread.sleep(dur!("seconds")(3)); if (req.path == "/") res.writeBody("Hello, World! from " ~ to!string(thisThreadID), "text/plain"); else if (auto m = matchFirst(req.path, reg)) res.writeBody("Hello, " ~ m[1] ~ " from " ~ to!string(thisThreadID), "text/plain"); } ``` That could be the reason for slowness. Ubuntu 17.10 64 bit, DMD v2.079.1, E7-4860, 8 core 32 GB RAM. With slight modifcaition to capture the timestamp of the request on the server: import std.datetime.systime : Clock; auto tm = Clock.currTime().toISOExtString(); writeln(tm, " My Thread Id: ", to!string(thisThreadID)); // simulate long runnig task Thread.sleep(dur!("seconds")(3)); if (req.path == "/") res.writeBody(tm ~ " Hello, World! from " ~ to!string(thisThreadID), "text/plain"); Launch two parallel curls.. and here is the server log.. Master 13284 is running [vibe-6(5fQI) INF] Listening for requests on http://0.0.0.0:8080/ [vibe-7(xljY) INF] Listening for requests on http://0.0.0.0:8080/ [vibe-2(FVCk) INF] Listening for requests on http://0.0.0.0:8080/ [vibe-3(peZP) INF] Listening for requests on http://0.0.0.0:8080/ [vibe-8(c5pQ) INF] Listening for requests on http://0.0.0.0:8080/ [vibe-4(T/oM) INF] Listening for requests on http://0.0.0.0:8080/ [vibe-5(zc5i) INF] Listening for requests on http://0.0.0.0:8080/ [vibe-1(Rdux) INF] Listening for requests on http://0.0.0.0:8080/ [vibe-0(PNMK) INF] Listening for requests on http://0.0.0.0:8080/ 2018-05-09T15:32:41.5424275 My Thread Id: 140129463940864 2018-05-09T15:32:44.5450092 My Thread Id: 140129463940864 2018-05-09T15:32:56.3998322 My Thread Id: 140129463940864 2018-05-09T15:32:59.4022579 My Thread Id: 140129463940864 2018-05-09T15:33:12.4973215 My Thread Id: 140129463940864 2018-05-09T15:33:15.4996923 My Thread Id: 140129463940864 PS: Your top posting makes reading your replies difficult
Re: Simple web server benchmark - vibe.d is slower than node.js and Go?
On which system? AFAIK HTTPServerOption.reusePort works on Linux but maybe not on others OSes. Other question is what events driver is use (libasync, libevent, vibe-core) On Wed, May 9, 2018 at 9:12 PM, Arun Chandrasekaran via Digitalmars-d < digitalmars-d@puremagic.com> wrote: > On Monday, 30 October 2017 at 17:23:02 UTC, Daniel Kozak wrote: > >> Maybe this one: >> >> import vibe.d; >> import std.regex; >> import std.array : appender; >> >> static reg = ctRegex!"^/greeting/([a-z]+)$"; >> >> void main() >> { >> setupWorkerThreads(logicalProcessorCount); >> runWorkerTaskDist(); >> runApplication(); >> } >> >> void runServer() >> { >> auto settings = new HTTPServerSettings; >> settings.options |= HTTPServerOption.reusePort; >> settings.port = 3000; >> settings.serverString = null; >> listenHTTP(settings, ); >> } >> >> void handleRequest(HTTPServerRequest req, >> HTTPServerResponse res) >> { >> switch(req.path) >> { >> case "/": res.writeBody("Hello World", "text/plain"); >> break; >> default: >> auto m = matchFirst(req.path, reg); >> string message = "Hello, "; >> auto app = appender(message); >> app.reserve(32); >> app ~= m[1]; >> res.writeBody(app.data, "text/plain"); >> } >> } >> >> On Mon, Oct 30, 2017 at 5:41 PM, ade90036 via Digitalmars-d < >> digitalmars-d@puremagic.com> wrote: >> >> On Thursday, 21 September 2017 at 13:09:33 UTC, Daniel Kozak wrote: >>> >>> wrong version, this is my letest version: https://paste.ofcode.org/qWsQi kdhKiAywgBpKwANFR On Thu, Sep 21, 2017 at 3:01 PM, Daniel Kozakwrote: my version: https://paste.ofcode.org/RLX7GM6SHh3DjBBHd7wshj > > On Thu, Sep 21, 2017 at 2:50 PM, Sönke Ludwig via Digitalmars-d < > digitalmars-d@puremagic.com> wrote: > > Am 21.09.2017 um 14:41 schrieb Vadim Lopatin: > >> >> [...] >> >>> >>> Oh, sorry, I forgot the reusePort option, so that multiple sockets >> can listen on the same port: >> >> auto settings = new HTTPServerSettings("0.0.0.0:3000"); >> settings.options |= HTTPServerOption.reusePort; >> listenHTTP(settings, ); >> >> > Hi, would it be possible to re-share the example of vibe.d woth >>> multithreaded support. >>> >>> The pastebin link has expired and the pull request doesnt have the >>> latest version. >>> >>> Thanks >>> >>> Ade >>> >> > With vibe.d 0.8.2, even when multiple worker threads are setup, only one > thread handles the requests: > > ``` > import core.thread; > import vibe.d; > import std.experimental.all; > > auto reg = ctRegex!"^/greeting/([a-z]+)$"; > > void main() > { > writefln("Master %d is running", getpid()); > setupWorkerThreads(logicalProcessorCount + 1); > runWorkerTaskDist(); > runApplication(); > } > > void runServer() > { > auto settings = new HTTPServerSettings; > settings.options |= HTTPServerOption.reusePort; > settings.port = 8080; > settings.bindAddresses = ["127.0.0.1"]; > listenHTTP(settings, ); > } > > void handleRequest(HTTPServerRequest req, > HTTPServerResponse res) > { > writeln("My Thread Id: ", to!string(thisThreadID)); > // simulate long runnig task > Thread.sleep(dur!("seconds")(3)); > > if (req.path == "/") > res.writeBody("Hello, World! from " ~ to!string(thisThreadID), > "text/plain"); > else if (auto m = matchFirst(req.path, reg)) > res.writeBody("Hello, " ~ m[1] ~ " from " ~ > to!string(thisThreadID), "text/plain"); > } > ``` > > That could be the reason for slowness. >
Re: Simple web server benchmark - vibe.d is slower than node.js and Go?
On Monday, 30 October 2017 at 17:23:02 UTC, Daniel Kozak wrote: Maybe this one: import vibe.d; import std.regex; import std.array : appender; static reg = ctRegex!"^/greeting/([a-z]+)$"; void main() { setupWorkerThreads(logicalProcessorCount); runWorkerTaskDist(); runApplication(); } void runServer() { auto settings = new HTTPServerSettings; settings.options |= HTTPServerOption.reusePort; settings.port = 3000; settings.serverString = null; listenHTTP(settings, ); } void handleRequest(HTTPServerRequest req, HTTPServerResponse res) { switch(req.path) { case "/": res.writeBody("Hello World", "text/plain"); break; default: auto m = matchFirst(req.path, reg); string message = "Hello, "; auto app = appender(message); app.reserve(32); app ~= m[1]; res.writeBody(app.data, "text/plain"); } } On Mon, Oct 30, 2017 at 5:41 PM, ade90036 via Digitalmars-d < digitalmars-d@puremagic.com> wrote: On Thursday, 21 September 2017 at 13:09:33 UTC, Daniel Kozak wrote: wrong version, this is my letest version: https://paste.ofcode.org/qWsQi kdhKiAywgBpKwANFR On Thu, Sep 21, 2017 at 3:01 PM, Daniel Kozakwrote: my version: https://paste.ofcode.org/RLX7GM6SHh3DjBBHd7wshj On Thu, Sep 21, 2017 at 2:50 PM, Sönke Ludwig via Digitalmars-d < digitalmars-d@puremagic.com> wrote: Am 21.09.2017 um 14:41 schrieb Vadim Lopatin: [...] Oh, sorry, I forgot the reusePort option, so that multiple sockets can listen on the same port: auto settings = new HTTPServerSettings("0.0.0.0:3000"); settings.options |= HTTPServerOption.reusePort; listenHTTP(settings, ); Hi, would it be possible to re-share the example of vibe.d woth multithreaded support. The pastebin link has expired and the pull request doesnt have the latest version. Thanks Ade With vibe.d 0.8.2, even when multiple worker threads are setup, only one thread handles the requests: ``` import core.thread; import vibe.d; import std.experimental.all; auto reg = ctRegex!"^/greeting/([a-z]+)$"; void main() { writefln("Master %d is running", getpid()); setupWorkerThreads(logicalProcessorCount + 1); runWorkerTaskDist(); runApplication(); } void runServer() { auto settings = new HTTPServerSettings; settings.options |= HTTPServerOption.reusePort; settings.port = 8080; settings.bindAddresses = ["127.0.0.1"]; listenHTTP(settings, ); } void handleRequest(HTTPServerRequest req, HTTPServerResponse res) { writeln("My Thread Id: ", to!string(thisThreadID)); // simulate long runnig task Thread.sleep(dur!("seconds")(3)); if (req.path == "/") res.writeBody("Hello, World! from " ~ to!string(thisThreadID), "text/plain"); else if (auto m = matchFirst(req.path, reg)) res.writeBody("Hello, " ~ m[1] ~ " from " ~ to!string(thisThreadID), "text/plain"); } ``` That could be the reason for slowness.
[Issue 18723] New: std.exception.ErrnoException@std/stdio.d(1012): Enforcement failed (Bad file descriptor) when running the simplified benchmark
https://issues.dlang.org/show_bug.cgi?id=18723 Issue ID: 18723 Summary: std.exception.ErrnoException@std/stdio.d(1012): Enforcement failed (Bad file descriptor) when running the simplified benchmark Product: D Version: D2 Hardware: All OS: FreeBSD Status: NEW Severity: major Priority: P1 Component: druntime Assignee: nob...@puremagic.com Reporter: greensunn...@gmail.com Comes from running the simplified benchmark. Maybe this is related to / fixed by: https://github.com/dlang/phobos/pull/6382 -- std.exception.ErrnoException@std/stdio.d(1012): Enforcement failed (Bad file descriptor) gmake -C test/typeinfo MODEL=64 OS=freebsd DMD=/usr/home/braddr/sandbox/at-client/pull-3102365-FreeBSD_64_64/dmd/generated/freebsd/release/64/dmd BUILD=debug \ DRUNTIME=/usr/home/braddr/sandbox/at-client/pull-3102365-FreeBSD_64_64/druntime/generated/freebsd/debug/64/libdruntime.a DRUNTIMESO=/usr/home/braddr/sandbox/at-client/pull-3102365-FreeBSD_64_64/druntime/generated/freebsd/debug/64/libdruntime.so LINKDL= \ QUIET= TIMELIMIT='timelimit -t 10 ' PIC=-fPIC ??:? @safe void std.exception.bailOut!(std.exception.ErrnoException).bailOut(immutable(char)[], ulong, const(char[])) [0x457555] ??:? @safe bool std.exception.enforce!(std.exception.ErrnoException).enforce!(bool).enforce(bool, lazy const(char)[], immutable(char)[], ulong) [0xcd6625] ??:? @safe ubyte[] std.stdio.File.rawRead!(ubyte).rawRead(ubyte[]) [0xd28dd3] ??:? void std.stdio.File.ByChunkImpl.prime() [0xd26875] ??:? ref std.stdio.File.ByChunkImpl std.stdio.File.ByChunkImpl.__ctor(std.stdio.File, ubyte[]) [0xd269ca] ??:? ref std.stdio.File.ByChunkImpl std.stdio.File.ByChunkImpl.__ctor(std.stdio.File, ulong) [0xd268f5] ??:? std.stdio.File.ByChunkImpl std.stdio.File.byChunk(ulong) [0xd26d1b] ??:? std.typecons.Tuple!(int, "status", immutable(char)[], "output").Tuple std.process.executeImpl!(std.process.pipeShell(const(char[]), std.process.Redirect, const(immutable(char)[][immutable(char)[]]), std.process.Config, const(char[]), immutable(char)[]), const(char)[], immutable(char)[]).executeImpl(const(char)[], const(immutable(char)[][immutable(char)[]]), std.process.Config, ulong, const(char[]), immutable(char)[]) [0xd1ea69] ??:? @trusted std.typecons.Tuple!(int, "status", immutable(char)[], "output").Tuple std.process.executeShell(const(char[]), const(immutable(char)[][immutable(char)[]]), std.process.Config, ulong, const(char[]), immutable(char)[]) [0xd1b00a] ??:? std.regex.internal.ir.MatcherFactory!(char).MatcherFactory std.regex.internal.ir.defaultFactory!(char).defaultFactory(ref const(std.regex.internal.ir.Regex!(char).Regex)).thompsonFactory [0x4555a5] ??:? int runbench.runTests(runbench.Config).__foreachbody4(ref immutable(char)[]) [0x45656c] ??:? void std.parallelism.ParallelForeach!(immutable(char)[][]).ParallelForeach.opApply(scope int delegate(ref immutable(char)[])).doIt() [0x489403] ??:? void std.parallelism.run!(void delegate()).run(void delegate()) [0xd159af] ??:? void std.parallelism.Task!(std.parallelism.run, void delegate()).Task.impl(void*) [0xd154eb] ??:? void std.parallelism.AbstractTask.job() [0xd13e66] ??:? void std.parallelism.TaskPool.doJob(std.parallelism.AbstractTask*) [0xd1403f] ??:? void std.parallelism.TaskPool.executeWorkLoop() [0xd14199] ??:? void std.parallelism.TaskPool.startWorkLoop() [0xd14140] ??:? void core.thread.Thread.run() [0xe134c3] ??:? thread_entryPoint [0xe1264c] ??:? pthread_create [0x11f1bc4] ??:? ??? [0x] --- --
[Issue 18067] Benchmark example is broken on the frontpage
https://issues.dlang.org/show_bug.cgi?id=18067 --- Comment #3 from github-bugzi...@puremagic.com --- Commits pushed to stable at https://github.com/dlang/dlang.org https://github.com/dlang/dlang.org/commit/93b97d25036f51d45d0160b72f8fed477aa70395 Fix Issue 18067 - Benchmark example is broken on the frontpage https://github.com/dlang/dlang.org/commit/8a22f440176180d74ea193fb2d9e459c321b44be Merge pull request #1954 from wilzbach/fix-benchmark-frontpage --
[Issue 18067] Benchmark example is broken on the frontpage
https://issues.dlang.org/show_bug.cgi?id=18067 github-bugzi...@puremagic.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --
[Issue 18067] Benchmark example is broken on the frontpage
https://issues.dlang.org/show_bug.cgi?id=18067 --- Comment #2 from github-bugzi...@puremagic.com --- Commits pushed to master at https://github.com/dlang/dlang.org https://github.com/dlang/dlang.org/commit/93b97d25036f51d45d0160b72f8fed477aa70395 Fix Issue 18067 - Benchmark example is broken on the frontpage https://github.com/dlang/dlang.org/commit/8a22f440176180d74ea193fb2d9e459c321b44be Merge pull request #1954 from wilzbach/fix-benchmark-frontpage Fix Issue 18067 - Benchmark example is broken on the frontpage merged-on-behalf-of: Mike Franklin <jins...@users.noreply.github.com> --
[Issue 18067] Benchmark example is broken on the frontpage
https://issues.dlang.org/show_bug.cgi?id=18067 Sebchanged: What|Removed |Added Keywords||pull --- Comment #1 from Seb --- PR https://github.com/dlang/dlang.org/pull/1954 --