[Issue 5650] A RedBlackTree performance problem
https://issues.dlang.org/show_bug.cgi?id=5650 berni44 changed: What|Removed |Added Status|ASSIGNED|RESOLVED CC||bugzi...@d-ecke.de Resolution|--- |WORKSFORME --- Comment #29 from berni44 --- D is meanwhile on my computer almost twice as fast as java (0.21 to 0.36); IMHO this isn't an issue anymore. --
[Issue 5650] A RedBlackTree performance problem
https://issues.dlang.org/show_bug.cgi?id=5650 Andrei Alexandrescuchanged: What|Removed |Added Keywords||bootcamp CC||and...@erdani.com --
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #28 from safety0ff.bugz safety0ff.b...@gmail.com 2013-10-08 19:20:03 PDT --- (In reply to comment #27) There are other reports for more specific RedBlack tree performance issues. What ones? Issue #9513 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 safety0ff.b...@gmail.com changed: What|Removed |Added CC||safety0ff.b...@gmail.com --- Comment #26 from safety0ff.b...@gmail.com 2013-10-03 06:29:10 PDT --- (In reply to comment #16) Now on my PC the D version (using redBlackTree) is just 3.7 times slower than the Java version that uses boxed Integers. This is not good enough still, but now redBlackTree is usable (DMD 2.060alpha, 32 bit, Windows): Timings, seconds: D version: 1.65 D versions + GC.disable() at the top: 0.78 Java version: 0.45 The current situation on my PC is the following: [1] Java version: 0.26 [2] D version: 0.28 [2] D version + GC.disable() at the top: 0.28 [3] D version: 0.45 [3] D version + GC.disable() at the top: 0.21 [4] C++ version: 0.11 [5] C++ version: 0.12 There are other reports for more specific RedBlack tree performance issues. The performance seems to be in the ballpark you were looking for (on my PC.) Can this bug be closed? [1]Java version 1.7.0_40 [2]DMD64 D Compiler v2.063.2 Options: -O -release -noboundscheck [3]the LLVM D compiler (0.11.0): based on DMD v2.062 and LLVM 3.3 Options: -O -release -noboundscheck [4]g++ 4.7.3 Options: -O -DNDEBUG [5]clang version 3.3 Options: -O -DNDEBUG -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #22 from Steven Schveighoffer schvei...@yahoo.com 2012-05-25 11:29:52 PDT --- (In reply to comment #20) BTW, I noticed changing the C++ implementation to cache the begin node did *not* change performance. So it may be they are caching the begin element internally since it is likely one of the most requested elements. Without that, because the begin element is very likely to be at the bottom of the tree, saving those lookups might be how they have improved performance. We can certainly add such an improvement to RedBlackTree. See if this helps: https://github.com/D-Programming-Language/phobos/pull/605 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #23 from github-bugzi...@puremagic.com 2012-05-25 12:19:50 PDT --- Commits pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/a83d5f13881ce79935ee73747948ab3b15ea633d issue 5650 - A RedBlackTree performance problem Caching first node (as _begin) to improve lookup performance of first node. https://github.com/D-Programming-Language/phobos/commit/965df0300ae635342082b6537f719da5c926e5ff Merge pull request #605 from schveiguy/rbtree_begin issue 5650 - A RedBlackTree performance problem -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #24 from bearophile_h...@eml.cc 2012-05-25 12:59:19 PDT --- (In reply to comment #22) See if this helps: https://github.com/D-Programming-Language/phobos/pull/605 I have tried this latest change. Unfortunately both this benchmark, and another larger program of mine that uses RedBlackTree a lot are now a bit (2-3%) slower than before; both with GC enabled and with GC disabled. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #25 from Steven Schveighoffer schvei...@yahoo.com 2012-05-25 13:05:10 PDT --- That doesn't make any sense. It resulted in a 100ms speedup on my end (about 20%). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #21 from bearophile_h...@eml.cc 2012-05-24 15:54:51 PDT --- I have asked to people on IRC #D to time the D and C++ versions. Two persons have seen a 4.8 X and 4.6 X speed differences (optimized builds, with G++ using -Ofast -s -flto, MSVC 2010, using standard release build options /OPT:REF /OPT:ICF, and D code compiled with DMD -O -release -inline). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #16 from bearophile_h...@eml.cc 2012-05-22 14:56:09 PDT --- Now on my PC the D version (using redBlackTree) is just 3.7 times slower than the Java version that uses boxed Integers. This is not good enough still, but now redBlackTree is usable (DMD 2.060alpha, 32 bit, Windows): Timings, seconds: D version: 1.65 D versions + GC.disable() at the top: 0.78 Java version: 0.45 I think a good D version has to run this benchmark in 0.2 - 0.3 seconds. And indeed, this C++ program (that uses a STL ordered set, that is usually implemented with a Red-Black Tree) runs in 0.27 seconds on my PC (GCC 4.7.0, -Ofast -flto -s), it's about 6.1 times faster than the current D code (that uses a different and less efficient back-end): #include set #include stdio.h using namespace std; int main() { const int range = 100; const int n = 100; setint t; t.insert(0); for (int i = 0; i n; i++) { if (i range) t.erase(t.begin()); t.insert(i); } /* printf(%u\n, t.size()); // prints 101 for (setint::iterator it = t.begin(); it != t.end(); it++) printf(%d , *it); printf(\n); */ } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #17 from Steven Schveighoffer schvei...@yahoo.com 2012-05-22 16:40:50 PDT --- If the issue is that my redblacktree implementation isn't fast enough, you are welcome to try and improve it. I implemented it based on the algorithm I found inside my CLR Algorithms book. I have never pretended really to be an expert in RBT implementation. I have no idea how optimized it is, or how much more optimized it could be. I made some refactoring adjustments from the standard algorithm, and the red black algorithm in my code is well documented. I also have not looked at any of the g++ STL code, so I have no idea how it compares to that. Obviously I haven't looked at any Java implementations. On my system, the D version with dcollections, which uses the same RBT implementation as std.algorithm, takes about 380 milliseconds (-release -O -inline), while the equivalent C++ version takes 250 ms. I don't think I have the JDK installed, so I cannot test the java version. I don't think the GC is really hindering performance any more, it's just pure algorithmic + compiler optimization comparison. BTW, MAKE SURE you compile with -release for testing, because without this, the runtime calls Object.invariant for every method call. Here is the optimized-for-dcollections code (which keeps track of the begin element), you can't do this in std.container, since the concept of cursors does not exist: import std.stdio; import dcollections.TreeSet; void main() { enum int range = 100; enum int n = 1_000_000; auto t = new TreeSet!int(0); auto nextToRemove = t.begin; for (int i = 0; i n; i++) { if (i range) nextToRemove = t.remove(nextToRemove); t.add(i); } //writeln(walkLength(t[])); //writeln(t[]); } Same code in C++: #include set #include stdio.h using namespace std; int main() { const int range = 100; const int n = 100; setint t; t.insert(0); setint::iterator nextToRemove = t.begin(); for (int i = 0; i n; i++) { if (i range) { setint::iterator tmp = nextToRemove; tmp++; t.erase(nextToRemove); nextToRemove = tmp; } t.insert(i); } /* printf(%u\n, t.size()); // prints 101 for (setint::iterator it = t.begin(); it != t.end(); it++) printf(%d , *it); printf(\n); */ } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #18 from Steven Schveighoffer schvei...@yahoo.com 2012-05-22 16:42:17 PDT --- (In reply to comment #17) On my system, the D version with dcollections, which uses the same RBT implementation as std.algorithm... should be std.container, not std.algorithm! -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #19 from bearophile_h...@eml.cc 2012-05-22 18:00:40 PDT --- (In reply to comment #17) Thank you for implementing the Red-Black Tree. On my system, the D version with dcollections, which uses the same RBT implementation as std.algorithm, takes about 380 milliseconds (-release -O -inline), while the equivalent C++ version takes 250 ms. There is a large difference between your D timings and mine. BTW, MAKE SURE you compile with -release for testing, because without this, the runtime calls Object.invariant for every method call. I have used -release too. I will wait for other people to time the D and C++ versions. If they confirm such small timing difference (like 380 ms Vs of 250 ms), we can close this bug report. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #20 from Steven Schveighoffer schvei...@yahoo.com 2012-05-22 18:06:17 PDT --- (In reply to comment #19) I will wait for other people to time the D and C++ versions. If they confirm such small timing difference (like 380 ms Vs of 250 ms), we can close this bug report. Be aware that my comparisons are between dcollections and the C++ code, not std.container. The two differences that made an impact I think are: 1. Using a custom allocator that recycles RBNode instances without going through the GC 2. Caching the first element that was to be removed (i.e. the begin element) instead of asking the container to look it up each time. Neither of those are supported via the current std.container.RedBlackTree. With that implementation, my timing is about 580ms. BTW, I noticed changing the C++ implementation to cache the begin node did *not* change performance. So it may be they are caching the begin element internally since it is likely one of the most requested elements. Without that, because the begin element is very likely to be at the bottom of the tree, saving those lookups might be how they have improved performance. We can certainly add such an improvement to RedBlackTree. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 d...@dawgfoto.de changed: What|Removed |Added CC||d...@dawgfoto.de --- Comment #15 from d...@dawgfoto.de 2012-04-29 07:04:14 PDT --- Gr... I hate how dmd makes poor decisions about inlining and then won't let you force the issue... This is an issue with how the exception handling alters the control flow. Perhaps annotating the properties with nothrow would have a similar effect. Anyhow scope (success) means that you'll have to temporarily store the return value, so a small penalty will remain in most cases. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #13 from Steven Schveighoffer schvei...@yahoo.com 2012-04-27 03:29:07 PDT --- (In reply to comment #12) With scope(success) ++_length in RBTree._add the property calls are not inlined. Gr... I hate how dmd makes poor decisions about inlining and then won't let you force the issue... Your optimization ignores the multiple option, it will have to be fixed. Then new RBTree!Elem allocates 33 bytes spending lots of time and space for the unused 31 bytes. This will be fixed when allocators are added. I think we can hold off on this optimization. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #14 from SomeDude lovelyd...@mailmetrash.com 2012-04-27 08:40:08 PDT --- (In reply to comment #11) I noticed that just after compilation, the first run is always slightly longer than the following ones by about 50 to 80 ms. The other observation, - and that's crazy -, is if I uncomment the line writeln(walkLength(t[])); I am consistently *faster* by 10 to 25 ms than when it's commented !! My dmd measurements are done via Powershell command: Measure-Command{./bug.exe} OK, forget it, it turns out using Measure-Command{...} was a rather poor way of measuring runtimes because of startup/shutdown. Using StopWatch, I now get consistent results. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #12 from d...@dawgfoto.de 2012-04-26 19:54:08 PDT --- Created an attachment (id=1099) optimzations dmd -O -release -inline rbtree.d time ./rbtree 0.566 total With scope(success) ++_length in RBTree._add the property calls are not inlined. time ./rbtree 0.362 total Then new RBTree!Elem allocates 33 bytes spending lots of time and space for the unused 31 bytes. Using GC.malloc directly: time ./rbtree 0.228 total -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #9 from SomeDude lovelyd...@mailmetrash.com 2012-04-25 00:56:28 PDT --- Ah ok, my measurements (Core2Duo 2,14GHz, 2Gb RAM WinXP SP3): range = 100_000_000 n = 10_000_000 Eclipse, JVM1.6 with option -Xmx512M : 9781 ms 537Mb RAM (note that I don't have a server JVM) dmd -O -release -inline (2.059): 25367 ms , 321 Mb RAM -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #10 from SomeDude lovelyd...@mailmetrash.com 2012-04-25 01:12:19 PDT --- For range = 100, n = 10_000_000 Eclipse, JVM1.6 with option -Xmx512M : 381 ms dmd -O -release -inline (2.059): around 1800 ms -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #11 from SomeDude lovelyd...@mailmetrash.com 2012-04-25 01:24:40 PDT --- I noticed that just after compilation, the first run is always slightly longer than the following ones by about 50 to 80 ms. The other observation, - and that's crazy -, is if I uncomment the line writeln(walkLength(t[])); I am consistently *faster* by 10 to 25 ms than when it's commented !! My dmd measurements are done via Powershell command: Measure-Command{./bug.exe} -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #8 from bearophile_h...@eml.cc 2012-04-24 18:58:41 PDT --- (In reply to comment #7) This test no longer compiles in 2.059: PS E:\DigitalMars\dmd2\samples rdmd bug bug.d(7): Error: no property 'opCall' for type 'std.container.RedBlackTree!(int).RedBlackTree' Now RedBlackTree is a class. So replace this line: auto t = RedBlackTree!int(0); With: auto t = redBlackTree!int(0); -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 SomeDude lovelyd...@mailmetrash.com changed: What|Removed |Added CC||lovelyd...@mailmetrash.com --- Comment #7 from SomeDude lovelyd...@mailmetrash.com 2012-04-22 16:36:01 PDT --- This test no longer compiles in 2.059: PS E:\DigitalMars\dmd2\samples rdmd bug bug.d(7): Error: no property 'opCall' for type 'std.container.RedBlackTree!(int).RedBlackTree' -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 Steven Schveighoffer schvei...@yahoo.com changed: What|Removed |Added Status|NEW |ASSIGNED AssignedTo|nob...@puremagic.com|schvei...@yahoo.com --- Comment #1 from Steven Schveighoffer schvei...@yahoo.com 2011-02-24 14:08:18 PST --- This is an odd thing. dcollections way outperforms std.container.RedBlackTree, but I'm positive this is because of the custom allocator. On my PC, the phobos version takes anywhere from 26 to 38 seconds. The dcollections version takes .6 seconds. However, every few runs, the dcollections version goes to 25 seconds. I have no idea why. The result seems to be the same, so I'm not sure what is causing that big pause. I'll have to investigate it more. But in the short run, I'd recommend using dcollections until std.container figures out how it will do custom allocation. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #2 from Steven Schveighoffer schvei...@yahoo.com 2011-02-24 14:24:25 PST --- :O Cannot, I mean *really* cannot believe this, but this was still in my d2 code: http://www.dsource.org/projects/dcollections/browser/branches/d2/dcollections/DefaultAllocator.d#L325 Which means the chunk allocator was not even being used (BTW, it has a bug in it, for some reason it seg faults, but that's not a dmd or phobos problem). I'd say the problem is definitely the GC running too often. If I print out each loop, and in a debugger hit ctrl-C when it pauses, it's invariably in GC.fullCollect. This looks like a GC problem, I wonder if David's recent changes help... I'm going to leave this bug open. We'll see how the thing works on the next version with David's GC improvements. In the meantime, I'll get dcollections fixed, and maybe that can be an interim solution. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #3 from bearophile_h...@eml.cc 2011-02-24 14:27:34 PST --- The D version runs in about 20.6 seconds, the Java -server version in about 0.49 seconds, and this C++ (GCC 4.5.1 -O3) version in about 0.3 seconds (this is not a fully apples-to-apples comparison because this doesn't use a GC, but it's good as reference point): #include stdio.h #include set using namespace std; int main() { const int range = 100; const int n = 100; setint t; for (int i = 0; i n; i++) { if (i range) t.erase(t.begin()); t.insert(i); } printf(%d\n, t.size()); //for (setint::iterator it=t.begin(); it!=t.end(); ++it) //printf(%d , *it); //printf(\n); return 0; } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #4 from Steven Schveighoffer schvei...@yahoo.com 2011-02-24 14:36:47 PST --- Fixed the dcollections custom allocator, time is now consistently .5 seconds. I noticed the phobos version also occasionally dips down to .5 or .6 seconds. I'm unsure why sometimes the collect cycles take so long. Perhaps there are false pointer problems, the RBNode elements contain pointers and also the ints. This is definitely a GC issue. Not much RedBlackTree can do about it. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #5 from David Simcha dsim...@yahoo.com 2011-02-24 14:40:27 PST --- (In reply to comment #2) This looks like a GC problem, I wonder if David's recent changes help... I'm going to leave this bug open. We'll see how the thing works on the next version with David's GC improvements. I'd doubt that the changes I've posted to Bugzilla so far would help this benchmark, because those changes focus entirely on large (2KB) allocations. However, in rebuilding my mental model of the GC, I've come up with several smaller (i.e. a few percent each) ideas for optimizing the GC. These are currently half-implemented. I'm going to fork druntime on GitHub and commit them all there, and show the progressive performance improvements on a few different benchmarks with each one on the GitHub wiki page for my fork. This, I guess, will be added to my list of GC benchmarks for small allocations, though I'm not sure if it will tell us anything that Bearophile's other recent tree-building GC benchmark didn't. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5650] A RedBlackTree performance problem
http://d.puremagic.com/issues/show_bug.cgi?id=5650 --- Comment #6 from Steven Schveighoffer schvei...@yahoo.com 2011-02-24 14:48:54 PST --- I have a really good suspicion that this has more to do with false pointer problems than GC performance. The only reason I feel this way is because between you can run this program 5 times, and 1 time, it will take 38 seconds, and the other 4 it takes .5 seconds. Everything in this little test program should be deterministic *except* what address segment the OS gives the GC... -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---