Re: 200-600x slower Dlang performance with nested foreach loop
On Wed, Jan 27, 2021 at 01:28:33AM +, Paul Backus via Digitalmars-d-learn wrote: > On Tuesday, 26 January 2021 at 23:57:43 UTC, methonash wrote: > > > Using AA's may not necessarily improve performance. It depends on > > > what your code does with it. Because AA's require random access > > > to memory, it's not friendly to the CPU cache hierarchy, whereas > > > traversing linear arrays is more cache-friendly and in some cases > > > will out-perform AA's. > > > > I figured a built-in AA might be an efficient path to performing > > unique string de-duplication. If there's a more performant method > > available, I'll certainly try it. > > You could try sorting the array first, and then using `uniq` [1] to > discard duplicate elements. There's an example in the docs that shows > how to do this in-place (without allocating additional memory). > > [1] http://phobos.dpldocs.info/std.algorithm.iteration.uniq.html Yes, definitely try this. This will completely eliminate the overhead of using an AA, which has to allocate memory (at least) once per entry added. Especially since the data has to be sorted eventually anyway, you might as well sort first then use the sortedness as a convenient property for fast de-duplication. Since .uniq traverses the range linearly, this will be cache-friendly, and along with eliminating GC load should give you a speed boost. T -- Nearly all men can stand adversity, but if you want to test a man's character, give him power. -- Abraham Lincoln
Re: 200-600x slower Dlang performance with nested foreach loop
On Tuesday, 26 January 2021 at 23:57:43 UTC, methonash wrote: Using AA's may not necessarily improve performance. It depends on what your code does with it. Because AA's require random access to memory, it's not friendly to the CPU cache hierarchy, whereas traversing linear arrays is more cache-friendly and in some cases will out-perform AA's. I figured a built-in AA might be an efficient path to performing unique string de-duplication. If there's a more performant method available, I'll certainly try it. You could try sorting the array first, and then using `uniq` [1] to discard duplicate elements. There's an example in the docs that shows how to do this in-place (without allocating additional memory). [1] http://phobos.dpldocs.info/std.algorithm.iteration.uniq.html
Re: 200-600x slower Dlang performance with nested foreach loop
On Tuesday, 26 January 2021 at 18:17:31 UTC, H. S. Teoh wrote: Do not do this. Every time you call .array it allocates a new array and copies all its contents over. If this code runs frequently, it will cause a big performance hit, not to mention high GC load. The function you're looking for is .release, not .array. Many thanks for the tip! I look forward to trying this soon. For reference, the .array call is only performed once. That nested loop is an O(n^2) algorithm. Meaning it will slow down *very* quickly as the size of the array n increases. You might want to think about how to improve this algorithm. Nice observation, and yes, this would typically be an O(n^2) approach. However, due to subsetting the input dataset to unique strings and then sorting in descending length, one might notice that the inner foreach loop does not iterate over all of n, only on the iterator value i+1 through the end of the array. Thus, I believe this would then become approximately O(n^2/2). More precisely, it should be O( ( n^2 + n ) / 2 ). Further: the original dataset has 64k strings. Squaring that yields 4.1 billion string comparisons. Once uniquely de-duplicated, the dataset is reduced to ~46k strings. Considering roughly O(n^2/2) at this level, this yields 1.06 billion string comparisons. So, performing steps 1 through 3 improves the brute-force string comparison problem four-fold in my test development dataset. Using AA's may not necessarily improve performance. It depends on what your code does with it. Because AA's require random access to memory, it's not friendly to the CPU cache hierarchy, whereas traversing linear arrays is more cache-friendly and in some cases will out-perform AA's. I figured a built-in AA might be an efficient path to performing unique string de-duplication. If there's a more performant method available, I'll certainly try it. First of all, you need to use a profiler to identify where the hotspots are. Otherwise you could well be pouring tons of effort into "optimizing" code that doesn't actually need optimizing, while completely missing the real source of the problem. Whenever you run into performance problems, do not assume you know where the problem is, profile, profile, profile! Message received. Given that D is the first compiled language I've semi-seriously dabbled with, I have no real experience with profiler usage. Second, you only posted a small fragment of your code, so it's hard to say where the problem really is. I can only guess based on what you described. If you could post the entire program, or at least a complete, compilable and runnable excerpt thereof that displays the same (or similar) performance problems, then we could better help you pinpoint where the problem is. Yes, I'll be looking to present a complete, compilable, and executable demo of code for this issue if/when subsequent efforts continue to fail. For professional reasons (because I no longer work in academia), I cannot share the original source code for the issue presented here, but I can attempt to reproduce it in a minimally complete form for a public dataset.
Re: is core Mutex lock "fast"?
On 1/26/21 4:40 PM, ludo wrote: However, I think this is all moot, druntime is the same as Tango. Moot you mean debatable? Or irrelevant :) I *think* it's irrelevant. The comment makes it sound like it's slightly different than Tango, but for sure reentrant locks are possible with D2 phobos. I am *100% sure* that druntime is from Tango, because I was there when it was a controversy and this was the solution. Thanks to your explanations, I understand now that the dev tried to imitate a Tango feature with very old D1 code. This is 2005/2009 code And as pointed out by IGotD-, better not to mess around with synch / reinvent the wheel. I will trash this class. Still learning a lot, thank you guys. Yes, I agree. I'm no expert on thread atomics, but I know enough to know I shouldn't mess with the tried-and-true primitives that are generally used. Consider that very very smart people have tried to write "lock free" stuff and commonly fail (and some dumb people, including me). And when you fail here, it's not a failure you see immediately, because it may happen once in a blue moon. Performance is irrelevant if it's not correct. -Steve
Re: 200-600x slower Dlang performance with nested foreach loop
On Tuesday, 26 January 2021 at 21:55:47 UTC, mw wrote: On Tuesday, 26 January 2021 at 17:40:36 UTC, methonash wrote: foreach( i, ref pStr; sortedArr ) { foreach( j, ref cStr; sortedArr[ i + 1 .. $ ] ) { if( indexOf( pStr, cStr ) > -1 ) { // ... yourInnerOp } } } Before adding the code excerpt above, the Dlang program was taking ~1 second on an input file containing approx. 64,000 strings. What's the typical length of your strings? Actually, I think it's reasonable given your algo: Your algo (double-loop) is O(n^2), n = 64,000 so the loop will run n^2 = 4,096,000,000 i.e 4G Suppose your CPU is 2GHz, and suppose each loop operation take just 1 machine cycle (very unlikely), this algo will take 2 seconds. However, string searching i.e `indexOf`, or `yourInnerLoop` can easily take hundreds of cycles, let's suppose it's 100 machine cycles (still a very low estimate), then the algo will take ~200 seconds = ~3.3 minutes. If you want, you can try to rewrite your algo in Java or Python, and compare the run time with the Dlang version.
Re: 200-600x slower Dlang performance with nested foreach loop
On Tuesday, 26 January 2021 at 17:40:36 UTC, methonash wrote: foreach( i, ref pStr; sortedArr ) { foreach( j, ref cStr; sortedArr[ i + 1 .. $ ] ) { if( indexOf( pStr, cStr ) > -1 ) { // ... } } } Before adding the code excerpt above, the Dlang program was taking ~1 second on an input file containing approx. 64,000 strings. What's the typical length of your strings?
Re: is core Mutex lock "fast"?
On Tuesday, 26 January 2021 at 21:09:34 UTC, Steven Schveighoffer wrote: The only item that is read without being locked is owner. If you change that to an atomic read and write, it should be fine (and is likely fine on x86* without atomics anyway). All the other data is protected by the actual mutex, and so should be synchronous. However, I think this is all moot, druntime is the same as Tango. -Steve Yes, I didn't see the lock would block subsequent threads. Both pthread_mutex_lock and EnterCriticalSection do exactly the same as FastLock and the only difference is the the check is "closer" to the running code. Performance increase should be low. As I was wrong about the thread safety, I will not write here any further.
Re: is core Mutex lock "fast"?
However, I think this is all moot, druntime is the same as Tango. Moot you mean debatable? Or irrelevant :) Thanks to your explanations, I understand now that the dev tried to imitate a Tango feature with very old D1 code. This is 2005/2009 code And as pointed out by IGotD-, better not to mess around with synch / reinvent the wheel. I will trash this class. Still learning a lot, thank you guys.
Re: dustmite on dub project
On 1/26/21 3:47 PM, Andre Pany wrote: On Tuesday, 26 January 2021 at 20:36:58 UTC, Steven Schveighoffer wrote: On 1/26/21 3:17 PM, Andre Pany wrote: On Tuesday, 26 January 2021 at 20:09:27 UTC, Steven Schveighoffer wrote: Hold on, where do you see this? mysql-native has dub.sdl, and it doesn't have these in there. I executed `dub init sample` and added in the interactive console the dependency `mysql-native`. It added `mysql-native ~> 3.0.0`. In the local dub package folder I opened dub.json of package mysql-native. I assume dub converts dub.sdl to dub.json while fetching packages. Here I found the content of this https://github.com/mysql-d/mysql-native/blob/master/dub.sdl just as JSON formatted. Oh wow. Weird. No, the dub.sdl does NOT contain the exclusion of the package.d file (see in the file you actually linked). So dub-registry is doing this? or is it dub? Now I need to load a previous version of dmd and see if this works. I think dub is doing this. I remember there was a recent PR doing some things with excludeSourceFiles: https://github.com/dlang/dub/pull/2039/files Hah, I found it too. Thanks so much, this likely saved me another 2 days of dustmiting. -Steve
Re: is core Mutex lock "fast"?
On 1/26/21 3:56 PM, IGotD- wrote: That code isn't thread safe at all (assuming FastLock is used from several threads). lockCount isn't atomic which means the code will not work with several threads.> Also the assignment of the variable owner isn't thread safe. As soon you start to include more that one supposedly atomic assignment in synchronization primitives, things quickly get out of hand. The only item that is read without being locked is owner. If you change that to an atomic read and write, it should be fine (and is likely fine on x86* without atomics anyway). All the other data is protected by the actual mutex, and so should be synchronous. However, I think this is all moot, druntime is the same as Tango. -Steve
Re: is core Mutex lock "fast"?
On Tuesday, 26 January 2021 at 18:07:06 UTC, ludo wrote: Hi guys, still working on old D1 code, to be updated to D2. At some point the previous dev wrote a FastLock class. The top comment is from the dev himself, not me. My question is after the code. --- class FastLock { protected Mutex mutex; protected int lockCount; protected Thread owner; /// this() { mutex = new Mutex(); } /** * This works the same as Tango's Mutex's lock()/unlock except provides extra performance in the special case where * a thread calls lock()/unlock() multiple times while it already has ownership from a previous call to lock(). * This is a common case in Yage. * * For convenience, lock() and unlock() calls may be nested. Subsequent lock() calls will still maintain the lock, * but unlocking will only occur after unlock() has been called an equal number of times. * * On Windows, Tango's lock() is always faster than D's synchronized statement. */ void lock() { auto self = Thread.getThis(); if (self !is owner) { mutex.lock(); owner = self; } lockCount++; } void unlock() /// ditto { assert(Thread.getThis() is owner); lockCount--; if (!lockCount) { owner = null; mutex.unlock(); } } } --- Now if I look at the doc , in particular Class core.sync.mutex.Mutex, I see: --- lock () If this lock is not already held by the caller, the lock is acquired, then the internal counter is incremented by one. -- Which looks exactly like the behavior of "fastLock". Is it so that the old Tango's mutex lock was not keeping count and would lock the same object several time? Do we agree that the FastLock class is obsolete considering current D core? cheers That code isn't thread safe at all (assuming FastLock is used from several threads). lockCount isn't atomic which means the code will not work with several threads. Also the assignment of the variable owner isn't thread safe. As soon you start to include more that one supposedly atomic assignment in synchronization primitives, things quickly get out of hand. Normal D Mutex uses pthread_mutex on Linux and the usual CriticalSection stuff on Windows. Neither is particularly fast. Futex on Linux isn't exactly super fast either. Synchronization primitives aren't exactly fast on any system because that's how it is. There are ways to makes things faster but adding stuff like timeouts and the complexity goes exponential. There so many pitfalls with synchronization primitives that it is hardly worth it making your own.
Re: dustmite on dub project
On Tuesday, 26 January 2021 at 20:36:58 UTC, Steven Schveighoffer wrote: On 1/26/21 3:17 PM, Andre Pany wrote: On Tuesday, 26 January 2021 at 20:09:27 UTC, Steven Schveighoffer wrote: Hold on, where do you see this? mysql-native has dub.sdl, and it doesn't have these in there. I executed `dub init sample` and added in the interactive console the dependency `mysql-native`. It added `mysql-native ~> 3.0.0`. In the local dub package folder I opened dub.json of package mysql-native. I assume dub converts dub.sdl to dub.json while fetching packages. Here I found the content of this https://github.com/mysql-d/mysql-native/blob/master/dub.sdl just as JSON formatted. Oh wow. Weird. No, the dub.sdl does NOT contain the exclusion of the package.d file (see in the file you actually linked). So dub-registry is doing this? or is it dub? Now I need to load a previous version of dmd and see if this works. -Steve I think dub is doing this. I remember there was a recent PR doing some things with excludeSourceFiles: https://github.com/dlang/dub/pull/2039/files Kind regards André
Re: dustmite on dub project
On 1/26/21 3:36 PM, Steven Schveighoffer wrote: On 1/26/21 3:17 PM, Andre Pany wrote: On Tuesday, 26 January 2021 at 20:09:27 UTC, Steven Schveighoffer wrote: Hold on, where do you see this? mysql-native has dub.sdl, and it doesn't have these in there. I executed `dub init sample` and added in the interactive console the dependency `mysql-native`. It added `mysql-native ~> 3.0.0`. In the local dub package folder I opened dub.json of package mysql-native. I assume dub converts dub.sdl to dub.json while fetching packages. Here I found the content of this https://github.com/mysql-d/mysql-native/blob/master/dub.sdl just as JSON formatted. Oh wow. Weird. No, the dub.sdl does NOT contain the exclusion of the package.d file (see in the file you actually linked). So dub-registry is doing this? or is it dub? Now I need to load a previous version of dmd and see if this works. It's dub. https://github.com/dlang/dub/pull/2039 Not sure if I agree with this change. -Steve
Re: dustmite on dub project
On 1/26/21 3:17 PM, Andre Pany wrote: On Tuesday, 26 January 2021 at 20:09:27 UTC, Steven Schveighoffer wrote: Hold on, where do you see this? mysql-native has dub.sdl, and it doesn't have these in there. I executed `dub init sample` and added in the interactive console the dependency `mysql-native`. It added `mysql-native ~> 3.0.0`. In the local dub package folder I opened dub.json of package mysql-native. I assume dub converts dub.sdl to dub.json while fetching packages. Here I found the content of this https://github.com/mysql-d/mysql-native/blob/master/dub.sdl just as JSON formatted. Oh wow. Weird. No, the dub.sdl does NOT contain the exclusion of the package.d file (see in the file you actually linked). So dub-registry is doing this? or is it dub? Now I need to load a previous version of dmd and see if this works. -Steve
Re: dustmite on dub project
On Tuesday, 26 January 2021 at 20:09:27 UTC, Steven Schveighoffer wrote: On 1/26/21 2:59 PM, Steven Schveighoffer wrote: On 1/26/21 2:41 PM, Andre Pany wrote: For your specific problem, this issue is related to your dub.json: "configurations": [ { "excludedSourceFiles": [ "source/mysql/package.d" ], "name": "application", "targetType": "executable", "versions": [ "VibeCustomMain" ] }, { "excludedSourceFiles": [ "source/app.d", "source/mysql/package.d" ], "name": "library", "targetType": "library" } ], If you remove the excludedSourceFiles from config "library", it is working. But at the moment I am puzzled, what is going on here. Wait, this makes no sense. I'm going to have to figure out why those are added. And THANK YOU for seeing that. That is definitely the issue (ModuleInfoZ is the module info for a module) Hold on, where do you see this? mysql-native has dub.sdl, and it doesn't have these in there. -Steve I executed `dub init sample` and added in the interactive console the dependency `mysql-native`. It added `mysql-native ~> 3.0.0`. In the local dub package folder I opened dub.json of package mysql-native. I assume dub converts dub.sdl to dub.json while fetching packages. Here I found the content of this https://github.com/mysql-d/mysql-native/blob/master/dub.sdl just as JSON formatted. For this package I can reproduce the linker error. Kind regards André
Re: is core Mutex lock "fast"?
On 1/26/21 1:07 PM, ludo wrote: Hi guys, still working on old D1 code, to be updated to D2. At some point the previous dev wrote a FastLock class. The top comment is from the dev himself, not me. My question is after the code. [snip] Is it so that the old Tango's mutex lock was not keeping count and would lock the same object several time? Do we agree that the FastLock class is obsolete considering current D core? Just want to point out that druntime was based off of Tango's runtime. So I would expect D2's mutex to have at least as much performance as Tango's mutex. If this is D1 code, it was before this happened (D1 phobos did not share a runtime with Tango). -Steve
Re: dustmite on dub project
On 1/26/21 2:59 PM, Steven Schveighoffer wrote: On 1/26/21 2:41 PM, Andre Pany wrote: For your specific problem, this issue is related to your dub.json: "configurations": [ { "excludedSourceFiles": [ "source/mysql/package.d" ], "name": "application", "targetType": "executable", "versions": [ "VibeCustomMain" ] }, { "excludedSourceFiles": [ "source/app.d", "source/mysql/package.d" ], "name": "library", "targetType": "library" } ], If you remove the excludedSourceFiles from config "library", it is working. But at the moment I am puzzled, what is going on here. Wait, this makes no sense. I'm going to have to figure out why those are added. And THANK YOU for seeing that. That is definitely the issue (ModuleInfoZ is the module info for a module) Hold on, where do you see this? mysql-native has dub.sdl, and it doesn't have these in there. -Steve
Re: dustmite on dub project
On 1/26/21 2:41 PM, Andre Pany wrote: For your specific problem, this issue is related to your dub.json: "configurations": [ { "excludedSourceFiles": [ "source/mysql/package.d" ], "name": "application", "targetType": "executable", "versions": [ "VibeCustomMain" ] }, { "excludedSourceFiles": [ "source/app.d", "source/mysql/package.d" ], "name": "library", "targetType": "library" } ], If you remove the excludedSourceFiles from config "library", it is working. But at the moment I am puzzled, what is going on here. Wait, this makes no sense. I'm going to have to figure out why those are added. And THANK YOU for seeing that. That is definitely the issue (ModuleInfoZ is the module info for a module) what really bugs me is that this only seemed to be happening with DMD 2.095. The simple app worked with a different version of the compiler (2.094 I think, but I have to reinstall to figure it out). I feel like this was added by the previous author to fix some quirky issue with either dub or the compiler. It could be related to documentation too. -Steve
Re: 200-600x slower Dlang performance with nested foreach loop
On 1/26/21 12:40 PM, methonash wrote: My first attempt to solve this problem space used a small Perl program to perform steps 1 through 3, which would then pipe intermediate output to a small Dlang program handling only step #4 using dynamic arrays (no use of AAs) of ubyte[][] with use of countUntil(). The Dlang code for the nested foreach block above is essentially near-identical between my two Dlang implementations. Yet, the second implementation--where I'm trying to solve the entire problem space in D--has absolutely failed in terms of performance. Perl+D, ubyte[][], countUntil() :: under 2 seconds only D, string[], indexOf() :: ~6 minutes only D, ubyte[][], countUntil() :: >20 minutes Maybe try a different approach. Replace the perl code with D, and still have it output to the same small D program that processes the results. It seems from your description that everything is "identical" that if your conclusions are correct, it should be at least as fast as the Perl+D version. But I think you are missing something else. -Steve
Re: dustmite on dub project
On Tuesday, 26 January 2021 at 16:04:29 UTC, Steven Schveighoffer wrote: I have a bug report in mysql-native that if you try to create the following file, and add mysql-native as a dependency it fails to link on Windows 10: import std.stdio; import mysql; void main() { writeln("Edit source/app.d to start your project."); } You might recognize that as the default dub file, with an extra import. The link error is: testmysql.obj : error LNK2001: unresolved external symbol _D5mysql12__ModuleInfoZ So I figured I'd try dustmite, and used: dub dustmite ..\dusted --linker-status=1 The result after almost 2 days: a bunch of directories with mostly no d files, and no source code in any of the d files. What did I do wrong? Is this even worth trying again? 2 days is a long time to tie up my windows vm. -Steve For your specific problem, this issue is related to your dub.json: "configurations": [ { "excludedSourceFiles": [ "source/mysql/package.d" ], "name": "application", "targetType": "executable", "versions": [ "VibeCustomMain" ] }, { "excludedSourceFiles": [ "source/app.d", "source/mysql/package.d" ], "name": "library", "targetType": "library" } ], If you remove the excludedSourceFiles from config "library", it is working. But at the moment I am puzzled, what is going on here. Kind regards André
Re: dustmite on dub project
On 1/26/21 1:33 PM, Andre Pany wrote: On Tuesday, 26 January 2021 at 16:04:29 UTC, Steven Schveighoffer wrote: I have a bug report in mysql-native that if you try to create the following file, and add mysql-native as a dependency it fails to link on Windows 10: import std.stdio; import mysql; void main() { writeln("Edit source/app.d to start your project."); } You might recognize that as the default dub file, with an extra import. The link error is: testmysql.obj : error LNK2001: unresolved external symbol _D5mysql12__ModuleInfoZ So I figured I'd try dustmite, and used: dub dustmite ..\dusted --linker-status=1 The result after almost 2 days: a bunch of directories with mostly no d files, and no source code in any of the d files. What did I do wrong? Is this even worth trying again? 2 days is a long time to tie up my windows vm. I think the behavior can be explained. Your search criteria (linked status) is not precise. It can be triggered by the real problem but also by a few empty d files. Therefore Dustmite did a good job, it reduced your code base to a minimum which still trigger a linker error. You might search the linker error text instead. Yes, thanks. I think probably this would work. I would suggest however, that dub not provide an option that is almost certainly likely to result in completely useless results. I had assumed that using that would keep the linker error the same. Maybe if --linker-status or --compiler-status is provided, not do anything unless there are other options (i.e. --compiler-regex or --linker-regex) which is what I should have added as well. In the meantime, I'm going to try manually reducing the dependencies, to try and reduce down the dustmite search (mysql-native depends on a lot of vibe stuff, which is probably not necessary to reproduce this). I didn't expect 2 days of running. -Steve
Re: dustmite on dub project
On Tuesday, 26 January 2021 at 16:04:29 UTC, Steven Schveighoffer wrote: I have a bug report in mysql-native that if you try to create the following file, and add mysql-native as a dependency it fails to link on Windows 10: import std.stdio; import mysql; void main() { writeln("Edit source/app.d to start your project."); } You might recognize that as the default dub file, with an extra import. The link error is: testmysql.obj : error LNK2001: unresolved external symbol _D5mysql12__ModuleInfoZ So I figured I'd try dustmite, and used: dub dustmite ..\dusted --linker-status=1 The result after almost 2 days: a bunch of directories with mostly no d files, and no source code in any of the d files. What did I do wrong? Is this even worth trying again? 2 days is a long time to tie up my windows vm. -Steve I think the behavior can be explained. Your search criteria (linked status) is not precise. It can be triggered by the real problem but also by a few empty d files. Therefore Dustmite did a good job, it reduced your code base to a minimum which still trigger a linker error. You might search the linker error text instead. Kind regards Andre
Re: 200-600x slower Dlang performance with nested foreach loop
On Tue, Jan 26, 2021 at 06:13:54PM +, methonash via Digitalmars-d-learn wrote: [...] > I cannot post the full source code. Then we are limited in how much we can help you. > Regarding a reduced version reproducing the issue: well, that's > exactly what the nested foreach loop does. Without it, the program > reaches that point quickly. By "reduced version" we mean a code exceprt that's (1) compilable, (2) runnable, and (3) exhibits the same problem that you're seeing in your full code. Posting an isolated loop cut out of some unknown function in some unknown module somewhere is not helping us identify where your problem is. > With the nested foreach block, it slows to a crawl. More specifically, > commenting-out the indexOf() or countUntil() sub-blocks preserves fast > performance, but I'm not sure if that may be related to compiler > optimizations realizing that there's nothing but "dead/nonexistent > code" inside the loops and generating a binary that just never goes > there. When in doubt, use a disassembler to see exactly what is the generated code. > If this may help: I've composed the second Dlang implementation as one > extended block of code within main() and am thinking of soon > refactoring the code into different functions. I remain pessimistic of > whether this may help. As I said in my other reply: don't guess, profile! Randomly changing your code in the hopes that it will help, in general, won't. > Is there any possibility this could be GC-related? Without more information and a complete, compilable example, it's anybody's guess. T -- Debian GNU/Linux: Cray on your desktop.
Re: 200-600x slower Dlang performance with nested foreach loop
On Tue, Jan 26, 2021 at 05:40:36PM +, methonash via Digitalmars-d-learn wrote: [...] > 1) Read a list of strings from a file > 2) De-duplicate all strings into the subset of unique strings > 3) Sort the subset of unique strings by descending length and then by > ascending lexicographic identity > 4) Iterate through the sorted subset of unique strings, identifying smaller > sequences with perfect identity to their largest possible parent string [...] > Things went sideways at step #4: because multiSort() returns a SortedRange, > I used .array to convert the returned SortedRange into an array of type > string[]. Do not do this. Every time you call .array it allocates a new array and copies all its contents over. If this code runs frequently, it will cause a big performance hit, not to mention high GC load. The function you're looking for is .release, not .array. [...] > With the formally returned array, I then attempted to construct a > double foreach loop to iterate through the sorted array of unique > strings and find substring matches. > > foreach( i, ref pStr; sortedArr ) > { > foreach( j, ref cStr; sortedArr[ i + 1 .. $ ] ) > { > if( indexOf( pStr, cStr ) > -1 ) > { > // ... > } > } > } > > Before adding the code excerpt above, the Dlang program was taking ~1 > second on an input file containing approx. 64,000 strings. > > By adding the code above, the program now takes 6 minutes to complete. That nested loop is an O(n^2) algorithm. Meaning it will slow down *very* quickly as the size of the array n increases. You might want to think about how to improve this algorithm. > An attempt was made to more efficiently perform ASCII-only substring > searching by converting the sorted string[] into ubyte[][] and then > using countUntil() instead of indexOf(), but this had an effect that > was completely opposite to what I had previously experienced: the > program then took over 20 minutes to complete! How are you doing the conversion? If you're using std.conv.to or something like that, it will definitely cause a big performance hit because of the needless allocations and copying. You probably want a direct cast instead. I.e., you want to reinterpret the array reference, not transcribe a copy of it into a ubyte[][]. Probably what you're looking for is to use .representation and .countUntil, or maybe just .canFind to bypass the decoding overhead. (If indeed that is the bottleneck; it may not be. Have you used a profiler to identify where the hotspot is?) > Thus, I am entirely baffled. > > My first attempt to solve this problem space used a small Perl program > to perform steps 1 through 3, which would then pipe intermediate > output to a small Dlang program handling only step #4 using dynamic > arrays (no use of AAs) of ubyte[][] with use of countUntil(). Using AA's may not necessarily improve performance. It depends on what your code does with it. Because AA's require random access to memory, it's not friendly to the CPU cache hierarchy, whereas traversing linear arrays is more cache-friendly and in some cases will out-perform AA's. > The Dlang code for the nested foreach block above is essentially > near-identical between my two Dlang implementations. Yet, the second > implementation--where I'm trying to solve the entire problem space in > D--has absolutely failed in terms of performance. > > Perl+D, ubyte[][], countUntil() :: under 2 seconds > only D, string[], indexOf() :: ~6 minutes > only D, ubyte[][], countUntil() :: >20 minutes > > Please advise. This nightmarish experience is shaking my confidence in > using D. First of all, you need to use a profiler to identify where the hotspots are. Otherwise you could well be pouring tons of effort into "optimizing" code that doesn't actually need optimizing, while completely missing the real source of the problem. Whenever you run into performance problems, do not assume you know where the problem is, profile, profile, profile! Second, you only posted a small fragment of your code, so it's hard to say where the problem really is. I can only guess based on what you described. If you could post the entire program, or at least a complete, compilable and runnable excerpt thereof that displays the same (or similar) performance problems, then we could better help you pinpoint where the problem is. In general, though, things to look for are: (1) Unnecessary memory allocations, e.g., calling .array on a SortedRange when you really should be using .release, or calling a conversion function to transcribe an array instead of just casting to reinterpret it; (2) Algorithms with poor big-O characteristics, e.g., the O(n^2) nested loop you have above. (3) Expensive operations inside inner loops, because the loop nesting magnifies any slowness in the code. But above all, before randomly changing your code in the hopes that you will hit upon a solution, use a profiler. Don't shoot in the
Re: 200-600x slower Dlang performance with nested foreach loop
On Tuesday, 26 January 2021 at 17:40:36 UTC, methonash wrote: Greetings Dlang wizards, I seek knowledge/understanding of a very frustrating phenomenon I've experienced over the past several days. [...] Source please
Re: 200-600x slower Dlang performance with nested foreach loop
On Tuesday, 26 January 2021 at 17:56:22 UTC, Paul Backus wrote: It would be much easier for us to help you with this if you could post the full program, or at the very least a reduced version that reproduces the same issue. [1] Since your attempts so far have failed to fix the problem, it is quite likely that some part of the code you do not suspect is actually to blame. I cannot post the full source code. Regarding a reduced version reproducing the issue: well, that's exactly what the nested foreach loop does. Without it, the program reaches that point quickly. With the nested foreach block, it slows to a crawl. More specifically, commenting-out the indexOf() or countUntil() sub-blocks preserves fast performance, but I'm not sure if that may be related to compiler optimizations realizing that there's nothing but "dead/nonexistent code" inside the loops and generating a binary that just never goes there. If this may help: I've composed the second Dlang implementation as one extended block of code within main() and am thinking of soon refactoring the code into different functions. I remain pessimistic of whether this may help. Is there any possibility this could be GC-related?
is core Mutex lock "fast"?
Hi guys, still working on old D1 code, to be updated to D2. At some point the previous dev wrote a FastLock class. The top comment is from the dev himself, not me. My question is after the code. --- class FastLock { protected Mutex mutex; protected int lockCount; protected Thread owner; /// this() { mutex = new Mutex(); } /** * This works the same as Tango's Mutex's lock()/unlock except provides extra performance in the special case where * a thread calls lock()/unlock() multiple times while it already has ownership from a previous call to lock(). * This is a common case in Yage. * * For convenience, lock() and unlock() calls may be nested. Subsequent lock() calls will still maintain the lock, * but unlocking will only occur after unlock() has been called an equal number of times. * * On Windows, Tango's lock() is always faster than D's synchronized statement. */ void lock() { auto self = Thread.getThis(); if (self !is owner) { mutex.lock(); owner = self; } lockCount++; } void unlock() /// ditto { assert(Thread.getThis() is owner); lockCount--; if (!lockCount) { owner = null; mutex.unlock(); } } } --- Now if I look at the doc , in particular Class core.sync.mutex.Mutex, I see: --- lock () If this lock is not already held by the caller, the lock is acquired, then the internal counter is incremented by one. -- Which looks exactly like the behavior of "fastLock". Is it so that the old Tango's mutex lock was not keeping count and would lock the same object several time? Do we agree that the FastLock class is obsolete considering current D core? cheers
Re: 200-600x slower Dlang performance with nested foreach loop
On Tuesday, 26 January 2021 at 17:40:36 UTC, methonash wrote: foreach( i, ref pStr; sortedArr ) { foreach( j, ref cStr; sortedArr[ i + 1 .. $ ] ) { if( indexOf( pStr, cStr ) > -1 ) { // ... } } } Before adding the code excerpt above, the Dlang program was taking ~1 second on an input file containing approx. 64,000 strings. By adding the code above, the program now takes 6 minutes to complete. It would be much easier for us to help you with this if you could post the full program, or at the very least a reduced version that reproduces the same issue. [1] Since your attempts so far have failed to fix the problem, it is quite likely that some part of the code you do not suspect is actually to blame. [1] https://idownvotedbecau.se/nomcve/
200-600x slower Dlang performance with nested foreach loop
Greetings Dlang wizards, I seek knowledge/understanding of a very frustrating phenomenon I've experienced over the past several days. The problem space: 1) Read a list of strings from a file 2) De-duplicate all strings into the subset of unique strings 3) Sort the subset of unique strings by descending length and then by ascending lexicographic identity 4) Iterate through the sorted subset of unique strings, identifying smaller sequences with perfect identity to their largest possible parent string I have written a Dlang program that performantly progresses through step #3 above. I used a built-in AA (associative array) to uniquely de-duplicate the initial set of strings and then used multiSort(). Performance was good up till this point, especially with use of the LDC compiler. Things went sideways at step #4: because multiSort() returns a SortedRange, I used .array to convert the returned SortedRange into an array of type string[]. This appeared to work, and neither DMD nor LDC threw any warnings/errors for doing this. With the formally returned array, I then attempted to construct a double foreach loop to iterate through the sorted array of unique strings and find substring matches. foreach( i, ref pStr; sortedArr ) { foreach( j, ref cStr; sortedArr[ i + 1 .. $ ] ) { if( indexOf( pStr, cStr ) > -1 ) { // ... } } } Before adding the code excerpt above, the Dlang program was taking ~1 second on an input file containing approx. 64,000 strings. By adding the code above, the program now takes 6 minutes to complete. An attempt was made to more efficiently perform ASCII-only substring searching by converting the sorted string[] into ubyte[][] and then using countUntil() instead of indexOf(), but this had an effect that was completely opposite to what I had previously experienced: the program then took over 20 minutes to complete! Thus, I am entirely baffled. My first attempt to solve this problem space used a small Perl program to perform steps 1 through 3, which would then pipe intermediate output to a small Dlang program handling only step #4 using dynamic arrays (no use of AAs) of ubyte[][] with use of countUntil(). The Dlang code for the nested foreach block above is essentially near-identical between my two Dlang implementations. Yet, the second implementation--where I'm trying to solve the entire problem space in D--has absolutely failed in terms of performance. Perl+D, ubyte[][], countUntil() :: under 2 seconds only D, string[], indexOf() :: ~6 minutes only D, ubyte[][], countUntil() :: >20 minutes Please advise. This nightmarish experience is shaking my confidence in using D.
Re: Why filling AA in shared library freezes execution?
On Tue, Jan 26, 2021 at 02:12:17PM +, Adam D. Ruppe via Digitalmars-d-learn wrote: > On Monday, 25 January 2021 at 21:48:10 UTC, Vitalii wrote: > > Q: Why filling assoc.array in shared library freeze execution? > > D exes loading D dlls are very broken on Windows. You can kinda make > it work but there's a lot of bad design and showstopper bugs. [...] Just out of curiosity, what are some of the showstoppers? I'd have expected D exe's loading D dll's should be the first priority in making Windows dll's work in D. I'm surprised there are still obvious problems. T -- Frank disagreement binds closer than feigned agreement.
Re: Why filling AA in shared library freezes execution?
Vitalii, I test your program and it runs without any problem. Consuming about 1Gb RAM at end. But i have а slightly different environment. Win10 1909 x64, DMD32 D Compiler v2.092.1-dirty
Re: Compile time check for GC?
On Tuesday, 26 January 2021 at 16:08:15 UTC, Steven Schveighoffer wrote: std.traits.hasAliasing? I guess, but it will limit me too much. I should accept Element types that manage their own memory and has pointers to sub-ojects that don't point to GC memory. But I guess that would require a language extension or I would have to establish my own protocol (basically some enum I can test for, I guesss). enough. One can compile a betterC library using your code, which is then used by a GC-allocating app, which means your library is told betterC is in use while compiling the template, but the GC is still involved. So there doesn't seem to be a way to be sure that a pointer is not a GC pointer. :-/
Re: Compile time check for GC?
On 1/26/21 11:00 AM, Ola Fosheim Grøstad wrote: On Tuesday, 26 January 2021 at 15:30:09 UTC, Steven Schveighoffer wrote: The only way to ensure the GC isn't used is with betterC. Even with @nogc tag on main, the GC could be used in static ctors, and casting function pointers/etc. Yes, @nogc is not strong enough... It is for a container library, so maybe I could put test for the properties of the elements instead? I guess I would want test if the Element type contains a pointer that should be traced by the GC. But how would I go about it? std.traits.hasAliasing? And this will only apply to templates, not to compiled code, since compiled code already is done (and one can obviously use betterC compiled code in normal D code). Yes, but templates is ok, but I think Better_C is too restrictive in the long term. So I hope there is some way for my library to figure out if it has to care about GC for the template parameters the user provides. I thought about it a bit more, and even this is not good enough. One can compile a betterC library using your code, which is then used by a GC-allocating app, which means your library is told betterC is in use while compiling the template, but the GC is still involved. So there doesn't seem to be a way to be sure that a pointer is not a GC pointer. -Steve
dustmite on dub project
I have a bug report in mysql-native that if you try to create the following file, and add mysql-native as a dependency it fails to link on Windows 10: import std.stdio; import mysql; void main() { writeln("Edit source/app.d to start your project."); } You might recognize that as the default dub file, with an extra import. The link error is: testmysql.obj : error LNK2001: unresolved external symbol _D5mysql12__ModuleInfoZ So I figured I'd try dustmite, and used: dub dustmite ..\dusted --linker-status=1 The result after almost 2 days: a bunch of directories with mostly no d files, and no source code in any of the d files. What did I do wrong? Is this even worth trying again? 2 days is a long time to tie up my windows vm. -Steve
Re: Compile time check for GC?
On Tuesday, 26 January 2021 at 15:30:09 UTC, Steven Schveighoffer wrote: The only way to ensure the GC isn't used is with betterC. Even with @nogc tag on main, the GC could be used in static ctors, and casting function pointers/etc. Yes, @nogc is not strong enough... It is for a container library, so maybe I could put test for the properties of the elements instead? I guess I would want test if the Element type contains a pointer that should be traced by the GC. But how would I go about it? And this will only apply to templates, not to compiled code, since compiled code already is done (and one can obviously use betterC compiled code in normal D code). Yes, but templates is ok, but I think Better_C is too restrictive in the long term. So I hope there is some way for my library to figure out if it has to care about GC for the template parameters the user provides. I guess I also will find the equivalent of C++ std::trivially_copyable, std::trivially_destructible in Phobos, to figure out if memcpy is safe.
Re: Can I set the base class like this?
On Tuesday, 26 January 2021 at 14:15:25 UTC, frame wrote: On Tuesday, 26 January 2021 at 04:39:07 UTC, Jack wrote: note the body is the same, what changes is the base class. I'd like to avoid repeating myself when the body is the same and only the base class changes. You would have to call it with correct instantiation like alias Foo = C!(A!bool); Of course T!MyType would not work but I don't think you want that anyway. It very depends on the use-case but just use a mixin where you can pass any type you want from template constructor if you don't want to repeat yourself: class MyType { } class A { } class B { } template base(T) { static if (is(T : A)) { bool doSomething() { return true; } } else static if (is(T : B)) { bool doSomething() { return false; } } else { void doSOmethingElse() { } } } class C(T1, T2) { mixin base!T2; T1 whatever() { return new T1; } } alias Foo = C!(MyType, A); alias Baa = C!(MyType, B); Thank you! I find this approach rather elegant. Ability to pick the method to be part of the class' body without macros is really great.
Re: Can I set the base class like this?
On Tuesday, 26 January 2021 at 14:12:21 UTC, vitamin wrote: On Tuesday, 26 January 2021 at 04:39:07 UTC, Jack wrote: Can I pass the base class type thought template parameter? something like this: [...] You have it almost right: class C(alias T) // Template is not type but symbol. Thanks!
Re: Compile time check for GC?
On 1/26/21 8:10 AM, Ola Fosheim Grøstad wrote: Is there some way for library authors to test whether a GC is present at compile time? @nogc just means that my code does not depend on a GC, but it doesn't tell the compiler that my code is incompatible with GC. I want to compute pointers in a way that is not scannable by a conservative collector for performance reasons. So I need to make sure that the faster code isn't chosen when people use GC using "static if"s. The only way to ensure the GC isn't used is with betterC. Even with @nogc tag on main, the GC could be used in static ctors, and casting function pointers/etc. In that case, you can do version(D_BetterC) And this will only apply to templates, not to compiled code, since compiled code already is done (and one can obviously use betterC compiled code in normal D code). -Steve
Re: How to covert dchar and wchar to string?
On 1/25/21 1:45 PM, Rempas wrote: Actually what the title says. For example I have dchar c = '\u03B3'; and I want to make it into string. I don't want to use "to!string(c);". Any help? That's EXACTLY what you want to use, if what you want is a string. If you just want a conversion to a char array, use encode [1]: import std.utf; char[4] buf; size_t nbytes = encode(buf, c); // now buf[0 .. nbytes] contains the char data representing c -Steve [1] https://dlang.org/phobos/std_utf.html#encode
Re: Why filling AA in shared library freezes execution?
On Tuesday, 26 January 2021 at 14:12:17 UTC, Adam D. Ruppe wrote: On Monday, 25 January 2021 at 21:48:10 UTC, Vitalii wrote: Q: Why filling assoc.array in shared library freeze execution? D exes loading D dlls are very broken on Windows. You can kinda make it work but there's a lot of bad design and showstopper bugs. That's the sad reality of it. I'd suggest finding a different approach. Maybe IPC or maybe making either the exe or dll not use druntime (like redesigning for -betterC, though even there it is tricky since like global variables aren't imported from the dll by the compiler, you have to do extra indirection yourself) Thank you, Adam!
Using mir to work with matrices
It is not easy to understand what mir library one should use to work with matrices. mir-glas turns out unsupported now and I try to use mir-blas. I need to reimplement my Kalman filter version to use more high dimension matrix than 4x4 plus Kronecker product. Is mir-blas recommended to work with matrices?
Re: F*cked by memory corruption after assiging value to associative array
On Monday, 25 January 2021 at 17:11:37 UTC, frame wrote: Wrong way? Please, someone correct me if I'm getting this wrong: Structure: EXE/Main Thread: - GC: manual - requests DLL 1 object A - GC knows about object A DLL/Thread 1: - GC: conservative - allocates new object A -> addRoot(object A), return to EXE (out param) - requests DLL 2 object B - GC knows about object A and object B - requests sub objects of object B later DLL/Thread 2: - GC: manual - allocates new object B -> addRoot(object B), return to DLL 1 (out param) - GC knows about object B - allocates sub objects over object B when DLL 1 requests it, return to DLL 1 (out param) - sub objects are stored in object B - object B sub objects memory gets corrupted after DLL 1 becomes active thread again In this scenario only DLL 1 can cause the corruption as it does not occur if all GCs are set to manual. At this point I am confused about how memory allocation is ensured. Each thread should have assigned its own memory area. Each GC adopts the root by the returned object and knows about that area too. But if DLL 1 becomes active it writes into sub memory of DLL 2. It only can because it has adopted the root of object B - but why does not see DLL 1 then that sub objects of B are still alive?
Re: Can I set the base class like this?
On Tuesday, 26 January 2021 at 04:39:07 UTC, Jack wrote: note the body is the same, what changes is the base class. I'd like to avoid repeating myself when the body is the same and only the base class changes. You would have to call it with correct instantiation like alias Foo = C!(A!bool); Of course T!MyType would not work but I don't think you want that anyway. It very depends on the use-case but just use a mixin where you can pass any type you want from template constructor if you don't want to repeat yourself: class MyType { } class A { } class B { } template base(T) { static if (is(T : A)) { bool doSomething() { return true; } } else static if (is(T : B)) { bool doSomething() { return false; } } else { void doSOmethingElse() { } } } class C(T1, T2) { mixin base!T2; T1 whatever() { return new T1; } } alias Foo = C!(MyType, A); alias Baa = C!(MyType, B);
Re: Why filling AA in shared library freezes execution?
On Monday, 25 January 2021 at 21:48:10 UTC, Vitalii wrote: Q: Why filling assoc.array in shared library freeze execution? D exes loading D dlls are very broken on Windows. You can kinda make it work but there's a lot of bad design and showstopper bugs. That's the sad reality of it. I'd suggest finding a different approach. Maybe IPC or maybe making either the exe or dll not use druntime (like redesigning for -betterC, though even there it is tricky since like global variables aren't imported from the dll by the compiler, you have to do extra indirection yourself)
Re: How do I overload += operator?
On Monday, 25 January 2021 at 17:09:22 UTC, Jack wrote: I'd like to make this work s += 10 where s is a struct. How can I do that? You have your answer, but someone else might come upon this in the future, so here's a link to the clearest explanation of operator overloading for someone new to the language: http://ddili.org/ders/d.en/operator_overloading.html
Re: Can I set the base class like this?
On Tuesday, 26 January 2021 at 04:39:07 UTC, Jack wrote: Can I pass the base class type thought template parameter? something like this: [...] You have it almost right: class C(alias T) // Template is not type but symbol.
Compile time check for GC?
Is there some way for library authors to test whether a GC is present at compile time? @nogc just means that my code does not depend on a GC, but it doesn't tell the compiler that my code is incompatible with GC. I want to compute pointers in a way that is not scannable by a conservative collector for performance reasons. So I need to make sure that the faster code isn't chosen when people use GC using "static if"s.
Re: Why filling AA in shared library freezes execution?
On Tuesday, 26 January 2021 at 11:17:11 UTC, Vitalii wrote: I'll be waiting for bugfix release. There could also be other reasons if your system is "compromised" by a Hijack-DLL thats automatically included when your app starts by an Anti-Virus scanner or some bug in a C++ updated or outdated runtime part. I would try it on a clean system again.
Re: Why filling AA in shared library freezes execution?
On Tuesday, 26 January 2021 at 08:14:10 UTC, frame wrote: On Tuesday, 26 January 2021 at 06:53:22 UTC, Vitalii wrote: It's quite unexpected for me that nobody give me some help about usage of AA in shared library. Nobody use shared library? Nobody use AA? Post with trivial questions about OpAssign gets many answers. Even post about changing logo color from red to blue gets almost 50 replies. With all rules of decorum I post reproducible source code and ask any help. Where is language community? Vitalii You get this wrong. It's nothing bad with your code. It's a problem with your OS or compiler support or even your CPU has some bug. If you have nothing special in dll.d we do not see, it should run without problems. You can try out VisualD for Visual Studio which may can give you a hint of the error you get before your app is freezing. Thank you frame! Just simple mention that code doesn't contains obvious mistakes help me a lot. dll.d is code from https://wiki.dlang.org/Win32_DLLs_in_D, I replace it with mixin SimpleDllMain; but result was the same. Also I write the same code in C++, compile it with Intel C++ 2013 compiler and get no errors at all. You give me a hint to try it on another OS - I tried on Windows Server 2012 R2 and Intel Xeon Gold 6134 - and it works! So it's really OS-dependent thing. I'll be waiting for bugfix release. Vitalii
Re: Why filling AA in shared library freezes execution?
On Tuesday, 26 January 2021 at 06:53:22 UTC, Vitalii wrote: It's quite unexpected for me that nobody give me some help about usage of AA in shared library. Nobody use shared library? Nobody use AA? Post with trivial questions about OpAssign gets many answers. Even post about changing logo color from red to blue gets almost 50 replies. With all rules of decorum I post reproducible source code and ask any help. Where is language community? Vitalii You get this wrong. It's nothing bad with your code. It's a problem with your OS or compiler support or even your CPU has some bug. If you have nothing special in dll.d we do not see, it should run without problems. You can try out VisualD for Visual Studio which may can give you a hint of the error you get before your app is freezing.