Re: Initializing D runtime and executing module and TLS ctors for D libraries
On Saturday, 30 January 2021 at 05:44:37 UTC, Ali Çehreli wrote: On 1/24/21 2:28 AM, IGotD- wrote: > [...] course. Any > [...] not do D > [...] [...] Hmm, interesting, or what you should call it With this knowledge we have now, what changes could and/or should be made to make this process easier? 樂 (Btw, I just "forced" my boss to buy your and Adam's book for me. I'm trying to sneak in D @thecompany)
Re: Why filling AA in shared library freezes execution?
On Friday, 29 January 2021 at 10:10:56 UTC, frame wrote: On Friday, 29 January 2021 at 01:23:20 UTC, Siemargl wrote: On Friday, 29 January 2021 at 00:45:12 UTC, Siemargl wrote: Then i modify program, just removing DLL, copying TestFun() in main module and it runs. Same compiler -m64 target. Ups. Sorry, I just forget copy test_dll.dll inside VM :-) So, program runs in Win7, but hangs after printing i:64511 I downgrade DMD to 2.090.1 + MSVC2013 libs and problem disappears. But 2.092 + MSVC2013 libs also hangs. Not every time, but You should really try to use a debugger to see what error is thrown in first chance. It also helps to identify a possible hidden problem that is not reproducable on other machines. Sorry, there are many problems debugging D x64 on Windows. All i can get, is call stack from crash dump ntdll!ZwWaitForSingleObject+0xa ntdll!RtlDeNormalizeProcessParams+0x5a8 ntdll!RtlDeNormalizeProcessParams+0x4a4 ntdll!RtlInitializeCriticalSectionEx+0x3b9 KERNELBASE!HeapDestroy+0x3a KERNELBASE!GetModuleHandleExW+0x39 test_dll!TestFun+0x576b6 test_dll!TestFun+0x55bf3 test_dll!TestFun+0x4e315 test_dll!TestFun+0x4d86f test_dll!TestFun+0x4bdb5 test_dll!TestFun+0x507e1 test_dll!TestFun+0x4756b test_dll!TestFun+0x22d1d test_dll!TestFun+0x23d9a test_dll!TestFun+0x1a1b9 test_dll!TestFun+0x93 test_dll_exe!D main+0xe5 test_dll_exe!D2rt6dmain212_d_run_main2UAAamPUQgZiZ6runAllMFZ9__lambda1MFZv+0x33 test_dll_exe!D2rt6dmain212_d_run_main2UAAamPUQgZiZ7tryExecMFMDFZvZv+0x3c
Re: Why filling AA in shared library freezes execution?
On Friday, 29 January 2021 at 15:34:49 UTC, H. S. Teoh wrote: On Fri, Jan 29, 2021 at 12:45:02PM +, Imperatorn via Digitalmars-d-learn wrote: On Wednesday, 27 January 2021 at 15:25:17 UTC, H. S. Teoh wrote: > On Wed, Jan 27, 2021 at 02:39:08PM +, Adam D. Ruppe via > Digitalmars-d-learn wrote: > > [...] > > I'm surprised this stuff hasn't been fixed yet, considering > Walter (mostly?) works on Windows. Has he never run into > these issues before? [...] Anyone knows what it would take to fix it? Somebody who (1) knows enough of compiler internals to be able to fix this, (2) is intimately familiar with how Windows dlls work, (3) is desperate enough to do the work himself instead of waiting for someone else to do it, and (4) is stubborn enough to push it through in spite of any resistance. T Who would that special someone be? 樂
Bit rotation question/challenge
I have a static array of `ubyte`s of arbitrary size: ```d ubyte[4] x = [ // in reality, ubyte[64] 0b1000, 0b0001, 0b00010101, 0b0010, ]; ``` Now I want to bit-rotate the array as if it is one big integer. So: ```d ubyte[n] rotateRight(size_t n)(ref const ubyte[n] array, uint rotation) { // ? } // same for rotateLeft ubyte[4] y = [ 0b1001, 0b0100, 0b, 0b10001010, ]; assert(x.rotateRight(9) == y); assert(y.rotateLeft(9) == x); ``` Any ideas how this could be achieved? I.e. what should go at the "?" for rotateRight and rotateLeft?
Re: Initializing D runtime and executing module and TLS ctors for D libraries
On Saturday, 30 January 2021 at 12:28:16 UTC, Ali Çehreli wrote: I wonder whether doing something in the runtime is possible. For example, it may be more resilient and not crash when suspending a thread fails because the thread may be dead already. However, studying the runtime code around thread_detachThis three years ago, I had realized that like many things in computing, the whole stop-the-world is wishful thinking because there is no guarantee that your "please suspend this thread" request to the OS has succeeded. You get a success return code back but it means your request succeeded not that the thread was or will be suspended. (I may be misremembering this point but I know that the runtime requests things where OS does not give full guarantee for.) OT. A thread that suspends itself will always happen (not taking fall through cases into account), if not, throw the OS away. If a thread suspends another thread, then you don't really know when that thread will be suspended. I would discourage that threads suspends other threads because that will open up a new world of race conditions. Some systems don't even allow it and its benefits are very limited. Back to topic. I think that the generic solution even if it doesn't help you with your current implementation is to ban TLS all together. I think there have already been requests to remove TLS for druntime/phobos totally and I think this should definitely be done sooner than later. Also if you write a shared library in D, simply don't use TLS at all. This way it will not matter if a thread is registered by druntime or not. TLS is in my opinion a wart in computer science.
Re: Bit rotation question/challenge
On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote: I have a static array of `ubyte`s of arbitrary size: ```d ubyte[4] x = [ // in reality, ubyte[64] 0b1000, 0b0001, 0b00010101, 0b0010, ]; ``` Now I want to bit-rotate the array as if it is one big integer. You may find `std.bitmanip.BitArray` useful for this: http://phobos.dpldocs.info/std.bitmanip.BitArray.html
Re: Bit rotation question/challenge
On Saturday, 30 January 2021 at 14:56:14 UTC, burt wrote: On Saturday, 30 January 2021 at 14:41:59 UTC, Afgdr wrote: On Saturday, 30 January 2021 at 14:40:49 UTC, Afgdr wrote: On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote: [...] cast as uint and shift. cast the result as ubyte[4]. obiously, that works for n=4 with uint and n=8 for ulong, only. Yes I used to do this, but then I needed it for n > 8. As suggested in the other answer BitArray may be the best generic solution.
Re: Initializing D runtime and executing module and TLS ctors for D libraries
On 1/30/21 1:34 AM, Imperatorn wrote: > With this knowledge we have now, what changes could and/or should be > made to make this process easier? 樂 I wonder whether doing something in the runtime is possible. For example, it may be more resilient and not crash when suspending a thread fails because the thread may be dead already. However, studying the runtime code around thread_detachThis three years ago, I had realized that like many things in computing, the whole stop-the-world is wishful thinking because there is no guarantee that your "please suspend this thread" request to the OS has succeeded. You get a success return code back but it means your request succeeded not that the thread was or will be suspended. (I may be misremembering this point but I know that the runtime requests things where OS does not give full guarantee for.) (Going off-topic, even clicking on a user interface is wishful thinking because a few times a year I attempt to click on something but another window element pops under my mouse pointer and I unintentionally click something else, commonly on web pages as they are being rendered by a browser: links move around on the page. This used to bother me but not anymore. Life is not perfect and I appreciate it. :) ) > (Btw, I just "forced" my boss to buy your and Adam's book for me Cool! :) It makes me a little sad that my online version is ahead of the paper version by a couple of years now. I want to update the paper as well but I want to work on work stuff like the topic of this discussion. :) (Related note: the ebook versions on the web page are more up-to-date than ones that you can buy especially because the versions on my web site include a table of contents section. Consider updating your ebook here: http://ddili.org/ders/d.en/index.html ) > I'm trying to sneak in D @thecompany) I still think D is a great tool but some use cases can be tough and sometimes embarrassing. :/ Ali
Minimize GC memory footprint
Is there a way to force the GC to re-use memory in already existing pools? I set maxPoolSize:1 to gain pools that can be quicker released after there no longer in use. This already reduces memory usage to 1:3. Sadly the application creates multiple pools that are not necessary in my POV - just fragmented temporary slice data like from format(). What can I do to optimize?
Re: Bit rotation question/challenge
On Saturday, 30 January 2021 at 14:17:06 UTC, Paul Backus wrote: On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote: [...] Now I want to bit-rotate the array as if it is one big integer. You may find `std.bitmanip.BitArray` useful for this: http://phobos.dpldocs.info/std.bitmanip.BitArray.html Thank you, this is indeed what I am looking for! For future reference, this is how I implemented it: ```d ubyte[n] rotateRight(size_t n)(ubyte[n] x, uint rotation) { import std.bitmanip : BitArray; ubyte[n] x2; foreach (i, value; x) // have to swap because of endianness x2[n - 1 - i] = value; auto copy = x2; auto bitArray1 = BitArray(cast(void[]) x2[], n * 8); auto bitArray2 = BitArray(cast(void[]) copy[], n * 8); bitArray1 >>= rotation; bitArray2 <<= n * 8 - rotation; bitArray1 |= bitArray2; foreach (i, value; x2) // swap back x[n - 1 - i] = value; return x; } ubyte[4] x = [ 0b00011000, 0b0011, 0b00010101, 0b0010, ]; writefln!"%(%8b,\n%)"(x.rotateRight(4)); ```
Re: Bit rotation question/challenge
On Saturday, 30 January 2021 at 14:41:59 UTC, Afgdr wrote: On Saturday, 30 January 2021 at 14:40:49 UTC, Afgdr wrote: On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote: [...] cast as uint and shift. cast the result as ubyte[4]. obiously, that works for n=4 with uint and n=8 for ulong, only. Yes I used to do this, but then I needed it for n > 8.
Re: Initializing D runtime and executing module and TLS ctors for D libraries
On Saturday, 30 January 2021 at 12:28:16 UTC, Ali Çehreli wrote: On 1/30/21 1:34 AM, Imperatorn wrote: > [...] should be > [...] I wonder whether doing something in the runtime is possible. For example, it may be more resilient and not crash when suspending a thread fails because the thread may be dead already. [...] Will take a look at the e-book also. Didn't know there was a difference
Re: Bit rotation question/challenge
On Saturday, 30 January 2021 at 14:40:49 UTC, Afgdr wrote: On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote: I have a static array of `ubyte`s of arbitrary size: ```d ubyte[4] x = [ // in reality, ubyte[64] 0b1000, 0b0001, 0b00010101, 0b0010, ]; ``` Now I want to bit-rotate the array as if it is one big integer. So: ```d ubyte[n] rotateRight(size_t n)(ref const ubyte[n] array, uint rotation) { // ? } // same for rotateLeft ubyte[4] y = [ 0b1001, 0b0100, 0b, 0b10001010, ]; assert(x.rotateRight(9) == y); assert(y.rotateLeft(9) == x); ``` Any ideas how this could be achieved? I.e. what should go at the "?" for rotateRight and rotateLeft? cast as uint and shift. cast the result as ubyte[4]. obiously, that works for n=4 with uint and n=8 for ulong, only.
Re: Bit rotation question/challenge
On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote: I have a static array of `ubyte`s of arbitrary size: ```d ubyte[4] x = [ // in reality, ubyte[64] 0b1000, 0b0001, 0b00010101, 0b0010, ]; ``` Now I want to bit-rotate the array as if it is one big integer. So: ```d ubyte[n] rotateRight(size_t n)(ref const ubyte[n] array, uint rotation) { // ? } // same for rotateLeft ubyte[4] y = [ 0b1001, 0b0100, 0b, 0b10001010, ]; assert(x.rotateRight(9) == y); assert(y.rotateLeft(9) == x); ``` Any ideas how this could be achieved? I.e. what should go at the "?" for rotateRight and rotateLeft? cast as uint and shift. cast the result as ubyte[4].
Re: emplace doesn't forward aeguments
On Thursday, 28 January 2021 at 23:18:21 UTC, kinke wrote: On Thursday, 28 January 2021 at 21:15:49 UTC, vitamin wrote: Is there reason why std.conv.emplace doesn't forward arguments to __ctor? Yeah, a bug in the emplace() version for classes, some missing `forward!args` in there (it works when emplacing a struct with identical ctor). E.g. https://github.com/dlang/druntime/blob/e2e304e1709b0b30ab65471a98023131f0e7620c/src/core/lifetime.d#L124-L128 if you want to fix it (std.conv.emplace is now an alias for core.lifetime.emplace in Phobos master). thanks;
Re: Bit rotation question/challenge
On Saturday, 30 January 2021 at 14:56:14 UTC, burt wrote: On Saturday, 30 January 2021 at 14:41:59 UTC, Afgdr wrote: On Saturday, 30 January 2021 at 14:40:49 UTC, Afgdr wrote: On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote: [...] cast as uint and shift. cast the result as ubyte[4]. obiously, that works for n=4 with uint and n=8 for ulong, only. Yes I used to do this, but then I needed it for n > 8. You can try somethink like this: https://run.dlang.io/is/POQgnb import std.range : cycle, take, drop; import std.algorithm : copy; import std.stdio; version (LittleEndian) ubyte[n] rotateRight(size_t n)(ref const ubyte[n] array, uint rotation){ typeof(return) result; array[] .cycle() .drop(n - (rotation / 8) % n) .take(n) .copy(result[]); const ubyte bit_rotation = rotation % 8; enum ubyte full = 0b_; if(bit_rotation == 0) return result; ubyte next_prefix(const ubyte elm){ const ubyte suffix = (elm & ~(full << bit_rotation)); const ubyte prefix = cast(ubyte)(suffix << (8 - bit_rotation)); return prefix; } ubyte prefix = next_prefix(result[$-1]); foreach(ref ubyte elm; result[]){ const new_prefix = next_prefix(elm); elm = (elm >> bit_rotation) | prefix; prefix = new_prefix; } return result; } void main(){ ubyte[4] x = [ 0b00011000, 0b0011, 0b00010101, 0b0010, ]; writefln!"%(%8b,\n%)"(x.rotateRight(4)); }
Re: 200-600x slower Dlang performance with nested foreach loop
Greetings all, Many thanks for sharing your collective perspective and advice thus far! It has been very helpful and instructive. I return bearing live data and a minimally complete, compilable, and executable program to experiment with and potentially optimize. The dataset can be pulled from here: https://filebin.net/qf2km1ea9qgd5skp/seqs.fasta.gz?t=97kgpebg Running "cksum" on this file: 1477520542 2199192 seqs.fasta.gz Naturally, you'll need to gunzip this file. The decompressed file contains strings on every even-numbered line that have already been reduced to the unique de-duplicated subset, and they have already been sorted by descending length and alphabetical identity. From my initial post, the focus is now entirely on step #4: finding/removing strings that can be entirely absorbed (substringed) by their largest possible parent. And now for the code: import std.stdio : writefln, File, stdin; import std.conv : to; import std.string : indexOf; void main() { string[] seqs; foreach( line; stdin.byLine() ) { if( line[ 0 ] == '>' ) continue; else seqs ~= to!string( line ); } foreach( i; 0 .. seqs.length ) { if( seqs[ i ].length == 0 ) continue; foreach( j; i + 1 .. seqs.length ) { if( seqs[ j ].length == 0 || seqs[ i ].length == seqs[ j ].length ) continue; if( indexOf( seqs[ i ], seqs[ j ] ) > -1 ) { seqs[ j ] = ""; writefln( "%s contains %s", i, j ); } } } } Compile the source and then run the executable via redirecting stdin: ./substr < seqs.fasta See any straightforward optimization paths here? For curiosity, I experimented with use of string[] and ubyte[][] and several functions (indexOf, canFind, countUntil) to assess for the best potential performer. My off-the-cuff results: string[] with indexOf() :: 26.5-27 minutes string[] with canFind() :: >28 minutes ubyte[][] with canFind() :: 27.5 minutes ubyte[][] with countUntil() :: 27.5 minutes Resultantly, the code above uses string[] with indexOf(). Tests were performed with an Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz. I have additional questions/concerns/confusion surrounding the foreach() syntax I have had to apply above, but performance remains my chief immediate concern.
Re: Minimize GC memory footprint
On Saturday, 30 January 2021 at 22:57:41 UTC, Imperatorn wrote: On Saturday, 30 January 2021 at 16:42:35 UTC, frame wrote: Is there a way to force the GC to re-use memory in already existing pools? I set maxPoolSize:1 to gain pools that can be quicker released after there no longer in use. This already reduces memory usage to 1:3. Sadly the application creates multiple pools that are not necessary in my POV - just fragmented temporary slice data like from format(). What can I do to optimize? Do you want to optimize for reduced memory usage? Yes, speed is secondary (long running daemon)
Re: How to profile compile times of a source code?
On Saturday, 30 January 2021 at 23:34:50 UTC, Stefan Koch wrote: this special version of dmd will generate a trace file which can be read with the included printTraceHeader tool you will want to take a look at the PhaseHist command which shows you the compiler phase that took the most time. Alternative I recommend using callgrind to profile where dmd spents most of it's time. For that to be useful you need a debug build of dmd though.
Re: How to profile compile times of a source code?
On Saturday, 30 January 2021 at 22:47:39 UTC, Ahmet Sait wrote: I'm looking for ways to figure out what parts of the code slows down the compiler other than brute force trial. Can I use -vtemplates switch for this? Would -v (verbose) switch helpful in some way? How would I know if my bottleneck is ctfe or templates? How do the compiler devs approach this issue? I'm interested in all kinds of tricks to the point of debugging the compiler itself although anything less complicated would be appreciated. I have a way of getting the profile data your are after. Get the dmd_tracing_20942 branch from https://github.com/UplinkCoder/dmd Compile that version of dmd. this special version of dmd will generate a trace file which can be read with the included printTraceHeader tool
Re: 200-600x slower Dlang performance with nested foreach loop
On 1/30/21 6:13 PM, methonash wrote: Greetings all, Many thanks for sharing your collective perspective and advice thus far! It has been very helpful and instructive. I return bearing live data and a minimally complete, compilable, and executable program to experiment with and potentially optimize. The dataset can be pulled from here: https://filebin.net/qf2km1ea9qgd5skp/seqs.fasta.gz?t=97kgpebg Running "cksum" on this file: 1477520542 2199192 seqs.fasta.gz Naturally, you'll need to gunzip this file. The decompressed file contains strings on every even-numbered line that have already been reduced to the unique de-duplicated subset, and they have already been sorted by descending length and alphabetical identity. From my initial post, the focus is now entirely on step #4: finding/removing strings that can be entirely absorbed (substringed) by their largest possible parent. And now for the code: import std.stdio : writefln, File, stdin; import std.conv : to; import std.string : indexOf; void main() { string[] seqs; foreach( line; stdin.byLine() ) { if( line[ 0 ] == '>' ) continue; else seqs ~= to!string( line ); } foreach( i; 0 .. seqs.length ) { if( seqs[ i ].length == 0 ) continue; foreach( j; i + 1 .. seqs.length ) { if( seqs[ j ].length == 0 || seqs[ i ].length == seqs[ j ].length ) continue; if( indexOf( seqs[ i ], seqs[ j ] ) > -1 ) { seqs[ j ] = ""; writefln( "%s contains %s", i, j ); } } } } Compile the source and then run the executable via redirecting stdin: ./substr < seqs.fasta See any straightforward optimization paths here? For curiosity, I experimented with use of string[] and ubyte[][] and several functions (indexOf, canFind, countUntil) to assess for the best potential performer. My off-the-cuff results: string[] with indexOf() :: 26.5-27 minutes string[] with canFind() :: >28 minutes ubyte[][] with canFind() :: 27.5 minutes ubyte[][] with countUntil() :: 27.5 minutes Resultantly, the code above uses string[] with indexOf(). Tests were performed with an Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz. I have additional questions/concerns/confusion surrounding the foreach() syntax I have had to apply above, but performance remains my chief immediate concern. The code looks pretty minimal. I'd suggest trying it in reverse. If you have the sequence "cba", "ba", "a", then determining "a" is in "ba" is probably cheaper than determining "a" is in "cba". Are you still convinced that it's possible to do it in under 2 seconds? That would seem a huge discrepancy. If not, what specifically are you looking for in terms of performance? -Steve
Re: 200-600x slower Dlang performance with nested foreach loop
On Tuesday, 26 January 2021 at 23:57:43 UTC, methonash wrote: clip That nested loop is an O(n^2) algorithm. Meaning it will slow down *very* quickly as the size of the array n increases. You might want to think about how to improve this algorithm. Nice observation, and yes, this would typically be an O(n^2) approach. However, due to subsetting the input dataset to unique strings and then sorting in descending length, one might notice that the inner foreach loop does not iterate over all of n, only on the iterator value i+1 through the end of the array. Thus, I believe this would then become approximately O(n^2/2). More precisely, it should be O( ( n^2 + n ) / 2 ). But that is still O(n^2), you've only changed the constant.
How to profile compile times of a source code?
I'm looking for ways to figure out what parts of the code slows down the compiler other than brute force trial. Can I use -vtemplates switch for this? Would -v (verbose) switch helpful in some way? How would I know if my bottleneck is ctfe or templates? How do the compiler devs approach this issue? I'm interested in all kinds of tricks to the point of debugging the compiler itself although anything less complicated would be appreciated.
Re: Why filling AA in shared library freezes execution?
On Friday, 29 January 2021 at 01:23:20 UTC, Siemargl wrote: On Friday, 29 January 2021 at 00:45:12 UTC, Siemargl wrote: Then i modify program, just removing DLL, copying TestFun() in main module and it runs. Same compiler -m64 target. Ups. Sorry, I just forget copy test_dll.dll inside VM :-) So, program runs in Win7, but hangs after printing i:64511 I downgrade DMD to 2.090.1 + MSVC2013 libs and problem disappears. But 2.092 + MSVC2013 libs also hangs. Not every time, but . Thank you, Siemargl! It's just the same behaviour that I got. The same number 64511. If you change double[int] to double[], the number would be around ~520.000, if int[] then ~1.000.000. I make conclusion that there is something concerning memory limit of 4 Mb.
Re: Why filling AA in shared library freezes execution?
On Saturday, 30 January 2021 at 19:52:09 UTC, Vitalii wrote: On Friday, 29 January 2021 at 01:23:20 UTC, Siemargl wrote: On Friday, 29 January 2021 at 00:45:12 UTC, Siemargl wrote: Then i modify program, just removing DLL, copying TestFun() in main module and it runs. Same compiler -m64 target. Ups. Sorry, I just forget copy test_dll.dll inside VM :-) So, program runs in Win7, but hangs after printing i:64511 I downgrade DMD to 2.090.1 + MSVC2013 libs and problem disappears. But 2.092 + MSVC2013 libs also hangs. Not every time, but . Thank you, Siemargl! It's just the same behaviour that I got. The same number 64511. If you change double[int] to double[], the number would be around ~520.000, if int[] then ~1.000.000. I make conclusion that there is something concerning memory limit of 4 Mb. No, this is a deadlock in memory manager. To find roots of problem, needed a debug version of druntime, but i were unsuccesfull to compile it.
Re: Minimize GC memory footprint
On Saturday, 30 January 2021 at 16:42:35 UTC, frame wrote: Is there a way to force the GC to re-use memory in already existing pools? I set maxPoolSize:1 to gain pools that can be quicker released after there no longer in use. This already reduces memory usage to 1:3. Sadly the application creates multiple pools that are not necessary in my POV - just fragmented temporary slice data like from format(). What can I do to optimize? Do you want to optimize for reduced memory usage?