Re: Strange memory corruption / codegen bug?
On Sunday, 11 December 2016 at 11:58:39 UTC, ag0aep6g wrote: Try putting an `assert(childCrossPoint !is otherCrossPoint);` before the assignment. If it fails, the variables refer to the same node. That would explain how otherCrossPoint.left gets set. Furthermore, I think he is calling breed on a Tree with itself. i.e. assert(other !is this) would be a more reliable test since it won't be subject to randomness.
Re: Will the GC scan this pointer?
On Sunday, 24 April 2016 at 11:03:11 UTC, Lass Safin wrote: So the question is: Will the GC scan ptr? As you can see, it is a write-only pointer, so reading from it will cause undefined behavior (such as return data which looks like a pointer to data..), and can potentially be reallly slow. The GC will see that ptr doesn't point to memory managed by the GC and move on. Do I have to mark it with NO_SCAN each time I call glMapNamedBufferRange? No, calling setAttr on memory not managed by the GC will do nothing.
Re: Possible to write a classic fizzbuzz example using a UFCS chain?
Just for fun: // map, join, text, iota, writeln, tuple import std.algorithm, std.array, std.conv, std.range, std.stdio, std.typecons; void main() { iota(1,100) .map!(a = tuple(a, a % 3 == 0 ? 0 : 4, a % 5 == 0 ? 8 : 4)) .map!(a = a[1] == a[2] ? a[0].text : fizzbuzz[a[1] .. a[2]]) .join(, ) .writeln; }
Re: Parallelization of a large array
On Tuesday, 10 March 2015 at 20:41:14 UTC, Dennis Ritchie wrote: Hi. How to parallelize a large array to check for the presence of an element matching the value with the data? Here's a simple method (warning: has pitfalls): import std.stdio; import std.parallelism; void main() { int[] a = new int[100]; foreach (i, ref elem; a) elem = cast(int)i; bool found; foreach (elem; a.parallel) if (elem == 895639) found = true; if (found) writeln(Yes); else writeln(No); }
Re: Purity not enforced for default arguments?
On Tuesday, 10 March 2015 at 21:56:39 UTC, Xinok wrote: I'm inclined to believe this is a bug. https://issues.dlang.org/show_bug.cgi?id=11048
Re: Strange behavior of the function find() and remove()
On Sunday, 8 March 2015 at 21:34:25 UTC, Dennis Ritchie wrote: This is normal behavior? Yes it is normal, there are two potential points of confusion: - remove mutates the input range and returns a shortened slice to the range which excludes the removed element. - remove takes an index as its second argument, not an element. For more information see: https://issues.dlang.org/show_bug.cgi?id=10959
Re: why GC not work?
On Sunday, 8 February 2015 at 16:23:44 UTC, FG wrote: 2. auto buf = new byte[](1024*1024*100); now the gc can't free this buf. can i free it by manual? Yes. import core.memory; GC.free(buf.ptr); // and don't use buf afterwards That won't work, see: http://forum.dlang.org/thread/uankmwjejsitmlmrb...@forum.dlang.org
Re: why GC not work?
On Sunday, 8 February 2015 at 18:43:18 UTC, FG wrote: On 2015-02-08 at 19:15, safety0ff wrote: That won't work, see: http://forum.dlang.org/thread/uankmwjejsitmlmrb...@forum.dlang.org Perhaps it was fixed in DMD 2.066.1, because this works for me just fine: Here's the link I couldn't find earlier: https://issues.dlang.org/show_bug.cgi?id=14134
Re: why GC not work?
False pointers, current GC is not precise.
Re: Allocating aligned memory blocks?
On Friday, 12 December 2014 at 06:17:56 UTC, H. S. Teoh via Digitalmars-d-learn wrote: Is there a way to allocate GC memory blocks in D that are guaranteed to fall on OS page boundaries? I don't know about guarantees, I think that in practice, if your OS page size is 4096, any GC allocation of 4096 or greater will be page aligned. should I just forget the GC and just use posix_memalign() manually? I think it may be possible to do what you want with mmap/munmap alone (selectively map parts of the file to memory.)
Re: naked popcnt function
On Saturday, 22 November 2014 at 18:30:06 UTC, Ad wrote: Hello, I would like to write a popcnt function. This works fine ulong popcnt(ulong x) { asm { mov RAX, x ; popcnt RAX, RAX ; } } However, if I add the naked keyword ( which should improve performance? ) it doesn't work anymore and I can't figure out what change I am supposed to make ( aside from x[RBP] instead of x ) This function is going to be *heavily* used. Thanks for any help. Last time I used naked asm simply used the calling convention to figure out the location of the parameter (e.g. RCX win64, RDI linux 64, iirc.) N.B. on LDC GDC there is an intrinsic for popcnt.
Re: new(malloc) locks everything in multithreading
On Friday, 24 October 2014 at 02:51:20 UTC, tcak wrote: I don't want to blame dmd directly because as far as I see from the search I did with __lll_lock_wait_private, some C++ programs are having same problem with malloc operation as well. But still, can this be because of compiler? Looks like bug #11981 [1], which should be fixed in the latest versions of the compiler. Which version are you using? [1] https://issues.dlang.org/show_bug.cgi?id=11981
Re: Global const variables
On Tuesday, 21 October 2014 at 08:25:07 UTC, bearophile wrote: Minas Mina: Aren't pure functions supposed to return the same result every time? If yes, it is correct to not accept it. But how can main() not be pure? Or, how can't the 'a' array be immutable? Bye, bearophile There can exist a mutable reference to a's underlying memory: const int[] a; int[] b; static this() { b = [1]; a = b; }
Re: How would you dive into a big codebase
On Wednesday, 22 October 2014 at 01:21:19 UTC, Freddy wrote: Is there any advice/tips for reading medium/big D codebases? Somewhat D specific: I would consider an IDE/editor like Eclipse with DDT that can give an outline of the data structures functions names in a source file to make the files easier to digest.
Re: A significant performance difference
The following D code runs over 2x faster than the C++ code (comparing dmd no options to g++ no options.) Its not a fair comparison because it changes the order of operations. import core.stdc.stdio; const uint H = 9, W = 12; const uint[3][6] g = [[7, 0, H - 3], [1 + (1 H) + (1 (2 * H)), 0, H - 1], [3 + (1 H), 0, H - 2], [3 + (2 H), 0, H - 2], [1 + (1 H) + (2 H), 0, H - 2], [1 + (1 H) + (1 (H - 1)), 1, H - 1]]; int main() { ulong p, i, k; ulong[uint] x, y; uint l; x[0] = 1; for (i = 0; i W; ++i) { y = null; while (x.length) foreach (j; x.keys) { p = x[j]; x.remove(j); for (k = 0; k H; ++k) if ((j (1 k)) == 0) break; if (k == H) y[j H] += p; else for (l = 0; l 6; ++l) if (k = g[l][1] k = g[l][2]) if ((j (g[l][0] k)) == 0) x[j + (g[l][0] k)] += p; } x = y; } printf(%lld\n, y[0]); return 0; }
Re: D daemon GC?
On Saturday, 30 August 2014 at 17:09:41 UTC, JD wrote: Hi all, I tried to write a Linux daemon in D 2.065 (by translating one in C we use at work). My basic skeleton works well. But as soon as I start allocating memory it crashed with several 'core.exception.InvalidMemoryOperationError's. It works for me with 2.066, I do not have 2.065 installed at the moment to see if it fails on 2.065.
Re: Appender is ... slow
IIRC it manages the capacity information manually instead of calling the runtime which reduces appending overhead.
Re: What hashing algorithm is used for the D implementation of associative arrays?
On Thursday, 14 August 2014 at 13:10:58 UTC, bearophile wrote: D AAs used to be not vulnerable to collision attacks because they resolved collisions building a red-black tree for each bucket. Later buckets became linked lists for speed, Slight corrections: It was a effectively a randomized BST, it used the hash value + comparison function to place the elements in the tree. E.g. The AA's node comparison function might be: if (hash == other.hash) return value.opCmp(other.value); else if (hash other.hash) return -1; return 1; The hash function has a significant influence on how balanced the BST will be. Insertion and removal order also have performance influence since rebalancing was only done when growing the AA. It had no performance guarantees. I believe it was removed to reduce memory consumption, see the Mar 19 2010 cluster of commits by Walter Bright to aaA.d. Since GC rounds up allocations to powers of two for small objects, the additional pointer doubles the allocation size per node. A template library based AA implementation should be able to handily outperform built-in AAs and provide guarantees. Furthermore, improved memory management could be a significant win. Fun fact: The AA implementation within DMD still uses the randomized BST though the hash functions are very rudimentary.
Re: A little of coordination for Rosettacode
On Tuesday, 12 February 2013 at 01:07:35 UTC, bearophile wrote: In practice at the moment I am maintaining all the D entries of Rosettacode. Here's a candidate for http://rosettacode.org/wiki/Extensible_prime_generator#D in case it is preferred to the existing entry: http://dpaste.dzfl.pl/43735da3f1d1
Re: Haskell calling D code through the FFI
On Tuesday, 5 August 2014 at 23:23:43 UTC, Jon wrote: So that does indeed solve some of the problems. However, using this method, when linking I get two errors, undefined reference rt_init() and rt_term() I had just put these methods in the header file. If I put wrappers around these functions and export I get the rt_init, rt_term is private. It works for me, here are the main parts of my Makefile: DC = ~/bin/dmd main: Main.hs FunctionsInD.a ghc -o main Main.hs FunctionsInD.a ~/lib/libphobos2.a -lpthread FunctionsInD.a: FunctionsInD.d $(DC) -c -lib FunctionsInD.d I passed in the phobos object directly because I don't know how to specify the ~/lib directory on the ghc command line.
Re: Haskell calling D code through the FFI
Don't forget to call rt_init: http://dlang.org/phobos/core_runtime.html#.rt_init
Re: Haskell calling D code through the FFI
On Monday, 4 August 2014 at 21:14:17 UTC, Jon wrote: On Monday, 4 August 2014 at 21:10:46 UTC, safety0ff wrote: Don't forget to call rt_init: http://dlang.org/phobos/core_runtime.html#.rt_init Where/when should I call this? Before calling any D functions, but usually it's simplest to call it early in main. It initializes the GC and notifies the D runtime of its existence. For simple D functions you might get away without calling it.
Re: Haskell calling D code through the FFI
On Monday, 4 August 2014 at 21:35:21 UTC, Jon wrote: I get Error: core.runtime.rt_init is private. And Error: core.runtime.init is not accessible. I would add them to the header and Haskell wrapper (FunctionsInD.h and ToD.hs.) The signatures are: int rt_init(); int rt_term(); When it is linked it will find the symbols in druntime.
Re: Threadpools, difference between DMD and LDC
On Sunday, 3 August 2014 at 19:52:42 UTC, Philippe Sigaud wrote: Can someone confirm the results and tell me what I'm doing wrong? LDC is likely optimizing the summation: int sum = 0; foreach(i; 0..task.goal) sum += i; To something like: int sum = cast(int)(cast(ulong)(task.goal-1)*task.goal/2);
Re: unittest affects next unittest
On Friday, 1 August 2014 at 23:09:39 UTC, sigod wrote: Code: http://dpaste.dzfl.pl/51bd62138854 (It was reduced by DustMite.) Have I missed something about structs? Or this simply a bug? Isn't this the same mistake as: http://forum.dlang.org/thread/muqgqidlrpoxedhyu...@forum.dlang.org#post-mpcwwjuaxpvwiumlyqls:40forum.dlang.org In other words: private Node * _root = new Node(); looks wrong.
Re: A little of coordination for Rosettacode
On Tuesday, 12 February 2013 at 01:07:35 UTC, bearophile wrote: In practice at the moment I am maintaining all the D entries of Rosettacode. I modified the Hamming numbers code in a personal exercise. It now uses considerably less memory but is slower. I've posted the code here in case it is of use: http://dpaste.dzfl.pl/3990023e5577 For a single n, n = 350_000_000: Alternative version 2: 13.4s and ~5480 MB of ram My code: 21s and ~74 MB of ram Regards.
Re: Segfault games with factorials
On Thursday, 24 July 2014 at 14:59:16 UTC, Darren wrote: It does seem that's the case. Which is odd, as I thought that DMD and LDC did TCO. Not in this case obviously. DMD doesn't do it with the :? operator: https://issues.dlang.org/show_bug.cgi?id=3713
Re: String to int exception
On Tuesday, 15 July 2014 at 12:24:48 UTC, Alexandre wrote: Thanks, but, when I convert I recive a 'c' in the front of my number... uint reverseBytes(uint val) { import core.bitop : bitswap; return bitswap(val); } You confused bswap with bitswap. The former reverses bytes, the latter reverses bits. If you look at bearophile's original message he says bswap.
Re: What am I doing Wrong (OpenGL SDL)
On Friday, 4 July 2014 at 09:39:49 UTC, Sean Campbell wrote: On Friday, 4 July 2014 at 08:02:59 UTC, Misu wrote: Can you try to add DerelictGL3.reload(); after SDL_GL_CreateContext ? yes this solved the problem. however why? is it a problem with the SDL binding? No. https://github.com/DerelictOrg/DerelictGL3/blob/master/README.md
CTFE bug or enhancement?
Everything compiles fine except for function qux2: http://dpaste.dzfl.pl/9d9187e0b450 Is this a bug or an enhancement for CTFE? It would be really nice to have this feature because core.simd has functions such as: void16 __simd(XMM opcode, void16 op1, void16 op2, ubyte imm8); Where all the arguments must be compile time constants. It would be nice to be able to push some parameters out from the type list and into the argument list in user code too.
Re: CTFE bug or enhancement?
Actually, this is an enhancement because adding: enum b = blah Makes them fail. :(
Re: CTFE bug or enhancement?
On Thursday, 3 July 2014 at 01:55:14 UTC, safety0ff wrote: Actually, this is an enhancement because adding: enum b = blah Makes them fail. :( The question is now: how can the delegate be evaluated for the return value but not for the enum?
Re: CTFE bug or enhancement?
On Thursday, 3 July 2014 at 02:02:19 UTC, safety0ff wrote: On Thursday, 3 July 2014 at 01:55:14 UTC, safety0ff wrote: Actually, this is an enhancement because adding: enum b = blah Makes them fail. :( The question is now: how can the delegate be evaluated for the return value but not for the enum? Looks like an ICE: https://github.com/D-Programming-Language/dmd/blob/master/src/interpret.c#L5169
Re: GC.calloc(), then what?
On Friday, 27 June 2014 at 23:26:55 UTC, Ali Çehreli wrote: I appreciated your answers, which were very helpful. What I meant was, I was partially enlightened but still had some questions. I am in much better shape now. :) Yea, I understood what you meant. :)
Re: GC.calloc(), then what?
On Friday, 27 June 2014 at 07:03:28 UTC, Ali Çehreli wrote: 1) After allocating memory by GC.calloc() to place objects on it, what else should one do? Use std.conv.emplace. In what situations does one need to call addRoot() or addRange()? Add root creates an internal reference within the GC to the memory pointed by the argument (void* p.) This pins the memory so that it won't be collected by the GC. E.g. you're going to pass a string to an extern C function, and the function will store a pointer to the string within its own data structures. Since the GC won't have access to the data structures, you must addRoot it to avoid creating a dangling pointer in the C data structure. Add range is usually for cases when you use stdc.stdlib.malloc/calloc and place pointers to GC managed memory within that memory. This allows the GC to scan that memory for pointers during collection, otherwise it may reclaim memory which is pointed to my malloc'd memory. 2) Does the answer to the previous question differ for struct objects versus class objects? No. 3) Is there a difference between core.stdc.stdlib.calloc() and GC.calloc() in that regard? Which one to use in what situation? One is GC managed, the other is not. calloc simply means the memory is pre-zero'd, it has nothing to do with C allocation / allocation in the C language 4) Are the random bit patterns in a malloc()'ed memory always a concern for false pointers? Does that become a concern after calling addRoot() or addRange()? If by malloc you're talking about stdc.stdlib.malloc then: It only becomes a concern after you call addRange, and the false pointers potential is only present within the range you gave to addRange. So if you over-allocate using malloc and give the entire memory range to addRange, then any false pointers in the un-intialized portion become a concern. If you're talking about GC.malloc(): Currently the GC zeros the memory unless you allocate NO_SCAN memory, so it only differs in the NO_SCAN case. If so, why would anyone ever malloc() instead of always calloc()'ing? To save on redundant zero'ing.
Re: GC.calloc(), then what?
I realize that my answer isn't completely clear in some cases, if you still have questions, ask away.
Re: GC.calloc(), then what?
On Friday, 27 June 2014 at 08:17:07 UTC, Ali Çehreli wrote: Thank you for your responses. I am partly enlightened. :p I know you're a knowledgeable person in the D community, I may have stated many things you already knew, but I tried to answer the questions as-is. On 06/27/2014 12:34 AM, safety0ff wrote: Add range is usually for cases when you use stdc.stdlib.malloc/calloc and place pointers to GC managed memory within that memory. This allows the GC to scan that memory for pointers during collection, otherwise it may reclaim memory which is pointed to my malloc'd memory. One part that I don't understand in the documentation is if p points into a GC-managed memory block, addRange does not mark this block as live. [SNIP] See, that's confusing: What does that mean? I still hold the memory block anyway; what does the GC achieve by scanning my memory if it's not going to follow references anyway? The GC _will_ follow references (i.e. scan deeply,) that's the whole point of addRange. What that documentation is saying is that: If you pass a range R through addRange, and R lies in the GC heap, then once there are no pointers (roots) to R, the GC will collect it anyway regardless that you called addRange on it. In other words, prefer using addRoot for GC memory and addRange for non-GC memory. 4) Are the random bit patterns in a malloc()'ed memory always a concern for false pointers? Does that become a concern after calling addRoot() or addRange()? If by malloc you're talking about stdc.stdlib.malloc then: It only becomes a concern after you call addRange, But addRange doesn't seem to make sense for stdlib.malloc'ed memory, right? The reason is, that memory is not managed by the GC so there is no danger of losing that memory due to a collection anyway. It will go away only when I call stdlib.free. addRange almost exclusively makes sense with stdlib.malloc'ed memory. As you've stated: If you pass it GC memory it does not mark the block as live. I believe the answer above clears things up: the GC does scan the range, and scanning is always deep (i.e. when it finds pointers to unmarked GC memory, it marks them.) Conversely, addRoot exclusively makes sense with GC memory. If you're talking about GC.malloc(): Currently the GC zeros the memory unless you allocate NO_SCAN memory, so it only differs in the NO_SCAN case. So, the GC's default behavior is to scan the memory, necessitating clearing the contents? That seems to make GC.malloc() behave the same as GC.calloc() by default, doesn't it? I don't believe it's necessary to clear it, it's just a measure against false pointers (AFAIK.) So, is this guideline right? GC.malloc() makes sense only with NO_SCAN. I wouldn't make a guideline like that, just say that: if you want the memory to be guaranteed to be zero'd use GC.calloc. However, due to GC internals (for preventing false pointers,) GC.malloc'd memory will often be zero'd anyway. If so, why would anyone ever malloc() instead of always calloc()'ing? To save on redundant zero'ing. And again, redundant zero'ing is saved only when used with NO_SCAN. Yup. I think I finally understand the main difference between stdlib.malloc and GC.malloc: The latter gets collected by the GC. Yup. Another question: Do GC.malloc'ed and GC.calloc'ed memory scanned deep? Yes, only NO_SCAN memory doesn't get scanned, everything else does.
Re: GC.calloc(), then what?
On Friday, 27 June 2014 at 08:17:07 UTC, Ali Çehreli wrote: So, the GC's default behavior is to scan the memory, necessitating clearing the contents? That seems to make GC.malloc() behave the same as GC.calloc() by default, doesn't it? Yes. compare: https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L543 to: https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L419
Re: GC.calloc(), then what?
On Friday, 27 June 2014 at 09:20:53 UTC, safety0ff wrote: Yes. compare: https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L543 to: https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L419 Actually, I just realized that I was wrong in saying the memory likely be cleared by malloc it's only the overallocation that gets cleared.
Re: modulo Strangeness?
On Wednesday, 11 June 2014 at 22:32:45 UTC, Taylor Hillegeist wrote: I have a simpleish bit of code here that always seems to give me an error, and i can't figure out quite why. modulo takes the sign of the dividend: http://en.wikipedia.org/wiki/Modulo_operation#Common_pitfalls It works with length because you introduce a signed - unsigned conversion.
Re: Hiding types
On Friday, 30 May 2014 at 19:50:43 UTC, Philippe Sigaud wrote: Am I misunderstanding something or is that a bug? Try: auto foo() { return Hidden();}
Re: Hiding types
On Friday, 30 May 2014 at 19:54:00 UTC, safety0ff wrote: On Friday, 30 May 2014 at 19:50:43 UTC, Philippe Sigaud wrote: Am I misunderstanding something or is that a bug? Try: auto foo() { return Hidden();} This is incorrect, please ignore.
Is this a bug or illegal code?
//*** CODE ** mixin(version = foo;); version(foo) { void main(){} } //** END CODE *** If it's illegal in D, what is the reason where is documented? The reason I was considering such a construct is the following: Some C libraries have an associated config.h header that gets generated when it is compiled. I was thinking it may be possible to parse these config.h files at compile time (using text import) and convert some of the #define's into version = foo; It wouldn't do any favors for compile times, but it would save on having an extra step in the build process to convert config.h to config.d.
Re: Is this a bug or illegal code?
On Thursday, 29 May 2014 at 15:02:48 UTC, Steven Schveighoffer wrote: Even if that is valid code, you are much better off using enums and static if. enum includeSomeFeature = ... static if(includeSomeFeature) { ... } These work much more like #defines, and can be seen outside the module. -Steve Thanks, the following works: mixin(enum hasFoo = true;); static if(hasFoo) { void main(){} }
Re: @safe @nogc memory allocation
I think malloc isn't @safe and alloca doesn't work if your function can throw.
Re: @safe @nogc memory allocation
On Wednesday, 28 May 2014 at 23:57:40 UTC, Dicebot wrote: I believe within current language semantics even considering `new` pure is broken, there was a very recent thread discussing it in digitalmars.D group. If you can be sure that your code won't break basic sanity requirements (never comparing allocated immutable pointer identity, only pointed values) it should work fine. But I have never done it in my code and not aware of possible pitfalls. You also have to make sure your calls to malloc wont be considered strongly pure and memoized. e.g.: int* a = cast(int*)malloc(4); // 4 should be considered immutable int* b = cast(int*)malloc(4); // a == b if memoized // a != b otherwise (unless out of memory) Perhaps the wrapper function should take a pointer reference as a parameter (note: not immutable,) this also means that it can use type inference.
Re: naming a variable at runtime
You should look into associative arrays ( http://dlang.org/hash-map .) Example: import std.stdio; void main() { int[][string] mybobs; mybobs[bob_1] = [-1, -1, 1, -1, -1]; mybobs[bob_2] = [-1, 1, 1, 1, -1]; mybobs[bob_3] = [-1, 1, 1, 1, -1]; writeln(mybobs); }
Re: Need help with movement from C to D
On Monday, 5 May 2014 at 03:57:54 UTC, Andrey wrote: A similar D code is, as far as I know, type.field.offsetof Is there an any way to make a corresponding D template? What you've written is the specific syntax for offsetof in D. If the intent is to create a template so that you can simply find/replace offsetof(type,field) with offsetoftemplate!(type,field) then I think it is easier to create a sed script - better yet a D program - for replacing the C macro with D code. Example program: import std.array; import std.file; import std.regex; import std.string; int main(string[] args) { if (args.length 2) return -1; auto regex = ctRegex!(`offsetof\(([^,]+),([^)]+)\)`); auto sink = appender!(char[])(); foreach (filename; args[1..$]) { auto text = readText(filename); sink.reserve(text.length); replaceAllInto!(cap = cap[1].strip~.~cap[2].strip~.offsetof)(sink, text, regex); write(filename, sink.data); sink.clear(); } return 0; }
Re: Just-run-the-unittests
On Sunday, 16 March 2014 at 07:59:33 UTC, Sergei Nosov wrote: Hi! Suppose I have a small .d script that has a main. Is there any simple way to just run the unit tests without running main at all? Here's the first thing that came to mind: If you never want to both unit tests and regular main: code begins import std.stdio; version(unittest) void main(){} else void main() { writeln(Hello world!); } unittest { writeln(Hello unit testing world!); } code ends If you sometimes want to have your normal main with unit testing you can replace version(unittest) with version(nopmain) or some other custom version identifier and compile with -version=nopmain when you want the dummy main.
Re: Scalability in std.parallelism
On Saturday, 22 February 2014 at 16:21:21 UTC, Nordlöw wrote: In the following test code given below of std.parallelism I get some interesting results: Don't forget that n.iota.map is returning a lazily evaluated range. Std.parallelism might have to convert the lazy range to a random access range (i.e. an array,) before it can schedule the work. If I add .array after the map call (e.g. auto nums = n.iota.map!piTerm.array;) I get numbers closer to the ideal for test2. Now we compare the differences between test1 and test2: test1 is reducing doubles and test2 is reducing ints. I believe that the reason for the difference in speed up is because you have hyper threads and not true independent threads. Hyper threads can contend for shared resources in the CPU (e.g. cache and FPU.) On my computer, forcing the nums to be a range of doubles in test2 causes the speed up to drop to approximately the same as test1. Regards.
Re: std.parallelism + alloc = deadlock
Could be related to #10351 or #5488 in bugzilla.
Re: Red-Black tree storing color without additional memory requirements
On Wednesday, 20 November 2013 at 08:48:33 UTC, simendsjo wrote: But I would think this trick would break the GC, as well as making code less portable. Since the GC supports interior pointers, I think you can justify using the least significant bits as long as the size and alignment of the pointed object guarantee that the pointer + tag will always lie inside the memory block.
Re: new Type[count] takes too much?
On Thursday, 31 October 2013 at 09:15:53 UTC, Namespace wrote: I'm sure we had already this conversation but I don't find the thread. T[] buffer = new T[N]; assumes more space than stated (in average 2010 elements more. See: http://dpaste.dzfl.pl/af92ad22c). It behaves exactly like reserve and that is IMO wrong. If I reserve memory with buffer.reserve(N), I want to have at least N elements. That behaviour is correct. But if I use new T[N] I mostly want exactly N elements and no extra space. Thoughts? To me it looks like it is derived directly from the way the GC allocates chunks: Next power of two if less than 4096 otherwise some multiple of 4096. Unless you modify the GC, this behaviour is present whether you can see it or not (http://dpaste.dzfl.pl/5481ffc2 .)
Is this a bug, if so, how would you summarize it?
See this code: http://dpaste.dzfl.pl/b3ae1667 On DMD it gives the error message if version=bug, but not if version=bug AND version=workaround1 (through 6). On LDC it segfaults at run time if version=bug, but not if version=bug AND version=workaround1 (through 6). Workaround1 - workaround4 are more or less boring. However it is surprising that workaround5 and workaround6 work but in the bug (when there's two aliases) you then get an error message for each alias.
Re: std.parallelism amap not scaling?
On Tuesday, 8 October 2013 at 10:54:08 UTC, JR wrote: On Monday, 7 October 2013 at 21:13:53 UTC, safety0ff wrote: I think I've found the culprit: Memory managment / GC, disabling the GC caused the program to eat up all my memory. I'll have to look into this later. From what I've gathered from http://forum.dlang.org/thread/dbeliopehpsncrckd...@forum.dlang.org, your use of enum makes it copy (and allocate) those variables on each access. Quoth Dmitry Olshansky in that thread (with its slightly different context); And the answer is - don't use ENUM with ctRegex. The problem is that ctRegex returns you a pack of datastructures (=arrays). Using them with enum makes it behave as if you pasted them as array literals and these do allocate each time. Merely replacing all occurences of enum with immutable seems to make a world of difference. I benched your main.d a bit on this laptop (also i7, so 4 real cores + HT); http://dpaste.dzfl.pl/a4ecc84f4 Note that inlining slows it down. I didn't verify its output, but if those numbers are true then ldmd2 -O -release -noboundscheck is a beast. Thank you for responding! I went ahead and stub'ed the gc as per: http://forum.dlang.org/thread/fbjeivugntvudgopy...@forum.dlang.org and ended coming to the same thread/conclusion. Enum creating hidden allocations is evil. :(
std.parallelism amap not scaling?
Hello, I tried converting a c++/go ray tracing benchmark [1] to D [2]. I tried to using std.parallelism amap to implement parallelism, but it does not seem to scale in the manner I expect. By running the program with different numbers of threads in the thread pool, I got these results (core i7 sandy bridge, 4 core +HT): Threads 1 2 3 4 5 6 7 8 Real time (s) 34.14 26.894 21.293 20.184 19.998 25.977 34.15 39.404 User time (s) 62.84 65.182 65.895 70.851 78.521 111.012 157.448 173.074 System time (s) 0.270.562 1.276 1.596 2.178 4.008 6.588 8.652 System calls 155808 224084 291634 403496 404161 360065 360065 258661 System calls error 39643 80245 99000 147487 155605 142922 142922 108454 I got these measurements using latest DMD/druntime/phobos, compiled with -O -release -inline -noboundscheck I used time and strace -c to measure: e.g.: time ./main -h=256 -w=256 -t=7 /dev/null strace -c ./main -h=256 -w=256 -t=7 /dev/null What I also noticed in the task manager is that no matter what I did, I could not get the utilization to go anywhere close to 99% (unlike the C++ program in [1].) My interpretation of these results is that std.parallelism.amap does significant communication between threads which causes issues with scaling. [1] https://github.com/kid0m4n/rays [2] https://github.com/Safety0ff/rays/tree/master/drays
Re: std.parallelism amap not scaling?
Ok, well I re-wrote the parallelism amap into spawning/joining threads and the results are similar, except notably less system calls (specifically, less futex calls.)
Re: std.parallelism amap not scaling?
I think I've found the culprit: Memory managment / GC, disabling the GC caused the program to eat up all my memory. I'll have to look into this later.