Re: GC question
On Saturday, 4 February 2017 at 15:23:53 UTC, Adam D. Ruppe wrote: On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote: - Automatic but conservative. Can leak at any time. All GCs are prone to leak, including precise ones. The point of garbage collection is not to prevent leaks, but rather to prevent use-after-free bugs. No, the main point of GC is to prevent leaks in the case where you have circular references. Precise GCs don't leak, by definition. If the object is reachable then it isn't a leak. Now, you might claim that objects that provably won't be touched again should be classified as dead and freed and that this is a bug that exhibit the same behaviour as a leak (running out of memory). But it's really nothing like the leaks you experience with manual memory management (e.g. circular references preventing memory from being released in a reference counting management scheme)
Re: GC question
On Sunday, 5 February 2017 at 04:22:30 UTC, rikki cattermole wrote: On 05/02/2017 5:02 PM, thedeemon wrote: snip It may look so from a distance. But in my experience it's not that bad. In most software I did in D it did not matter really (it's either 64-bit or short lived programs) and the control D gives to choose how to deal with everything makes it all quite manageable, I can decide what to take from both worlds and hence pick the best, not the worst. The best of both worlds can be done quite simply. Instead of a chain of input ranges like: int[] data = input.filter!"a != 7".map!"a * 2".array; Use: int[] data; data.length = input.length; size_t i; foreach(v; input.filter!"a != 7".map!"a * 2") { data[i] = v; i++; } data.length = i; Of course this is dirt simple example, but instead look at it for e.g. a csv parser with some complex data structure creation + manipulation. I have some real world code here[0] that uses it. Not only is there less allocations and uses the GC but also it ends up being significantly faster! [0] https://gist.github.com/rikkimax/42c3dfa6500155c5e441cbb1437142ea#file-reports-d-L124 Some data to weigh that in order to compare different memory management strategies on that simple case: #!/usr/bin/env rdmd import std.conv; import std.stdio; import std.array; import std.range; import std.algorithm; auto input = [1, 2, 7, 3, 7, 8, 8, 9, 7, 1, 0]; void naive() { int[] data = input.filter!(a => a!= 7).map!(a => a*2).array; assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string); } void maxReallocs() { int[] data; size_t i; foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) { data ~= v; } assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string); } void betterOfTwoWorlds() { int[] data; data.length = input.length; size_t i; foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) { data[i] = v; i++; } data.length = i; assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string); } void explicitNew() { int[] data = new int[input.length]; scope(exit) delete data; size_t i; foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) { data[i] = v; i++; } data.length = i; assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string); } void cStyle() @nogc { import std.c.stdlib; int* data = cast(int*)malloc(input.length * int.sizeof); scope(exit) free(data); size_t i; foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) { data[i++] = v; } debug assert(data[0..i] == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string); } void onTheStack() @nogc { int[100] data; size_t i; foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) { data[i++] = v; } debug assert(data[0..i] == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string); } void main(string[] args) { import std.datetime; benchmark!( naive, maxReallocs, betterOfTwoWorlds, explicitNew, cStyle, onTheStack )(10).each!writeln; } /* Results: Compiled with dmd -profile=gc test.d TickDuration(385731143) // naive, TickDuration(575673615) // maxReallocs, TickDuration(255928562) // betterOfTwoWorlds, TickDuration(270497154) // explicitNew, TickDuration(97596363) // cStyle, TickDuration(96467459) // onTheStack GC usage: bytes allocated, allocations, type, function, file:line 1760 10 int[] test.explicitNew test.d:43 440 10 int[] test.betterOfTwoWorlds test.d:30 320 80 int[] test.maxReallocs test.d:22 320 10 int[] test.maxReallocs test.d:25 320 10 int[] test.explicitNew test.d:51 320 10 int[] test.explicitNew test.d:53 320 10 int[] test.betterOfTwoWorlds test.d:37 320 10 int[] test.betterOfTwoWorlds test.d:39 320 10 std.array.Appender!(int[]).Appender.Data std.array.Appender!(int[]).Appender.this /usr/include/dlang/dmd/std/array.d:2675 320 10 int[] test.naive test.d:14 Compiled with dmd -O -inline test.d === TickDuration(159383005) // naive, TickDuration(187192137) // maxReallocs, TickDuration(94094585) // betterOfTwoWorlds, TickDuration(102374657) // explicitNew, TickDuration(41801695) // cStyle, TickDuration(45613954) // onTheStack Compiled with dmd -O -inline -release -boundscheck=off test.d = TickDuration(152151439) // naive, TickDuration(140870515) // maxReallocs, TickDuration(46740440) // betterOfTwoWorlds, TickDuration(59089016) // explicitNew, TickDuration(26038060) // cStyle, TickDuration(25984371) // onTheStack */
Re: GC question
On 05/02/2017 5:02 PM, thedeemon wrote: snip It may look so from a distance. But in my experience it's not that bad. In most software I did in D it did not matter really (it's either 64-bit or short lived programs) and the control D gives to choose how to deal with everything makes it all quite manageable, I can decide what to take from both worlds and hence pick the best, not the worst. The best of both worlds can be done quite simply. Instead of a chain of input ranges like: int[] data = input.filter!"a != 7".map!"a * 2".array; Use: int[] data; data.length = input.length; size_t i; foreach(v; input.filter!"a != 7".map!"a * 2") { data[i] = v; i++; } data.length = i; Of course this is dirt simple example, but instead look at it for e.g. a csv parser with some complex data structure creation + manipulation. I have some real world code here[0] that uses it. Not only is there less allocations and uses the GC but also it ends up being significantly faster! [0] https://gist.github.com/rikkimax/42c3dfa6500155c5e441cbb1437142ea#file-reports-d-L124
Re: GC question
On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote: - Automatic but conservative. Can leak at any time. You have to implement manual management (managed heaps) to avoid leaks. Leaks are hard to find as any heap value may be causing it. By "managed heap" I just meant the GC heap, the one used by "new" operator. Besides it, there are already other allocators and container libraries available, don't need to implement this stuff manually. the worst of both worlds. It may look so from a distance. But in my experience it's not that bad. In most software I did in D it did not matter really (it's either 64-bit or short lived programs) and the control D gives to choose how to deal with everything makes it all quite manageable, I can decide what to take from both worlds and hence pick the best, not the worst.
Re: GC question
All GCs are prone to leak, including precise ones. The point of garbage collection is not to prevent leaks, but rather to prevent use-after-free bugs. Of course I can have leaks in a GC environment, but having non-deterministic leaks is another thing, and I'd rather make sure to delete my references to let GC do its thing than to pray and hope some random number on my stack won't be in the range of my heap. I don't agree that the point is just preventing use-after-free, which can be guaranteed statically even in a non-GC language (see e.g. Rust).
Re: GC question
On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote: - Automatic but conservative. Can leak at any time. All GCs are prone to leak, including precise ones. The point of garbage collection is not to prevent leaks, but rather to prevent use-after-free bugs. Granted, the D 32 bit GC is more prone to leak than most others (including D 64 bit), but this isn't as horrible as you're believing, it does it's *main* job pretty well just at the cost of higher memory consumption, which we can often afford. And if you can't, manual management of large arrays tends to be relatively simple anyway. For example, my png.d used to leak something nasty in 32 bit because it used GC-allocated large temporary buffers while decompressing images. But, since they were temporary buffers, it was really easy to just `scope(exit) free(buffer);` after allocating to let them be freed at the end of the function. Then the memory consumption cut in half.
Re: GC question
On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote: I'm surprised that D was able to come this far with this. It's used mostly for server software. Things are moving to 64 bit, so this will be less of an issue.
Re: GC question
On Saturday, 4 February 2017 at 11:09:21 UTC, thedeemon wrote: On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote: I'm wondering what are the implications of the fact that current GC is a Boehm-style conservative GC rather than a precise one, I've never worked with a conservative GC before. Are there any disallowed memory operations? Can I break things by not following some unchecked rules etc. ? How often does it leak? Do I need to be careful with some operations to avoid leaks? Here's some practical perspective from someone who released a 32-bit video processing app in D with thousands of users. When developing with GC in D you need to keep in mind 3 key things: 1) The GC will treat some random stack data as possible pointers, and some of those false pointers will accidentally point to some places in the heap, so for any object in GC heap there is a probability that GC will think it's alive (used) even when it's not, and this probability is directly proportional to the size of your object. 2) Each GC iteration scans the whole GC heap, so the larger your managed heap, the slower it gets. Main consequence of 1 and 2: don't store large objects (images, big file chunks etc.) in the GC heap, use other allocators for them. Leave GC heap just for the small litter. This way you practically don't leak and keep GC pauses short. 3) GC will call destructors (aka finalizers) for the objects that have them, and during the GC phase no allocations are allowed. Also, since you don't know in which order objects are collected, accessing other objects from a destructor is a bad idea, those objects might be collected already. Main consequence of 3: don't do silly things in destructor (like throwing exceptions or doing other operations that might allocate), try avoiding using the destructors at all, if possible. They may be used to ensure you release your resources, but don't make it the primary and only way to release them, since some objects might leak and their destructors won't be called at all. If you follow these principles, your app will be fine, it's not hard really. Honestly this still sounds horrible. I'd be OK with any of these two: - Percise GC, no manual management, no RAII or destructors etc. - Manual GC, RAII and destructors, smart pointers. but this: - Automatic but conservative. Can leak at any time. You have to implement manual management (managed heaps) to avoid leaks. Leaks are hard to find as any heap value may be causing it. is the worst of both worlds. I'm surprised that D was able to come this far with this.
Re: GC question
On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote: I'm wondering what are the implications of the fact that current GC is a Boehm-style conservative GC rather than a precise one, I've never worked with a conservative GC before. Are there any disallowed memory operations? Can I break things by not following some unchecked rules etc. ? How often does it leak? Do I need to be careful with some operations to avoid leaks? Here's some practical perspective from someone who released a 32-bit video processing app in D with thousands of users. When developing with GC in D you need to keep in mind 3 key things: 1) The GC will treat some random stack data as possible pointers, and some of those false pointers will accidentally point to some places in the heap, so for any object in GC heap there is a probability that GC will think it's alive (used) even when it's not, and this probability is directly proportional to the size of your object. 2) Each GC iteration scans the whole GC heap, so the larger your managed heap, the slower it gets. Main consequence of 1 and 2: don't store large objects (images, big file chunks etc.) in the GC heap, use other allocators for them. Leave GC heap just for the small litter. This way you practically don't leak and keep GC pauses short. 3) GC will call destructors (aka finalizers) for the objects that have them, and during the GC phase no allocations are allowed. Also, since you don't know in which order objects are collected, accessing other objects from a destructor is a bad idea, those objects might be collected already. Main consequence of 3: don't do silly things in destructor (like throwing exceptions or doing other operations that might allocate), try avoiding using the destructors at all, if possible. They may be used to ensure you release your resources, but don't make it the primary and only way to release them, since some objects might leak and their destructors won't be called at all. If you follow these principles, your app will be fine, it's not hard really.
Re: GC question
On Friday, 3 February 2017 at 11:36:26 UTC, osa1 wrote: On Friday, 3 February 2017 at 10:49:00 UTC, Kagamin wrote: Leaks are likely in 32-bit processes and unlikely in 64-bit processes. See e.g. https://issues.dlang.org/show_bug.cgi?id=15723 This looks pretty bad. I think I'll consider something else until D's memory management story gets better. This is sad because the language otherwise looks quite good, and I'd love to try assertions, contracts, scope guards, macros etc. you can use less auto GC. use the RC to replace the GC. https://github.com/huntlabs/SmartRef
Re: GC question
On Friday, 3 February 2017 at 10:49:00 UTC, Kagamin wrote: Leaks are likely in 32-bit processes and unlikely in 64-bit processes. See e.g. https://issues.dlang.org/show_bug.cgi?id=15723 This looks pretty bad. I think I'll consider something else until D's memory management story gets better. This is sad because the language otherwise looks quite good, and I'd love to try assertions, contracts, scope guards, macros etc.
Re: GC question
On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote: Are there any disallowed memory operations? Currently can't touch GC from destructor during collection. Another concern is interoperability with C-allocated memory: GC knows nothing about C heap. How often does it leak? Leaks are likely in 32-bit processes and unlikely in 64-bit processes. See e.g. https://issues.dlang.org/show_bug.cgi?id=15723 Do I need to be careful with some operations to avoid leaks? Leaks happen only due to false pointers. But data allocated in GC with new operator and known to have no pointers (e.g. strings) is not scanned. Premature collection happen when GC doesn't see a pointer to the allocated data, happens when such pointer is put in a memory GC doesn't see, like C heap. Is a precise GC in the roadmap? There's an effort on it: https://forum.dlang.org/post/hdwwkzqswwtffjehe...@forum.dlang.org It's fine if I have to do manual memory management, but I don't want any leaks. If you manually deallocate memory, it gets deallocated for sure, shouldn't leak. Comparing to java, D GC trades GC performance for code execution performance, which can result in better overall performance when you don't allocate much and worse performance for allocation-heavy code that java GC is optimized for.
Re: GC question
On Wednesday, 1 February 2017 at 09:50:42 UTC, osa1 wrote: Thanks for the answer. Could you elaborate on the lacklustre part? It's fine if I have to do manual memory management, but I don't want any leaks. Ideally I'd have a precise GC + RAII style resource management when needed. Rust, Go, Java, Swift etc have a single memory management scheme which is used by libraries and mostly enforced by the compiler. In C++ you tend to go with unique ownership and occasional shared ownership with the ability to have weak pointers and swap out objects without updating pointer, and there is an easy transition form old ad-hoc ownership to shared (the reference counter is in a separate object). It is not enforced by the compiler, but C++ is moving towards having dedicated tools for checking correctness. In D the goal is to have safety enforced by the compiler, but it isn't quite there yet and what is on the map for leak free resource management seems a simple reference counting mechanism (simpler than swift?) with refcount embedded in objects (like intrusive_ptr in Boost except the compiler is intended to be better at optimizing unnecessary updates of the reference count).
Re: GC question
On Wednesday, 1 February 2017 at 09:40:17 UTC, Ola Fosheim Grøstad wrote: On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote: I'm wondering what are the implications of the fact that current GC is a Boehm-style conservative GC rather than a precise one, I've never worked with a conservative GC before. The GC isn't competitive with the ones you find in GC languages (Java, Go etc). E.g. Go is now aiming for GC pauses in the microseconds range. Resource management in D is rather lacklustre, even C++ does better imho. D seems to move towards using thread local ref-counting and making GC optional. I guess that would be ok on cpus with few cores, but not really adequate on many core CPUs. Thanks for the answer. Could you elaborate on the lacklustre part? It's fine if I have to do manual memory management, but I don't want any leaks. Ideally I'd have a precise GC + RAII style resource management when needed.
Re: GC question
On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote: I'm wondering what are the implications of the fact that current GC is a Boehm-style conservative GC rather than a precise one, I've never worked with a conservative GC before. The GC isn't competitive with the ones you find in GC languages (Java, Go etc). E.g. Go is now aiming for GC pauses in the microseconds range. Resource management in D is rather lacklustre, even C++ does better imho. D seems to move towards using thread local ref-counting and making GC optional. I guess that would be ok on cpus with few cores, but not really adequate on many core CPUs.
GC question
Hi all, I was looking at D as the next language to use in my hobby projects, but the "conservative GC" part in the language spec (http://dlang.org/spec/garbage.html) looks a bit concerning. I'm wondering what are the implications of the fact that current GC is a Boehm-style conservative GC rather than a precise one, I've never worked with a conservative GC before. Are there any disallowed memory operations? Can I break things by not following some unchecked rules etc. ? How often does it leak? Do I need to be careful with some operations to avoid leaks? Is a precise GC in the roadmap? any kind of comments on the GC would be really appreciated. Thanks