Re: Safe reference counting cannot be implemented as a library
On Tuesday, 27 October 2015 at 18:10:18 UTC, deadalnix wrote: I've made the claim that we should implement reference counting as a library many time, so I think I should explicit my position. Indeed, RC require some level a compiler support to be safe. That being said, the support does not need to be specific to RC. On fact, my position is that the language should provide some basic mechanism on top of which safe RC can be implemented, as a library. The problem at hand here is escape analysis. The compiler must be able to ensure that a reference doesn't escape the RC mechanism in an uncontrolled manner. I'd like to add such mechanism to the language rather than bake in reference counting, as it can be used to solve other problem we are facing today (@nogc exception for instance). Here's a link to the reference safety system I proposed some months ago: http://forum.dlang.org/post/offurllmuxjewizxe...@forum.dlang.org I'm very far from having the expertise needed to know whether it would be worth its weight in practice, but it was better to write it out than to keep it bottled up in my head. I hope it will be of some use.
Re: Mitigating the attribute proliferation - attribute inference for functions
Please scan this thread for any useful ideas: http://forum.dlang.org/post/vlzwhhymkjgckgyox...@forum.dlang.org I don't have the technical expertise to know if it's useful or could work. The basic suggestion is that D has a function attribute which expressly indicates that a function is separately compiled, thus eliminating all ambiguity and mystery about what can and can't be inferred.
Re: const as default for variables
On Wednesday, 18 March 2015 at 09:28:35 UTC, deadalnix wrote: On Wednesday, 18 March 2015 at 06:24:38 UTC, Zach the Mystic wrote: I'm starting to think that refcounting is precisely the opposite of ownership, useful only for when its *impossible* to track ownership easily. Otherwise why would you need a refcount? It is not the language's problem. If the language defines ownership, the you can define all kind of RC systems as library type by deferring the ownership of things to the RC library. The good thing about it is that it doesn't limit the library solution to be ref counting, but it can be anything else, or any refcounting strategy. Indeed, internally, the RC system have to play unsafe, but as long as it has to free, it has to play unsafe anyway. The important point is that it can provides a safe interface to the outside world. The inc/dec elision problem is simply a copy optimization problem. Framing it as a refcounting problem is the wrong way to think about it. You would like to elide copy as much as possible. The first element for this is borrowing. You can pass borrowed things around without needing to have copies. So let's go through the steps. Question: What parts of borrowing are internally kept track of by the compiler, and what parts are made manifest in code? For what is made manifest, how do they appear -- as type qualifiers, i.e. `borrowed`, `owned` -- or built-in properties, e.g. `fun(x.borrowed)`? For things kept hidden, we need to find potential sources of ambiguity, and derive reliable algorithms to resolve them. For me, a big issue is passing variables as arguments, because the compiler can't read into the function to see what it does, and the function can only tell the caller what the attribute system allows. What if the caller takes a wrapped type and you only have an unwrapped version, or a different wrapped version, to pass to it? Should there be any way to pass it transparently (i.e for the called type to automatically receive the passed type of the argument), or does it have to be created manually? (I was thinking about this when Andrei was trying to create smart pointers, and wondered what it would take to create a`Ref!` type to entirely replace the `ref` storage class.) This may or may not be related to a fully effective ownership system. The general problem of the assignation comes up when something is borrowed several time and assigned. const is obviously a situation where we can elide when borrowing, but that is not the only one. In that situation, only borrowing the RC wrapper require a copy (borrowing the wrapped do not). Note that borrowing the wrapped is most likely what you want in the first place in most situation (so the code manipulating the borrowed do not need to rely on a specific memory management scheme, which allow for versatile libraries) so copy elision is what you'll in most situation as well. I guess this will most often be accomplished with `alias this` when passing to an argument? I guess what you're suggesting is that if a function may delete a reference, you can detect this because it accepts only a fully RC'd type rather than the unwrapped version. Solid core constructs are much better that attribute proliferation. I totally agree, but at this point, we must figure out precisely which constructs to ask for, and then convince everyone else of their worth. How did you become convinced of the value of built-in ownership? Is there a good article you could point me to? Secondly, what do you suggest it would look like in D? Type qualifiers, a storage class, function/parameter attributes? How much just takes place invisibly to the programmer?
Re: The next iteration of scope
On Wednesday, 18 March 2015 at 13:01:50 UTC, Oren Tirosh wrote: On Sunday, 15 March 2015 at 19:11:36 UTC, Marc Schütz wrote: On Sunday, 15 March 2015 at 17:31:17 UTC, Nick Treleaven wrote: On 15/03/2015 14:10, Marc =?UTF-8?B?U2Now7x0eiI=?= schue...@gmx.net wrote: Here's the new version of my scope proposal: http://wiki.dlang.org/User:Schuetzm/scope2 It's still missing real-life examples, a section on the implementation, and a more formal specification, as well as a discussion of backwards compatibility. But I thought I'd show what I have, so that it can be discussed early on. I hope it will be more digestible for Walter Andrei. It's more or less an extended version of DIP25, and avoids the need for most explicit annotations. I too want a scope attribute e.g. for safe slicing of static arrays, etc. I'm not sure if it's too late for scope by default though, perhaps. If we get @safe by default, we automatically get scope by default, too. The scope storage class is a two way contract. The function promises not to escape the reference. The caller promises to ensure the storage that the reference is pointing to will remain valid for the duration of the function call. In some cases, the caller code may need to take active steps to ensure that, like keeping an otherwise temporary reference alive to prevent it from being deallocated. But what if the pointer is null? Can this be considered to fulfill the caller's part of the deal? Yes, the old @notnull debate again. For me, @safe by default and scope by default also suggests @notnull by default for scope references. Sorry if this opens up directions you don't want to think about at the moment... So far, null pointers haven't been a big part of the discussion. By the existing definition, a null pointer is memory safe, because it doesn't point to anything. But they are obviously a problem in their own right.
Re: Phobos Documentation - call to action
On Wednesday, 18 March 2015 at 03:45:07 UTC, Walter Bright wrote: The bad news: the Phobos documentation sux. The good news: we can make things a lot better by just filling in blanks. For example, picking a function largely at random: http://dlang.org/phobos/std_uni.html#sicmp There is no Params section, no Returns: section, and no See_Also section. Hence, I wrote a PR for it: https://github.com/D-Programming-Language/phobos/pull/3060 There's nothing clever about it, just filling in the blanks. If we all pitch in, we can substantially improve the documentation. Some guidelines: 1. The sections Params, Returns, and See_Also need to be there. (Unless there are no parameters, or a void return.) 2. One PR per function being fixed. 3. Resist the urge to do more, stay focused simply on filling in the blanks, one PR per function, making things easy to review. No responses yet -- not that I'm any less guilty than anyone else. But maybe this needs to be bumped up to a higher priority -- a hiatus on internal development for a couple weeks solely to bring documentation up to a minimum. Obviously clear guidelines like the ones you just posted are a plus.
Re: Phobos Documentation - call to action
On Wednesday, 18 March 2015 at 20:19:10 UTC, Walter Bright wrote: On 3/18/2015 12:42 PM, Zach the Mystic wrote: But why, therefore, is it so hard to get movement on it? I don't know why, so I'll ask. Why haven't you submitted a PR to fix one of them? :-) I have pathetically little experience with most of phobos. I most certainly hold the record for amount of passion associated with the D language versus number of lines actually coded in it. That said, it can't be that hard to figure out what the parameters are and what they return. If you give me a specific module, I'll start making pull requests for it.
Re: Phobos Documentation - call to action
On Wednesday, 18 March 2015 at 18:09:07 UTC, Walter Bright wrote: On 3/18/2015 10:55 AM, Zach the Mystic wrote: No responses yet -- not that I'm any less guilty than anyone else. But maybe this needs to be bumped up to a higher priority -- a hiatus on internal development for a couple weeks solely to bring documentation up to a minimum. Obviously clear guidelines like the ones you just posted are a plus. We have a great language, but represent it poorly in the documentation. Every library entry also needs a pithy example (or even any example at all), but I thought we could make progress first by simply documenting what the return value is supposed to be. We also need to stop pulling new library additions that have obviously inadequate documentation. I'm just thinking in terms of psychology. I haven't seen anyone disagree that the documentation is inadequate, so that's not even disputed. But why, therefore, is it so hard to get movement on it? I suspect that it's because it is perceived as a chore, like cleaning a barn. I don't want to go in that barn by myself. But if I everyone's doing it, with the mutual understanding that it needs to get done - and no one is exempt - then it doesn't feel so bad. At some point, it must be possible for documentation to get so bad that *nothing* is more important. Otherwise, it may well continue to flounder in destitute obscurity, never receiving the attention it deserves.
Re: Phobos Documentation - call to action
On Wednesday, 18 March 2015 at 19:50:24 UTC, Andrei Alexandrescu wrote: On 3/18/15 12:42 PM, Zach the Mystic wrote: At some point, it must be possible for documentation to get so bad that *nothing* is more important. Strategically we're definitely there, and have been for a while (if we define improving D's rate of adoption as important). Yeah, it appears so. A body of examples of idiomatic uses of the language is missing. Unfortunately, I get the sense that's not the only thing that's missing. Full disclosure: I'm not an experienced team leader, so I can't promise my suggestion will work. That said, I suggest, for the purpose of turning motive into action, a ten-day Documentation Holiday, akin to Franklin D. Roosevelt's Bank Holiday of 1933: http://en.wikipedia.org/wiki/Emergency_Banking_Act Guidelines for enhancements must be drawn up and made clear in advance, and the community given sufficient notice to prepare for the holiday. It's just an idea... but as you say, D is already there, strategically. I didn't feel great about having to be the first to respond to this thread, since I'm not a major contributor (yet, anyway) - it looks like the sign of a real problem.
Re: const as default for variables
On Tuesday, 17 March 2015 at 22:53:20 UTC, deadalnix wrote: On Tuesday, 17 March 2015 at 22:25:30 UTC, Zach the Mystic wrote: The real devil against safe reference counting is in the assignment operators, when they do destructive moves. I think those have to be the focus of any effort here. I'm trying to imagine a parameter attribute `@destroy`, for example, indicating that its reference may get destroyed. Not sure if it will work, or even help, but it's a start. That is the wrong approach. This is a know problem and there is a known solution: ownership. If we are going to add something in the language to handle it, then it has to be ownership. Just so we're clear, there are two problems. One is making ref-counting safe. The other is making it fast, by eliding unnecessary operations. The issue I'm worried about is when you pass an RC'd type as an argument by value, for example, you make a copy. To be safe the compiler should wrap the original in an inc/dec cycle for the duration of the call. But this is a waste if there's no risk of reassignment, if you're just mutating the original data, or if the type isn't even an RC'd type but has some other kind of destructor. My guess was that `@destroy` could help the compiler elide unnecessary cycles this way. If you always pass by reference (e.g. `ref`), you're sending the original, rather than copying it. This needs no wrapping therefore, since any reassignment will affect the original. What good would ownership do in that case? Any normal copying will increase the refcount anyway. I'm starting to think that refcounting is precisely the opposite of ownership, useful only for when its *impossible* to track ownership easily. Otherwise why would you need a refcount? What would be really interesting is a combined system, where the compiler detects the ownership properties of any given variable, and automatically decides whether it needs to be refcounted or not. There could be a built-in template in the runtime, e.g. a `_refCounted(T)`, which must be a perfect drop-in replacement for a regular `T` in all cases -- difficult, yes, but interesting to imagine at least -- which the compiler would swap in at its discretion. Obviously a huge flight of fancy, given that D is not in the habit of altering the basic type of a variable based on how it used... but it would be very efficient if it worked. Do you agree that refcounting and ownership oppose each other, that refcounting only makes sense when ownership is impossible? That refcounting is a runtime mechanism for tracking precisely what a compile time ownership system can't? In other words, what problems does ownership solve, and how?
Re: `return const` parameters make `inout` obsolete
On Tuesday, 17 March 2015 at 18:27:01 UTC, ixid wrote: To be fully viable, `return` would have to be secretly recorded as part of the `x's type, so that the compiler could forgive returning it to a non-const. But the compiler should probably track that `x` is copied from `t` anyway, so that it can verify `return t` when it returns `x`, and the same information would be used to forgive `x's constness. But yeah, there might still be a use for `inout`. Why is this ability important? It feels like trying to distort non-templates into templates. Is this the alternative to using templates or repeating yourself or are there other important aspects to it? I don't know for sure. I think the main point of `inout` is to avoid returning a copy of a reference that's mutable and assigning it to an immutable. When there is no copy of a reference (i.e. it's unique), or if you know that all possible copies are immutable, there's no problem. Even an immutable reference with lifetime shorter than the `const` value it copies is okay. In other words, it seems like there are a lot of cases where you can assign something that returns a regular `T*` to an `immutable(T*)` safely.
Re: const as default for variables
On Tuesday, 17 March 2015 at 19:53:14 UTC, deadalnix wrote: On Tuesday, 17 March 2015 at 13:55:36 UTC, Dejan Lekic wrote: I definitely think this is a good idea. And if someone wants mutable variable, we simply use proposed 'var' storage class. Brilliant! This is going to break pretty much all the code that use auto. The benefice for the compiler is hypothetical. Walter is right when mentioning that is can remove some refcount boilerplate, which is right but, we have no idea how much, and how good the compiler would be at recognizing them. With existing languag feature, the compiler CANNOT leverage this change for optimization. The real devil against safe reference counting is in the assignment operators, when they do destructive moves. I think those have to be the focus of any effort here. I'm trying to imagine a parameter attribute `@destroy`, for example, indicating that its reference may get destroyed. Not sure if it will work, or even help, but it's a start.
Re: `return const` parameters make `inout` obsolete
On Tuesday, 17 March 2015 at 12:02:15 UTC, Nick Treleaven wrote: On 16/03/2015 14:17, Zach the Mystic wrote: char* fun(return const char* x); Compiler has enough information to adjust the return type of `fun()` to that of input `x`. This assumes return parameters have been generalized to all reference types. Destroy. inout can be used for local variables too. Yeah that might be a use for it. inout(T*) fun(inout(T*) t) { inout(T*) x = t; return x; } -- T* gun(return const T* t) { const(T*) x = t; return x; } To be fully viable, `return` would have to be secretly recorded as part of the `x's type, so that the compiler could forgive returning it to a non-const. But the compiler should probably track that `x` is copied from `t` anyway, so that it can verify `return t` when it returns `x`, and the same information would be used to forgive `x's constness. But yeah, there might still be a use for `inout`.
Re: The next iteration of scope
On Monday, 16 March 2015 at 20:50:46 UTC, Marc Schütz wrote: On Monday, 16 March 2015 at 19:43:01 UTC, Zach the Mystic wrote: I always tend to think of member functions as if they weren't: struct S { T t; ref T fun() return { return t; } } In my head, I just translate fun() above to: ref T fun(return S* __this) { return __this.t; } Therefore whatever the scope of `__this`, that's the scope of the return, just like it would be for any other parameter. Then: S s; s.fun(); ... is really just `fun(s);` in disguise. That's why it's hard for me to grasp `scope` members, because they seem to me to be just as scope as their parent, whether global or local. It works just the same: struct S { private int* payload_; ref int* payload() return { return payload_; } } ref int* payload(scope ref S __this) return { return __this.payload_;// well, imagine it's not private } More accurately, // `return` is moved ref int* payload(return scope ref S __this) { return __this.payload_; } I think that if you need `return` to make it safe, there's much less need for `scope`. Both the S.payload() and the free-standing payload() do the same thing. From inside the functions, `return` tells us that we're allowed to a reference to our payload. From the caller's point of view, it signifies that the return value is scoped to the first argument, or `this` respectively. To reiterate, `scope` members are just syntactical sugar for the kinds of accessor methods/functions in the example code. There's nothing special about them. That's fine, but then there's the argument that syntax sugar is different from real functionality. To add it would require a compelling use case. My fundamental issue with `scope` in general is that it should be the safe default, which means it doesn't really need to appear that often. If @safe is default, the compiler would force you to mark any parameter `return` when it detected such a return. How a member could be scope when the parent is global is hard for me to imagine. The following is clear, right? int* p; scope int* borrowed = p; That's clearly allowed, we're storing a reference to a global or GC object into a scope variable. Now let's use `S`, which contains an `int*` member: S s; scope S borrowed_s = s; That's also ok. Doesn't matter whether it's the pointer itself, or something containing the pointer. And now the final step: scope int* p2; p2 = s.payload; // OK p2 = borrowed_s.payload; // also OK static int* p3; p3 = s.payload; // NOT OK! However, if `payload` were not the accessor method/function, but instead a simple (non-scope) member of `S`, that last line would be allowed, because there is nothing restricting its use. See above. With `return` being forced on the implicit this parameter: ref int* payload(return /*scope*/ ref S __this) { ... } `return` covers the need for safety, unless I'm still missing something. For members that the struct owns and want's to manage itself, this is not good. Therefore, we make it private and allow access to it only through accessor methods/functions that are annotated with `return`. But we could accidentally forget an annotation, and the pointer could escape. Same argument. Forgetting `return` in safe code == compiler error. I think DIP25 already does this.
Re: const as default for variables
On Monday, 16 March 2015 at 19:52:00 UTC, deadalnix wrote: On Monday, 16 March 2015 at 14:40:51 UTC, Zach the Mystic wrote: On Sunday, 15 March 2015 at 20:09:56 UTC, deadalnix wrote: On Sunday, 15 March 2015 at 07:44:50 UTC, Walter Bright wrote: const ref can tell the optimizer that that path for a ref counted object cannot alter its ref count. That is not clear why. const ref is supposed to protect against escaping when ref does not ? There are two cases here. One is when the reference is copied to new variable, which would actually break const because the reference count of the original data would have to be incremented (which is a separate issue). I think we should provide library solution for this kind of things. Changing the reference count is a very low-level operation. I'm not sure how to go about breaking the type system in order to support `const` variations on it.
`return const` parameters make `inout` obsolete
char* fun(return const char* x); Compiler has enough information to adjust the return type of `fun()` to that of input `x`. This assumes return parameters have been generalized to all reference types. Destroy.
Re: const as default for variables
On Sunday, 15 March 2015 at 20:09:56 UTC, deadalnix wrote: On Sunday, 15 March 2015 at 07:44:50 UTC, Walter Bright wrote: const ref can tell the optimizer that that path for a ref counted object cannot alter its ref count. That is not clear why. const ref is supposed to protect against escaping when ref does not ? There are two cases here. One is when the reference is copied to new variable, which would actually break const because the reference count of the original data would have to be incremented (which is a separate issue). But the other case is where the original is reassigned, in which the counter for the data it used to point to gets decremented, possibly to zero. `const` would guarantee against this. But even this is a blunt force weapon, because it would also stop you from mutating the original data, even though that wouldn't change the reference count.
Re: `return const` parameters make `inout` obsolete
On Monday, 16 March 2015 at 14:23:42 UTC, ketmar wrote: On Mon, 16 Mar 2015 14:17:57 +, Zach the Mystic wrote: char* fun(return const char* x); Compiler has enough information to adjust the return type of `fun()` to that of input `x`. This assumes return parameters have been generalized to all reference types. Destroy. but why compiler has to rewrite return type? i never told it to do that! It has to if you pass an immutable to x, which you're allowed to do. It only gives an error if you assign the result to a mutable variable. The point is that the signature still contains all the information it needs without `inout`. What old errors will fail to be reported and what new errors would it cause? I haven't been able to think of any.
Re: `return const` parameters make `inout` obsolete
On Monday, 16 March 2015 at 15:39:39 UTC, ketmar wrote: On Mon, 16 Mar 2015 15:33:40 +, Zach the Mystic wrote: On Monday, 16 March 2015 at 14:23:42 UTC, ketmar wrote: On Mon, 16 Mar 2015 14:17:57 +, Zach the Mystic wrote: char* fun(return const char* x); Compiler has enough information to adjust the return type of `fun()` to that of input `x`. This assumes return parameters have been generalized to all reference types. Destroy. but why compiler has to rewrite return type? i never told it to do that! It has to if you pass an immutable to x, which you're allowed to do. It only gives an error if you assign the result to a mutable variable. The point is that the signature still contains all the information it needs without `inout`. What old errors will fail to be reported and what new errors would it cause? I haven't been able to think of any. this is the question of consistency. if i wrote `char* fun()`, i want fun to return `char*`, and i'm not expecting it to change in a slightest. i don't like when compiler starts to change things on it's own. I think it's just less cluttered than `inout`. The simple fact is that if you try to assign an immutable variable to a mutable reference, you will still get an error. I doubt it would take long for programmers to adjust to the new way of reading the signatures. They just see `return` sitting there in front of `const` and know how to handle the situation.
Re: The next iteration of scope
On Monday, 16 March 2015 at 13:55:43 UTC, Marc Schütz wrote: Also, what exactly does the `scope` on T payload get you? Is it just a more specific version of `return` on the this parameter, i.e. `return this.payload`? Why would you need that specificity? What is the dangerous operation it is intended to prevent? Nick already answered that. I'll expand on his explanation: Let's take the RC struct as an example. Instances of RC can appear with and without scope. Because structs members inherit the scope-ness from the struct, `payload` could therefore be an unscoped pointer. It could therefore be escaped unintentionally. By adding `scope` to its declaration, we force it to be scoped to the structs lifetime, no matter how it's accessed. If an RC'd struct is heap-allocated, but one of its members points to the stack, how is it ever safe to escape it? Why shouldn't the heap variable be treated as scoped too, inheriting the most restricted scope of any of its members? To me, the question is not how you can know that a member is scoped, so much as how you could know that it *isn't* scoped, i.e. that a sub-pointer was global while the parent was local. I think it would require a very complicated type system: struct S { T* p; } // note the elaborate new return signature T* fun(return!(S.p) S x) { return x.p; } T* run() { S s; s.p = new T; // s local, s.p global return fun(s); } The above is technically safe, but the question is whether it's too complicated for the benefit. In the absence of such a complicated system, the safe default is to assume a struct is always as scoped as its most scoped member (i.e. transitive scoping). Your idea of `scope` members would only be valid in the absence of this safe default. But even then it would be of limited usefulness, because it would prevent all uses of global references in those members, even if the parent was global. For me, it comes down to that you can't know if anything is global or local until you define an instance of it, which you can't do in the struct definition.
Re: `return const` parameters make `inout` obsolete
On Monday, 16 March 2015 at 16:49:36 UTC, ketmar wrote: having argument modifier that changes function return type is very surprising regardless of how much people used to it. really, why should i parse *arguments* to know the (explicitly specified!) *return* *type*? it's ok with `auto`, it's ok with `inout`, but when i wrote `char *`, i want `char *`. and then compiler decides that it knows better. ok, compiler, you win, can you write the rest of the code for me? no? stupid compiler! I feel like you're reacting more to Change than to my actual point. `inout` wasn't invented because it looks good, but because it solves the DRY problem for different input types. `return` parameters also solve that problem, plus a few more, and with less DRY even than `inout`. I don't think either type if signature is that hard to read. It's just a matter of getting used to them.
Re: `return const` parameters make `inout` obsolete
On Monday, 16 March 2015 at 16:22:46 UTC, Marc Schütz wrote: On Monday, 16 March 2015 at 14:17:58 UTC, Zach the Mystic wrote: char* fun(return const char* x); Compiler has enough information to adjust the return type of `fun()` to that of input `x`. This assumes return parameters have been generalized to all reference types. Destroy. That's a very interesting observation. I never liked the name `inout`, as it doesn't describe what it does. The only downside I see is that it's more magic, because nothing on the return type says its mutability is going to depend on an argument. I think Kenji also had additional plans for `inout`, related to uniqueness. There was a PR. Better ask him whether it's going to be compatible. `return` would work just as well for uniqueness. inout(T*) fun(inout(T*) x); - T* fun(return const T* x); I don't think any information is being lost. My attitude is that unless you are losing information, the underlying logic won't be any harder to implement. I think that `return` parameters are a building block of `inout`, but more useful because they can be used separately from it. Perhaps in the early days of D, it just seemed too weird to have `return` parameters, but now with ref safety, they are better justified. But if they'd been there back then, `inout` probably wouldn't exist, since you can just build it in its current form from `const` and `return`.
Re: The next iteration of scope
On Monday, 16 March 2015 at 17:00:12 UTC, Marc Schütz wrote: BUt there is indeed still some confusion on my side. It's about the question whether `this` should implicitly be passed as `scope` or not. Because if it is, scope members are probably useless, because they are already implied. I think I should remove this suggestion, because it would break too much code (in @system). I always tend to think of member functions as if they weren't: struct S { T t; ref T fun() return { return t; } } In my head, I just translate fun() above to: ref T fun(return S* __this) { return __this.t; } Therefore whatever the scope of `__this`, that's the scope of the return, just like it would be for any other parameter. Then: S s; s.fun(); ... is really just `fun(s);` in disguise. That's why it's hard for me to grasp `scope` members, because they seem to me to be just as scope as their parent, whether global or local. How a member could be scope when the parent is global is hard for me to imagine.
Re: A few notes on choosing between Go and D for a quick project
On Monday, 16 March 2015 at 00:27:56 UTC, Walter Bright wrote: I like the analogy of D being a fully equipped machine shop, as opposed to a collection of basic hand tools. When I was younger it was hard working on my car, because I could not afford the right tools. So I made do with whatever was available. The results were lots of scrapes and bruises, much time invested, and rather crappy repairs. Now I can buy the right tools, and boy what a difference that makes! I can get professional quality results with little effort. I agree with this. However, it actually implies a huge amount about what I would call D's brand. The fully equipped machine shop metaphor has some very serious tradeoffs when applied to computer programming languages, the steep learning curve required to use the machines correctly, for instance. But I see advantage in this, because I can see a brand -- that is, an identity which distinguishes something from its rivals, not by flat-out superiority, but by its commitment to particulars -- for D here. I think D can market itself to a certain type of programmer, and win the language war by empowering this type of programmer, thereby inciting the envy of other types of programmers, who over time grudgingly concede the inferiority of their own styles and follow the herd. Brands are all about types of people, rather than of products. I would love to see D consciously embrace its own kind of person, and not just because it feels good, but because of its value as a marketing strategy. I see D attracting *really* good programmers, programmers from, let's say the 90-95th percentile in skill and talent in their field on average. By marketing to these programmers specifically -- that is, telling everyone that while D is for everyone, it is especially designed to give talented and experienced programmers the tools they need to get their work done -- even if you repel several programmers from, say, the 45th percentile or below in exchange for the brand loyalty of one from 92nd percentile or above, it's probably a winning strategy, because that one good programmer will get more done than all the rest combined.
Re: Smart references
On Saturday, 14 March 2015 at 15:55:51 UTC, Marc Schütz wrote: Are the suggested changes also related to the possibility of making `ref` a type? Are there plans to do this? I remember Walter suggested `ref` for non-parameters, i.e. local variables, but as a storage class, not a type modifier. I don't think there are plans per se, but if struct semantics are made powerful and flexible enough, I can imagine it being possible to simply recreate 'ref' parameters as 'Ref!' struct templates. For me, the question is what new additions would have to be added to structs to enable this. It seems like a good thought exercise, regardless of the final decision.
Re: The next iteration of scope
On Sunday, 15 March 2015 at 14:10:02 UTC, Marc Schütz wrote: Here's the new version of my scope proposal: http://wiki.dlang.org/User:Schuetzm/scope2 It's still missing real-life examples, a section on the implementation, and a more formal specification, as well as a discussion of backwards compatibility. But I thought I'd show what I have, so that it can be discussed early on. I hope it will be more digestible for Walter Andrei. It's more or less an extended version of DIP25, and avoids the need for most explicit annotations. It's great to see your design evolving like this. BIG plus for `scope` by default in @safe code -- this makes the proposal much more attractive than the alternative. Functions and methods can be overloaded on scope. This allows efficient passing of RC wrappers for instance... How does the compiler figure out which of the variables it's passing to the parameters are `scope` or not? Does the caller try the scoped overloads first by default, and only if there's an error tries the non-scoped overloads? If so, what causes the error? To specify that the value is returned through another parameter, the return!ident syntax can be used... struct RC(T) if(is(T == class)) { scope T payload; T borrow() return {// `return` applies to `this` return payload; } } The example contains no use of `return!ident`. Also, what exactly does the `scope` on T payload get you? Is it just a more specific version of `return` on the this parameter, i.e. `return this.payload`? Why would you need that specificity? What is the dangerous operation it is intended to prevent?
Re: Smart references
On Wednesday, 11 March 2015 at 20:33:07 UTC, Andrei Alexandrescu wrote: I'm investigating D's ability to define and use smart references. Per the skeleton at http://dpaste.dzfl.pl/9d752b1e9b4e, lines: #6: You can't default-initialize a ref. #7: You can't copy a ref - copying should mean copying the object itself. #9: Per this example I'm hooking a reference with an Owner. The reference hooks calls to opAddRef and opRelease in the owner. #23: Assigning the reference really assigns the referred. #28: A reference is a subtype of ref T. Most operations against the reference will be automatically forwarded to the underlying object, by reference (ref is important here). As unittests show, things work quite nicely. There are a few things that don't: #70: Attempting to copy a reference fails on account of the disabled postblit. There should be a way to tell the compiler to automatically invoke alias this and create a copy of that guy. #81: Moving from a reference works by moving the Ref object. There should be a way to tell the compiler that moving should really move the payload around. There are a couple other issues not represented in the unittest, for example related to template deduction. In a perfect world, Ref would masquerade (aside from having a different layout, ctor, and dtor) as an lvalue of type T. But in fact I think solving the matters above would go a long way toward making smart references nicely usable. Although my example is centered on reference counting an owner, there are other uses of smart references. Are all these worth changing the language? Are the suggested changes also related to the possibility of making `ref` a type? I have no opposition in principle to expanding struct semantics to be as transparent as possible. But then I ask, what prevented them from being expanded until now? From reading these forums, I've learned that C++ reference types have a lot of problems. Does expanding the semantics of structs run the risk of encountering same sorts of problems C++ references have? The ideal is to find a way to add semantics without adding ambiguity (i.e. to make sure both the compiler and the programmer always choose the right interpretation of a given construct). So, for example, if you pass a `Ref!X` type to a type `X` parameter, or you pass an `X` type to a `Ref!X` parameter, the result is easy for both the compiler and the human to figure out. That's all I've got.
Re: Two suggestions for safe refcounting
On Friday, 6 March 2015 at 14:40:31 UTC, Volodymyr wrote: On Friday, 6 March 2015 at 07:46:13 UTC, Zach the Mystic wrote: ... Note how the last member, opIndex, doesn't return a raw E*, but only an E* which is paired with a pointer to the same RCData instance as the RCArray is: struct RCElement(E) { E* element; private RCData* data; this(this) { data.addRef(); } ~this() { data.decRef(); } } This is the best I could do. It's needed to change type of this from RCArray to tuple!(RCArray, RCData). But as for me better to use Array and cahnge typeof(this) to RefCounter!Array: assert(typeid(typeof(this)) == typeid(RefCounter!Array)); So how to deal with it: struct RefCounter(T) // this is struct! { void opAddRef(); void opRelease(); alias this = __data; void[] allocate(size_t) // Hendler for sharing owned resources auto opShareRes(MemberType)(ref MemberType field) { return makeRefCounter(field, __count); } private: size_t __count; T __data; } @resource_owner(RefCounter) class Array { ref int opIndex(size_t i) return { return _data[i]; } opIndex will be replaced with this function //RefCounter!int opIndex(size_t i) // @return? //{ //assert(typeid(this) == typeid(RefCounter!Array)); //return this.opShareResource(_data[i]); //// after inlining: return makeRefCounter(_data[i], __count); //} private int[] _data; } Method opShareRes is to move resources away(share with other owner) and an @return method will change its return type to opSharedRes return type. opShareRes also wraps access to public fields(and may change type of result). Now Array is actualy alias to RefCounter!Array. Array creation is special case. new Array have to use RefCounter!Array.allocate. So owner manage array parts sharing, allocation and removing. Options for @resource_owner @resource_owner(this) - class provides opAddRef/opRelease/opShareRes by itself as in DIP74 @resource_owner(this, MyRCMixin) - MyRCMixin provides opAddRef/opRelease/opShareRes and will be mixed in class.(What DIP74 has in mind) @resource_owner(Owner) - Owner is a template. Whenever you use owned type T it will be replaced with Owner!T(even type of this). This case prohibits changing owning strategy. You've packed a lot of ideas into one post. Your solution might work, but it's hard for me to tell. Resourse owning is close to memory management. Maybe resource owner have to set memory allocation strategy instead of providing method allocate. This is an open question. I'm still wrestling with understanding all the interlocking systems. The only reason I keep exploring them is that sometimes it seems like nobody else understands them either. ^_^
Re: Two suggestions for safe refcounting
On Friday, 6 March 2015 at 14:59:46 UTC, monarch_dodra wrote: struct RCArray(E) { E[] array; int* count; ... } auto x = RCArray([E()]); E* t = x[0]; But taking that address is unsafe to begin with. Do arguably, this isn't that big of a problem. Taking the address is only really unsafe (in a non-RC'd type) if you don't have a lifetime tracking system. As long as the lifetime of the address taker is shorter than the address of the takee, it's not inherently unsafe. Whether D will end up with such a system is a different question. But I still think there's value in having a separate RCData type, because you can save one pointer per instance of RCArray. Right now, if you take a slice of an RCArray, your working array might not start at the same place as the reserved memory array. Therefore you need to keep a pointer to the reserved memory in addition to your active working array. If the counter and the pointer to the original memory are in the same place, one pointer will get you both. I think the idea is worth exploring. Your first dual reference issue seems much more problematic, as there are always cases the compiler can't catch. How so? If all we're talking about is RC'd types, the compiler can catch everything. I think the greater concern is that the workarounds will take a toll in runtime performance. I'll try to illustrate: void fun(ref RCStruct a, ref RCStruct b); auto x = new RCStruct; fun(x, x); This wouldn't be safe. If fun() contained a line a = new RCStruct;, b will point to deleted memory for the rest of the function. The normal way to protect this to make sure there's another reference: auto y = x; fun(x,x); This is actually safe, because y bumps the reference counter to 2 when initialized, which is enough to cover all possible reassignments of x. The compiler could do this automatically. It could detect that the parameter x aliases itself and create a temporary copy of x. But it would mean the runtime performance cost of the copy and postblit and destructor call. So D probably can't invest in that strategy, since the programmer should have a choice about it. So it's not about it being impossible to deal with the safety problems here, just that the runtime cost is too high. But there are some ways out. If the given type has no postblit, for example (or opAddRef for classes), there's no reason to mark the operation unsafe, since you know it's not reference counted. Also, const parameters are safe and won't be affected.
Re: RCArray is unsafe
On Thursday, 5 March 2015 at 18:41:31 UTC, deadalnix wrote: Kind of OT, but your train of thought is very difficult to follow the way you are communicating (ie by updating on previous post by answering to yourself). Could you post some more global overview at some point, so one does not need to gather various information for various posts please ? Okay. I seem to be mixing my more well-thought out ideas with ideas I get on the spur of the moment. Then they come out in a jumble. I have to confess that a lot of my ideas just pop into my head. Did you want me to talk about how I would do ownership with my reference safety system?
Re: RCArray is unsafe
On Wednesday, 4 March 2015 at 18:05:52 UTC, Zach the Mystic wrote: On Wednesday, 4 March 2015 at 17:22:15 UTC, Steven Schveighoffer wrote: Again, I think this is an issue with the expectation of RCArray. You cannot *save* a ref to an array element, only a ref to the array itself, because you lose control over the reference count. What you need is a special RCSlave type, which is reference counted not to the type of its *own* data, but to its parent's. In this case, a RCArraySlave!(T) holds data of type T, but a pointer to an RCArray, which it decrements when it gets destroyed. This could get expensive, with an extra pointer per instance than a regular T, but it would probably be safe. A way to do this is to have a core RCData type which has the count itself and the chunk of memory the count refers to in type ambiguous form: struct RCData { int count; // the point is that RCData can be type ambiguous void[] chunk; this(size_t size) { chunk = new void[size]; count = 0; } void addRef() { ++count; } void decRef() { if (count --count == 0) delete chunk; } } Over top of that you create a basic element type which refcounts an RCData rather than itself: struct RCType(E) { E element; RCType* data; this(this) { data.addRef(); } ~this() { data.decRef(); } [...etc...] } Then you have an RCArray which returns RCType elements when indexed rather than naked types: struct RCArray(E) { E[] array; private RCData* data; RCElement!E opIndex(size_t i) return { return RCElement!E(array[start + i], data); } this(E[] a) { data = new RCData(a * sizeof(a)); array = cast(E[]) data.chunk; } this(this) { data.addRef(); } ~this() { data.decRef(); } //... } This might work. The idea is to only leak references to types which also have pointers to the original data.
Two suggestions for safe refcounting
As per deadalnix's request, a summary of my thoughts regarding the thread RCArray is unsafe: It's rather easy to guarantee memory safety from the safe confines of a garbage collected system. Let's take this as a given. It's much harder when you step outside that system and try to figure out when it is or isn't safe to delete memory. It shouldn't be too surprising, therefore, that there are lots of pitfalls. Reference counting is a lonely outpost in the wilderness which is otherwise occupied by manual memory management. It's the only alternative to chaos. But the walls protecting this outpost are easily breached by any dangling reference which is not accounted for. We have seen two instances of how this can occur. The first, when boiled down to its essence, is that there is no corresponding bump in the reference count for a parameter which can alias an existing reference: void fun(ref RCStruct a, ref RCStruct b); RCStruct c; fun(c,c); // c aliases itself void gun(ref RCStruct a); static RCStruct d; gun(d); // d aliases global d Because the workarounds are easy: { RCStruct c; auto tmp = c; fun(c,tmp); auto tmp2 = d; gun(tmp2); } ...it seems okay to mark these rare violations @system. The second, harder problem, is when you take a reference to a subcomponent of an RC'd type, e.g. an individual E of an RCArray of E: struct RCArray(E) { E[] array; int* count; ... } auto x = RCArray([E()]); E* t = x[0]; Here's the problem. If x is assigned to a different RCArray, the one t points to will be deleted. On the other hand, if some special logic allows the definition of t to increment the reference count, then you have a memory leak, because t is not designed to keep track of x's original counter. I don't know if we can get out of this mess. My suggestion represents a best-effort attempt. The only way I can see out of this problem is to redesign RCArray. The problem with RCArray is that it owns the data it references. If a type different from RCArray, i.e. an individual E* into the array of E[], tries to reference the data, it's stuck, because it's not an RCArray!E. Therefore, you need to separate out the core data from the different types that can point to it. The natural place would be right next to its reference counter, in a separate struct: struct RCData { int count = 0; void[] chunk; this(size_t size) { chunk = new void[size]; } void addRef() { ++count; } void decRef() { if (--count == 0) delete chunk; } } Now RCArray can be redesigned to point to an RCData type. All new RC types will also contain a pointer to an RCData instance: struct RCArray(E) { E[] array; private RCData* data; this(E[] a) { data = new RCData(a * sizeof(a)); data.chunk = cast(void[]) a; array = a; } this(this) { data.addRef(); } ~this() { data.decRef(); } ref RCElement!E opIndex(size_t i) return { return RCElement!E(array[i], data); } ... } Note how the last member, opIndex, doesn't return a raw E*, but only an E* which is paired with a pointer to the same RCData instance as the RCArray is: struct RCElement(E) { E* element; private RCData* data; this(this) { data.addRef(); } ~this() { data.decRef(); } } This is the best I could do.
Re: RCArray is unsafe
On Wednesday, 4 March 2015 at 17:13:13 UTC, Zach the Mystic wrote: (Also, `pure` functions will need no `static` parameter attributes, and functions both `pure` and `@nogc` will not need ) ...will not need `@noscope` either.
Re: RCArray is unsafe
On Wednesday, 4 March 2015 at 09:06:01 UTC, Walter Bright wrote: On 3/4/2015 12:13 AM, deadalnix wrote: The #1 argument for DIP25 compared to alternative proposal was its simplicity. I assume at this point that we have empirical evidence that this is NOT the case. The complexity of a free list doesn't remotely compare to that of adding an ownership system. My reference safety system has ownership built in, more-or-less for free: http://forum.dlang.org/post/offurllmuxjewizxe...@forum.dlang.org See also my reply to deadalnix: http://forum.dlang.org/post/oyaoibmwybzfkhhuf...@forum.dlang.org
Re: RCArray is unsafe
On Wednesday, 4 March 2015 at 07:50:50 UTC, Manu wrote: Well you can't get to a subcomponent if not through it's owner. If the question is about passing RC objects members to functions, then the solution is the same as above, the stack needs a reference to the parent before it can pass a pointer to it's member down the line for the same reasons. Yeah, or you could mimic such a reference by wrapping the call in an addRef/release cycle, as a performance optimization. The trouble then is what if that member pointer escapes? Well I'd imagine that it needs to be a scope pointer (I think we all agree RC relies on scope). So a raw pointer to some member of an RC object must be scope(*). I have a whole Reference Safety System which doesn't need explicit scope because it incorporates it implicitly: http://forum.dlang.org/post/offurllmuxjewizxe...@forum.dlang.org That it can't escape, combined with knowledge that the stack has a reference to it's owner, guarantees that it won't disappear. I think you and I are on the same page.
Re: RCArray is unsafe
On Wednesday, 4 March 2015 at 08:13:33 UTC, deadalnix wrote: On Wednesday, 4 March 2015 at 03:46:36 UTC, Zach the Mystic wrote: That's fine. I like DIP25. It's a start towards stronger safety guarantees. While I'm pretty sure the runtime costs of my proposal are lower than yours, they do require compiler hacking, which means they can wait. I don't think that it is fine. At this point we need to : - Not free anything as long as something is alive. - Can't recycle memory. - Keep track of allocated chunk to be able to free them (ie implementing malloc on top of malloc). Well, I don't want to make any enemies. I thought that once the compiler was hacked people could just change their deferred-freeing code.
Re: RCArray is unsafe
On Wednesday, 4 March 2015 at 08:13:33 UTC, deadalnix wrote: On Wednesday, 4 March 2015 at 03:46:36 UTC, Zach the Mystic wrote: That's fine. I like DIP25. It's a start towards stronger safety guarantees. While I'm pretty sure the runtime costs of my proposal are lower than yours, they do require compiler hacking, which means they can wait. I don't think that it is fine. At this point we need to : - Not free anything as long as something is alive. - Can't recycle memory. - Keep track of allocated chunk to be able to free them (ie implementing malloc on top of malloc). It means that RC is attached to an ever growing arena. Code that would manipulate RCArray and append to it on a regular manner must expect some impressive memory consumption. Even if we manage to do this in phobos (I'm sure we can) it is pretty much guaranteed at this point that noone else will, at least safely. The benefit is reduced because of the bookeeping that need to be done for memory to be freed in addition to reference count themselves. The #1 argument for DIP25 compared to alternative proposal was its simplicity. I assume at this point that we have empirical evidence that this is NOT the case. To me, DIP25 is just the first step towards an ownership system. The only language additions you need to it are out! parameters, to track escapes to other parameters, static parameters (previously called noscope), to say that the parameter won't be copied to a global, and one more function attribute (for which I can reuse noscope as @noscope) which says the return value will nto be allocated on the heap. All of these will be rare, as they aim to target the exceptional cases rather than the norms (scope would be the norm. Hence @noscope to target the rare cases): Examples: T* fun(return T* a, T* b, T**c); This signature would indicate complete ownership transferred from `a` to the return value, since only `a` can be returned (see why below) T* gun(return out!b T* a, T** b); `a` is declared to be copied both to the return value and to `b`. Therefore it is not owned. (If you're following my previous definition of `out!` in DIP71, you'll notice I moved `out!` to the source parameter rather than the target, but the point is the same.) T* hun(return T* a) @noscope { if(something) return a; else return new T; } Again, no ownership. If you *might* return a heap or global, the function must be marked @noscope (Again I've readapted the word to a new meaning from dIP71. I'm using `static` now for `noscope's original meaning.) Another example: T* jun(return static T* a) { static T* t; t = a; return a; } Again, no ownership, because of the `static` parameter attribute. In a previous post, you suggested that such an attribute was unnecessary, but an ownership system would require that a given parameter `a` which was returned, not also be copied to a global at the same time. So `static` tells the compiler this, and thus cancels ownership. My point is that DIP25's `return` parameters are the beginning of an ownership system. An option to specify that the function *will* return a given `return` parameter as opposed to *might* return it is the only thing needed. Hence the additions named above. (Also, `pure` functions will need no `static` parameter attributes, and functions both `pure` and `@nogc` will not need ) With the exception of some minor cosmetic changes, all this is in, or at least hinted at, in my previously posted Reference Safety System: http://forum.dlang.org/post/offurllmuxjewizxe...@forum.dlang.org The only thing which bears reiterating is that with better attribute inference, the whole system becomes invisible for most uses.
Re: RCArray is unsafe
On Wednesday, 4 March 2015 at 17:22:15 UTC, Steven Schveighoffer wrote: Again, I think this is an issue with the expectation of RCArray. You cannot *save* a ref to an array element, only a ref to the array itself, because you lose control over the reference count. What you need is a special RCSlave type, which is reference counted not to the type of its *own* data, but to its parent's. In this case, a RCArraySlave!(T) holds data of type T, but a pointer to an RCArray, which it decrements when it gets destroyed. This could get expensive, with an extra pointer per instance than a regular T, but it would probably be safe.
Re: RCArray is unsafe
On Wednesday, 4 March 2015 at 18:05:52 UTC, Zach the Mystic wrote: On Wednesday, 4 March 2015 at 17:22:15 UTC, Steven Schveighoffer wrote: Again, I think this is an issue with the expectation of RCArray. You cannot *save* a ref to an array element, only a ref to the array itself, because you lose control over the reference count. What you need is a special RCSlave type, which is reference counted not to the type of its *own* data, but to its parent's. In this case, a RCArraySlave!(T) holds data of type T, but a pointer to an RCArray, which it decrements when it gets destroyed. This could get expensive, with an extra pointer per instance than a regular T, but it would probably be safe. Another solution is to get compiler help. If you know the lifetime of a sub-reference `p.t` to be shorter than of its Rc'd parent `p`, the compiler can wrap its `p.t's lifetime in an addRef/release cycle for P. This works in calling a function: fun(p, p.t); Let's say that you know that `p.t` won't escape (a different question). The compiler doesn't need to know about `p.t` to wrap the whole function like this: p.opAddRef(); // or equivalent fun(p, p.t); p.opRelease(); It just needs to know that `p.t's lifetime is shorter than `p's.
Re: RCArray is unsafe
On Wednesday, 4 March 2015 at 18:17:41 UTC, Andrei Alexandrescu wrote: Yah, this is a fork in the road: either we solve this with DIP25 + implementation, or we add stricter static checking disallowing two lent references to data in the same scope. The third solution is to keep track of lifetimes, recognize refcounted types for structs the same as suggested for classes in DIP74, and wrap the lifetime of the subreference `t.s` in an opAdd/Release cycle for `t`, as illustrated in my other reply. You could have the compiler recognize a refcounted struct by simply declaring void opAddRef(); and void opRelease();, with the compiler automatically aliasing them to this(this) and ~this.
Re: RCArray is unsafe
On Wednesday, 4 March 2015 at 17:13:13 UTC, Zach the Mystic wrote: Another example: T* jun(return static T* a) { static T* t; t = a; return a; } Again, no ownership, because of the `static` parameter attribute. In a previous post, you suggested that such an attribute was unnecessary, but an ownership system would require that a given parameter `a` which was returned, not also be copied to a global at the same time. So `static` tells the compiler this, and thus cancels ownership. Actually, I think you convinced me before that `static` (or `noscope`) parameters wouldn't carry their weight. Instead, copying a parameter reference to a global variable is unsafe by default. Wrap it in a `@trusted` lambda if you know what you're doing. (Trusted lambdas are assumed to copy no reference parameters.) In this way, you can assume ownership. Any unsafe global escapes are just ignored. ???
Re: RCArray is unsafe
On Wednesday, 4 March 2015 at 19:22:25 UTC, Zach the Mystic wrote: On Wednesday, 4 March 2015 at 18:17:41 UTC, Andrei Alexandrescu wrote: Yah, this is a fork in the road: either we solve this with DIP25 + implementation, or we add stricter static checking disallowing two lent references to data in the same scope. The third solution is to keep track of lifetimes, recognize refcounted types for structs the same as suggested for classes in DIP74, and wrap the lifetime of the subreference `t.s` in an opAdd/Release cycle for `t`, as illustrated in my other reply. You could have the compiler recognize a refcounted struct by simply declaring void opAddRef(); and void opRelease();, with the compiler automatically aliasing them to this(this) and ~this. I'm sorry, I just realized this proposal is too complicated, and it wouldn't even work. I think stricter static checking in @safe code is the way to go. When passing a global RC type to an impure, or duplicating the same RC reference variable in a function call, it's unsafe. The workaround is to make copies and use them: static RcType s; // global RcType c; // Instead of: func(s); func(c, c); // ...do this: auto tmp = s; // get stack reference func(tmp); auto d = c; // copy Rc'd type func(c, d); Expensive, perhaps, but safe.
Re: RCArray is unsafe
On Tuesday, 3 March 2015 at 05:12:15 UTC, Walter Bright wrote: On 3/2/2015 6:04 PM, weaselcat wrote: On Tuesday, 3 March 2015 at 01:56:09 UTC, Walter Bright wrote: On 3/2/2015 4:40 PM, deadalnix wrote: After moving resources, the previous owner can no longer be used. How does that work with the example presented by Marc? He couldn't pass s and a member of s because s is borrowed as mutable. He would have to pass both as immutable. A pointer to s could be obtained otherwise and passed. Under normal circumstances, if the pointer to s is an lvalue, the refcount will be bumped when it is taken. Isn't the only problem now aliasing something (i.e. a global) invisibly through a parameter? This is easily solved -- when passing a global reference, or duplicating a variable in the same call, wrap the call in an add/release cycle. This preserves the alias for the duration of the call. Or are we also talking about taking the address of a non-rc'd subcomponent of an rc'd struct?
Re: RCArray is unsafe
On Tuesday, 3 March 2015 at 08:04:25 UTC, Manu wrote: My immediate impression on this problem: s.array[0] is being passed to foo from main. s does not belong to main (is global), and main does not hold have a reference to s.array. Shouldn't main just need to inc/dec array around the call to foo when passing un-owned references down the call tree. It seems to me that there always needs to be a reference _somewhere_ on the stack for anything being passed down the call tree (unless the function is pure). Seems simplest to capture a stack ref at the top level, then as it's received as arguments to each callee, it's effectively owned by those functions and they don't need to worry anymore. So, passing global x to some function; inc/dec x around the function call that it's passed to...? Then the stack has its own reference, and the global reference can go away safely. This is my position too. There is another problem being discussed now, however, having to do with references to non-rc'd subcomponents of an Rc'd type.
Re: RCArray is unsafe
On Monday, 2 March 2015 at 22:58:19 UTC, Walter Bright wrote: Pretty dazz idea, dontcha think? And DIP25 still stands unscathed :-) Unless, of course, we missed something obvious. I was dazzed, but I'm not anymore. I wrote my concern here: http://forum.dlang.org/post/ylpaqhnuiczfgfpqj...@forum.dlang.org
Re: RCArray is unsafe
On Tuesday, 3 March 2015 at 16:31:07 UTC, Andrei Alexandrescu wrote: I was dazzed, but I'm not anymore. I wrote my concern here: http://forum.dlang.org/post/ylpaqhnuiczfgfpqj...@forum.dlang.org There's a misunderstanding here. The object being assigned keeps a trailing list of past values and defers their deallocation to destruction. -- Andrei So you need an extra pointer per instance? Isn't that a big price to pay? Is the only problem we're still trying to solve aliasing which is not recognized as such and therefore doesn't bump the refcounter like it should? An extra pointer would be overkill for that. Isn't it better to just recognize the aliasing when it happens? As far as taking the address of an RcArray element, the type of which element is not itself Rc'ed, it's a different problem. The only thing I've been able to come up with is maybe to create a wrapper type within RcArray for the individual elements, and have that type do refcounting on the parent instead of itself, if that's possible.
Re: RCArray is unsafe
On Tuesday, 3 March 2015 at 17:40:59 UTC, Marc Schütz wrote: All instances need to carry a pointer to refcount anyway, so the freelist could just be stored next to the refcount. The idea of creating that list, however, is more worrying, because it again involves allocations. It can get arbitrarily long. If the last RcType is a global, will the list ever get freed at all? No, Andrei's proposed solution would take care of that. On assignment to RCArray, if the refcount goes to zero, the old array is put onto the cleanup list. But there can still be borrowed references to it's elements. However, these can never outlive the RCArray, therefore it's safe to destroy all of the arrays in the cleanup list in the destructor. Wouldn't you need a lifetime system for this? A global, for example, couldn't borrow safely. I'm all in favor of an ownership/borrowing system, but that would be for a different DIP, right? It seems like taking the address of a sub-element of an RcType is inherently unsafe, since it separates the memory from the refcount.
Re: RCArray is unsafe
On Tuesday, 3 March 2015 at 18:48:36 UTC, Andrei Alexandrescu wrote: On 3/3/15 9:00 AM, Zach the Mystic wrote: On Tuesday, 3 March 2015 at 16:31:07 UTC, Andrei Alexandrescu wrote: I was dazzed, but I'm not anymore. I wrote my concern here: http://forum.dlang.org/post/ylpaqhnuiczfgfpqj...@forum.dlang.org There's a misunderstanding here. The object being assigned keeps a trailing list of past values and defers their deallocation to destruction. -- Andrei So you need an extra pointer per instance? Yah, or define your type to be single-assignment (probably an emerging idiom). You can massage the extra pointer with other data thus reducing its cost. Isn't that a big price to pay? Is the only problem we're still trying to solve aliasing which is not recognized as such and therefore doesn't bump the refcounter like it should? An extra pointer would be overkill for that. Isn't it better to just recognize the aliasing when it happens? It's all tradeoffs. This has runtime overhead. Isn't allocating and collecting a freelist also overhead? A static analysis would have the challenges of being permissive enough, cheap enough, not add notational overhead, etc. etc. It's certainly permissive: you can do anything, and compiler wraps uncertain operations with add/release cycles automatically. These are: passing a global as a mutable reference to an impure function; aliasing the same variable in two parameters with itself. The unoptimized lowerings would be: { auto tmp = myGlobal; // bump count impureFun(myGlobal); } // tmp destroyed, --count { auto tmp2 = c; // bump count fun(c, c); } // --count The only addition is an optimization where the compiler elides the assignments and calls the add/release cycles directly.
Re: RCArray is unsafe
On Wednesday, 4 March 2015 at 03:46:36 UTC, Zach the Mystic wrote: Just my own past posts. My suggestion is based on the compiler doing all the work. I don't know how it could be tested without hacking the compiler. I think that part of the fear of my idea is that I want structs to get some of the behavior suggested in DIP74 for classes, i.e. the compiler inserts calls to opAddRef/opRelease on its own at certain times. Since structs only have postblits and destructors, there's no canonical way to call them as separate functions. The behavior I'm suggesting would only be good if you had a refcounted type, which means it's superfluous if not harmful to insert it just because in other types of structs. If it turns out that some of the behavior desirable for refcounted classes is useful for structs too, it may be necessary to hint to the complier that a struct is indeed of the refcounted type. For example, void opAddRef(); and void opRelease(); could be specially recognized, with no definitions even permitted (error on attempt), implying alias opAddRef this(this);, alias opRelease ~this;.
Re: RCArray is unsafe
On Tuesday, 3 March 2015 at 21:37:20 UTC, Andrei Alexandrescu wrote: On 3/3/15 12:35 PM, Zach the Mystic wrote: Isn't allocating and collecting a freelist also overhead? No. I don't have time now for a proof of concept and it seems everybody wants to hypothesize about code that doesn't exist instead of writing code and then discussing it. Okay. The unoptimized lowerings would be: { auto tmp = myGlobal; // bump count impureFun(myGlobal); } // tmp destroyed, --count { auto tmp2 = c; // bump count fun(c, c); } // --count The only addition is an optimization where the compiler elides the assignments and calls the add/release cycles directly. Do you have something reviewable, or just your own past posts? Just my own past posts. My suggestion is based on the compiler doing all the work. I don't know how it could be tested without hacking the compiler. For the time being I want to move forward with DIP25 and deferred freeing. That's fine. I like DIP25. It's a start towards stronger safety guarantees. While I'm pretty sure the runtime costs of my proposal are lower than yours, they do require compiler hacking, which means they can wait.
Re: My Reference Safety System (DIP???)
On Monday, 2 March 2015 at 22:00:56 UTC, deadalnix wrote: You don't put the ownership acquire at the same place, but that is the same idea. It is probably even better to do it your way (or is it ?). Yes. Unless the compiler detects that you duplicate a variable in two parameters in the same call, you literally have *no* added cycles, anywhere: fun(c, c.c); This is the only time you pay any penalty (except for passing globals, as we now realize, since all globals can alias themselves as parameters -- nasty).
Re: RCArray is unsafe
On Monday, 2 March 2015 at 20:54:20 UTC, Walter Bright wrote: On 3/2/2015 12:42 PM, Walter Bright wrote: For D structs, that means, if there's a postblit, a copy must be made. For D ref counted classes, a ref count increment must be done. I was hoping to avoid that, but apparently there's no way. There are cases where this can be avoided, like calling pure functions. Another win for pure functions! It seems like the most common use case for passing a global to a parameter is to process that global in a pure way. You already have access to it in an impure function, so you could just access it directly. The good news is that from within the passed-to function, no further calls will treat it as global.
Re: My Reference Safety System (DIP???)
On Monday, 2 March 2015 at 20:04:49 UTC, deadalnix wrote: I let the night go over that one. Here is what I think is the best road forward : - triggering postblit and/or ref count bump/decrease is prohibited on borrowed. - Acquiring and releasing ownership does. Now that we have this, let's get back to the exemple : class C { C c; // Make ti refconted somehow, doesn't matter. Andrei's proposal for instance. } void boom() { C c = new C(); c.c = new C(); foo(c, c.c); } void foo(ref C c1, ref C c2) { // Here is where things get different. c1 is borrowed, so you can't // do c1.c = null before acquiring c1.c beforehand. Right, I agree with this. That means the // compiler needs to get a local copy of c1.c, bump the refcount // to get ownership before executing c1.c = null and decrease // the refcount. Yeah, but should it do this inside foo() or in bump() right before it calls foo. I think in bump, and only for a parameter which might be aliased by another parameter (an extremely rare case). For any other case, the refcount has already been preserved: void boom() { C c = new C(); // refcount(c) == 1 c.c = new C(); // refcount(c.c) == 1 auto d = c.c; // refcount(c.c) == 2 now foo(c, d); // safe } The only problem is the rare case when the exact same identifier is getting sent to two different parameters. I'm sure there will be opportunities to elide a lot of refcount calls, but in this case, I don't see much to left to elide.
Re: My Reference Safety System (DIP???)
On Tuesday, 3 March 2015 at 00:02:48 UTC, deadalnix wrote: What do you think? How many times do you normally pass a global? I fail too see how t being global vs t being a local that is doubly passed change anything. Within the function, the global passed as a parameter creates an alias to the global. Fortunately, Andrei Fermat may have just solved the issue: http://forum.dlang.org/post/md2pub$nqn$1...@digitalmars.com
Re: RCArray is unsafe
On Monday, 2 March 2015 at 20:37:46 UTC, Walter Bright wrote: On 3/1/2015 12:51 PM, Michel Fortin wrote: That's actually not enough. You'll have to block access to global variables too: S s; void main() { s.array = RCArray!T([T()]); // s.array's refcount is now 1 foo(s.array[0]); // pass by ref } void foo(ref T t) { s.array = RCArray!T([]); // drop the old s.array t.doSomething(); // oops, t is gone } So with Andrei's solution, will s.array ever get freed, since s is a global? I guess it *should* never get freed, since s is a global and it will always exist as a reference. Which makes me think about a bigger problem... when you opAssign, don't you redirect the variable to a different instance? Won't the destructor then destroy *that* instance (or not destroy it, since it just got a +1 count) instead of the one most recently decremented? How does it hold onto the instance to be destroyed?
Re: My Reference Safety System (DIP???)
On Monday, 2 March 2015 at 22:51:29 UTC, deadalnix wrote: On Monday, 2 March 2015 at 22:21:11 UTC, Zach the Mystic wrote: On Monday, 2 March 2015 at 22:00:56 UTC, deadalnix wrote: You don't put the ownership acquire at the same place, but that is the same idea. It is probably even better to do it your way (or is it ?). Yes. Unless the compiler detects that you duplicate a variable in two parameters in the same call, you literally have *no* added cycles, anywhere: fun(c, c.c); This is the only time you pay any penalty (except for passing globals, as we now realize, since all globals can alias themselves as parameters -- nasty). Global simply are parameter implicitly passed to all function from a theoretical perspective. There are no reason to thread them differently. Except for this: static Rctype t; // fun(t); Now you have that implicit parameter which screws things up. It's like calling: fun(@globals, t); ...where @globals is a namespace which can alias t. So you have two parameters which can alias each other. I think the only saving grace is that you probably don't really need to pass a global that often, since you already have it if you want it. Only if you want the global to play the role of a parameter. What do you think? How many times do you normally pass a global?
Re: RCArray is unsafe
On Tuesday, 3 March 2015 at 01:23:24 UTC, Zach the Mystic wrote: Which makes me think about a bigger problem... when you opAssign, don't you redirect the variable to a different instance? Won't the destructor then destroy *that* instance (or not destroy it, since it just got a +1 count) instead of the one most recently decremented? How does it hold onto the instance to be destroyed? auto y = new RcStruct; y = null; y's old RcStruct gets decremented to zero, but who owns it now? Whose destructor ever gets run on it?
Re: RCArray is unsafe
On Monday, 2 March 2015 at 22:58:19 UTC, Walter Bright wrote: His insight was that the deletion of the payload occurred before the end of the lifetime of the RC object, and that this was the source of the problem. If the deletion of the payload occurs during the destructor call, rather than the postblit, RcArray, a struct, already does this. You wouldn't delete in a postblit anyway would you? Do you need opRelease and ~this to be separate for structs too?
Re: RCArray is unsafe
On Monday, 2 March 2015 at 22:58:19 UTC, Walter Bright wrote: I.e. the postblit manipulates the ref count, but does NOT do payload deletions. The destructor checks the ref count, if it is zero, THEN it does the payload deletion. Pretty dazz idea, dontcha think? And DIP25 still stands unscathed :-) Unless, of course, we missed something obvious. Add me to we. I'm dazzed! :-)
Re: RCArray is unsafe
On Monday, 2 March 2015 at 22:58:19 UTC, Walter Bright wrote: His insight was that the deletion of the payload occurred before the end of the lifetime of the RC object, and that this was the source of the problem. If the deletion of the payload occurs during the destructor call, rather than the postblit, then although the ref count of the payload goes to zero, it doesn't actually get deleted. I.e. the postblit manipulates the ref count, but does NOT do payload deletions. The destructor checks the ref count, if it is zero, THEN it does the payload deletion. I guess you also mean opAssigns -- they would manipulate refcounts too right? In fact, they would be the primary means of decrementing the refcount *apart* from the destructor, right?
Re: RCArray is unsafe
On Tuesday, 3 March 2015 at 00:05:50 UTC, Zach the Mystic wrote: I guess you also mean opAssigns -- they would manipulate refcounts too right? In fact, they would be the primary means of decrementing the refcount *apart* from the destructor, right? Nevermind. I was a minute too soon with my post!
Re: RCArray is unsafe
On Monday, 2 March 2015 at 05:57:35 UTC, Walter Bright wrote: On 3/1/2015 12:51 PM, Michel Fortin wrote: That's actually not enough. You'll have to block access to global variables too: Hmm. That's not so easy to solve. But consider this. It's only an impure function which might alias a global. And since you already have access to the global in the impure function, there might be less incentive in general to pass it through a function. Other than that, you're stuck with a theoretical @impure!varName function attribute, for example, which tells the caller which globals are accessed.
Re: My Reference Safety System (DIP???)
On Monday, 2 March 2015 at 08:59:11 UTC, deadalnix wrote: On Monday, 2 March 2015 at 00:37:05 UTC, Zach the Mystic wrote: I'm sure many inc/dec can still be removed. Do you agree or disagree with what I said? I can't tell. Yes, but I think this is overly conservative. I'm arguing a rather liberal position: that only in a very exceptional case do you need to protect a variable for the duration of a function. For the most part, it's not necessary. What am I conserving?
Re: RCArray is unsafe
On Sunday, 1 March 2015 at 20:51:35 UTC, Michel Fortin wrote: On 2015-03-01 19:21:57 +, Walter Bright said: The trouble seems to happen when there are two references to the same object passed to a function. I.e. there can be only one borrowed ref at a time. I'm thinking this could be statically disallowed in @safe code. That's actually not enough. You'll have to block access to global variables too: S s; void main() { s.array = RCArray!T([T()]); // s.array's refcount is now 1 foo(s.array[0]); // pass by ref } void foo(ref T t) { s.array = RCArray!T([]); // drop the old s.array t.doSomething(); // oops, t is gone } What's the difference between that and this: void fun() { T[] ta = [T()].dup; T* t = ta[0]; delete ta; // or however you do it *t = ...; } Why is this a parameter passing issue and not a you kept a sub-reference to a deleted chunk issue?
Re: RCArray is unsafe
On Monday, 2 March 2015 at 15:22:33 UTC, Zach the Mystic wrote: void fun() { T[] ta = [T()].dup; T* t = ta[0]; I meant: T* t = ta[0];
Re: My Reference Safety System (DIP???)
On Sunday, 1 March 2015 at 14:40:54 UTC, Marc Schütz wrote: I don't think a callee-based solution can work: class T { void doSomething() scope; } struct S { RC!T t; } void main() { auto s = S(RC!T()); // `s.t`'s refcount is 1 T t = s.t; // borrowing from the RC wrapper foo(s); t.doSomething();// oops, `t` is gone } void foo(ref S s) { s.t = RC!T(); // drops the old `s.t` } I thought of this, and I disagree. The very fact of assigning to `T t` adds the reference count you need to keep `s.t` from disintegrating. As soon as you borrow, you increment the count.
Re: My Reference Safety System (DIP???)
On Monday, 2 March 2015 at 00:06:52 UTC, deadalnix wrote: On Sunday, 1 March 2015 at 23:56:02 UTC, Zach the Mystic wrote: On Sunday, 1 March 2015 at 14:40:54 UTC, Marc Schütz wrote: I don't think a callee-based solution can work: class T { void doSomething() scope; } struct S { RC!T t; } void main() { auto s = S(RC!T()); // `s.t`'s refcount is 1 T t = s.t; // borrowing from the RC wrapper foo(s); t.doSomething();// oops, `t` is gone } void foo(ref S s) { s.t = RC!T(); // drops the old `s.t` } I thought of this, and I disagree. The very fact of assigning to `T t` adds the reference count you need to keep `s.t` from disintegrating. As soon as you borrow, you increment the count. I'm sure many inc/dec can still be removed. Do you agree or disagree with what I said? I can't tell.
Re: Contradictory justification for status quo
On Saturday, 28 February 2015 at 23:03:23 UTC, Walter Bright wrote: On 2/28/2015 2:31 AM, bearophile wrote: Zach the Mystic: You can see exactly how D works by looking at how Kenji spends his time. For a while he's only been fixing ICEs and other little bugs which he knows for certain will be accepted. I agree that probably there are often better ways to use Kenji time for the development of D. Actually, Kenji fearlessly deals with some of the hardest bugs in the compiler that require a deep understanding of how the compiler works and how it is supposed to work. He rarely does trivia. I regard Kenji's contributions as invaluable to the community. I don't think anybody disagrees with this. Kenji's a miracle.
Re: My Reference Safety System (DIP???)
On Monday, 2 March 2015 at 00:37:05 UTC, Zach the Mystic wrote: On Monday, 2 March 2015 at 00:06:52 UTC, deadalnix wrote: I thought of this, and I disagree. The very fact of assigning to `T t` adds the reference count you need to keep `s.t` from disintegrating. As soon as you borrow, you increment the count. I'm sure many inc/dec can still be removed. Do you agree or disagree with what I said? I can't tell. I think I understand now. Yes, they can probably be optimized, but that's a different issue than whether you need to protect certain RC instances from the tyranny of a function call. My whole argument is that basically you don't. Only when you split pass directly in the call itself: fun(x,x), does this issue ever matter, and it's easy to deal with.
Re: Contradictory justification for status quo
On Sunday, 1 March 2015 at 11:30:52 UTC, bearophile wrote: Walter Bright: Actually, Kenji fearlessly deals with some of the hardest bugs in the compiler that require a deep understanding of how the compiler works and how it is supposed to work. He rarely does trivia. I regard Kenji's contributions as invaluable to the community. But my point was that probably there are even better things that Kenji can do in part of the time he works on D. I think this once again brings up the issue of what might be called The Experimental Space (for which std.experimental is the only official acknowledgment thus far). Simply put, there are things which it would be nice to try out, which can be conditionally pre-approved depending on how they work in real life. There are a lot of things which would be great to have, if only some field testing could verify that they aren't laden with show-stopping flaws. But these represent a whole middle ground between pre-approved, and rejected. The middle ground is fraught with tradeoffs -- most prominently that if the field testers find the code useful it becomes the de facto standard *even if* fatal flaws are discovered in the design. Yet if you tell people honestly, this may not be the final design, a lot fewer people will be willing to test it. The Experimental Space must have a whole different philosophy about what it is -- the promises you make, or more accurately don't make, and the courage you have to reject a bad design even when it is already being used in real-world code. Basically, the experimental space must claim tentatively approved for D, pending field testing -- and it must courageously stick to that claim. That might give Kenji the motivation to implement some interesting new approaches to old problems, knowing that even if in the final analysis they fail, they will at least get a chance to prove themselves first. (Maybe there aren't really that many candidates for this approach anyway, but I thought the idea should be articulated at least.)
Re: RCArray is unsafe
On Sunday, 1 March 2015 at 20:51:35 UTC, Michel Fortin wrote: On 2015-03-01 19:21:57 +, Walter Bright said: The trouble seems to happen when there are two references to the same object passed to a function. I.e. there can be only one borrowed ref at a time. I'm thinking this could be statically disallowed in @safe code. That's actually not enough. You'll have to block access to global variables too: S s; void main() { s.array = RCArray!T([T()]); // s.array's refcount is now 1 foo(s.array[0]); // pass by ref } void foo(ref T t) { s.array = RCArray!T([]); // drop the old s.array t.doSomething(); // oops, t is gone } Globals to impures, that is.
Re: RCArray is unsafe
On Sunday, 1 March 2015 at 15:44:49 UTC, Marc Schütz wrote: Walter posted an example implementation of a reference counted array [1], that utilizes the features introduced in DIP25 [2]. Then, in the threads about reference counted objects, several people posted examples [3, 4] that broke the suggested optimization of eliding `opAddRef()`/`opRelease()` calls in certain situations. A weakness of the same kind affects DIP25, too. The core of the problem is borrowing (ref return as in DIP25), combined with manual (albeit hidden) memory management. An example to illustrate: struct T { void doSomething(); } struct S { RCArray!T array; } void main() { auto s = S(RCArray!T([T()])); // s.array's refcount is now 1 foo(s, s.array[0]); // pass by ref } void foo(ref S s, ref T T) { s.array = RCArray!T([]); // drop the old s.array t.doSomething(); // oops, t is gone } Any suggestions how to deal with this? As far as I can see, there are the following options: See: http://forum.dlang.org/post/bghjqvvrdcfqmoiyy...@forum.dlang.org ...and: http://forum.dlang.org/post/cviwlkugnothraubc...@forum.dlang.org
Re: Making RCSlice and DIP74 work with const and immutable
On Sunday, 1 March 2015 at 01:40:40 UTC, Andrei Alexandrescu wrote: Tracing garbage collection can afford the luxury of e.g. mutating data that was immutable during its lifetime. Reference counting needs to make minute mutations to data while references to that data are created. In fact, it's not mutation of the useful data, the payload of a data structure; it's mutation of metadata, additional information about the data (i.e. a reference count integral). The RCOs described in DIP74 and also RCSlice discussed in this forum need to work properly with const and immutable. Therefore, they need a way to reliably define and access metadata for a data structure. One possible solution is to add a @mutable or @metadata attribute similar to C++'s keyword mutable. Walter and I both dislike that solution because it's hamfisted and leaves too much opportunity for abuse - people can essentially create unbounded amounts of mutable payload for an object claimed to be immutable. That makes it impossible (or unsafe) to optimize code based on algebraic assumptions. We have a few candidates for solutions, but wanted to open with a good discussion first. So, how do you envision a way to define and access mutable metadata for objects (including immutable ones)? I need to get educated on this issue. First suggestion: Just break the type system by encouraging the idiom of using casts in opAddRef and opRelease. It's too easy, but I don't know why.
Re: DIP74 updated with new protocol for function calls
On Sunday, 1 March 2015 at 07:04:09 UTC, Zach the Mystic wrote: class RcType {...} void fun(RcType1 c, RcType1 d); auto x = new RcType; fun(x, x); If the compiler were smart, it would realize that by splitting parameters this way, it's actually adding an additional reference to x. The function should get one x for free, and then force an opAdd/opRelease, for every additional x (or x derivative) it detects in the same call. One more tidbit: class RcType { RcType r; ... } void fun(RcType x, RcType y); auto z = new RcType; z.r = new RcType; fun(z, z.r); From within fun(), z can alias z.r, but z.r can't possibly alias z. Thus, only z.r needs to be preserved. The algorithm should go For each parameter, add one ref/release cycle for every other parameter which could possibly generate an alias to it. We're approaching optimal here. This is feeling good to me.
Re: Improving DIP74: functions borrow by default, retain only if needed
On Friday, 27 February 2015 at 21:21:08 UTC, Andrei Alexandrescu wrote: On 2/27/15 1:02 PM, Michel Fortin wrote: On 2015-02-27 20:34:08 +, Steven Schveighoffer said: void main() { C2 c2 = new C2; c2.c = new C; foo(c2.c, c2); } Still same question. The issue here is how do you know that the reference that you are sure is keeping the thing alive is not going to release it through some back door. There are surely other cases, but you get the idea. These three situations are probably the most common, especially the first one. For instance, inside a member function, 'this' is a local variable and you will never pass it to another function by ref, so it's safe to call 'this.otherFunction()' without retaining 'this' first. Thanks. So it seems we continue as we were with DIP74 and leave the rest to the implementation. Hey, I don't think so. I think I figured it out. Keep track in house of which parameters get opReleased, and have the compiler insert addRef and opRelease at entry and exit to the function itself. No performance penalty, no parameter attribute, no nothin'. Just an in-house tracking mechanism. Eh???
Re: My Reference Safety System (DIP???)
On Saturday, 28 February 2015 at 20:49:22 UTC, Marc Schütz wrote: Any other ideas and opinions? I'm a little busy. It'll take me some time. There's a lot going on in recent days with all these ideas.
Re: DIP74 updated with new protocol for function calls
On Saturday, 28 February 2015 at 21:12:54 UTC, Andrei Alexandrescu wrote: Defines a significantly better function call protocol: http://wiki.dlang.org/DIP74 Andrei This is obviously much better, Andrei. I think an alternative solution (I know -- another idea -- against my own first idea!) is to keep track of this from the caller's side. The compiler, in this case, when copying a ref-counted type (or derivative) into a parameter, would actually check to see if it's splitting the variable in two. Look at this: class RcType {...} void fun(RcType1 c, RcType1 d); auto x = new RcType; fun(x, x); If the compiler were smart, it would realize that by splitting parameters this way, it's actually adding an additional reference to x. The function should get one x for free, and then force an opAdd/opRelease, for every additional x (or x derivative) it detects in the same call. This might be even better than the improved current proposal. The real key is realizing that duplicating an lvalue into the same function call is subtly adding a new reference to it. Eh??
Re: My Reference Safety System (DIP???)
On Saturday, 28 February 2015 at 20:49:22 UTC, Marc Schütz wrote: I encountered an ugly problem. Actually, I had already run into it in my first proposal, but Steven Schveighoffer just posted about it here, which made me aware again: http://forum.dlang.org/thread/mcqcor$aa$1...@digitalmars.com#post-mcqk4s:246qb:241:40digitalmars.com class T { void doSomething() scope; } struct S { RC!T t; } void main() { auto s = S(RC!T()); // `s.t`'s refcount is 1 foo(s, s.t);// borrowing, no refcount changes } void foo(ref S s, scope T t) { s.t = RC!T(); // drops the old `s.t` t.doSomething();// oops, `t` is gone } One quick thing. I suggest a solution here: http://forum.dlang.org/post/jycylhdhdewtgumba...@forum.dlang.org You do the checking and adding in the called function, not the caller. The algorithm: 1. Keep a compile-time refcount per function. Does the parameter get released, i.e. does the refcount ever go below 1? If not, stop. 2. Can the parameter contain (as a member) a reference to a refcounted struct of the types of any of the other parameters? If not, stop. 3. Okay, you need to preserve the reference. Add a call to opAdd at the beginning and one to opRelease at the end of the function. Done.
Re: My Reference Safety System (DIP???)
On Friday, 27 February 2015 at 23:18:24 UTC, Marc Schütz wrote: I think I have an inference algorithm that works. It can infer the required scope levels for local variables given the constraints of function parameters, and it can even infer the annotations for the parameters (in template functions). It can also cope with local variables that are explicitly declared as `scope`, though these are mostly unnecessary. Interestingly, the rvalue/lvalue problem deadalnix found is only relevant during assignment checking, but not during inference. That's because we are free to widen the scope of variables that are to be inferred as needed. It's based on two principles: * We start with the minimum possible scope a variable may have, which is empty for local variables, and its own lifetime for parameters. * When a scoped value is stored somewhere, it is then reachable through the destination. Therefore, assuming the source's scope is fixed, the destination's scope must be widened to accommodate the source's scope. * From the opposite viewpoint, a value that is to be stored somewhere must have at least the destination's scope. Therefore, assuming the destination's scope is fixed, the source's scope needs to be widened accordingly. I haven't formalized it yet, but I posted a very detailed step-by-step demonstration on my wiki talk page (nicer to read because it has syntax highlighting): http://wiki.dlang.org/User_talk:Schuetzm/scope2 I need to sleep as well right now. But I still don't understand where the cycles come from. Taken from your example: *b = c; // assignment from `c`: // = SCOPE(c) |= SCOPE(*b) // = DEFER because SCOPE(*b) = SCOPE(b) is incomplete `c` is merely being copied, but you indicate here that it will now inherit b's (or some part of b's) scope. Why would c's scope inherit b's when it is merely being copied and not written to?
Re: DIP74: Reference Counted Class Objects
On Thursday, 26 February 2015 at 21:50:56 UTC, Andrei Alexandrescu wrote: http://wiki.dlang.org/DIP74 got to reviewable form. Please destroy and discuss. Thanks, Andrei It's kind of funny that you were looking for an edge to my safety system -- I'll admit I don't know whether it really has an edge or not (it might be too bloated, both function-signature-wise and compile-time-wise) -- but one key advantage to any sophisticated ownership system is that automated reference counting can elide calls which it knows are unnecessary. What struck me in particular about DIP74 is how the pass-by-value protocol will force many function calls to endure an opAddRef/opRelease cycle, even if they do nothing to the reference count. What really worries me is that if the caller is responsible for the opAddRef, while the callee is responsible for the opRelease, isn't the potential optimization of eliding them just being sacrificed?
Re: Contradictory justification for status quo
On Friday, 27 February 2015 at 14:02:58 UTC, Andrei Alexandrescu wrote: Safety is good to have, and the simple litmus test is if you slap @safe: at the top of all modules and you use no @trusted (or of course use it correctly), you should have memory safety, guaranteed. A feature that is safe except for certain constructs is undesirable. It seems like you're agreeing with my general idea of going the whole hog. Generally having a large number of corner cases that require special language constructs to address is a Bad Sign. But D inherits C's separate compilation model. All these cool function and parameter attributes (pure, @safe, return ref, etc.) could be kept hidden and just used and they would Just Work if D didn't have to accommodate separation compilation. From my perspective, the only Bad Sign is that D has to navigate the tradeoff between: * concise function signatures * accurate communication between functions * enabling separate compilation It's like you have to sacrifice one to get the other two. Naturally I'm not keen on this, so I rush to see how far attribute inference for all functions can be taken. Then Dicebot suggests automated .di file generation with statically verified matching binaries: http://forum.dlang.org/post/otejdbgnhmyvbyaxa...@forum.dlang.org The point is that I don't feel the ominous burden of a Bad Sign here, because of the inevitability of this conflict.
Re: Improving DIP74: functions borrow by default, retain only if needed
On Friday, 27 February 2015 at 18:24:27 UTC, Andrei Alexandrescu wrote: DIP74's function call protocol for RCOs has the caller insert opAddRef for each RCO passed by value. Then the callee has the responsibility to call opRelease (or defer that to another entity). This choice of protocol mimics the constructor/destructor protocol and probably shows our C++ bias. However, ARC does not do that. Instead, it implicitly assumes the callee is a borrower of the reference. Only if the callee wants to copy the parameter to a member or a global (i.e. save it beyond the duration of the call), a new call to retain() (= opAddRef) is inserted. That way, functions that only need to look at the object but not store it incur no reference call overhead. So I was thinking of changing DIP74 as follows: * Caller does NOT insert an opAddRef for byval RCOs * Callee does NOT insert an opRelease for its byval RCO parameters It seems everything will just work with this change (including all move scenarios), but it is simple enough to make me worry I'm missing something. Thoughts? I think it's fine. I couldn't even figure out the original motive for wanting to add those calls -- I thought it must have something to do with threads or exceptions or something, but even then I couldn't figure it out. Any reference argument will, by definition, outlive its function -- it can't possibly die within the function itself, since the caller still thinks it's a valid reference. Another thing is that local references in general need not participate in reference counting. They will retain and release the reference automatically when they go in and out of scope. I'm really no expert (except that I like to study and think and by thinking become somewhat expert it appears), but if all ARC could be confined to global/heap = global/heap copies, you'd get the most efficient code. And I'm not trying to advertise a reference tracking system :-), but the real hiccup is that global reference can go *through* the stack and land back at a global... and you would need to keep track of that.
Re: Contradictory justification for status quo
On Friday, 27 February 2015 at 15:35:46 UTC, H. S. Teoh wrote: @safe has some pretty nasty holes right now... like: https://issues.dlang.org/show_bug.cgi?id=5270 https://issues.dlang.org/show_bug.cgi?id=8838 My new reference safety system: http://forum.dlang.org/post/offurllmuxjewizxe...@forum.dlang.org ...would solve the above two bugs. In fact, it's designed precisely for bugs like those. Here's your failing use case for bug 5270. I'll explain how my system would track and catch the bug: int delegate() globDg; void func(scope int delegate() dg) { globDg = dg; // should be rejected but isn't globDg(); } If func is marked @safe and no attribute inference is permitted, this would error, as it copies a reference parameter to a global. However, let's assume we have inference. The signature would now be inferred to: void func(noscope scope int delegate() dg); Yeah it's obviously weird having both `scope` and `noscope`, but that's pure coincidence, and moreover, I think the use of `scope` here would be made obsolete by my system anyway. (Note also that the `noscope` bikeshed has been suggested to be painted `static` instead -- it's not about the name, yet... ;-) void sub() { int x; func(() { return ++x; }); } Well I suppose this rvalue delegate is allocated on the stack, which will have local reference scope. This is where you'd get the safety error in the case of attribute inference, as you can't pass a local reference to a `noscope` parameter. The rest is just a foregone conclusion (added here for completion): void trashme() { import std.stdio; writeln(globDg()); // prints garbage } void main() { sub(); trashme(); } The next bug, 8838, is a very simple case, I think: int[] foo() @safe { int[5] a; return a[]; } `a`, being a static array, would have a reference scope depth of 1, and when you copy the reference to make a dynamic array in the return value, the reference scope inherits that of `a`. Any scope system would catch this one, I'm afraid. Mine seems like overkill in this case. :-/
Re: Improving DIP74: functions borrow by default, retain only if needed
On Friday, 27 February 2015 at 20:30:20 UTC, Steven Schveighoffer wrote: OK, I found the offending issue. It's when you pass a parameter, the only reference holding onto it may be also passed as well. Something like: void foo(C c, C2 c2) { c2.c = null; // this destroys 'c' unless you opAddRef it before passing c.someFunc(); // crash } void main() { C c = new C; // ref counted class C2 c2 = new C2; // another ref counted class c2.c = c; foo(c, c2); } How does the compiler know in this case that it *does* have to opAddRef c before calling? Maybe your ARC expert can explain how that works. Split-passing nested ref-counted classes with null loads! How insidious!
Re: Improving DIP74: functions borrow by default, retain only if needed
On Friday, 27 February 2015 at 21:21:08 UTC, Andrei Alexandrescu wrote: On 2/27/15 1:02 PM, Michel Fortin wrote: On 2015-02-27 20:34:08 +, Steven Schveighoffer said: On 2/27/15 3:30 PM, Steven Schveighoffer wrote: void main() { C c = new C; // ref counted class C2 c2 = new C2; // another ref counted class c2.c = c; foo(c, c2); } Bleh, that was dumb. void main() { C2 c2 = new C2; c2.c = new C; foo(c2.c, c2); } Still same question. The issue here is how do you know that the reference that you are sure is keeping the thing alive is not going to release it through some back door. You have to retain 'c' for the duration of the call unless you can prove somehow that calling the function will not cause it to be released. You can prove it in certain situations: - you are passing a local variable as a parameter and nobody has taken a mutable reference (or pointer) to that variable, or to the stack frame (be wary of nested functions accessing the stack frame) - you are passing a global variable as a parameter to a pure function and aren't giving to that pure function a mutable reference to that variable. - you are passing a member variable as a parameter to a pure function and aren't giving to that pure function a mutable reference to that variable or its class. There are surely other cases, but you get the idea. These three situations are probably the most common, especially the first one. For instance, inside a member function, 'this' is a local variable and you will never pass it to another function by ref, so it's safe to call 'this.otherFunction()' without retaining 'this' first. Thanks. So it seems we continue as we were with DIP74 and leave the rest to the implementation. Andrei Still seems like a very significant performance penalty for such a strange case. It probably won't surprise you that I would suggest another parameter attribute to the rescue, e.g.`@rcRelease`! Inter-function communication for the win!
Re: My Reference Safety System (DIP???)
On Friday, 27 February 2015 at 22:10:11 UTC, Marc Schütz wrote: I put my own version into the Wiki, building on yours: http://wiki.dlang.org/User:Schuetzm/scope2 It's quite similar to what you propose (at least as far as I understand it), and there are a few further user-facing simplifications, and provisions for backward compatibility. I intentionally kept it as concise as possible; there are neither justifications for particular decisions, nor any implementation details, nor examples. These can be added later. I like this phrase: Because all relevant information about lifetimes is contained in the function signature... This keeps seeming more and more important to me. There's no other place functions can talk to each other -- and they *really* need to talk to each other for any of these advanced features to work well. I'm pretty sure it's really the function signature which needs designing -- what to add, what can be deduced (and therefore not added), and how to express them all elegantly and simply. And of course, my favorite Castle in the Sky: attribute inference! I won't really know how your proposal works until I see code examples. For me, it's important to keep the implementation details and algorithms separate from the basic workings. Otherwise it's hard for me to fully understand it in all aspects. Okay, but hopefully some examples are forthcoming, cause they help *me* think.
Re: Contradictory justification for status quo
On Friday, 27 February 2015 at 21:09:51 UTC, H. S. Teoh wrote: https://issues.dlang.org/show_bug.cgi?id=12822 https://issues.dlang.org/show_bug.cgi?id=13442 https://issues.dlang.org/show_bug.cgi?id=13534 https://issues.dlang.org/show_bug.cgi?id=13536 https://issues.dlang.org/show_bug.cgi?id=13537 https://issues.dlang.org/show_bug.cgi?id=14136 https://issues.dlang.org/show_bug.cgi?id=14138 There are probably other holes that we haven't discovered yet. I wanted to say that besides the first two bugs I tried to address, none of the rest in your list involves more than just telling the compiler to check for this or that, whatever the case may be, per bug. Maybe blanket use of `@trusted` to bypass an over-cautious compiler is the only real danger I personally am able to worry about. I simplified my thinking by dividing everything into in function and outside of function. So I ask, within a function, what do I need to know to ensure everything is safe? And then, from outside a function, what do I need to know to ensure everything is safe? The function has inputs and outputs, sources and destinations.
Re: My Reference Safety System (DIP???)
On Thursday, 26 February 2015 at 16:40:27 UTC, Zach the Mystic wrote: int r; // declaration scopedepth(0) void fun(int a /*scopedepth(0)*/) { int b; // depth(1) { int c; // depth(2) { int d; // (3) } { int e; // (3) } } int f; // (1) } You have element of differing lifetime at scope depth 0 so far. Sorry for the delay. I made a mistake. Parameter `a` will have a *declaration* scope of 1, just like int b above. It's *reference* scope will have depth 0, with the mystery bit for the first parameter set. That is, `a` would have such a reference scope is it were a reference type... :-)
Re: My Reference Safety System (DIP???)
On Wednesday, 25 February 2015 at 18:08:55 UTC, deadalnix wrote: On Wednesday, 25 February 2015 at 01:12:15 UTC, Zach the Mystic wrote: int r; // declaration scopedepth(0) void fun(int a /*scopedepth(0)*/) { int b; // depth(1) { int c; // depth(2) { int d; // (3) } { int e; // (3) } } int f; // (1) } You have element of differing lifetime at scope depth 0 so far. Sorry for the delay. I made a mistake. Parameter `a` will have a *declaration* scope of 1, just like int b above. It's *reference* scope will have depth 0, with the mystery bit for the first parameter set. Principle 5: It's always un@safe to copy a declaration scope from a higher scopedepth to a reference variable stored at lower scopedepth. DIP69 tries to banish this type of thing only in `scope` variables, but I'm not afraid to banish it in all @safe code period: void gun() @safe { T* t; // t's declaration depth: 1 T u; { T* uu = u; // fine, this is normal T tt; t = tt; // t's reference depth: 2, error, un@safe } // now t is corrupted } Bingo. However, when you throw goto into the mix, weird thing happens. The general idea is good but need refining. I addressed this further down, in Principle 10. My proposed solution has the compiler detecting the presence of code which could both 1) be visited again (through a jump label or a loop) and 2) is in a branching condition. In these cases it pushes any statement which copies a reference onto a special stack. When the branching condition finishes, it revisits the stack, reheating the scopes in reverse order. If there is a way to defeat this technique, it must be very convoluted, since the scopes do nothing but accumulate possibilities. It may even be mathematically impossible. Principle 7: In this system, all scopes are *transitive*: any reference type with double indirections inherits the scope of the outermost reference. Think of it this way: It is more complex than that, and this is where most proposals fail short (including this one and DIP69). If you want to disallow the assignment of a reference to something with a short lifetime, you can't consider scope transitive when used as a lvalue. You can, however, consider it transitive when used as an rvalue. The more general rule is that you want to consider the largest possible lifetime of an lvalue, and the smallest possible one for an rvalue. When going through an indirection, that will differ, unless we choose to tag all indirections, which is undesirable. I'm unclear about what you're saying. Can you give an example in code? Principle 8: Any time a reference is copied, the reference scope inherits the *maximum* of the two scope depths: That makes control flow analysis easier, so I can buy this :) Principle 8: We don't need to know! For all intents and purposes, a reference parameter has infinite lifetime for the duration of the function it is compiled in. Whenever we copy any reference, we do a bitwise OR on *all* of the mystery scopes. The new reference accumulates every scope it has ever had access to, directly or indirectly. That would allow to copy a parameter reference to a global, which is dead unsafe. Actually, it's not unsafe, so long as you have the parameter attribute `noscope` (or possibly `static`) working for you: void fun(T* a) { static T* t; *t = a; // this might be safe } The truth is, this *might* be safe. It's only unsafe if the parameter `a` is located on the stack. From within the function, the compiler can't possibly know this. But if it forces you to mark `a` with `noscope` (or is allowed to infer the same), it tells the caller all it needs to know about `a`. Simply put, it's an error to pass a local to a `noscope` parameter. And it runs all the way down: any parameter which it itself passed to a `noscope` parameter must also be marked `noscope`. (Note: I'm actually preferring the name `static` at this point, but using `noscope` for consistency): T* fun(noscope T* a) { static T* t; *t = a; // this might be safe } void tun(T* b) { T c; fun(c); // error, local fun(b); // error, unless b also marked (or inferred) `noscope` } There is some goodness in there. Please address my comment and tell me if I'm wrong, but I think you didn't covered all bases. The only base I'm really worried about is the lvalue vs rvalue base. Hopefully we can fix that!
Re: My Reference Safety System (DIP???)
On Thursday, 26 February 2015 at 16:42:30 UTC, Zach the Mystic wrote: That is, `a` would have such a reference scope is it were a reference type... :-) s/is/if/ I seem to be making one more mistake for every mistake I correct.
Re: My Reference Safety System (DIP???)
On Wednesday, 25 February 2015 at 21:26:33 UTC, Marc Schütz wrote: IIRC H.S. Teoh suggested a change to the compilation model. I think he wants to expand the minimal compilation unit to a library or executable. In that case, inference for all kinds of attributes will be available in many more circumstances; explicit annotation would only be necessary for exported symbols. You probably mean Dicebot: http://forum.dlang.org/post/otejdbgnhmyvbyaxa...@forum.dlang.org Anyway, it is a good idea to enable scope semantics implicitly for all references involved in @safe code. As far as I understand it, this is something you suggest, right? It will eliminate annotations except in cases where a parameter is returned, which - as you note - will probably be acceptable, because it's already been suggested in DIP25. Actually you could eliminate `return` parameters as well, I think. If the compiler has the body of a function, which it usually does, then there shouldn't be a need to mark *any* of the covariant function or parameter attributes. I think it's the kind of thing which should Just Work in all these cases. Principle 4: Scopes. My system has its own notion of scopes. They are compile time information, used by the compiler to ensure safety. Every declaration which holds data at runtime must have a scope, called its declaration scope. Every reference type (defined below in Principle 6) will have an additional scope called its reference scope. A scope consists of a very short bit array, with a minimum of approximately 16 bits and reasonable maximum of 32, let's say. For this proposal I'm using 16, in order to emphasize this system's memory efficiency. 32 bits would not change anything fundamental, only allow the compiler to be a little more precise about what's safe and what's not, which is not a big deal since it conservatively defaults to @system when it doesn't know. This bitmask seems to be mostly an implementation detail. I guess I'm trying to win over the people who might think the system will cost too much memory or compilation time. AFAIU, further below you're introducing some things that make it visible to the user. The only things I'm making visible to the user are things which *must* appear in the function signature for the sake of the separate compilation model. Everything else would be invisible, except the occasional false positive, where something actually safe is thought unsafe (the solution being to enclose the statement in a @trusted black or lambda). I'm not convinced this is a good idea; it looks complicated for sure. It's not that complicated. My main fear is that it's too simple! Some of the logic may seem complicated, but the goal is to make it possible to compile a function without having to visit any other function. Everything is figured out in house. I also think it is too coarse. Even variables declared at the same lexical scope have different lifetimes, because they are destroyed in reverse order of declaration. This is relevant if they contain references and have destructors that access the references; we need to make sure that no reference to a destroyed variable can be kept in a variable whose destructor hasn't yet run. It might be too coarse. We could reserve a few more bits for depth-constant declaration order. At the same, time, it doesn't seem *that* urgent to me. But maybe I'm naive about this. Everything is being destroyed anyway, so what's the real danger? Principle 5: It's always un@safe to copy a declaration scope from a higher scopedepth to a reference variable stored at lower scopedepth. DIP69 tries to banish this type of thing only in `scope` variables, but I'm not afraid to banish it in all @safe code period: For backwards compatibility reasons, it might be better to restrict it to `scope` variables. But as all references in @safe code should be implicitly `scope`, this would mostly have the same effect. I guess this is the Language versus Legacy issue. I think D's strength is in it's language, not its huge legacy codebase. Therefore, I find myself going with the #pleasebreakourcode crowd, for the sake of extending D's lead where it shines. I'm not sure all references in safe code need to be `scope` - that would break a lot of code unto itself, right? Principle 8: Any time a reference is copied, the reference ^^^ Principle 7 ? scope inherits the *maximum* of the two scope depths: T* gru() { static T st; // decl depth(0) T t; // decl depth(1) T* tp = t; // ref depth(1) tp = st; // ref depth STILL (1) return tp; // error! } If you have ever loaded a reference with a local scope, it retains that scope level permanently, ensuring the safety of the reference. Why is this rule necessary? Can you show an example what could go wrong without it? I assume it's just there to ease implementation (avoids the need for data flow analysis)? You're right. It's only
Re: Memory safety depends entirely on GC ?
Here's my best so far: http://forum.dlang.org/post/offurllmuxjewizxe...@forum.dlang.org On Tuesday, 24 February 2015 at 20:53:24 UTC, Walter Bright wrote: My criticisms of it centered around: 1. confusion about whether it was a storage class or a type qualifier. My system has neither. Instead, it just bans unsafe reference copying in @safe code. 2. I agree with Andrei that any annotation system can be made to work - but this one (as are most annotation systems) also struck me as wordy, tedious, and aesthetically unappealing. I just can't see myself throwing it up on a slide and trying to sell it to the audience as cool. 3. In line with (2), I want a system that relies much more on inference. We've made good progress with the existing annotations being inferred. Well you know I'm on board with this. The one penalty my system requires is two more parameter attributes, which I'm hoping can be alleviated by inference as much as possible. 4. I didn't see how one could, for example, have an array of pointers: int*[] pointers; and then fill that array with pointers of varying ownership annotations. 5. The (4) homogeneity requirement would mean that templated types would get new instantiations every time they are used with a different ownership. This could lead to massive code bloat. I deliberately designed my system to avoid all associations with type. No code bloat. 6. The 'return ref' scheme, which you have expressed distaste for, was one that required the fewest instances of the user having to add an annotation. It turned out that upgrading Phobos to this required only a handful of annotations. 7. 'return ref' makes memory safe ref counted types possible, finally, in D, without needing to upend the language or legacy code. And as the example I posted showed, they are straightforward to write. Only time and experience will tell if this will be successful, but it looks promising and I hope you'll be willing to give it a chance. I do give it a chance! See my proposal!
Re: My Reference Safety System (DIP???)
On Thursday, 26 February 2015 at 20:46:07 UTC, deadalnix wrote: Consider : void foo(T** a) { T** b = a; // OK T* = ...; *b = c; // Legal because of your transitive clause, // but not safe as a can have an // arbitrary large lifetime. } This example's incomplete, but I can guess you meant something like this: void foo(T** a) { T** b = a; // OK T d; T* c = d; *b = c; // Legal because of your transitive clause, // but not safe as a can have an // arbitrary large lifetime. } This show that anything you reach through an indirection can have from the same lifetime as the indirection up to an infinite lifetime (and anything in between). When using it as an lvalue, you should consider the largest possible lifetime, when using it as an rvalue, you should consider the smallest (this is the only way to be safe). I'm starting to see what you mean. I guess it's only applicable to variables with double (or more) indirections (e.g. T**, T***, etc.), since only they can lose information with transitive scopes. Looks like we need a new rule: variables assigning to one of their double indirections cannot acquire a scope-depth greater than (or lifetime less than) their current one. Does that fix the problem?
Re: My Reference Safety System (DIP???)
On Thursday, 26 February 2015 at 21:33:53 UTC, Marc Schütz wrote: On Thursday, 26 February 2015 at 17:56:14 UTC, Zach the Mystic wrote: On Wednesday, 25 February 2015 at 21:26:33 UTC, Marc Schütz wrote: struct A { B* b; ~this() { b.doSomething(); } } struct B { void doSomething(); } void foo() { A a; // declscope(1) B b; // declscope(1) a.b = b; // refscope(1) = declscope(1): OK // end of scope: // `b` is destroyed // `a`'s destructor is called // = your calling a method on a destroyed object } Basically, every variable needs to get its own declscope; all declscopes form a strict hierarchy (no partial overlaps). Well, technically you only need one per variable with a destructor. Fortunately, this doesn't seem hard to add. Just another few bits, allowing as many declarations with destructors as seem necessary (4 bits = 15 variables, 5 bits = 31 variables, etc.), with the last being treated conservatively as unsafe. (I think anyone declaring 31+ variables with destructors in a function, and taking the addresses of those variables has bigger problems than memory safety!) I guess this is the Language versus Legacy issue. I think D's strength is in it's language, not its huge legacy codebase. Therefore, I find myself going with the #pleasebreakourcode crowd, for the sake of extending D's lead where it shines. I'm too, actually, but it would be a really hard sell. But look, Walter and Andrei were fine with adding `return ref` parameters. There's hope yet! I'm not sure all references in safe code need to be `scope` - that would break a lot of code unto itself, right? Not sure how much would be affected. I actually suspect that most of it already behaves as if it were scope, with the exception of newly allocated memory. But those should ideally be owned instead. But your right, there still needs to be an opt-out possibility, most likely static. I don't even have a use for `scope` itself in my proposal. The only risk I'm running is a lot of false positives -- safe constructs which the detection mechanism conservatively treats as unsafe because it can't follow the program logic. Still, it's hard for me to imagine even these appearing very much. And they can be put into @trusted lambdas -- all @trusted functions are treated as if they copy no references, effectively canceling any parameter attributes to the contrary. T* fun(T* a, T** b) { T* c = new T; c = a; *b = c; return c; } Algorithm for inference of ref scopes (= parameter annotations): 1) Each variable, parameter, and the return value get a ref scope (or ref depth). A ref scope can either be another variable (including `return` and `this`) or `static`. 2) The initial ref scope of variables is themselves. Actually, no. The *declaration* scope is themselves. The initial ref scope is whatever the variable is initialized with, or just null if nothing. We could even have a bit for could be null. You might get some null-checking out of this for free. But then you'd need more attributes in the signature to indicate could be null! But crashing due to null is not considered a safety issue (I think!), so I haven't gone there yet. 3) Each time a variable (or something reachable through a variable) is assigned (returning is assignment to the return value), i.e. for each location in the function that an assignment happens, the new scope ref will be: 3a) the scope of the source, if it is larger or equal to the old scope If scope depth is = 1, you inherit the maximum of the source and the target. If it's 0, you do a bitwise OR on the mystery scopes (unless the compiler can easily prove it doesn't need to), so you can accumulate all possible origins of the assigned-to scope. 3b) otherwise (for disjunct scopes, or assignment from smaller to larger scope), it is an error (could potentially violate guarantees) I don't have disjunct scopes. There's just greater than and less than. The mystery scopes are for figuring out what the parameter attributes are, and in the absence of inference, causing errors in safe code for the parameters not being accurately marked. 4) If a source scope refers to a variable (apart from the destination itself), for which not all assignments have been processed yet, it is put into a queue, to be evaluated later. For code like `a = b; b = a;` there can be dependency cycles. Such code will be disallowed. No, my system is simpler. I want to make this proposal appealing from the implementation side as well as from the language side. You analyze the code in lexical order: T* dum(T* a) { T* b = a; // b accumulates a return b; // okay... lexical ordering, b has a only T c; b = c; // now b accumulates scopedepth(1); return b; // error here, but *only* here } The whole process relies on accumulating the scopes as the compiler encounters them. There are cases of branching conditional,
Re: My Reference Safety System (DIP???)
On Friday, 27 February 2015 at 00:44:21 UTC, deadalnix wrote: On Thursday, 26 February 2015 at 22:45:19 UTC, Zach the Mystic wrote: I'm starting to see what you mean. I guess it's only applicable to variables with double (or more) indirections (e.g. T**, T***, etc.), since only they can lose information with transitive scopes. Looks like we need a new rule: variables assigning to one of their double indirections cannot acquire a scope-depth greater than (or lifetime less than) their current one. Does that fix the problem? Cool. I think that can work (I'm not 100% convinced, but at least something close to that should work). But that is probably too limiting. Hence the proposed differentiation of lvalue and rvalues. Yeah, wasn't completely clear. I meant to say: Variables assigning to one of their double indirections cannot acquire a scope-depth greater than (or lifetime less than) their current longest-lived one. Also, bear in mind, a parameter could be an lvalue: void fun(T* a, T** b) { *b = a; } I guess its just better to use source and targer than lvalue and rvalue. Also bear in mind that in the worst case scenario, any code can be made to work by putting it into the newly approved-of idiom: The @trusted Lambda! We want a safety mechanism conservative enough to catch all failures, accurate enough to avoid too many false positives (thus minimizing @trusted lambdas), easy enough to implement, and which doesn't tax compile time too heavily. The magic Four! I still have a few doubts (recursive inference, for example, which can probably be improved), but not too many.
Re: Contradictory justification for status quo
On Friday, 27 February 2015 at 01:33:58 UTC, Jonathan M Davis wrote: Well, I suspect that each case would have to be examined individually to decide upon the best action, but I think that what it comes down to is the same problem that we have with getting anything done around here - someone has to do it. This isn't true at all. Things need to be approved first, then implemented. With language changes, it's often the same. Someone needs to come up with a reasonable solution and then create a PR for it. They then have a much stronger position to argue from, and it may get in and settle the issue. I sometimes feel so bad for Kenji, who has come up with several reasonable solutions for longstanding problems, *and* implemented them, only to have them be frozen for *years* by indecision at the top. I'll never believe your side until this changes. You can see exactly how D works by looking at how Kenji spends his time. For a while he's only been fixing ICEs and other little bugs which he knows for certain will be accepted. I'm not saying any of these top level decisions are easy, but I don't believe you for a second, at least when it comes to the language itself. Phobos may be different.
Re: Contradictory justification for status quo
On Friday, 27 February 2015 at 02:58:31 UTC, Andrei Alexandrescu wrote: I'm following with interest the discussion My Reference Safety System (DIP???). Right now it looks like a lot of work - a long opener, subsequent refinements, good discussion. It also seems just that - there's work but there's no edge to it yet; right now a DIP along those ideas is more likely to be rejected than approved. But I certainly hope something good will come out of it. What I hope will NOT happen is that people come to me with a mediocre proposal going, We've put a lot of Work into this. Well? Can I ask you a general question about safety: If you became convinced that really great safety would *require* more function attributes, what would be the threshold for including them? I'm trying to go the whole hog with safety, but I'm paying what seems to me the necessary price -- more parameter attributes. Some of these gains (out! parameters, e.g.) seem like they would only apply to very rare code, and yet they *must* be there, in order for functions to talk to each other accurately. Are you interested in accommodating the rare use cases for the sake of robust safety, or do you just want to stop at the very common use cases (ref returns, e.g.)? ref returns will probably cover more than half of all use cases for memory safety. Each smaller category will require additions to what a function signature can contain (starting with expanding `return` to all reference types, e.g.), while covering a smaller number of actual use cases... but on the other hand, it's precisely because they cover fewer use cases that they will appear so much less often.
Re: Memory safety depends entirely on GC ?
On Tuesday, 24 February 2015 at 12:44:54 UTC, Marc Schütz wrote: On Monday, 23 February 2015 at 18:16:38 UTC, Andrei Alexandrescu wrote: On 2/23/15 6:56 AM, Marc =?UTF-8?B?U2Now7x0eiI=?= schue...@gmx.net wrote: These two points have undesirable consequences: All consumers such objects need to be aware of the exact type, which includes the management strategy (RC, Unique, GC). But this is a violation of the principle of separation of concerns: a consumer shouldn't need to have information about the management strategy, it should work equally with `RefCounted!C`, `Unique!C` and bare (GC) `C`, as long as it doesn't take ownership of the resource. Well I don't know of another way. Ok, I wrote my reply assuming that you are aware of the various proposals deadalnix, myself and several other people have made in the past, some of them quite specific. But now that I think of it, I don't remember that you were ever directly referring to it in any of your posts. Maybe you just missed it? As one example, here is what I originally suggested: http://wiki.dlang.org/User:Schuetzm/scope It's not completely up to date, during discussions I gained many useful new insights to simplify it and make things more consistent. It's also part of a bigger picture (deadalnix's ideas about ownership play an important role, too), which unfortunately isn't easy to recognize, because this page has become quite large und unwieldy. I should make a post explaining this. I'm working on my own idea now. I make scope transitive, because it's both memory safe and simple to implement, but doing so may cause some things which are actually safe to be considered unsafe (but then you could just use @system blocks or @trusted lambdas to correct this). Also, I don't think `scope` needs to be part of the type. I'm about 90 percent sure, 10 percent unsure that my system will work. I'll have it soon enough. It needs DIP25 to be expanded to all reference types (not just `ref`), requires my own DIP71, http://wiki.dlang.org/DIP71 for total safety, and possibly one or two more additions for a reliable ownership. The only real cost is added complexity to function signatures (a la DIP25), which can and should be inferred in most cases, assuming we aren't crippled by an ancient and subpar linking mechanism which requires all this manual marking of signatures all the time. Stay tuned, sir!
My Reference Safety System (DIP???)
So I've been thinking about how to do safety for a while, and this is how I would do it if I got to start from scratch. I think it can be harnessed to D, but I'm worried that people will be confused by it, or that there might be a show-stopping use case I haven't thought of, or that it is simply too cumbersome to be taken seriously, but I'll make a DIP when it overcomes these three obstacles. I'm feeding off the momentum built by the approval of DIP25, and off of other recent `scope` proposals: http://wiki.dlang.org/DIP25 http://wiki.dlang.org/User:Schuetzm/scope http://wiki.dlang.org/DIP69 This system goes farther than either DIP25 or DIP69 towards complete safety, but is simpler and easier to implement I (I think) than Mark Schutz's and deadalnix's proposal. It is not an ownership or reference counting system, but can serve as the foundation to one. Which leads to... Principle 1: Memory safety is indispensable to ownership, but not the other way around. Memory safety focuses on all the things which *might* happen, and casts a wide net, akin to an algebraic union, whereas ownership targets specific things, focuses on what *will* happen, and is akin to the algebraic intersection of things. I will therefore present the memory safety system first, leave grafting an ownership system on top of it for later. Principle 2: The Function is the key unit of memory safety. The compiler must never need to leave the function it is compiling to verify that it is safe. This means that no information important to safety can be excluded from the signatures of the functions that the compiling function is calling. This principle has already been conceded in part by Walter and Andrei's acceptance of `return ref` parameters in DIP25, which simply implements the most common use case where safety is needed. Here I am taking this principle to the extreme, in the interest of total safety. But speaking of function signatures, Principle 3: Extra function and parameter attributes are the tradeoff for great memory safety. There is no other way to support both encapsulation of control flow (Principle 2) and the separate-compilation model (indispensable to D). Function signatures pay the price for this with their expanding size. I try to create the new attributes for the rare case, as opposed to the common one, so that they don't appear very often. Principle 4: Scopes. My system has its own notion of scopes. They are compile time information, used by the compiler to ensure safety. Every declaration which holds data at runtime must have a scope, called its declaration scope. Every reference type (defined below in Principle 6) will have an additional scope called its reference scope. A scope consists of a very short bit array, with a minimum of approximately 16 bits and reasonable maximum of 32, let's say. For this proposal I'm using 16, in order to emphasize this system's memory efficiency. 32 bits would not change anything fundamental, only allow the compiler to be a little more precise about what's safe and what's not, which is not a big deal since it conservatively defaults to @system when it doesn't know. So what are these bits? Reserve 4 bits for an unsigned integer (range 0-15) I call scopedepth. Scopedepth is easier for me to think about than lifetime, of which it is simply the inverse, with (0) scopedepth being infinite lifetime, 1 having a lifetime at function scope, etc. Anyway, a declaration's scopedepth is determined according to logic similar that found in DIP69 and Mark Schutz's proposal: int r; // declaration scopedepth(0) void fun(int a /*scopedepth(0)*/) { int b; // depth(1) { int c; // depth(2) { int d; // (3) } { int e; // (3) } } int f; // (1) } Principle 5: It's always un@safe to copy a declaration scope from a higher scopedepth to a reference variable stored at lower scopedepth. DIP69 tries to banish this type of thing only in `scope` variables, but I'm not afraid to banish it in all @safe code period: void gun() @safe { T* t; // t's declaration depth: 1 T u; { T* uu = u; // fine, this is normal T tt; t = tt; // t's reference depth: 2, error, un@safe } // now t is corrupted } So you'd have to enclose t = tt; above in a @trusted lambda or a @system block. The truth is, it is absurd to copy the address of something with shorter lifetime into something with longer lifetime... what use would you ever have for it in the longer-lived variable? I'm therefore simplifying the system by making all instances of this unsafe. Looking at Principle 5, I realize I forgot: Principle 6: Reference variables: Any data which stores a reference is a reference variable. That includes any pointer, class instance, array/slice, `ref` parameter, or any struct containing any of those. For the sake of simplicity, I boil _all_ of these down to T* in this proposal. All reference types are
Re: Trusted Manifesto
On Tuesday, 10 February 2015 at 15:49:24 UTC, Zach the Mystic wrote: Wait a second... you're totally right. That is a cool solution. The only hiccup is that it might be hard to implement in the compiler because of flow tracking (i.e. the error comes before the @system block, forcing a recheck of all preceding functions.). I mean all preceding statements.
Re: Trusted Manifesto
On Tuesday, 10 February 2015 at 16:04:05 UTC, Marc Schütz wrote: On Tuesday, 10 February 2015 at 15:57:28 UTC, Zach the Mystic wrote: On Tuesday, 10 February 2015 at 15:49:24 UTC, Zach the Mystic wrote: As already pointed out in the other thread, there is a non-breaking variant of (3): 3a. Keep named @trusted functions, allow @system blocks inside them, but only treat those with @system blocks with the new semantics. But they *have* no semantics without disallowing @system code in the rest of the @trusted function. Wait a second... you're totally right. That is a cool solution. The only hiccup is that it might be hard to implement in the compiler because of flow tracking (i.e. the error comes before the @system block, forcing a recheck of all preceding functions.). I'm sorry I misread you at first -- this is actually really cool (notwithstanding the hiccup)! No problem! At first I thought it was only a nice deprecation path, but I realised that the intermediate stage could even be kept indefinitely. Eventually the error should be the default, I say, but even then, a compiler switch can be kept around indefinitely which turns the error off. It probably wouldn't be too complicated to implement, because semantic analysis already happens in several stages. I think @safe checks happen relatively late, which means that there has already been one complete traversal of the functions AST which can take a note whenever it sees an @system block. Well that's just jolly!