Re: Why must a bidirectional range also be a forward range?
On Friday, September 20, 2019 7:08:03 AM MDT Joseph Rushton Wakeling via Digitalmars-d-learn wrote: > On Thursday, 19 September 2019 at 22:55:55 UTC, Jonathan M Davis > > wrote: > > For better or worse, ranges were more or less set up as a > > linear hierarchy, and it's unlikely that use cases for > > bidirectional ranges which aren't forward ranges are common. I > > expect that it's a bit like infinite, bidirectional ranges. In > > theory, they could be a thing, but the use cases for them are > > uncommon enough that we don't really support them. Also, I > > expect that most range-based algorithms which operate on > > bidirectional ranges would require save anyway. A lot of > > algorithms do to the point that basic input ranges can be > > incredibly frustrating to deal with. > > > > [ ... ] > > Thanks for the characteristically thorough description of both > the design considerations and the history involved. > > On reflection it occurs to me that the problem in my thinking may > be the idea that `save` should result in a full deep copy. If > instead we go by how `save` is implemented for dynamic arrays, > it's only ever a shallow copy: it's not possible to make valid > assumptions of reproducible behaviour if the original copy is > modified in any way. > > If instead we assume that `save` is only suitable for temporary > shallow-copies that are made under the hood of algorithms, then > my problems go away. save is supposed to result copies that can be independently iterated over. So, code such as foreach(r; range.save) {} auto arr = r.save.array(); assert(equal(arr, r)); should work. How that's implemented underneath the hood doesn't really matter. However, none of that really takes into account mutation of the elements. The range API pretty much assumes that you don't ever modify any of the elements in the range as you iterate over them. So, if you do something like auto orig = range.save; range.front = 42; whether orig.front is then 42 is implementation-dependent. So, if you're modifying elements as you go, then the behavior you get is going to be highly dependent on what you're doing with the ranges, and certainly, if you're only using a range within a very specific context, it can be implemented in a way that works in that context but doesn't work with range-based functions in general. You just run the risk of problems if you then later modify the code to use other range-based functions which don't necessarily work with whatever you've done with the range. As for temporary, shallow copies, IIRC, isForwardRange requires that save returns exactly the same type as the original. So, while you can certainly have a range referring to a data structure without owning any of the data (after all, that's what happens with dynamic arrays), you can't have a range of one type that owns the data and then have save return a range type which just refers to the data unless the range is written in a way that you can have both within the same type. One example of avoiding the need to deep-copy with save where one range is at least sort of the owner whereas the others aren't is how dxml's stax parser works. The ranges share a context that keeps track of how far into the range the farthest range is, and popFront only does the validation when the range that popFront is being called on is the farthest of any range. That way, the stuff related to validating end tags didn't need to be deep-copied, but save always returns exactly the same type, and you get exactly the same behavior regardless of which range gets iterated farther first (or even if one is iterated farther and then another is iterated beyond it). If popFront every throws, then that range becomes invalid, but the others are fine. The validation other than that for matching end tags currently all gets done every time, so all ranges would throw in the same place for errors other than end tags that don't match, but the same is also true for when the end tags don't match, because even though that validation is only done for the farthest range, if it fails, the shared context is left in exactly the same state, and any other ranges that reach that point would then throw like the first range did. Without realizing that the validation for the end tags didn't have to be done for every instance of the range but only the one which was farthest along, I would have been screwed with regards to save, because deep-copying would have been required. I'm not sure that that particular trick is widely applicable, but it is an example of how save can do something other than a deep copy even though having each range do exactly the same work would have required a deep copy. > > Assuming we were redesigning the range API (which may happen if > > we do indeed end up doing a Phobos v2), then maybe we could > > make it so that bidirectional ranges don't have to be forward > > ranges, but honestly _any_ ranges which aren't forward ranges > > are a bit of a problem. We do need to
Re: Why must a bidirectional range also be a forward range?
On Thursday, 19 September 2019 at 22:55:55 UTC, Jonathan M Davis wrote: For better or worse, ranges were more or less set up as a linear hierarchy, and it's unlikely that use cases for bidirectional ranges which aren't forward ranges are common. I expect that it's a bit like infinite, bidirectional ranges. In theory, they could be a thing, but the use cases for them are uncommon enough that we don't really support them. Also, I expect that most range-based algorithms which operate on bidirectional ranges would require save anyway. A lot of algorithms do to the point that basic input ranges can be incredibly frustrating to deal with. [ ... ] Thanks for the characteristically thorough description of both the design considerations and the history involved. On reflection it occurs to me that the problem in my thinking may be the idea that `save` should result in a full deep copy. If instead we go by how `save` is implemented for dynamic arrays, it's only ever a shallow copy: it's not possible to make valid assumptions of reproducible behaviour if the original copy is modified in any way. If instead we assume that `save` is only suitable for temporary shallow-copies that are made under the hood of algorithms, then my problems go away. Assuming we were redesigning the range API (which may happen if we do indeed end up doing a Phobos v2), then maybe we could make it so that bidirectional ranges don't have to be forward ranges, but honestly _any_ ranges which aren't forward ranges are a bit of a problem. We do need to support them on some level for exactly the kind of reasons that you're looking to avoid save with a bidirectional range, but the semantic differences between what makes sense for a basic input range and a forward range really aren't the same (in particular, it works far better for basic input ranges to be reference types, whereas it works better for forward ranges to be value types). It occurs to me that the distinction we're missing here might between "true" input ranges (i.e. which really come from IO of some kind), which indeed must be reference types, versus "pure" input ranges (which are deterministic, but which don't necessarily allow algorithms to rely on the ability to save and replay them). As it stands, I don't think that we can change isBidirectionalRange, because it's likely that most code using it relies on its check for isForwardRange. So, I think that we're stuck for the moment, but it is food for thought in a possible range API redesign. I'll add it to my notes on the topic. Some aspects of a range API redesign should look like are pretty clear at this point, whereas others are very much an open question. Oh, I wasn't asking for any changes to the existing definition (at least not without much thought from everyone!). I was just wanting to understand the reasons for the current situation. But thanks for putting it on the list of things to consider. I may have some follow-up to your other remarks but I think at least now I have a way forward with my code. Thanks!
Re: Why must a bidirectional range also be a forward range?
On Thursday, September 19, 2019 3:31:32 AM MDT Joseph Rushton Wakeling via Digitalmars-d-learn wrote: > Hello folks, > > A question that occurred to me while implementing a new data > structure recently, which I'm not sure I've ever seen a reason > for. > > Why must bidirectional ranges also be forward ranges (as opposed > to just input ranges)? > > It doesn't seem to me that the `save` property is inherently > required to iterate backwards over a range -- just the `back` and > `popBack` methods. > > It makes sense that, for bidirectionality, the range needs to be > deterministic, so that iterating backward gives the exact same > elements as iterating forward, just in reverse order. But it > seems strange to require the `save` property in order to > automatically assume deterministic behaviour. > > For context, the use-case I have is a data structure which stores > an internal buffer as an array. A robust `save` method would > therefore have to duplicate the array (or at least the active > subset of its contents). This means a fresh heap allocation per > `save`, which has some nasty implications for phobos algorithms > that eagerly `.save` when they can. > > So, I'd rather not implement `save` in this case. But there is > nothing that blocks implementing `back` and `popBack`; yet I > can't use these with any of the functionality that requires > bidirectionality, because the current `isBidirectionalRange` > check requires `save`. > > So what gives? Are there some reasons for the `save` requirement > on bidirectional ranges that I'm missing? And regardless, any > advice on how to handle my particular use-case? > > Thanks & best wishes, > >-- Joe For better or worse, ranges were more or less set up as a linear hierarchy, and it's unlikely that use cases for bidirectional ranges which aren't forward ranges are common. I expect that it's a bit like infinite, bidirectional ranges. In theory, they could be a thing, but the use cases for them are uncommon enough that we don't really support them. Also, I expect that most range-based algorithms which operate on bidirectional ranges would require save anyway. A lot of algorithms do to the point that basic input ranges can be incredibly frustrating to deal with. Assuming we were redesigning the range API (which may happen if we do indeed end up doing a Phobos v2), then maybe we could make it so that bidirectional ranges don't have to be forward ranges, but honestly _any_ ranges which aren't forward ranges are a bit of a problem. We do need to support them on some level for exactly the kind of reasons that you're looking to avoid save with a bidirectional range, but the semantic differences between what makes sense for a basic input range and a forward range really aren't the same (in particular, it works far better for basic input ranges to be reference types, whereas it works better for forward ranges to be value types). As it stands, I don't think that we can change isBidirectionalRange, because it's likely that most code using it relies on its check for isForwardRange. So, I think that we're stuck for the moment, but it is food for thought in a possible range API redesign. I'll add it to my notes on the topic. Some aspects of a range API redesign should look like are pretty clear at this point, whereas others are very much an open question. Ideally, I'd like to force basic input ranges to be reference types, and forward ranges to be value types, but I'm not sure that that's reasonable in practice. It would really clean up some of the semantics of ranges, but it would also likely require allocating a lot more stuff on the heap than would be desirable. Either way, having bidirectional ranges not need to have the equivalent of save would mean treating them as more of an add-on capability (like length) rather than having the kind of hierarchy that we have now. I don't know if that's ultimately a good or a bad thing. - Jonathan M Davis
Re: Why must a bidirectional range also be a forward range?
On Thursday, 19 September 2019 at 09:31:32 UTC, Joseph Rushton Wakeling wrote: For context, the use-case I have is a data structure which stores an internal buffer as an array. A robust `save` method would therefore have to duplicate the array (or at least the active subset of its contents). This means a fresh heap allocation per `save`, which has some nasty implications for phobos algorithms that eagerly `.save` when they can. In this case, it is probably better to separate the range from the data structure it refers to. For example: struct Container { int[] data; private static struct Range { int[] contents; bool empty(){ return contents.length == 0; } int front() { return contents[0]; } void popFront() { contents = contents[1 .. $]; } int back() { return contents[$ - 1]; } void popBack() { contents = contents[0 .. $ - 1]; } Range save(){ return this; } } Range getRange() { return Range(data[]); } }
Why must a bidirectional range also be a forward range?
Hello folks, A question that occurred to me while implementing a new data structure recently, which I'm not sure I've ever seen a reason for. Why must bidirectional ranges also be forward ranges (as opposed to just input ranges)? It doesn't seem to me that the `save` property is inherently required to iterate backwards over a range -- just the `back` and `popBack` methods. It makes sense that, for bidirectionality, the range needs to be deterministic, so that iterating backward gives the exact same elements as iterating forward, just in reverse order. But it seems strange to require the `save` property in order to automatically assume deterministic behaviour. For context, the use-case I have is a data structure which stores an internal buffer as an array. A robust `save` method would therefore have to duplicate the array (or at least the active subset of its contents). This means a fresh heap allocation per `save`, which has some nasty implications for phobos algorithms that eagerly `.save` when they can. So, I'd rather not implement `save` in this case. But there is nothing that blocks implementing `back` and `popBack`; yet I can't use these with any of the functionality that requires bidirectionality, because the current `isBidirectionalRange` check requires `save`. So what gives? Are there some reasons for the `save` requirement on bidirectional ranges that I'm missing? And regardless, any advice on how to handle my particular use-case? Thanks & best wishes, -- Joe