Re: Why must a bidirectional range also be a forward range?

2019-09-21 Thread Jonathan M Davis via Digitalmars-d-learn
On Friday, September 20, 2019 7:08:03 AM MDT Joseph Rushton Wakeling via 
Digitalmars-d-learn wrote:
> On Thursday, 19 September 2019 at 22:55:55 UTC, Jonathan M Davis
>
> wrote:
> > For better or worse, ranges were more or less set up as a
> > linear hierarchy, and it's unlikely that use cases for
> > bidirectional ranges which aren't forward ranges are common. I
> > expect that it's a bit like infinite, bidirectional ranges. In
> > theory, they could be a thing, but the use cases for them are
> > uncommon enough that we don't really support them. Also, I
> > expect that most range-based algorithms which operate on
> > bidirectional ranges would require save anyway. A lot of
> > algorithms do to the point that basic input ranges can be
> > incredibly frustrating to deal with.
> >
> > [ ... ]
>
> Thanks for the characteristically thorough description of both
> the design considerations and the history involved.
>
> On reflection it occurs to me that the problem in my thinking may
> be the idea that `save` should result in a full deep copy.  If
> instead we go by how `save` is implemented for dynamic arrays,
> it's only ever a shallow copy: it's not possible to make valid
> assumptions of reproducible behaviour if the original copy is
> modified in any way.
>
> If instead we assume that `save` is only suitable for temporary
> shallow-copies that are made under the hood of algorithms, then
> my problems go away.

save is supposed to result copies that can be independently iterated over.
So, code such as

foreach(r; range.save)
{}
auto arr = r.save.array();
assert(equal(arr, r));

should work. How that's implemented underneath the hood doesn't really
matter. However, none of that really takes into account mutation of the
elements. The range API pretty much assumes that you don't ever modify any
of the elements in the range as you iterate over them. So, if you do
something like

auto orig = range.save;
range.front = 42;

whether orig.front is then 42 is implementation-dependent. So, if you're
modifying elements as you go, then the behavior you get is going to be
highly dependent on what you're doing with the ranges, and certainly, if
you're only using a range within a very specific context, it can be
implemented in a way that works in that context but doesn't work with
range-based functions in general. You just run the risk of problems if you
then later modify the code to use other range-based functions which don't
necessarily work with whatever you've done with the range.

As for temporary, shallow copies, IIRC, isForwardRange requires that save
returns exactly the same type as the original. So, while you can certainly
have a range referring to a data structure without owning any of the data
(after all, that's what happens with dynamic arrays), you can't have a range
of one type that owns the data and then have save return a range type which
just refers to the data unless the range is written in a way that you can
have both within the same type.

One example of avoiding the need to deep-copy with save where one range is
at least sort of the owner whereas the others aren't is how dxml's stax
parser works. The ranges share a context that keeps track of how far into
the range the farthest range is, and popFront only does the validation when
the range that popFront is being called on is the farthest of any range.
That way, the stuff related to validating end tags didn't need to be
deep-copied, but save always returns exactly the same type, and you get
exactly the same behavior regardless of which range gets iterated farther
first (or even if one is iterated farther and then another is iterated
beyond it).

If popFront every throws, then that range becomes invalid, but the others
are fine. The validation other than that for matching end tags currently all
gets done every time, so all ranges would throw in the same place for errors
other than end tags that don't match, but the same is also true for when the
end tags don't match, because even though that validation is only done for
the farthest range, if it fails, the shared context is left in exactly the
same state, and any other ranges that reach that point would then throw like
the first range did.

Without realizing that the validation for the end tags didn't have to be
done for every instance of the range but only the one which was farthest
along, I would have been screwed with regards to save, because deep-copying
would have been required. I'm not sure that that particular trick is widely
applicable, but it is an example of how save can do something other than a
deep copy even though having each range do exactly the same work would have
required a deep copy.

> > Assuming we were redesigning the range API (which may happen if
> > we do indeed end up doing a Phobos v2), then maybe we could
> > make it so that bidirectional ranges don't have to be forward
> > ranges, but honestly _any_ ranges which aren't forward ranges
> > are a bit of a problem. We do need to 

Re: Why must a bidirectional range also be a forward range?

2019-09-20 Thread Joseph Rushton Wakeling via Digitalmars-d-learn
On Thursday, 19 September 2019 at 22:55:55 UTC, Jonathan M Davis 
wrote:
For better or worse, ranges were more or less set up as a 
linear hierarchy, and it's unlikely that use cases for 
bidirectional ranges which aren't forward ranges are common. I 
expect that it's a bit like infinite, bidirectional ranges. In 
theory, they could be a thing, but the use cases for them are 
uncommon enough that we don't really support them. Also, I 
expect that most range-based algorithms which operate on 
bidirectional ranges would require save anyway. A lot of 
algorithms do to the point that basic input ranges can be 
incredibly frustrating to deal with.


[ ... ]


Thanks for the characteristically thorough description of both 
the design considerations and the history involved.


On reflection it occurs to me that the problem in my thinking may 
be the idea that `save` should result in a full deep copy.  If 
instead we go by how `save` is implemented for dynamic arrays, 
it's only ever a shallow copy: it's not possible to make valid 
assumptions of reproducible behaviour if the original copy is 
modified in any way.


If instead we assume that `save` is only suitable for temporary 
shallow-copies that are made under the hood of algorithms, then 
my problems go away.


Assuming we were redesigning the range API (which may happen if 
we do indeed end up doing a Phobos v2), then maybe we could 
make it so that bidirectional ranges don't have to be forward 
ranges, but honestly _any_ ranges which aren't forward ranges 
are a bit of a problem. We do need to support them on some 
level for exactly the kind of reasons that you're looking to 
avoid save with a bidirectional range, but the semantic 
differences between what makes sense for a basic input range 
and a forward range really aren't the same (in particular, it 
works far better for basic input ranges to be reference types, 
whereas it works better for forward ranges to be value types).


It occurs to me that the distinction we're missing here might 
between "true" input ranges (i.e. which really come from IO of 
some kind), which indeed must be reference types, versus "pure" 
input ranges (which are deterministic, but which don't 
necessarily allow algorithms to rely on the ability to save and 
replay them).


As it stands, I don't think that we can change 
isBidirectionalRange, because it's likely that most code using 
it relies on its check for isForwardRange. So, I think that 
we're stuck for the moment, but it is food for thought in a 
possible range API redesign. I'll add it to my notes on the 
topic. Some aspects of a range API redesign should look like 
are pretty clear at this point, whereas others are very much an 
open question.


Oh, I wasn't asking for any changes to the existing definition 
(at least not without much thought from everyone!).  I was just 
wanting to understand the reasons for the current situation.  But 
thanks for putting it on the list of things to consider.


I may have some follow-up to your other remarks but I think at 
least now I have a way forward with my code.  Thanks!


Re: Why must a bidirectional range also be a forward range?

2019-09-19 Thread Jonathan M Davis via Digitalmars-d-learn
On Thursday, September 19, 2019 3:31:32 AM MDT Joseph Rushton Wakeling via 
Digitalmars-d-learn wrote:
> Hello folks,
>
> A question that occurred to me while implementing a new data
> structure recently, which I'm not sure I've ever seen a reason
> for.
>
> Why must bidirectional ranges also be forward ranges (as opposed
> to just input ranges)?
>
> It doesn't seem to me that the `save` property is inherently
> required to iterate backwards over a range -- just the `back` and
> `popBack` methods.
>
> It makes sense that, for bidirectionality, the range needs to be
> deterministic, so that iterating backward gives the exact same
> elements as iterating forward, just in reverse order.  But it
> seems strange to require the `save` property in order to
> automatically assume deterministic behaviour.
>
> For context, the use-case I have is a data structure which stores
> an internal buffer as an array.  A robust `save` method would
> therefore have to duplicate the array (or at least the active
> subset of its contents).  This means a fresh heap allocation per
> `save`, which has some nasty implications for phobos algorithms
> that eagerly `.save` when they can.
>
> So, I'd rather not implement `save` in this case.  But there is
> nothing that blocks implementing `back` and `popBack`; yet I
> can't use these with any of the functionality that requires
> bidirectionality, because the current `isBidirectionalRange`
> check requires `save`.
>
> So what gives?  Are there some reasons for the `save` requirement
> on bidirectional ranges that I'm missing?  And regardless, any
> advice on how to handle my particular use-case?
>
> Thanks & best wishes,
>
>-- Joe

For better or worse, ranges were more or less set up as a linear hierarchy,
and it's unlikely that use cases for bidirectional ranges which aren't
forward ranges are common. I expect that it's a bit like infinite,
bidirectional ranges. In theory, they could be a thing, but the use cases
for them are uncommon enough that we don't really support them. Also, I
expect that most range-based algorithms which operate on bidirectional
ranges would require save anyway. A lot of algorithms do to the point that
basic input ranges can be incredibly frustrating to deal with.

Assuming we were redesigning the range API (which may happen if we do indeed
end up doing a Phobos v2), then maybe we could make it so that bidirectional
ranges don't have to be forward ranges, but honestly _any_ ranges which
aren't forward ranges are a bit of a problem. We do need to support them on
some level for exactly the kind of reasons that you're looking to avoid save
with a bidirectional range, but the semantic differences between what makes
sense for a basic input range and a forward range really aren't the same (in
particular, it works far better for basic input ranges to be reference
types, whereas it works better for forward ranges to be value types).

As it stands, I don't think that we can change isBidirectionalRange, because
it's likely that most code using it relies on its check for isForwardRange.
So, I think that we're stuck for the moment, but it is food for thought in a
possible range API redesign. I'll add it to my notes on the topic. Some
aspects of a range API redesign should look like are pretty clear at this
point, whereas others are very much an open question.

Ideally, I'd like to force basic input ranges to be reference types, and
forward ranges to be value types, but I'm not sure that that's reasonable in
practice. It would really clean up some of the semantics of ranges, but it
would also likely require allocating a lot more stuff on the heap than would
be desirable. Either way, having bidirectional ranges not need to have the
equivalent of save would mean treating them as more of an add-on capability
(like length) rather than having the kind of hierarchy that we have now. I
don't know if that's ultimately a good or a bad thing.

- Jonathan M Davis





Re: Why must a bidirectional range also be a forward range?

2019-09-19 Thread Paul Backus via Digitalmars-d-learn
On Thursday, 19 September 2019 at 09:31:32 UTC, Joseph Rushton 
Wakeling wrote:
For context, the use-case I have is a data structure which 
stores an internal buffer as an array.  A robust `save` method 
would therefore have to duplicate the array (or at least the 
active subset of its contents).  This means a fresh heap 
allocation per `save`, which has some nasty implications for 
phobos algorithms that eagerly `.save` when they can.


In this case, it is probably better to separate the range from 
the data structure it refers to. For example:


struct Container {
int[] data;

private static struct Range {
int[] contents;
bool empty(){ return contents.length == 0; }
int front() { return contents[0]; }
void popFront() { contents = contents[1 .. $]; }
int back()  { return contents[$ - 1]; }
void popBack()  { contents = contents[0 .. $ - 1]; }
Range save(){ return this; }
}

Range getRange() {
return Range(data[]);
}
}


Why must a bidirectional range also be a forward range?

2019-09-19 Thread Joseph Rushton Wakeling via Digitalmars-d-learn

Hello folks,

A question that occurred to me while implementing a new data 
structure recently, which I'm not sure I've ever seen a reason 
for.


Why must bidirectional ranges also be forward ranges (as opposed 
to just input ranges)?


It doesn't seem to me that the `save` property is inherently 
required to iterate backwards over a range -- just the `back` and 
`popBack` methods.


It makes sense that, for bidirectionality, the range needs to be 
deterministic, so that iterating backward gives the exact same 
elements as iterating forward, just in reverse order.  But it 
seems strange to require the `save` property in order to 
automatically assume deterministic behaviour.


For context, the use-case I have is a data structure which stores 
an internal buffer as an array.  A robust `save` method would 
therefore have to duplicate the array (or at least the active 
subset of its contents).  This means a fresh heap allocation per 
`save`, which has some nasty implications for phobos algorithms 
that eagerly `.save` when they can.


So, I'd rather not implement `save` in this case.  But there is 
nothing that blocks implementing `back` and `popBack`; yet I 
can't use these with any of the functionality that requires 
bidirectionality, because the current `isBidirectionalRange` 
check requires `save`.


So what gives?  Are there some reasons for the `save` requirement 
on bidirectional ranges that I'm missing?  And regardless, any 
advice on how to handle my particular use-case?


Thanks & best wishes,

  -- Joe