Re: Range length property

2018-04-10 Thread Steven Schveighoffer via Digitalmars-d-learn

On 4/10/18 6:07 PM, Cym13 wrote:

On Tuesday, 10 April 2018 at 20:08:14 UTC, Jonathan M Davis wrote:
On Tuesday, April 10, 2018 19:47:10 Nordlöw via Digitalmars-d-learn 
wrote:

On Tuesday, 10 April 2018 at 14:34:40 UTC, Adam D. Ruppe wrote:
> On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordlöw wrote:
>> Should ranges always provide a length property?
>
> No.
>
>> If so, in which cases is a length property an advantage or >> a 
requirement?

>
> Just provide it whenever it is cheap to do so. If you need > to do 
complex calculations or especially loop over contents > to figure out 
the length, do NOT provide it.

>
> But if it is as simple as returning some value, provide it > and 
algorithms can take advantage of it for optimizations > etc. as needed.


I'm thinking of my own container Hashmap having its range ByKeyValue 
requiring one extra word of memory to store the iteration count 
which, in turn, can be used to calculate the length of the remaining 
range. Is this motivated?


That would depend entirely on what you're trying to do, but in 
general, if a range has length, then some algorithms will be more 
efficient, and some algorithms do require length. So, if you can 
provide length, then the range will be more useful, just like a 
bidirectional range can be more useful than a forward range or a 
random-access range can be more useful than either. However, if you're 
not doing anything that ever benefits from it having length, then it 
doesn't buy you anything. So, it ultimately depends on what you're 
doing. In a general purpose library, I'd say that it should have 
length if it can do so in O(1), but if it's just for you, then it may 
or may not be worth it.


The other thing to consider is what happens when the container is 
mutated. I don't think that ranges necessarily behave all that well 
when an underlying container is mutated, but it is something that has 
to be considered when dealing with a range over a container. Even if 
mutating the underlying container doesn't necessarily invalidate a 
range, maintaining the length in the manner that you're suggesting 
probably makes it so that it would be invalidated in more cases, since 
if any elements are added or removed in the portion that was already 
popped off the range, then the iteration count couldn't be used to 
calculate the length in the same way anymore. Now, with a hash map, 
the range is probably fully invalidated when anything gets added or 
removed anyway, since that probably screws with the order of the 
elements in the range, but how the range is going to behave when the 
underlying container is mutated and how having the length property 
does or doesn't affect that is something that you'll need to consider.


- Jonathan M Davis


I find that discussion very interesting as I had never considered that 
because of design by introspection having a costly length method would 
lead to unexpected calls by generic algorithms making it a disadventage 
if present.


On the other hand I don't think the end user should have to scratch his 
head to find the length of a range, especially if it's not trivial to 
get (say, O(log n) kind of case). Therefore exposing a method in any 
case seems the best from an API perspective.


O(lg n) is fine for .length, it doesn't need to be O(1). It just can't 
be O(n). I think we established that "fast" operations are O(lg n) or 
better.


That being said, I don't know of a use case where you can get the length 
in O(lg n). It's usually O(1) or O(n).


But to avoid the performance issues mentionned earlier it means it 
should bear a different name (get/setLength comes to mind). I believe 
this is the same kind of issue that lead to having "in" for associative 
arrays but not regular ones. However this also leads to less coherent 
APIs in contradiction with the principle of least surprise.


It's definitely a tradeoff. It pushes some implementation details to the 
user, but it also makes the runtime complexity more predictable.


In retrospect since only "unexpected" calls to such methods cause the 
issue I wonder if it wouldn't be best to have an UDA saying "Hey, 
please, this method is costly, if you're a generic template performing 
introspection you should probably not call me". And writing that 
Andrei's work on complexity annotations comes to mind. Anyway, I don't 
think the user should use different names just to alleviate an issue on 
the library side but the alternative would be costly to put in place...


Potentially, but remember at the time length and walkLength were 
conceived, UDA's didn't exist!


Using UDAs would also have the unfortunate side effect of eliminating 
self-documentation. When you see walkLength right now, you know it's 
"slow", when you see length, you know "fast". If you have to look at 
UDAs to figure that out, then reading the code is that much harder.


-Steve


Re: Range length property

2018-04-10 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, Apr 10, 2018 at 10:07:40PM +, Cym13 via Digitalmars-d-learn wrote:
[...]
> On the other hand I don't think the end user should have to scratch
> his head to find the length of a range, especially if it's not trivial
> to get (say, O(log n) kind of case). Therefore exposing a method in
> any case seems the best from an API perspective.
> 
> But to avoid the performance issues mentionned earlier it means it
> should bear a different name (get/setLength comes to mind). I believe
> this is the same kind of issue that lead to having "in" for
> associative arrays but not regular ones. However this also leads to
> less coherent APIs in contradiction with the principle of least
> surprise.
[...]

I've run into this in my own code, and I've been using `walkLength` as
the name of the method, just for consistency with Phobos' walkLength.
I'm not 100% sure this is a good idea, since overloading Phobos names
can sometimes lead to annoying symbol conflict situations.  But the one
good thing is that you won't forget what it's called because it's so
familiar.


T

-- 
An elephant: A mouse built to government specifications. -- Robert Heinlein


Re: Range length property

2018-04-10 Thread Jonathan M Davis via Digitalmars-d-learn
On Tuesday, April 10, 2018 22:07:40 Cym13 via Digitalmars-d-learn wrote:
> On Tuesday, 10 April 2018 at 20:08:14 UTC, Jonathan M Davis wrote:
> > On Tuesday, April 10, 2018 19:47:10 Nordlöw via
> >
> > Digitalmars-d-learn wrote:
> >> On Tuesday, 10 April 2018 at 14:34:40 UTC, Adam D. Ruppe wrote:
> >> > On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordlöw wrote:
> >> >> Should ranges always provide a length property?
> >> >
> >> > No.
> >> >
> >> >> If so, in which cases is a length property an advantage or
> >> >> a requirement?
> >> >
> >> > Just provide it whenever it is cheap to do so. If you need
> >> > to do complex calculations or especially loop over contents
> >> > to figure out the length, do NOT provide it.
> >> >
> >> > But if it is as simple as returning some value, provide it
> >> > and algorithms can take advantage of it for optimizations
> >> > etc. as needed.
> >>
> >> I'm thinking of my own container Hashmap having its range
> >> ByKeyValue requiring one extra word of memory to store the
> >> iteration count which, in turn, can be used to calculate the
> >> length of the remaining range. Is this motivated?
> >
> > That would depend entirely on what you're trying to do, but in
> > general, if a range has length, then some algorithms will be
> > more efficient, and some algorithms do require length. So, if
> > you can provide length, then the range will be more useful,
> > just like a bidirectional range can be more useful than a
> > forward range or a random-access range can be more useful than
> > either. However, if you're not doing anything that ever
> > benefits from it having length, then it doesn't buy you
> > anything. So, it ultimately depends on what you're doing. In a
> > general purpose library, I'd say that it should have length if
> > it can do so in O(1), but if it's just for you, then it may or
> > may not be worth it.
> >
> > The other thing to consider is what happens when the container
> > is mutated. I don't think that ranges necessarily behave all
> > that well when an underlying container is mutated, but it is
> > something that has to be considered when dealing with a range
> > over a container. Even if mutating the underlying container
> > doesn't necessarily invalidate a range, maintaining the length
> > in the manner that you're suggesting probably makes it so that
> > it would be invalidated in more cases, since if any elements
> > are added or removed in the portion that was already popped off
> > the range, then the iteration count couldn't be used to
> > calculate the length in the same way anymore. Now, with a hash
> > map, the range is probably fully invalidated when anything gets
> > added or removed anyway, since that probably screws with the
> > order of the elements in the range, but how the range is going
> > to behave when the underlying container is mutated and how
> > having the length property does or doesn't affect that is
> > something that you'll need to consider.
> >
> > - Jonathan M Davis
>
> I find that discussion very interesting as I had never considered
> that because of design by introspection having a costly length
> method would lead to unexpected calls by generic algorithms
> making it a disadventage if present.
>
> On the other hand I don't think the end user should have to
> scratch his head to find the length of a range, especially if
> it's not trivial to get (say, O(log n) kind of case). Therefore
> exposing a method in any case seems the best from an API
> perspective.
>
> But to avoid the performance issues mentionned earlier it means
> it should bear a different name (get/setLength comes to mind). I
> believe this is the same kind of issue that lead to having "in"
> for associative arrays but not regular ones. However this also
> leads to less coherent APIs in contradiction with the principle
> of least surprise.
>
> In retrospect since only "unexpected" calls to such methods cause
> the issue I wonder if it wouldn't be best to have an UDA saying
> "Hey, please, this method is costly, if you're a generic template
> performing introspection you should probably not call me". And
> writing that Andrei's work on complexity annotations comes to
> mind. Anyway, I don't think the user should use different names
> just to alleviate an issue on the library side but the
> alternative would be costly to put in place...
>
> Any thoughts?

In general, if you care about efficient code, it becomes critical that
anything that's going to be used in generic code has well-known Big-O
complexity, which is why C++ cares about that sort of thing with its
containers and why Andrei specifically put the Big-O complexity of all of
the generic container operations in std.container. And that extends to
ranges. If ranges are implemented without any care about the algorithmic
complexity of the operations, then performance is ultimately going to tank.
And in the case of ranges, it's really not all that hard to understand the
algorithmic complexity, because 

Re: Range length property

2018-04-10 Thread Cym13 via Digitalmars-d-learn

On Tuesday, 10 April 2018 at 20:08:14 UTC, Jonathan M Davis wrote:
On Tuesday, April 10, 2018 19:47:10 Nordlöw via 
Digitalmars-d-learn wrote:

On Tuesday, 10 April 2018 at 14:34:40 UTC, Adam D. Ruppe wrote:
> On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordlöw wrote:
>> Should ranges always provide a length property?
>
> No.
>
>> If so, in which cases is a length property an advantage or 
>> a requirement?

>
> Just provide it whenever it is cheap to do so. If you need 
> to do complex calculations or especially loop over contents 
> to figure out the length, do NOT provide it.

>
> But if it is as simple as returning some value, provide it 
> and algorithms can take advantage of it for optimizations 
> etc. as needed.


I'm thinking of my own container Hashmap having its range 
ByKeyValue requiring one extra word of memory to store the 
iteration count which, in turn, can be used to calculate the 
length of the remaining range. Is this motivated?


That would depend entirely on what you're trying to do, but in 
general, if a range has length, then some algorithms will be 
more efficient, and some algorithms do require length. So, if 
you can provide length, then the range will be more useful, 
just like a bidirectional range can be more useful than a 
forward range or a random-access range can be more useful than 
either. However, if you're not doing anything that ever 
benefits from it having length, then it doesn't buy you 
anything. So, it ultimately depends on what you're doing. In a 
general purpose library, I'd say that it should have length if 
it can do so in O(1), but if it's just for you, then it may or 
may not be worth it.


The other thing to consider is what happens when the container 
is mutated. I don't think that ranges necessarily behave all 
that well when an underlying container is mutated, but it is 
something that has to be considered when dealing with a range 
over a container. Even if mutating the underlying container 
doesn't necessarily invalidate a range, maintaining the length 
in the manner that you're suggesting probably makes it so that 
it would be invalidated in more cases, since if any elements 
are added or removed in the portion that was already popped off 
the range, then the iteration count couldn't be used to 
calculate the length in the same way anymore. Now, with a hash 
map, the range is probably fully invalidated when anything gets 
added or removed anyway, since that probably screws with the 
order of the elements in the range, but how the range is going 
to behave when the underlying container is mutated and how 
having the length property does or doesn't affect that is 
something that you'll need to consider.


- Jonathan M Davis


I find that discussion very interesting as I had never considered 
that because of design by introspection having a costly length 
method would lead to unexpected calls by generic algorithms 
making it a disadventage if present.


On the other hand I don't think the end user should have to 
scratch his head to find the length of a range, especially if 
it's not trivial to get (say, O(log n) kind of case). Therefore 
exposing a method in any case seems the best from an API 
perspective.


But to avoid the performance issues mentionned earlier it means 
it should bear a different name (get/setLength comes to mind). I 
believe this is the same kind of issue that lead to having "in" 
for associative arrays but not regular ones. However this also 
leads to less coherent APIs in contradiction with the principle 
of least surprise.


In retrospect since only "unexpected" calls to such methods cause 
the issue I wonder if it wouldn't be best to have an UDA saying 
"Hey, please, this method is costly, if you're a generic template 
performing introspection you should probably not call me". And 
writing that Andrei's work on complexity annotations comes to 
mind. Anyway, I don't think the user should use different names 
just to alleviate an issue on the library side but the 
alternative would be costly to put in place...


Any thoughts?


Re: Range length property

2018-04-10 Thread Per Nordlöw via Digitalmars-d-learn
On Tuesday, 10 April 2018 at 20:16:18 UTC, Steven Schveighoffer 
wrote:
e.g. std.array.array is going to pre-allocate an array of the 
correct length and fill it in, vs. appending each element as it 
gets them from the range.


Personally, I would store the length because typically a 
container range is short-lived. It also jives with the 
container itself which likely has O(1) length.


Thanks, that's what I thought too.


Re: Range length property

2018-04-10 Thread Steven Schveighoffer via Digitalmars-d-learn

On 4/10/18 4:08 PM, Jonathan M Davis wrote:

On Tuesday, April 10, 2018 19:47:10 Nordlöw via Digitalmars-d-learn wrote:

I'm thinking of my own container Hashmap having its range
ByKeyValue requiring one extra word of memory to store the
iteration count which, in turn, can be used to calculate the
length of the remaining range. Is this motivated?


That would depend entirely on what you're trying to do, but in general, if a
range has length, then some algorithms will be more efficient, and some
algorithms do require length.


e.g. std.array.array is going to pre-allocate an array of the correct 
length and fill it in, vs. appending each element as it gets them from 
the range.


Personally, I would store the length because typically a container range 
is short-lived. It also jives with the container itself which likely has 
O(1) length.


-Steve


Re: Range length property

2018-04-10 Thread Jonathan M Davis via Digitalmars-d-learn
On Tuesday, April 10, 2018 19:47:10 Nordlöw via Digitalmars-d-learn wrote:
> On Tuesday, 10 April 2018 at 14:34:40 UTC, Adam D. Ruppe wrote:
> > On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordlöw wrote:
> >> Should ranges always provide a length property?
> >
> > No.
> >
> >> If so, in which cases is a length property an advantage or a
> >> requirement?
> >
> > Just provide it whenever it is cheap to do so. If you need to
> > do complex calculations or especially loop over contents to
> > figure out the length, do NOT provide it.
> >
> > But if it is as simple as returning some value, provide it and
> > algorithms can take advantage of it for optimizations etc. as
> > needed.
>
> I'm thinking of my own container Hashmap having its range
> ByKeyValue requiring one extra word of memory to store the
> iteration count which, in turn, can be used to calculate the
> length of the remaining range. Is this motivated?

That would depend entirely on what you're trying to do, but in general, if a
range has length, then some algorithms will be more efficient, and some
algorithms do require length. So, if you can provide length, then the range
will be more useful, just like a bidirectional range can be more useful than
a forward range or a random-access range can be more useful than either.
However, if you're not doing anything that ever benefits from it having
length, then it doesn't buy you anything. So, it ultimately depends on what
you're doing. In a general purpose library, I'd say that it should have
length if it can do so in O(1), but if it's just for you, then it may or may
not be worth it.

The other thing to consider is what happens when the container is mutated. I
don't think that ranges necessarily behave all that well when an underlying
container is mutated, but it is something that has to be considered when
dealing with a range over a container. Even if mutating the underlying
container doesn't necessarily invalidate a range, maintaining the length in
the manner that you're suggesting probably makes it so that it would be
invalidated in more cases, since if any elements are added or removed in the
portion that was already popped off the range, then the iteration count
couldn't be used to calculate the length in the same way anymore. Now, with
a hash map, the range is probably fully invalidated when anything gets added
or removed anyway, since that probably screws with the order of the elements
in the range, but how the range is going to behave when the underlying
container is mutated and how having the length property does or doesn't
affect that is something that you'll need to consider.

- Jonathan M Davis




Re: Range length property

2018-04-10 Thread Nordlöw via Digitalmars-d-learn

On Tuesday, 10 April 2018 at 14:34:40 UTC, Adam D. Ruppe wrote:

On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordlöw wrote:

Should ranges always provide a length property?


No.

If so, in which cases is a length property an advantage or a 
requirement?


Just provide it whenever it is cheap to do so. If you need to 
do complex calculations or especially loop over contents to 
figure out the length, do NOT provide it.


But if it is as simple as returning some value, provide it and 
algorithms can take advantage of it for optimizations etc. as 
needed.


I'm thinking of my own container Hashmap having its range 
ByKeyValue requiring one extra word of memory to store the 
iteration count which, in turn, can be used to calculate the 
length of the remaining range. Is this motivated?


Re: Range length property

2018-04-10 Thread Jonathan M Davis via Digitalmars-d-learn
On Tuesday, April 10, 2018 14:25:52 Nordlöw via Digitalmars-d-learn wrote:
> Should ranges always provide a length property?
>
> If so, in which cases is a length property an advantage or a
> requirement?

Whether a range has a length property or not is primarily dependent on how
efficient it is to implement length.
https://dlang.org/phobos/std_container.html lists the Big-O complexity of
the various functions and properties that are expected for containers, and
in the case of length, that also applies to ranges. It lists O(log n) as the
Big-O complexity of length, so length should only be implemented if its
Big-O complexity is no worse than O(log n). In most cases, that means
implementing it only if it's O(1) (I think that it's O(log n) rather than
O(1) because of binary trees), and length should certainly never be
implemented if it's O(n). Basically, if you can just return the length
without calculating it, then it makes sense to implement it, but if you have
to calculate it, then in most cases, it shouldn't be there. Range-based
functions are going to expect length to be very cheap to call if it is
present, and any function that needs to ascertain the length of a range even
if it doesn't have length will call walkLength (which returns length if
present and iterates the entire range to count it if it's not). As iterating
the entire range to count it, is O(n), most algorithms won't do that are
more likely to require length if they need to know how many elements there
are in the range, but that depends on what they're doing.

A function that requires a length property or which has a path optimized for
ranges that have a length property checks for that with
std.range.primitives.hasLength. Random access ranges are required to either
be infinite or provide length, but for other range types, it's optional and
must be checked for if an algorithm is going to use it.

And actually, a large percentage of ranges are lazy, in which case, it's
pretty rare that they can have a length property, because you usually have
no way of knowing how long they're going to be without actually iterating
through the range. So, while it's not uncommon for a range to define length,
it's very common that they don't.

- Jonathan M Davis




Re: Range length property

2018-04-10 Thread Adam D. Ruppe via Digitalmars-d-learn

On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordlöw wrote:

Should ranges always provide a length property?


No.

If so, in which cases is a length property an advantage or a 
requirement?


Just provide it whenever it is cheap to do so. If you need to do 
complex calculations or especially loop over contents to figure 
out the length, do NOT provide it.


But if it is as simple as returning some value, provide it and 
algorithms can take advantage of it for optimizations etc. as 
needed.