subject:"Transient ranges"

Re: Transient ranges

2016-07-06 Thread Andrei Alexandrescu via Digitalmars-d

Related: just made a few comments to 
https://github.com/dlang/phobos/pull/4511. More work on range 
formalization is welcome. Thanks! -- Andrei

Re: Transient ranges

2016-06-14 Thread Dicebot via Digitalmars-d

On Tuesday, 14 June 2016 at 00:19:54 UTC, Andrei Alexandrescu 
wrote:

On 06/12/2016 04:46 AM, Dicebot wrote:

That is matter of design philosophy. For me such basic library
primitives warrant C++ attitude of "don't pay for what you 
don't ask
for" - and everything else, including actual feature 
completeness, is of

negligible importance compared to that.


C++'s input iterator model also predicates multiple accesses to 
the current value by means of *it.


Yeah, I don't say C++ iterator model is any better for that kind 
of design - sorry if my wording implied that. I wonder how they 
would address it if pipeline approach would ever become popular 
in C++.


See my response to Steven - probably my main issue is wanting 
some very different concept, dedicated stream/pipeline 
abstraction that would still be usable with all the same 
std.algorithm


It seems to me, sometimes if I went by what this forum agrees 
upon I'd write one thousand lines of code one day and remove it 
the next.


Please don't take my rants in the NG as a serious call to action. 
I am simply sharing my concerns in a relevant topic - if there 
was anything pragmatical I could propose, I'd take it to the mail 
list instead or even right a DIP. To be honest, I'd even prefer 
if you ignored _any_ request/proposal which is not submitted as 
such. NG can't be taken as a base for reasoning.


Phobos indeed doesn't seem to make such priorities right now 
and that is

one of reasons I am growing increasingly unhappy with it.


What steps do you think we could take with Phobos to make it 
better without breaking backward compatibility?


Introducing `std2` package is technically not breaking backwards 
compatibility but I presume this is not what you have meant ;) I 
don't know - as I have already said, otherwise I'd actually taken 
some more constructive actions instead of throwing complaints.


If I'd have a say in designing brand new Phobos, it would be very 
different - async by default, with a strict separation of 
lightweight @nogc algorithm and API core (fitting even for 
embedded env) and all the feature rich additional modules built 
on top. This is not kind of stuff one simply "fixes" in existing 
mature library.


At the same time, I am not even sure if it is even important to 
fix. It is good enough for any kind of general purpose 
development, good for productivity. And bigger projects with hard 
performance concerns and maintenance costs tend to grow their own 
"standard" libraries anyway.

Re: Transient ranges

2016-06-14 Thread Dicebot via Digitalmars-d

On Monday, 13 June 2016 at 13:59:06 UTC, Steven Schveighoffer 
wrote:
So what it seems like you want is somerange.map(...).front to 
evaluate the lambda only once, without caching. This doesn't 
fit into the definition of ranges, which require the ability to 
access the same front more than once.


What you really want is a new kind of construct. It's not a 
range.


I think you are completely right here, though it is not very 
comforting :) Indeed, looking now through all uses of ranges in 
my code, it has always been same specific subset of their 
functionality - input range based pipeline. Chunk input, build up 
filters/transformations, write output. Never random access or 
even `.save`, and each original input chunk is supposed to be 
processed only once.


Every of my complaints is simply a natural outcome of this usage 
pattern - I'd really prefer to have something more stupid but 
also more fool-proof.

Re: Transient ranges

2016-06-13 Thread Andrei Alexandrescu via Digitalmars-d


On 06/12/2016 04:46 AM, Dicebot wrote:

That is matter of design philosophy. For me such basic library
primitives warrant C++ attitude of "don't pay for what you don't ask
for" - and everything else, including actual feature completeness, is of
negligible importance compared to that.


C++'s input iterator model also predicates multiple accesses to the 
current value by means of *it.



If keeping random access for map requires either caching or multiple
evaluations of predicate, I want it to be banned by default and
enabled explicitly (not sure what could be good API for that though).


In the initial implementation, map did cache the current element. That 
made some folks unhappy, so it was removed. In lieu, a function "cache" 
was introduced. Apparently this setup makes other folks unhappy.


It seems to me, sometimes if I went by what this forum agrees upon I'd 
write one thousand lines of code one day and remove it the next.



Phobos indeed doesn't seem to make such priorities right now and that is
one of reasons I am growing increasingly unhappy with it.


What steps do you think we could take with Phobos to make it better 
without breaking backward compatibility?



Andrei

Re: Transient ranges

2016-06-13 Thread John Colvin via Digitalmars-d

On Monday, 13 June 2016 at 13:59:06 UTC, Steven Schveighoffer 
wrote:
No, it's not. My advice is to understand the limitations and 
expectations of the range wrappers you are using (i.e. read the 
docs [1]). If you need caching for your purposes, then do that 
by plopping cache at the end of your use case. The user must 
have to understand that providing a mapping function that does 
random things (or even varying things on each call) is going to 
result in unexpected behavior. The compiler cannot foresee the 
purpose of your lambda code.


You just said only pay for what you ask for. I don't want to 
pay for caching for map if I don't need it. I'd be pissed if 
map cached for something like map!(x => x * 2). If you want 
caching, then ask for it.


There was a talk by Herb Sutter on atomic operations on various 
platforms. His general rule was, as long as you don't write 
race conditions, and use only provided atomics, your code will 
do what you think. We can apply a similar rule here: as long as 
you write a lambda which for a given input will provide the 
same result, map will do what you expect. When you try odd 
things like your random call, we don't guarantee things will 
work like you expect. You may utilize the knowledge of how map 
works, and how the things that are using the map work to write 
something clever that breaks the rules. But no guarantees.


-Steve

[1] I admit, the map docs aren't explicit about this, and 
should be.


I see three levels of function used with map:

pure: everything is as expected, consider caching if expensive.

idempotent side-effects: remember map is lazy and you'll be ok.

non-idempotent side-effects: probably not what you want.

Re: Transient ranges

2016-06-13 Thread Steven Schveighoffer via Digitalmars-d


On 6/12/16 4:46 AM, Dicebot wrote:

On Friday, 3 June 2016 at 12:03:26 UTC, Steven Schveighoffer wrote:



Yes, I can see good reason why you would want this. Hm... a set of
adapters which reduce a range to its lesser API would be useful here:

array.asInput.map

But an expectation for map should be that you want it to be exactly
the same as the original, but with a transformation applied to each
fetch of an element. I think it needs to provide random access if
random access is provided to it.


That is matter of design philosophy. For me such basic library
primitives warrant C++ attitude of "don't pay for what you don't ask
for" - and everything else, including actual feature completeness, is of
negligible importance compared to that. If keeping random access for map
requires either caching or multiple evaluations of predicate, I want it
to be banned by default and enabled explicitly (not sure what could be
good API for that though).


So what it seems like you want is somerange.map(...).front to evaluate 
the lambda only once, without caching. This doesn't fit into the 
definition of ranges, which require the ability to access the same front 
more than once.


What you really want is a new kind of construct. It's not a range.

Given that front must be accessible more than once, map has to make a 
choice, caching or reevaluating. Not doing one of those two things is 
not an option for ranges.


Note that this has nothing to do with random access.


Phobos indeed doesn't seem to make such priorities right now and that is
one of reasons I am growing increasingly unhappy with it.


It's not so much that Phobos attempts to give you kitchen sink support 
at all costs, it's that if the sink being passed in has all the faucets, 
it'll forward them for free if possible.



Then it's not a bug? It's going to work just fine how you specified
it. I just don't consider it a valid "range" for general purposes.

You can do this if you want caching:

only(0).map!(x => uniform(0, 10)).cache


Good advice. Don't want bugs with non-stable results and accidental
double I/O in your long idiomatic range pipeline? Just put "cache" calls
everywhere just to be safe, defensive programming for the win!


Strawman, not what I said.


But that is what your answer meant in context of my original question :)
My complaint was about existing semantics are being error-prone and hard
to spot - you proposed it by adding more _manual_ fixup, which kills the
whole point.


No, it's not. My advice is to understand the limitations and 
expectations of the range wrappers you are using (i.e. read the docs 
[1]). If you need caching for your purposes, then do that by plopping 
cache at the end of your use case. The user must have to understand that 
providing a mapping function that does random things (or even varying 
things on each call) is going to result in unexpected behavior. The 
compiler cannot foresee the purpose of your lambda code.


You just said only pay for what you ask for. I don't want to pay for 
caching for map if I don't need it. I'd be pissed if map cached for 
something like map!(x => x * 2). If you want caching, then ask for it.


There was a talk by Herb Sutter on atomic operations on various 
platforms. His general rule was, as long as you don't write race 
conditions, and use only provided atomics, your code will do what you 
think. We can apply a similar rule here: as long as you write a lambda 
which for a given input will provide the same result, map will do what 
you expect. When you try odd things like your random call, we don't 
guarantee things will work like you expect. You may utilize the 
knowledge of how map works, and how the things that are using the map 
work to write something clever that breaks the rules. But no guarantees.


-Steve

[1] I admit, the map docs aren't explicit about this, and should be.

Re: Transient ranges

2016-06-12 Thread Dicebot via Digitalmars-d

On Friday, 3 June 2016 at 12:03:26 UTC, Steven Schveighoffer 
wrote:
It only strengthens my opinion that Phobos is not a standard 
library I
want. Really, many of those issue would have been solved if 
basic input
range was defined as `empty` + `ElementType popFront()` 
instead.


This doesn't solve the problem. empty may not be knowable until 
you try to fetch the next element (for instance, i/o). A better 
interface would be bool getNext(ref ElementType), or 
Nullable!ElementType getNext(), or tuple!(bool, "eof", 
ElementType, "value") getNext().


Not necessarily, one can simply define that range processing 
requires checking `empty` both before and after getting next 
element (and that returned value is undefined if empty == true).


Though I do agree that returning `Optional!ElementType` would be 
even better.


Yes, I can see good reason why you would want this. Hm... a set 
of adapters which reduce a range to its lesser API would be 
useful here:


array.asInput.map

But an expectation for map should be that you want it to be 
exactly the same as the original, but with a transformation 
applied to each fetch of an element. I think it needs to 
provide random access if random access is provided to it.


That is matter of design philosophy. For me such basic library 
primitives warrant C++ attitude of "don't pay for what you don't 
ask for" - and everything else, including actual feature 
completeness, is of negligible importance compared to that. If 
keeping random access for map requires either caching or multiple 
evaluations of predicate, I want it to be banned by default and 
enabled explicitly (not sure what could be good API for that 
though).


Phobos indeed doesn't seem to make such priorities right now and 
that is one of reasons I am growing increasingly unhappy with it.


Then it's not a bug? It's going to work just fine how you 
specified
it. I just don't consider it a valid "range" for general 
purposes.


You can do this if you want caching:

only(0).map!(x => uniform(0, 10)).cache


Good advice. Don't want bugs with non-stable results and 
accidental
double I/O in your long idiomatic range pipeline? Just put 
"cache" calls

everywhere just to be safe, defensive programming for the win!


Strawman, not what I said.


But that is what your answer meant in context of my original 
question :) My complaint was about existing semantics are being 
error-prone and hard to spot - you proposed it by adding more 
_manual_ fixup, which kills the whole point.

Re: Transient ranges

2016-06-03 Thread Steven Schveighoffer via Digitalmars-d


On 6/3/16 4:37 AM, Dicebot wrote:

On Wednesday, 1 June 2016 at 01:31:53 UTC, Steven Schveighoffer wrote:

If you want to use such "ranges", the compiler will not stop you. Just
don't expect any help from Phobos.


It only strengthens my opinion that Phobos is not a standard library I
want. Really, many of those issue would have been solved if basic input
range was defined as `empty` + `ElementType popFront()` instead.


This doesn't solve the problem. empty may not be knowable until you try 
to fetch the next element (for instance, i/o). A better interface would 
be bool getNext(ref ElementType), or Nullable!ElementType getNext(), or 
tuple!(bool, "eof", ElementType, "value") getNext().


I've said many times, i/o is not conducive to ranges. You have to 
shoehorn it, which is fine if you need compatibility, but it shouldn't 
be a mechanism of low-level implementation. And you should expect some 
cruft here because of that.



Some
more - if algorithms didn't try to preserve original range kind unless
they can do it with no overhead (i.e. arrray.map should not be a random
access range out of the box).


Yes, I can see good reason why you would want this. Hm... a set of 
adapters which reduce a range to its lesser API would be useful here:


array.asInput.map

But an expectation for map should be that you want it to be exactly the 
same as the original, but with a transformation applied to each fetch of 
an element. I think it needs to provide random access if random access 
is provided to it.



Then it's not a bug? It's going to work just fine how you specified
it. I just don't consider it a valid "range" for general purposes.

You can do this if you want caching:

only(0).map!(x => uniform(0, 10)).cache


Good advice. Don't want bugs with non-stable results and accidental
double I/O in your long idiomatic range pipeline? Just put "cache" calls
everywhere just to be safe, defensive programming for the win!


Strawman, not what I said.


Existing situation is bug prone exactly because you have no idea if you
need to cache or not unless you try out specific combination of
algorithms / predicates / input and carefully check what it does.


Again, yes, it's still possible to write code with bugs. The compiler or 
range library can only hold your hand so much.


I don't know of a library that's impossible to use incorrectly. 
Especially ones that let you execute arbitrary code internally.


-Steve

Re: Transient ranges

2016-06-03 Thread Dicebot via Digitalmars-d

On 06/03/2016 01:49 PM, Andrei Alexandrescu wrote:
> On 6/3/16 4:37 AM, Dicebot wrote:
>> Really, many of those issue would have been solved if basic input range
>> was defined as `empty` + `ElementType popFront()` instead.
> 
> FYI that was on the table, along with many other possible designs.
> (popFront() should be allowed to return by reference, too.) The problem
> with this one is that it is inefficient with arrays when the same
> element is accessed multiple times - the user is forced to create a copy
> or to use other contortions. Arrays are a prime use case for the range
> interface. -- Andrei

Has it actually been measured as inefficient? Ranges are very
inline/optimization friendly because of the templated nature of
algorithms - I would expect at least GDC/LDC to take advantage of that
for simple case and for more complicated ones mutiple evaluations of
`front` predicates will dominate performance gain from direct array access.

I guess I need to benchmark it myself and see :)

Re: Transient ranges

2016-06-03 Thread Andrei Alexandrescu via Digitalmars-d


On 6/3/16 4:37 AM, Dicebot wrote:

Really, many of those issue would have been solved if basic input range
was defined as `empty` + `ElementType popFront()` instead.


FYI that was on the table, along with many other possible designs. 
(popFront() should be allowed to return by reference, too.) The problem 
with this one is that it is inefficient with arrays when the same 
element is accessed multiple times - the user is forced to create a copy 
or to use other contortions. Arrays are a prime use case for the range 
interface. -- Andrei

Re: Transient ranges

2016-06-03 Thread Dicebot via Digitalmars-d

On Wednesday, 1 June 2016 at 01:31:53 UTC, Steven Schveighoffer 
wrote:
If you want to use such "ranges", the compiler will not stop 
you. Just don't expect any help from Phobos.


It only strengthens my opinion that Phobos is not a standard 
library I want. Really, many of those issue would have been 
solved if basic input range was defined as `empty` + `ElementType 
popFront()` instead. Some more - if algorithms didn't try to 
preserve original range kind unless they can do it with no 
overhead (i.e. arrray.map should not be a random access range out 
of the box).



This is a totally valid code I want to actually work and not be
discarded as "bug".


Then it's not a bug? It's going to work just fine how you 
specified it. I just don't consider it a valid "range" for 
general purposes.


You can do this if you want caching:

only(0).map!(x => uniform(0, 10)).cache


Good advice. Don't want bugs with non-stable results and 
accidental double I/O in your long idiomatic range pipeline? Just 
put "cache" calls everywhere just to be safe, defensive 
programming for the win!


Existing situation is bug prone exactly because you have no idea 
if you need to cache or not unless you try out specific 
combination of algorithms / predicates / input and carefully 
check what it does.

Re: Transient ranges

2016-06-02 Thread Steven Schveighoffer via Digitalmars-d


On 6/1/16 8:37 PM, Steven Schveighoffer wrote:

On 6/1/16 8:49 AM, Joseph Rushton Wakeling wrote:

The `Generator` range is an eager violator of this requirement:
https://github.com/dlang/phobos/blob/ca292ff78cd825f642eb58d586e2723ba14ae448/std/range/package.d#L3075-L3080



 although I'd agree that's an implementation error.


Yeah, that seems like a bug.


I guess this controversy was hashed out when this function was added. So 
this is very much intentional:


https://github.com/dlang/phobos/pull/2606
https://github.com/dlang/phobos/pull/3002

I'll start a new thread to discuss it. I think it needs to be changed.

-Steve

Re: Transient ranges

2016-06-01 Thread Steven Schveighoffer via Digitalmars-d


On 6/1/16 10:05 AM, Patrick Schluter wrote:

On Tuesday, 31 May 2016 at 12:42:23 UTC, Steven Schveighoffer wrote:


There are 2 main issues with FILE *:

1) it does not provide buffer access, so you must rely on things like
getline if they exist. But these have their own problems (i.e. do not
support unicode, require C-malloc'd buffer)


You can cheat by using setvbuf() and imposing your own buffer to the
FILE* routine. What and how the underlying implementation put in that
buffer is of course not documented but not very difficult to guess (for
example fseek()/fread() will always fill the buffer from 0 to the end
(of buffer or file depending what come first).


But there is no mechanism to determine where the current file pointer is 
inside the buffer. One could use ftell vs. tell on the file descriptor, 
but that is going to perform quite poorly.


-Steve

Re: Transient ranges

2016-06-01 Thread Steven Schveighoffer via Digitalmars-d


On 6/1/16 8:49 AM, Joseph Rushton Wakeling wrote:

On Tuesday, 31 May 2016 at 18:31:05 UTC, Steven Schveighoffer wrote:

On 5/31/16 11:45 AM, Jonathan M Davis via Digitalmars-d wrote:

On Monday, May 30, 2016 09:57:29 H. S. Teoh via Digitalmars-d wrote:

I'd argue that range-based generic code that assumes non-transience is
inherently buggy, because generic code ought not to make any
assumptions beyond what the range API guarantees. Currently, the range
API does not guarantee non-transience, therefore code that assumes
so is
broken by definition.  Just because they happen to work most of the
time
does not change the fact that they're written wrongly.


Technically, the range API doesn't even require that front return the
same
value every time that it's called, because isInputRange can't
possibly test
for it.


The API doesn't require it mechanically, but the API does require it
semantically (what does popFront mean if front changes
automatically?). If front returns different things, I'd say that's a
bug in your range construction.


The `Generator` range is an eager violator of this requirement:
https://github.com/dlang/phobos/blob/ca292ff78cd825f642eb58d586e2723ba14ae448/std/range/package.d#L3075-L3080


 although I'd agree that's an implementation error.


Yeah, that seems like a bug.

-Steve

Re: Transient ranges

2016-06-01 Thread Patrick Schluter via Digitalmars-d

On Tuesday, 31 May 2016 at 12:42:23 UTC, Steven Schveighoffer 
wrote:



There are 2 main issues with FILE *:

1) it does not provide buffer access, so you must rely on 
things like getline if they exist. But these have their own 
problems (i.e. do not support unicode, require C-malloc'd 
buffer)


You can cheat by using setvbuf() and imposing your own buffer to 
the FILE* routine. What and how the underlying implementation put 
in that buffer is of course not documented but not very difficult 
to guess (for example fseek()/fread() will always fill the buffer 
from 0 to the end (of buffer or file depending what come first).

Re: Transient ranges

2016-06-01 Thread Joseph Rushton Wakeling via Digitalmars-d

On Tuesday, 31 May 2016 at 18:31:05 UTC, Steven Schveighoffer 
wrote:

On 5/31/16 11:45 AM, Jonathan M Davis via Digitalmars-d wrote:
On Monday, May 30, 2016 09:57:29 H. S. Teoh via Digitalmars-d 
wrote:
I'd argue that range-based generic code that assumes 
non-transience is

inherently buggy, because generic code ought not to make any
assumptions beyond what the range API guarantees. Currently, 
the range
API does not guarantee non-transience, therefore code that 
assumes so is
broken by definition.  Just because they happen to work most 
of the time

does not change the fact that they're written wrongly.


Technically, the range API doesn't even require that front 
return the same
value every time that it's called, because isInputRange can't 
possibly test

for it.


The API doesn't require it mechanically, but the API does 
require it semantically (what does popFront mean if front 
changes automatically?). If front returns different things, I'd 
say that's a bug in your range construction.


The `Generator` range is an eager violator of this requirement:
https://github.com/dlang/phobos/blob/ca292ff78cd825f642eb58d586e2723ba14ae448/std/range/package.d#L3075-L3080

... although I'd agree that's an implementation error.

Re: Transient ranges

2016-05-31 Thread Steven Schveighoffer via Digitalmars-d


On 5/31/16 4:59 PM, Dicebot wrote:

On Tuesday, 31 May 2016 at 18:11:34 UTC, Steven Schveighoffer wrote:

1) Current definition of input range (most importantly, the fact `front`
has to be @property-like) implies `front` to always return the same
result until `popFront` is called.


Regardless of property-like or not, this should be the case.
Otherwise, popFront makes no sense.


Except it isn't in many cases you call "bugs" :(


If you want to use such "ranges", the compiler will not stop you. Just 
don't expect any help from Phobos.



2) For ranges that call predicates on elements to evaluate next element
this can only be achieved by caching - predicates are never required to
be pure.


Or, by not returning different things from your predicate.


It is perfectly legal for predicate to be non-pure and that would be
hugely annoying if anyone decided to prohibit it. Also even pure
predicates may be simply very expensive to evaluate which can make
`front` a silent pessimization.


There's no requirement or need to prevent it. Just a) don't do it, or b) 
deal with the consequences.





This is like saying RedBlackTree is broken when I give it a predicate
of "a == b".


RBL at least makes certain demands about valid predicate can be. This is
not case for ranges in general.


RedBlackTree with "a == b" will compile and operate. It just won't do 
much red-black-tree-like things.



3) But caching is sub-optimal performance wise and thus bunch of Phobos
algorithms violate `front` consistency / cheapness expectation
evaluating predicates each time it is called (liked map).


I don't think anything defensively caches front in case the next call
to front is different, unless that's specifically the reason for the
range.


And that makes input ranges violate implication #1 (front stability)
casually to the point it can't be relied at all and one has to always
make sure it is only evaluated once (make stack local copy or something
like that).


That's a little much. If you expect such things, you can construct them. 
There's no way for the functions to ascertain what your lambda is going 
to do (halting problem) and determine to cache or not based on that.



I think we should be aware that the range API doesn't prevent bugs of
all kinds. There's only so much analysis the compiler can do.


This is a totally valid code I want to actually work and not be
discarded as "bug".


Then it's not a bug? It's going to work just fine how you specified it. 
I just don't consider it a valid "range" for general purposes.


You can do this if you want caching:

only(0).map!(x => uniform(0, 10)).cache

-Steve

Re: Transient ranges

2016-05-31 Thread Dicebot via Digitalmars-d


On Tuesday, 31 May 2016 at 21:25:12 UTC, Timon Gehr wrote:

On 31.05.2016 22:59, Dicebot wrote:




I think we should be aware that the range API doesn't prevent 
bugs of

all kinds. There's only so much analysis the compiler can do.


This is a totally valid code I want to actually work and not be
discarded as "bug".


map often allows random access. Do you suggest it should cache 
opIndex too?


Random access map must store all already evaluated items in 
memory in mu opinion.

Re: Transient ranges

2016-05-31 Thread Timon Gehr via Digitalmars-d


On 31.05.2016 22:59, Dicebot wrote:





I think we should be aware that the range API doesn't prevent bugs of
all kinds. There's only so much analysis the compiler can do.


This is a totally valid code I want to actually work and not be
discarded as "bug".


map often allows random access. Do you suggest it should cache opIndex too?

Re: Transient ranges

2016-05-31 Thread Dicebot via Digitalmars-d

On Tuesday, 31 May 2016 at 18:11:34 UTC, Steven Schveighoffer 
wrote:
1) Current definition of input range (most importantly, the 
fact `front`
has to be @property-like) implies `front` to always return the 
same

result until `popFront` is called.


Regardless of property-like or not, this should be the case. 
Otherwise, popFront makes no sense.


Except it isn't in many cases you call "bugs" :(

2) For ranges that call predicates on elements to evaluate 
next element
this can only be achieved by caching - predicates are never 
required to

be pure.


Or, by not returning different things from your predicate.


It is perfectly legal for predicate to be non-pure and that would 
be hugely annoying if anyone decided to prohibit it. Also even 
pure predicates may be simply very expensive to evaluate which 
can make `front` a silent pessimization.


This is like saying RedBlackTree is broken when I give it a 
predicate of "a == b".


RBL at least makes certain demands about valid predicate can be. 
This is not case for ranges in general.


3) But caching is sub-optimal performance wise and thus bunch 
of Phobos

algorithms violate `front` consistency / cheapness expectation
evaluating predicates each time it is called (liked map).


I don't think anything defensively caches front in case the 
next call to front is different, unless that's specifically the 
reason for the range.


And that makes input ranges violate implication #1 (front 
stability) casually to the point it can't be relied at all and 
one has to always make sure it is only evaluated once (make stack 
local copy or something like that).



I think we should be aware that the range API doesn't prevent 
bugs of all kinds. There's only so much analysis the compiler 
can do.


This is a totally valid code I want to actually work and not be 
discarded as "bug".

Re: Transient ranges

2016-05-31 Thread Steven Schveighoffer via Digitalmars-d


On 5/31/16 11:45 AM, Jonathan M Davis via Digitalmars-d wrote:

On Monday, May 30, 2016 09:57:29 H. S. Teoh via Digitalmars-d wrote:

I'd argue that range-based generic code that assumes non-transience is
inherently buggy, because generic code ought not to make any
assumptions beyond what the range API guarantees. Currently, the range
API does not guarantee non-transience, therefore code that assumes so is
broken by definition.  Just because they happen to work most of the time
does not change the fact that they're written wrongly.


Technically, the range API doesn't even require that front return the same
value every time that it's called, because isInputRange can't possibly test
for it.


The API doesn't require it mechanically, but the API does require it 
semantically (what does popFront mean if front changes automatically?). 
If front returns different things, I'd say that's a bug in your range 
construction.



That doesn't mean that we don't expect that ranges behave that way.
It's never been the case that we agreed that transient ranges were
acceptable.


I want to note here that transient ranges are different from what you 
identified above (front returning different values each time). I know 
that you know that, but the way you said this implies that's what 
transient ranges are.



In fact, it's pretty much the opposite. There was some
discussion that maybe we'd consider it legitimate for basic input ranges
given that we already had byLine doing it, and we didn't want to require
that it copy its buffer, but previous discussions on it made it quite clear
that most of us (Andrei included) thought that it should not be allowed for
forward ranges.


I can see why forward ranges make it easier to avoid transience, since 
by definition the data should be available even if another range has 
already popFront'd by. But I don't think this needs to be a requirement. 
What I'd say is that as long as you haven't called popFront on the 
range, front should be stable. What happens elsewhere is implementation 
defined.



It's been the case for ages now that ranges with a transient front are not
explicitly supported. They happen to work with some algorithms (and you made
changes to Phobos to make them work with more algorithms there), but they're
not checked for, and there are algorithms that they don't and can't work
with. The simple fact that std.array.array doesn't work with them is pretty
damning IMHO.


array can work with them, you just have to copy each element as you 
iterate it (or otherwise transform the data into something that's 
persistent when copied).


But there is no requirement I know of that says a range has to always 
provide the expected outcome with std.array.array.



We already have too many problems with unspecified behavior in ranges
without adding transient front into the mix. It should be killed with fire
IMHO.


It can't be killed, because it's ALREADY supported. Way too much code 
would break, and I think you would piss off a lot of D developers 
(myself included).


But I don't think transient ranges are that horrific. For instance, 
stdin.byLine.array probably doesn't do what you want. But it does 
actually work and do something that's defined and reproducible and will 
not corrupt memory. It's a bug needing to be fixed because you didn't 
understand what it was doing.


-Steve

Re: Transient ranges

2016-05-31 Thread Steven Schveighoffer via Digitalmars-d


On 5/31/16 10:46 AM, Dicebot wrote:

On Monday, 30 May 2016 at 12:53:07 UTC, Steven Schveighoffer wrote:

On 5/30/16 5:35 AM, Dicebot wrote:

On Sunday, 29 May 2016 at 17:25:47 UTC, Steven Schveighoffer wrote:

What problems are solvable only by not caching the front element? I
can't think of any.


As far as I know, currently it is done mostly for performance reasons -
if result is fitting in the register there is no need to allocate stack
space for the cache, or something like that. One of most annoying
examples is map which calls lambda on each `front` call :
https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L587-L590



Maybe my understanding of your post is incorrect. You said "It is
impossible to correctly define input range without caching front which
may not be always possible and may have negative performance impact."

I'm trying to figure out which cases caching makes the solution
impossible.


There are two separate points here:

1) Current definition of input range (most importantly, the fact `front`
has to be @property-like) implies `front` to always return the same
result until `popFront` is called.


Regardless of property-like or not, this should be the case. Otherwise, 
popFront makes no sense.



2) For ranges that call predicates on elements to evaluate next element
this can only be achieved by caching - predicates are never required to
be pure.


Or, by not returning different things from your predicate.

This is like saying RedBlackTree is broken when I give it a predicate of 
"a == b".



3) But caching is sub-optimal performance wise and thus bunch of Phobos
algorithms violate `front` consistency / cheapness expectation
evaluating predicates each time it is called (liked map).


I don't think anything defensively caches front in case the next call to 
front is different, unless that's specifically the reason for the range.



I don't really care about concept of transient ranges, it is the fact
there is no guarantee of front stability for plain input ranges which
worries me.


But this is inherent in languages which support mutable data. If you
want data that doesn't change, require copied/unique data, or
immutable data.


This has nothing to do with mutability, look at this totally broken
example:

import std.random;
import std.algorithm;
import std.stdio;
import std.range;


void main ( )
{
auto range = only(0).map!(x => uniform(0, 10));
writeln(range.front);
writeln(range.front);
writeln(range.front);
}



I think we should be aware that the range API doesn't prevent bugs of 
all kinds. There's only so much analysis the compiler can do.


-Steve

Re: Transient ranges

2016-05-31 Thread Jonathan M Davis via Digitalmars-d

On Monday, May 30, 2016 09:57:29 H. S. Teoh via Digitalmars-d wrote:
> I'd argue that range-based generic code that assumes non-transience is
> inherently buggy, because generic code ought not to make any
> assumptions beyond what the range API guarantees. Currently, the range
> API does not guarantee non-transience, therefore code that assumes so is
> broken by definition.  Just because they happen to work most of the time
> does not change the fact that they're written wrongly.

Technically, the range API doesn't even require that front return the same
value every time that it's called, because isInputRange can't possibly test
for it. That doesn't mean that we don't expect that ranges behave that way.
It's never been the case that we agreed that transient ranges were
acceptable. In fact, it's pretty much the opposite. There was some
discussion that maybe we'd consider it legitimate for basic input ranges
given that we already had byLine doing it, and we didn't want to require
that it copy its buffer, but previous discussions on it made it quite clear
that most of us (Andrei included) thought that it should not be allowed for
forward ranges.

It's been the case for ages now that ranges with a transient front are not
explicitly supported. They happen to work with some algorithms (and you made
changes to Phobos to make them work with more algorithms there), but they're
not checked for, and there are algorithms that they don't and can't work
with. The simple fact that std.array.array doesn't work with them is pretty
damning IMHO.

We already have too many problems with unspecified behavior in ranges
without adding transient front into the mix. It should be killed with fire
IMHO.

- Jonathan M Davis

Re: Transient ranges

2016-05-31 Thread Dicebot via Digitalmars-d

On Monday, 30 May 2016 at 12:53:07 UTC, Steven Schveighoffer 
wrote:

On 5/30/16 5:35 AM, Dicebot wrote:
On Sunday, 29 May 2016 at 17:25:47 UTC, Steven Schveighoffer 
wrote:
What problems are solvable only by not caching the front 
element? I

can't think of any.


As far as I know, currently it is done mostly for performance 
reasons -
if result is fitting in the register there is no need to 
allocate stack
space for the cache, or something like that. One of most 
annoying

examples is map which calls lambda on each `front` call :
https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L587-L590


Maybe my understanding of your post is incorrect. You said "It 
is impossible to correctly define input range without caching 
front which may not be always possible and may have negative 
performance impact."


I'm trying to figure out which cases caching makes the solution 
impossible.


There are two separate points here:

1) Current definition of input range (most importantly, the fact 
`front` has to be @property-like) implies `front` to always 
return the same result until `popFront` is called.


2) For ranges that call predicates on elements to evaluate next 
element this can only be achieved by caching - predicates are 
never required to be pure.


3) But caching is sub-optimal performance wise and thus bunch of 
Phobos algorithms violate `front` consistency / cheapness 
expectation evaluating predicates each time it is called (liked 
map).


I don't really care about concept of transient ranges, it is 
the fact
there is no guarantee of front stability for plain input 
ranges which

worries me.


But this is inherent in languages which support mutable data. 
If you want data that doesn't change, require copied/unique 
data, or immutable data.


This has nothing to do with mutability, look at this totally 
broken example:


import std.random;
import std.algorithm;
import std.stdio;
import std.range;


void main ( )
{
auto range = only(0).map!(x => uniform(0, 10));
writeln(range.front);
writeln(range.front);
writeln(range.front);
}

Re: Transient ranges

2016-05-31 Thread Steven Schveighoffer via Digitalmars-d


On 5/30/16 11:46 AM, Andrei Alexandrescu wrote:

On 05/30/2016 11:22 AM, Steven Schveighoffer wrote:

On Monday, 30 May 2016 at 15:12:41 UTC, Andrei Alexandrescu wrote:

On 05/30/2016 09:30 AM, Steven Schveighoffer wrote:

My iopipe library is 2x as fast.


Cool! What is the trick to the speedup? -- Andrei


Direct buffer access and not using FILE * as a base.


So what primitives do you use? The lower-level descriptor-based
primitives open() and friends? http://linux.die.net/man/2/open


Yes. That is not the focus of my library, though. I have a very crude 
and basic I/O type that just wraps the system calls into a nice API.



What do you mean by direct buffer access?


I mean instead of pulling data out one character at a time from another 
buffer into your buffer, you have direct access, using all the speedups 
that CPUs provide when accessing arrays.


I'm also allowing the compiler to inline as much as possible because all 
the code is available.



What is the relative contributions of these (and possibly other) design
decisions to the overall speed?


I don't know. I started with a vastly different design and it ended up 
being twice as fast. So it's difficult to say for certain what are the 
major factor that is slowing down std.stdio.


At DConf, Adrian Matoga presented his very similar flod library, and I 
was surprised when it was faster than mine at processing lines, even 
though he was still using std.stdio.File. So I found a few reasons why 
mine was slower and fixed them. The things that made it much faster were:


1. Switched to direct pointer access (i.e. doing while(*ptr++ != 
lineTerminator) {} ) when the buffer type was an array.
2. I was stupidly accessing the member of the struct which held the 
terminator instead of allocating a stack variable for it. After doing 
that (and making it immutable), the compiler was much better informed on 
how to optimize.


There may even be more opportunities for speedup, but I'm pretty 
satisfied with it at the moment.



How can we integrate some of these in std.stdio without breaking
backward compatibility,


There are 2 main issues with FILE *:

1) it does not provide buffer access, so you must rely on things like 
getline if they exist. But these have their own problems (i.e. do not 
support unicode, require C-malloc'd buffer)


2) All calls are opaque, so no inlining can occur. This is especially 
painful if you have to call fgetc in a tight loop. This goes back to the 
buffer access issue -- FILE * only lets you look ahead one character.


So I think the only real way to provide a decent performance improvement 
without breaking C compatibility is to write code that is aware of the 
implementation details of the opaque FILE * type. Just like getline.



or offer additional artifacts that integrate
seamlessly and don't need to be compatible?


I had a grand plan to replace stdio with something that "evolves" to 
faster D based i/o when you did D-like things. But it's very complex, 
and each addition of code to stdio makes things more complex.


In reality, I think we only need to be C compatible for stdout (possibly 
only via write[f][ln]), and possibly stdin. There's no reason we need to 
use C i/o for any other uses. What makes things difficult is that 
everything in Phobos uses std.stdio.File for all i/o and that uses C i/o.


-Steve

Re: Transient ranges

2016-05-31 Thread Steven Schveighoffer via Digitalmars-d


On 5/30/16 2:32 PM, Alex Parrill wrote:

On Monday, 30 May 2016 at 12:53:07 UTC, Steven Schveighoffer wrote:



I'm trying to figure out which cases caching makes the solution
impossible.


One case is wrapping a network stream: a TCP input range that yields
ubyte[] chunks of data as they are read from the socket, a joiner to
join the blocks together, a map that converts the bytes to characters,
and take to consume each message.

The issue is that, in a naive implementation, creating the TCP range
requires reading a chunk from the socket to populate front. Since I
usually set up my stream objects, before using them, the receiver range
will try to receive a chunk, but I haven't sent a request to the server
yet, so it would block indefinitely.

Same issue with popFront: I need to pop the bytes I've already used for
the previous request, but calling popFront would mean waiting for the
response for the next request which I haven't sent out yet.

The solution I used was to delay actually receiving the chunk until
front was called, which complicates the implementation a bit.


This is not the same issue. Note that populating on call to front 
instead of on popFront is still valid within the definition of a range 
(though I think the most straightforward way to implement a range is to 
populate only on popFront).


What makes this awkward is that you need to check to see if front has 
already been populated, which also kind of requires a separate flag 
(though you may be able to do this without an extra flag). However, you 
still need a place to put the data, and using a buffer for this is a 
good fit.


I would suggest that custom i/o patterns like this (where calling 
popFront is coupled to some other operation) do not fit ranges very 
well. In the iopipe library I wrote, I have two separate operations for 
"fetch more data" and "forget the oldest data", which are based on the 
requirements for i/o. These can be cajoled into a range interface, of 
course, but those operations suit i/o requirements much better than 
front/popFront/empty.


-Steve

Re: Transient ranges

2016-05-30 Thread Alex Parrill via Digitalmars-d

On Monday, 30 May 2016 at 12:53:07 UTC, Steven Schveighoffer 
wrote:

On 5/30/16 5:35 AM, Dicebot wrote:
On Sunday, 29 May 2016 at 17:25:47 UTC, Steven Schveighoffer 
wrote:
What problems are solvable only by not caching the front 
element? I

can't think of any.


As far as I know, currently it is done mostly for performance 
reasons -
if result is fitting in the register there is no need to 
allocate stack
space for the cache, or something like that. One of most 
annoying

examples is map which calls lambda on each `front` call :
https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L587-L590


Maybe my understanding of your post is incorrect. You said "It 
is impossible to correctly define input range without caching 
front which may not be always possible and may have negative 
performance impact."


I'm trying to figure out which cases caching makes the solution 
impossible.


One case is wrapping a network stream: a TCP input range that 
yields ubyte[] chunks of data as they are read from the socket, a 
joiner to join the blocks together, a map that converts the bytes 
to characters, and take to consume each message.


The issue is that, in a naive implementation, creating the TCP 
range requires reading a chunk from the socket to populate front. 
Since I usually set up my stream objects, before using them, the 
receiver range will try to receive a chunk, but I haven't sent a 
request to the server yet, so it would block indefinitely.


Same issue with popFront: I need to pop the bytes I've already 
used for the previous request, but calling popFront would mean 
waiting for the response for the next request which I haven't 
sent out yet.


The solution I used was to delay actually receiving the chunk 
until front was called, which complicates the implementation a 
bit.

Re: Transient ranges

2016-05-30 Thread H. S. Teoh via Digitalmars-d

On Mon, May 30, 2016 at 09:26:35AM -0400, Steven Schveighoffer via 
Digitalmars-d wrote:
> On 5/30/16 12:17 AM, Jonathan M Davis via Digitalmars-d wrote:
[...]
> > Having byLine not copy its buffer is fine. Having it be a range is
> > not.  Algorithms in general just do not play well with that
> > behavior, and I don't think that it's reasonable to expect them to.
> 
> I disagree. Most algorithms in std.algorithm are fine with transient
> ranges.

Yes, some years ago I submitted a series of PRs to fix poorly-written
code in Phobos that unnecessarily depended on .front not changing when
popFront is called.  I had found that in the majority of cases it was
actually unnecessary to require non-transience; most algorithms in
std.algorithm, properly-written, work just fine with transient ranges.

For the few algorithms that don't work with transient ranges, arguably
they ought to require forward ranges and use .save instead. The only
exception I can think of right now is array().

[...]
> Here is how I think about it: the front element is valid and stable
> until you call popFront. After that, anything goes for the old front.
> 
> This is entirely reasonable, and fits into many many algorithms. This
> isn't a functional-only language, mutation is valid in D.

I agree, most range algorithms work just fine with transient ranges.  I
consider it a reasonable consequence of defensive programming: don't
assume anything about the API beyond what it actually guarantees. The
current range API says that the current range element is accessible via
.front; from this one may conclude that the value returned by .front
should be valid immediately afterwards. However, the API says nothing
about this value lasting past the next call to .popFront, and in fact it
specifies that .front will return something different afterwards, so
defensively-written code should assume the worst and regard the previous
value of .front as invalidated.

Understandably, writing code this way is more constrained and less
convenient, but in a standard library I'd expect that code should be at
least of this calibre, if not better.  And as far as I have found, the
majority of range algorithms in Phobos *can* be written this way and
work perfectly fine with transient ranges.  As I've already said, most
of the remaining algorithms can be implemented by requiring forward
ranges and using .save instead of assuming that .front is cacheable --
.save is something guaranteed by the API, and therefore defensively
written generic code should use it rather than making unfounded
assumptions about .front.

[...]
> > If it's a range, then it can be passed around to other algorithms
> > with impunity, and almost nothing is written with the idea that a
> > range's front is transient.

And nothing about the range API guarantees that .front is *not*
transient either.  Generic code should not assume either way.  It should
be written with the minimal assumptions necessary to work.

[...]
> > There's no way to check for transience, and I don't think that it's
> > even vaguely worth adding yet another range primitive that has to be
> > checked for everywhere just for this case. Transience does _not_
> > play nicely with algorithms in general.

I disagree. Of the algorithms I surveyed in Phobos at the time, I found
that the majority of them can be written in such a way that they are
transience-agnostic.  Only a small set of algorithms actually require
non-transience.  That's hardly "not play nicely with algorithms in
general".

[...]
> > Pretty much no range-based code is written with the idea that front
> > is transient.
> 
> Provably false, see above.
[...]

I'd argue that range-based generic code that assumes non-transience is
inherently buggy, because generic code ought not to make any
assumptions beyond what the range API guarantees. Currently, the range
API does not guarantee non-transience, therefore code that assumes so is
broken by definition.  Just because they happen to work most of the time
does not change the fact that they're written wrongly.

T

-- 
Just because you can, doesn't mean you should.

Re: Transient ranges

2016-05-30 Thread Andrei Alexandrescu via Digitalmars-d


On 05/30/2016 12:22 AM, Jack Stouffer wrote:

On Sunday, 29 May 2016 at 17:36:24 UTC, Steven Schveighoffer wrote:

Wholly disagree. If we didn't cache the element, D would be a
laughingstock of performance-minded tests.


byLine already is a laughingstock performance wise:
https://issues.dlang.org/show_bug.cgi?id=11810


That has been fixed a long time ago by 
https://github.com/dlang/phobos/pull/3089, I just marked it as fixed.



It's way faster to read the entire file into a buffer and iterate by
line over that.


Sometimes yes, but that's a change of approach with its own tradeoffs 
(e.g. what if you want to stop early, the file is very large etc. etc).



Andrei

Re: Transient ranges

2016-05-30 Thread Andrei Alexandrescu via Digitalmars-d


On 05/30/2016 11:22 AM, Steven Schveighoffer wrote:

On Monday, 30 May 2016 at 15:12:41 UTC, Andrei Alexandrescu wrote:

On 05/30/2016 09:30 AM, Steven Schveighoffer wrote:

My iopipe library is 2x as fast.


Cool! What is the trick to the speedup? -- Andrei


Direct buffer access and not using FILE * as a base.


So what primitives do you use? The lower-level descriptor-based 
primitives open() and friends? http://linux.die.net/man/2/open


What do you mean by direct buffer access?

What is the relative contributions of these (and possibly other) design 
decisions to the overall speed?


How can we integrate some of these in std.stdio without breaking 
backward compatibility, or offer additional artifacts that integrate 
seamlessly and don't need to be compatible?



Andrei

Re: Transient ranges

2016-05-30 Thread Steven Schveighoffer via Digitalmars-d


On Monday, 30 May 2016 at 15:12:41 UTC, Andrei Alexandrescu wrote:

On 05/30/2016 09:30 AM, Steven Schveighoffer wrote:

My iopipe library is 2x as fast.


Cool! What is the trick to the speedup? -- Andrei


Direct buffer access and not using FILE * as a base.

-Steve

Re: Transient ranges

2016-05-30 Thread Andrei Alexandrescu via Digitalmars-d


On 05/30/2016 09:26 AM, Steven Schveighoffer wrote:

Here is how I think about it: the front element is valid and stable
until you call popFront. After that, anything goes for the old front.


Yah, we should have formal language in the library reference about this. 
-- Andrei

Re: Transient ranges

2016-05-30 Thread Andrei Alexandrescu via Digitalmars-d


On 05/30/2016 09:30 AM, Steven Schveighoffer wrote:

My iopipe library is 2x as fast.


Cool! What is the trick to the speedup? -- Andrei

Re: Transient ranges

2016-05-30 Thread Steven Schveighoffer via Digitalmars-d


On 5/29/16 11:46 PM, Alex Parrill wrote:

On Sunday, 29 May 2016 at 17:45:00 UTC, Steven Schveighoffer wrote:

On 5/27/16 7:42 PM, Seb wrote:


So what about the convention to explicitely declare a `.transient` enum
member on a range, if the front element value can change?


enum isTransient(R) = is(typeof(() {
   static assert(isInputRange!R);
   static assert(hasIndirections(ElementType!R));
   static assert(!allIndrectionsImmutable!(ElementType!R)); // need to
write this
}));



allIndrectionsImmutable could probably just be is(T : immutable) (ie
implicitly convertible to immutable). Value types without (or with
immutable only) indirections should be convertible to immutable, since
the value is being copied.



Yes, I think that is correct. Thanks!

-Steve

Re: Transient ranges

2016-05-30 Thread Steven Schveighoffer via Digitalmars-d


On 5/30/16 12:22 AM, Jack Stouffer wrote:

On Sunday, 29 May 2016 at 17:36:24 UTC, Steven Schveighoffer wrote:

Wholly disagree. If we didn't cache the element, D would be a
laughingstock of performance-minded tests.


byLine already is a laughingstock performance wise:
https://issues.dlang.org/show_bug.cgi?id=11810


But that's because we base our I/O on C :)

My iopipe library is 2x as fast.


It's way faster to read the entire file into a buffer and iterate by
line over that.


Depends on how big the entire file is.


I have to agree with Jonathan, I see a lot of proposals in this thread
but I have yet to see a cost/benefit analysis that's pro transient
support. The amount of changes needed to support them is not
commensurate to any possible benefits.


Transient ranges are already supported. Removing the ability to have 
transient ranges would be an unmitigated disaster. e.g. removing range 
support for byLine.


-Steve

Re: Transient ranges

2016-05-30 Thread Steven Schveighoffer via Digitalmars-d


On 5/30/16 12:17 AM, Jonathan M Davis via Digitalmars-d wrote:

On Sunday, May 29, 2016 13:36:24 Steven Schveighoffer via Digitalmars-d wrote:

On 5/27/16 9:48 PM, Jonathan M Davis via Digitalmars-d wrote:

On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:

So what about the convention to explicitely declare a
`.transient` enum member on a range, if the front element value
can change?


Honestly, I don't think that supporting transient ranges is worth it.
Every
single range-based function would have to either test that the "transient"
enum wasn't there or take transient ranges into account, and
realistically,
that isn't going to happen. For better or worse, we do have byLine in
std.stdio, which has a transient front, but aside from the performance
benefits, it's been a disaster.


Wholly disagree. If we didn't cache the element, D would be a
laughingstock of performance-minded tests.


Having byLine not copy its buffer is fine. Having it be a range is not.
Algorithms in general just do not play well with that behavior, and I don't
think that it's reasonable to expect them to.


I disagree. Most algorithms in std.algorithm are fine with transient ranges.




It's way too error-prone. We now have
byLineCopy to combat that, but of course, byLine is the more obvious
function and thus more likely to be used (plus it's been around longer),
so
a _lot_ of code is going to end up using it, and a good chunk of that code
really should be using byLineCopy.


There's nothing actually wrong with using byLine, and copying on demand.
Why such a negative connotation?


Because it does not play nicely with ranges, and aside from a few rare
ranges like byLine that have to deal directly with I/O, transience isn't
even useful. Having an efficient solution that plays nicely with I/O is
definitely important, but it doesn't need to be a range, especially when it
complicates ranges in general. byLine doesn't even work with
std.array.array, and if even that doesn't work, I don't see how a range
could be considered well-behaved.


Here is how I think about it: the front element is valid and stable 
until you call popFront. After that, anything goes for the old front.


This is entirely reasonable, and fits into many many algorithms. This 
isn't a functional-only language, mutation is valid in D.





I'm of the opinion that if you want a transient front, you should just use
opApply and skip ranges entirely.


So you want to make this code invalid? Why?

foreach(i; map!(a => a.to!int)(stdin.byLine))
{
// process each integer
...
}

You want to make me copy each line to a heap-allocated string so I can
parse it?!!


If it's a range, then it can be passed around to other algorithms with
impunity, and almost nothing is written with the idea that a range's front
is transient.


Algorithms which are fine with byLine from std.algorithm.searching (not 
going to go through all of the submodules):


all, any, balancedParens, count, countUntil, find, canFind, findAmong 
(with first range being transient), skipOver (second parameter 
transient), startsWith, until.


algorithms which would would compile with transient ranges, but not work 
correctly: minCount, maxCount


Now, if minCount and maxCount could introspect transience, it could .dup 
the elements to make sure they were returned properly.



There's no way to check for transience, and I don't think
that it's even vaguely worth adding yet another range primitive that has to
be checked for everywhere just for this case. Transience does _not_ play
nicely with algorithms in general.


I think your understanding of this is incorrect. A range is transient by 
default if the type of the element allows modification via reference. It 
doesn't have to be checked everywhere, because most algorithms are fine 
with or without transience.



Using opApply doesn't completely solve the problem (since the buffer could
still escape - we'd need some kind of scope attribute or wrapper to fix that
problem), but it makes it so that you can't pass such a a range around and
run into problems with all of the algorithms that don't play nicely with it.
So, instead, you end up with code that looks something like

foreach(line; stdin.byLine())
{
auto i = line.to!int();
...
}

And yes, it's slightly longer, but it prevents a whole class of bugs by not
having it be a range with a transient front.


Sure, as long as we're adding new newsgroups, let's at one that's titled 
"why can't I use byLine as a range", as this will be a popular topic.





Allowing for front to be transient -
whether you can check for it or not - simply is not worth the extra
complications. I'd love it if we deprecated byLine's range functions, and
made it use opApply instead and just declare transient ranges to be
completely unsupported. If you want to write your code to have a transient
front, you can obviously take that risk, but you're on your own.


There is no way to disallow fro

Re: Transient ranges

2016-05-30 Thread Steven Schveighoffer via Digitalmars-d


On 5/30/16 8:44 AM, Andrei Alexandrescu wrote:

On 05/30/2016 08:21 AM, Joseph Rushton Wakeling wrote:

On Sunday, 29 May 2016 at 17:29:35 UTC, Steven Schveighoffer wrote:

This doesn't help at all. I can still make a "transient" range with
all three range primitives.

There seems to be a misunderstanding about what a transient range is.

byLine is a transient range that requires the front element be
cacheable (I have to build the line somewhere, and reusing that buffer
provides performance). Shoehorning into a popFront-only style "range"
does not solve the problem. Not only that, but now I would have to
cache BOTH the front element and the next one.


Ack, yea, I think see the issue.  It's

 auto a = transientRange.front;
 transientRange.popFront() // <=== now a has changed

Even if you go with the only-2-primitives one, you'd have the same
problem:

 auto a = transientRange.front;
 auto b = transientRange.front;   // <=== now a has changed


That's right. The one approach that works here is to acknowledge the
range has no buffering at all and require the user to provide the buffer
as an argument. However, that would make for a different primitive that
does not fit within the range hierarchy. -- Andrei



Yes, this is how e.g. Linux getline works. But this leaks even more 
implementation details to the user, and still doesn't solve the problem, 
it just pushes it onto the user.


It also would make for some awkward pipelining.

Another solution is to accept a "buffer manager" that is responsible for 
determining how to handle the buffering.


-Steve

Re: Transient ranges

2016-05-30 Thread Steven Schveighoffer via Digitalmars-d


On 5/30/16 5:35 AM, Dicebot wrote:

On Sunday, 29 May 2016 at 17:25:47 UTC, Steven Schveighoffer wrote:

What problems are solvable only by not caching the front element? I
can't think of any.


As far as I know, currently it is done mostly for performance reasons -
if result is fitting in the register there is no need to allocate stack
space for the cache, or something like that. One of most annoying
examples is map which calls lambda on each `front` call :
https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L587-L590


Maybe my understanding of your post is incorrect. You said "It is 
impossible to correctly define input range without caching front which 
may not be always possible and may have negative performance impact."


I'm trying to figure out which cases caching makes the solution impossible.





And there is no way to define "transient" ranges in a way other than
explicitly declaring they are transient. There isn't anything inherent
or introspectable about such ranges.


I don't really care about concept of transient ranges, it is the fact
there is no guarantee of front stability for plain input ranges which
worries me.


But this is inherent in languages which support mutable data. If you 
want data that doesn't change, require copied/unique data, or immutable 
data.


-Steve

Re: Transient ranges

2016-05-30 Thread Andrei Alexandrescu via Digitalmars-d


On 05/30/2016 08:21 AM, Joseph Rushton Wakeling wrote:

On Sunday, 29 May 2016 at 17:29:35 UTC, Steven Schveighoffer wrote:

This doesn't help at all. I can still make a "transient" range with
all three range primitives.

There seems to be a misunderstanding about what a transient range is.

byLine is a transient range that requires the front element be
cacheable (I have to build the line somewhere, and reusing that buffer
provides performance). Shoehorning into a popFront-only style "range"
does not solve the problem. Not only that, but now I would have to
cache BOTH the front element and the next one.


Ack, yea, I think see the issue.  It's

 auto a = transientRange.front;
 transientRange.popFront() // <=== now a has changed

Even if you go with the only-2-primitives one, you'd have the same problem:

 auto a = transientRange.front;
 auto b = transientRange.front;   // <=== now a has changed


That's right. The one approach that works here is to acknowledge the 
range has no buffering at all and require the user to provide the buffer 
as an argument. However, that would make for a different primitive that 
does not fit within the range hierarchy. -- Andrei

Re: Transient ranges

2016-05-30 Thread Steven Schveighoffer via Digitalmars-d


On 5/29/16 2:27 PM, default0 wrote:

On Sunday, 29 May 2016 at 18:09:29 UTC, Steven Schveighoffer wrote:

On 5/29/16 1:45 PM, Steven Schveighoffer wrote:

On 5/27/16 7:42 PM, Seb wrote:


So what about the convention to explicitely declare a `.transient` enum
member on a range, if the front element value can change?


enum isTransient(R) = is(typeof(() {
   static assert(isInputRange!R);
   static assert(hasIndirections(ElementType!R));
   static assert(!allIndrectionsImmutable!(ElementType!R)); // need to
write this
}));


obviously, this is better as a simple && statement between the three
requirements :) When I started writing, I thought I'd have to write
some runtime code.



Would that make a range of polymorphic objects transient?


Of course! If it's mutable (or marked const), it can change from call to 
call.


-Steve

Re: Transient ranges

2016-05-30 Thread Joseph Rushton Wakeling via Digitalmars-d

On Sunday, 29 May 2016 at 17:29:35 UTC, Steven Schveighoffer 
wrote:
This doesn't help at all. I can still make a "transient" range 
with all three range primitives.


There seems to be a misunderstanding about what a transient 
range is.


byLine is a transient range that requires the front element be 
cacheable (I have to build the line somewhere, and reusing that 
buffer provides performance). Shoehorning into a popFront-only 
style "range" does not solve the problem. Not only that, but 
now I would have to cache BOTH the front element and the next 
one.


Ack, yea, I think see the issue.  It's

auto a = transientRange.front;
transientRange.popFront() // <=== now a has changed

Even if you go with the only-2-primitives one, you'd have the 
same problem:


auto a = transientRange.front;
auto b = transientRange.front;   // <=== now a has changed

My "empty + front only" idea was more inspired by the case e.g. 
where the range is wrapping some truly-random signal (like the 
current observed value of atmospheric noise, for example), it 
clearly doesn't solve this case.

Re: Transient ranges

2016-05-30 Thread Nordlöw via Digitalmars-d


On Friday, 27 May 2016 at 23:42:24 UTC, Seb wrote:
So what about the convention to explicitely declare a 
`.transient` enum member on a range, if the front element value 
can change?


An alternative solution is to extend data-flow/escape-analysis to 
forbit references to `scope`d-variables from leaking outside of 
the scope where it's declared. If the element variable in the 
`foreach`-statement is then qualifed with `scope` the developer 
can safely use the front-reference inside the foreach-scope 
without worrying about it leaking into the enclosing scopes.


However, this solution, of course, requires the developer to 
remember to use the `scope` keyword every time he iterates over a 
transient range, which might not be what want in terms of 
simplicity.


For a very technical plan on how to implement this in D see

http://wiki.dlang.org/User:Schuetzm/scope

Could this big undertaking be split up into smaller more 
managable parts?

Re: Transient ranges

2016-05-30 Thread Dicebot via Digitalmars-d

On Sunday, 29 May 2016 at 17:25:47 UTC, Steven Schveighoffer 
wrote:
What problems are solvable only by not caching the front 
element? I can't think of any.


As far as I know, currently it is done mostly for performance 
reasons - if result is fitting in the register there is no need 
to allocate stack space for the cache, or something like that. 
One of most annoying examples is map which calls lambda on each 
`front` call : 
https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L587-L590


And there is no way to define "transient" ranges in a way other 
than explicitly declaring they are transient. There isn't 
anything inherent or introspectable about such ranges.


I don't really care about concept of transient ranges, it is the 
fact there is no guarantee of front stability for plain input 
ranges which worries me.

Re: Transient ranges

2016-05-29 Thread Jonathan M Davis via Digitalmars-d

On Sunday, May 29, 2016 18:27:53 default0 via Digitalmars-d wrote:
> On Sunday, 29 May 2016 at 18:09:29 UTC, Steven Schveighoffer
>
> wrote:
> > On 5/29/16 1:45 PM, Steven Schveighoffer wrote:
> >> On 5/27/16 7:42 PM, Seb wrote:
> >>> So what about the convention to explicitely declare a
> >>> `.transient` enum
> >>> member on a range, if the front element value can change?
> >>
> >> enum isTransient(R) = is(typeof(() {
> >>
> >>static assert(isInputRange!R);
> >>static assert(hasIndirections(ElementType!R));
> >>static assert(!allIndrectionsImmutable!(ElementType!R)); //
> >>
> >> need to
> >> write this
> >> }));
> >
> > obviously, this is better as a simple && statement between the
> > three requirements :) When I started writing, I thought I'd
> > have to write some runtime code.
> >
> > -Steve
>
> Would that make a range of polymorphic objects transient?

It would make pretty much anything that isn't a value type - including a
type that's actually a value but uses postblit to do it - be treated as
transient, with the one exception being that if the reference types involved
are immutable (be they the element type or members in the elmenet type),
then it's not treated as transient. This means a very large number of ranges
will be treated as being transient, which is completely unacceptable IMHO.
Having a transient front is _not_ the norm, and code is usually written with
the assumption that front is not transient. In almost all cases, if a
range-based function happens to work with a transient front, it's by luck
and not because it was designed that way.

You can't statically check for transience, because it depends on runtime
behavior. At best, you can statically eliminate a fairly small portion of
the ranges as not being having transient fronts. If we want to actually
support transient fronts, it really needs to be explicit IMHO.

Regardless, I don't think that we want to need to be checking for transience
in range-based functions in general. It's too much extra complication for
too little benefit. A very small number of ranges actually have or benefit
from having a transient front, and I don't think that it's worth supporting
them as ranges given how much that affects everything else. Otherwise, you
end up with the 1% case causing problems for all range-based code.

- Jonathan M Davis

Re: Transient ranges

2016-05-29 Thread Jack Stouffer via Digitalmars-d

On Sunday, 29 May 2016 at 17:36:24 UTC, Steven Schveighoffer 
wrote:
Wholly disagree. If we didn't cache the element, D would be a 
laughingstock of performance-minded tests.


byLine already is a laughingstock performance wise: 
https://issues.dlang.org/show_bug.cgi?id=11810


It's way faster to read the entire file into a buffer and iterate 
by line over that.


I have to agree with Jonathan, I see a lot of proposals in this 
thread but I have yet to see a cost/benefit analysis that's pro 
transient support. The amount of changes needed to support them 
is not commensurate to any possible benefits.

Re: Transient ranges

2016-05-29 Thread Jonathan M Davis via Digitalmars-d

On Sunday, May 29, 2016 13:36:24 Steven Schveighoffer via Digitalmars-d wrote:
> On 5/27/16 9:48 PM, Jonathan M Davis via Digitalmars-d wrote:
> > On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:
> >> So what about the convention to explicitely declare a
> >> `.transient` enum member on a range, if the front element value
> >> can change?
> >
> > Honestly, I don't think that supporting transient ranges is worth it.
> > Every
> > single range-based function would have to either test that the "transient"
> > enum wasn't there or take transient ranges into account, and
> > realistically,
> > that isn't going to happen. For better or worse, we do have byLine in
> > std.stdio, which has a transient front, but aside from the performance
> > benefits, it's been a disaster.
>
> Wholly disagree. If we didn't cache the element, D would be a
> laughingstock of performance-minded tests.

Having byLine not copy its buffer is fine. Having it be a range is not.
Algorithms in general just do not play well with that behavior, and I don't
think that it's reasonable to expect them to.

> > It's way too error-prone. We now have
> > byLineCopy to combat that, but of course, byLine is the more obvious
> > function and thus more likely to be used (plus it's been around longer),
> > so
> > a _lot_ of code is going to end up using it, and a good chunk of that code
> > really should be using byLineCopy.
>
> There's nothing actually wrong with using byLine, and copying on demand.
> Why such a negative connotation?

Because it does not play nicely with ranges, and aside from a few rare
ranges like byLine that have to deal directly with I/O, transience isn't
even useful. Having an efficient solution that plays nicely with I/O is
definitely important, but it doesn't need to be a range, especially when it
complicates ranges in general. byLine doesn't even work with
std.array.array, and if even that doesn't work, I don't see how a range
could be considered well-behaved.

> > I'm of the opinion that if you want a transient front, you should just use
> > opApply and skip ranges entirely.
>
> So you want to make this code invalid? Why?
>
> foreach(i; map!(a => a.to!int)(stdin.byLine))
> {
> // process each integer
> ...
> }
>
> You want to make me copy each line to a heap-allocated string so I can
> parse it?!!

If it's a range, then it can be passed around to other algorithms with
impunity, and almost nothing is written with the idea that a range's front
is transient. There's no way to check for transience, and I don't think
that it's even vaguely worth adding yet another range primitive that has to
be checked for everywhere just for this case. Transience does _not_ play
nicely with algorithms in general.

Using opApply doesn't completely solve the problem (since the buffer could
still escape - we'd need some kind of scope attribute or wrapper to fix that
problem), but it makes it so that you can't pass such a a range around and
run into problems with all of the algorithms that don't play nicely with it.
So, instead, you end up with code that looks something like

foreach(line; stdin.byLine())
{
auto i = line.to!int();
...
}

And yes, it's slightly longer, but it prevents a whole class of bugs by not
having it be a range with a transient front.

> > Allowing for front to be transient -
> > whether you can check for it or not - simply is not worth the extra
> > complications. I'd love it if we deprecated byLine's range functions, and
> > made it use opApply instead and just declare transient ranges to be
> > completely unsupported. If you want to write your code to have a transient
> > front, you can obviously take that risk, but you're on your own.
>
> There is no way to disallow front from being transient. In fact, it
> should be assumed that it is the default unless it's wholly a value-type.

Pretty much no range-based code is written with the idea that front is
transient. It's pretty much the opposite. Unfortunately, we can't check for
all of the proper range semantics at compile time (be it having to do with
transience, the fact that front needs to be the same every time until
popFront is called, that save has to actually result in a range that will
have exactly the same elements, or whatever other runtime behavior that
ranges are supposed to adhere to), but just because something can't be
checked for doesn't mean that it should be considered reasonable or valid.
IMHO, a range with a transient front should be considered as valid as a
range that returns a different value every time that front is called without
popFront having been called. Neither can be tested for, but both cause
problems.

If we're going to support transience, then we _need_ to have some sort of

Re: Transient ranges

2016-05-29 Thread Alex Parrill via Digitalmars-d

On Sunday, 29 May 2016 at 17:45:00 UTC, Steven Schveighoffer 
wrote:

On 5/27/16 7:42 PM, Seb wrote:

So what about the convention to explicitely declare a 
`.transient` enum

member on a range, if the front element value can change?


enum isTransient(R) = is(typeof(() {
   static assert(isInputRange!R);
   static assert(hasIndirections(ElementType!R));
   static assert(!allIndrectionsImmutable!(ElementType!R)); // 
need to write this

}));

-Steve


allIndrectionsImmutable could probably just be is(T : immutable) 
(ie implicitly convertible to immutable). Value types without (or 
with immutable only) indirections should be convertible to 
immutable, since the value is being copied.

Re: Transient ranges

2016-05-29 Thread default0 via Digitalmars-d

On Sunday, 29 May 2016 at 18:09:29 UTC, Steven Schveighoffer 
wrote:

On 5/29/16 1:45 PM, Steven Schveighoffer wrote:

On 5/27/16 7:42 PM, Seb wrote:

So what about the convention to explicitely declare a 
`.transient` enum

member on a range, if the front element value can change?


enum isTransient(R) = is(typeof(() {
   static assert(isInputRange!R);
   static assert(hasIndirections(ElementType!R));
   static assert(!allIndrectionsImmutable!(ElementType!R)); // 
need to

write this
}));


obviously, this is better as a simple && statement between the 
three requirements :) When I started writing, I thought I'd 
have to write some runtime code.


-Steve


Would that make a range of polymorphic objects transient?

Re: Transient ranges

2016-05-29 Thread Steven Schveighoffer via Digitalmars-d


On 5/29/16 1:45 PM, Steven Schveighoffer wrote:

On 5/27/16 7:42 PM, Seb wrote:


So what about the convention to explicitely declare a `.transient` enum
member on a range, if the front element value can change?


enum isTransient(R) = is(typeof(() {
   static assert(isInputRange!R);
   static assert(hasIndirections(ElementType!R));
   static assert(!allIndrectionsImmutable!(ElementType!R)); // need to
write this
}));


obviously, this is better as a simple && statement between the three 
requirements :) When I started writing, I thought I'd have to write some 
runtime code.


-Steve

Re: Transient ranges

2016-05-29 Thread Steven Schveighoffer via Digitalmars-d


On 5/27/16 7:42 PM, Seb wrote:


So what about the convention to explicitely declare a `.transient` enum
member on a range, if the front element value can change?


enum isTransient(R) = is(typeof(() {
   static assert(isInputRange!R);
   static assert(hasIndirections(ElementType!R));
   static assert(!allIndrectionsImmutable!(ElementType!R)); // need to 
write this

}));

-Steve

Re: Transient ranges

2016-05-29 Thread Steven Schveighoffer via Digitalmars-d


On 5/27/16 9:48 PM, Jonathan M Davis via Digitalmars-d wrote:

On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:

So what about the convention to explicitely declare a
`.transient` enum member on a range, if the front element value
can change?


Honestly, I don't think that supporting transient ranges is worth it. Every
single range-based function would have to either test that the "transient"
enum wasn't there or take transient ranges into account, and realistically,
that isn't going to happen. For better or worse, we do have byLine in
std.stdio, which has a transient front, but aside from the performance
benefits, it's been a disaster.


Wholly disagree. If we didn't cache the element, D would be a 
laughingstock of performance-minded tests.



It's way too error-prone. We now have
byLineCopy to combat that, but of course, byLine is the more obvious
function and thus more likely to be used (plus it's been around longer), so
a _lot_ of code is going to end up using it, and a good chunk of that code
really should be using byLineCopy.


There's nothing actually wrong with using byLine, and copying on demand. 
Why such a negative connotation?



I'm of the opinion that if you want a transient front, you should just use
opApply and skip ranges entirely.


So you want to make this code invalid? Why?

foreach(i; map!(a => a.to!int)(stdin.byLine))
{
   // process each integer
   ...
}

You want to make me copy each line to a heap-allocated string so I can 
parse it?!!



Allowing for front to be transient -
whether you can check for it or not - simply is not worth the extra
complications. I'd love it if we deprecated byLine's range functions, and
made it use opApply instead and just declare transient ranges to be
completely unsupported. If you want to write your code to have a transient
front, you can obviously take that risk, but you're on your own.


There is no way to disallow front from being transient. In fact, it 
should be assumed that it is the default unless it's wholly a value-type.


-Steve

Re: Transient ranges

2016-05-29 Thread Steven Schveighoffer via Digitalmars-d


On 5/29/16 7:28 AM, ZombineDev wrote:

On Sunday, 29 May 2016 at 11:15:19 UTC, Dicebot wrote:

On 05/28/2016 08:27 PM, Joseph Rushton Wakeling wrote:

On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:

On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:

So what about the convention to explicitely declare a `.transient`
enum member on a range, if the front element value can change?


Honestly, I don't think that supporting transient ranges is worth it.


I have personally wondered if there was a case for a TransientRange
concept where the only primitives defined are `empty` and `front`.

`popFront()` is not defined because the whole point is that every
single call to `front` will produce a different value.


I would prefer such ranges to not have `front` and return new item
from `popFront` instead but yes, I would much prefer it to existing
form, transient or not. It is impossible to correctly define input
range without caching front which may not be always possible and may
have negative performance impact. Because of that, a lot of Phobos
ranges compromise `front` consistency in favor of speed up - which
only seems to work because most algorithms need to access `front` once.

I believe this is biggest issue in D ranges design right now, by large
margin.


+1
I think making popFront return a value for transient ranges is a sound
idea. It would allow to easily distinguish between InputRange and
TransientRange with very simple CT introspection. The biggest blocker is
to teach the compiler to recognize TransientRange types in foreach.


This doesn't help at all. I can still make a "transient" range with all 
three range primitives.


There seems to be a misunderstanding about what a transient range is.

byLine is a transient range that requires the front element be cacheable 
(I have to build the line somewhere, and reusing that buffer provides 
performance). Shoehorning into a popFront-only style "range" does not 
solve the problem. Not only that, but now I would have to cache BOTH the 
front element and the next one.


-Steve

Re: Transient ranges

2016-05-29 Thread Steven Schveighoffer via Digitalmars-d


On 5/29/16 7:15 AM, Dicebot wrote:

On 05/28/2016 08:27 PM, Joseph Rushton Wakeling wrote:

On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:

On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:

So what about the convention to explicitely declare a `.transient`
enum member on a range, if the front element value can change?


Honestly, I don't think that supporting transient ranges is worth it.


I have personally wondered if there was a case for a TransientRange
concept where the only primitives defined are `empty` and `front`.

`popFront()` is not defined because the whole point is that every single
call to `front` will produce a different value.


I would prefer such ranges to not have `front` and return new item from
`popFront` instead but yes, I would much prefer it to existing form,
transient or not. It is impossible to correctly define input range
without caching front which may not be always possible and may have
negative performance impact. Because of that, a lot of Phobos ranges
compromise `front` consistency in favor of speed up - which only seems
to work because most algorithms need to access `front` once.


What problems are solvable only by not caching the front element? I 
can't think of any.


And there is no way to define "transient" ranges in a way other than 
explicitly declaring they are transient. There isn't anything inherent 
or introspectable about such ranges.


-Steve

Re: Transient ranges

2016-05-29 Thread ZombineDev via Digitalmars-d

On Sunday, 29 May 2016 at 15:45:14 UTC, Joseph Rushton Wakeling 
wrote:

On Sunday, 29 May 2016 at 11:28:11 UTC, ZombineDev wrote:

On Sunday, 29 May 2016 at 11:15:19 UTC, Dicebot wrote:
I would prefer such ranges to not have `front` and return new 
item from `popFront` instead but yes, I would much prefer it 
to existing form, transient or not. It is impossible to 
correctly define input range without caching front which may 
not be always possible and may have negative performance 
impact. Because of that, a lot of Phobos ranges compromise 
`front` consistency in favor of speed up - which only seems 
to work because most algorithms need to access `front` once.


I believe this is biggest issue in D ranges design right now, 
by large margin.


+1
I think making popFront return a value for transient ranges is 
a sound idea. It would allow to easily distinguish between 
InputRange and TransientRange with very simple CT 
introspection. The biggest blocker is to teach the compiler to 
recognize TransientRange types in foreach.


I don't follow your reasoning here.  In the proposal I put 
forward, if a range doesn't define `popFront()`, it's not an 
InputRange, it's a TransientRange.


Conversely, if it _does_ define `popFront()`, it _is_ an 
InputRange.


What's the problem with introspecting that?


Yes it can be introspected in library, but it breaks the 
expectations of the unsuspecting user. For example:


auto r = getSomeRange();

assert (r.front == r.front);

So, under your suggestion, the above could fail just because an 
implementation detail was changed like modifying some input range 
to become a transient range and somehow the code still compiled.


Now I'm leaning more towards the buffer + popFront + empty trio, 
because it's harder to misuse.


An alternative would be tagging such ranges with an enum or UDA, 
but we would need to verify/change too many places in phobos for 
this to work.

Re: Transient ranges

2016-05-29 Thread Seb via Digitalmars-d

On Sunday, 29 May 2016 at 15:45:14 UTC, Joseph Rushton Wakeling 
wrote:

On Sunday, 29 May 2016 at 11:28:11 UTC, ZombineDev wrote:

On Sunday, 29 May 2016 at 11:15:19 UTC, Dicebot wrote:
I would prefer such ranges to not have `front` and return new 
item from `popFront` instead but yes, I would much prefer it 
to existing form, transient or not. It is impossible to 
correctly define input range without caching front which may 
not be always possible and may have negative performance 
impact. Because of that, a lot of Phobos ranges compromise 
`front` consistency in favor of speed up - which only seems 
to work because most algorithms need to access `front` once.


I believe this is biggest issue in D ranges design right now, 
by large margin.


+1
I think making popFront return a value for transient ranges is 
a sound idea. It would allow to easily distinguish between 
InputRange and TransientRange with very simple CT 
introspection. The biggest blocker is to teach the compiler to 
recognize TransientRange types in foreach.


I don't follow your reasoning here.  In the proposal I put 
forward, if a range doesn't define `popFront()`, it's not an 
InputRange, it's a TransientRange.


Conversely, if it _does_ define `popFront()`, it _is_ an 
InputRange.


What's the problem with introspecting that?


Nothing. Just that it could lead to a lot of surprising mistakes, 
currently the following yields an error:


struct A
{
   auto front(){ ...}
   enum empty = false;
}
A as;
foreach (a; as) {} // ERROR: no method popFront found

with the proposed change it would compile.

Re: Transient ranges

2016-05-29 Thread Stefan Koch via Digitalmars-d

On Sunday, 29 May 2016 at 15:45:14 UTC, Joseph Rushton Wakeling 
wrote:



What's the problem with introspecting that?


There is none :)
it could be implemented today.

Re: Transient ranges

2016-05-29 Thread Joseph Rushton Wakeling via Digitalmars-d


On Sunday, 29 May 2016 at 11:28:11 UTC, ZombineDev wrote:

On Sunday, 29 May 2016 at 11:15:19 UTC, Dicebot wrote:
I would prefer such ranges to not have `front` and return new 
item from `popFront` instead but yes, I would much prefer it 
to existing form, transient or not. It is impossible to 
correctly define input range without caching front which may 
not be always possible and may have negative performance 
impact. Because of that, a lot of Phobos ranges compromise 
`front` consistency in favor of speed up - which only seems to 
work because most algorithms need to access `front` once.


I believe this is biggest issue in D ranges design right now, 
by large margin.


+1
I think making popFront return a value for transient ranges is 
a sound idea. It would allow to easily distinguish between 
InputRange and TransientRange with very simple CT 
introspection. The biggest blocker is to teach the compiler to 
recognize TransientRange types in foreach.


I don't follow your reasoning here.  In the proposal I put 
forward, if a range doesn't define `popFront()`, it's not an 
InputRange, it's a TransientRange.


Conversely, if it _does_ define `popFront()`, it _is_ an 
InputRange.


What's the problem with introspecting that?

Re: Transient ranges

2016-05-29 Thread Seb via Digitalmars-d


On Sunday, 29 May 2016 at 11:28:11 UTC, ZombineDev wrote:

On Sunday, 29 May 2016 at 11:15:19 UTC, Dicebot wrote:

On 05/28/2016 08:27 PM, Joseph Rushton Wakeling wrote:
On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis 
wrote:

On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:
So what about the convention to explicitely declare a 
`.transient` enum member on a range, if the front element 
value can change?


Honestly, I don't think that supporting transient ranges is 
worth it.


I have personally wondered if there was a case for a 
TransientRange concept where the only primitives defined are 
`empty` and `front`.


`popFront()` is not defined because the whole point is that 
every single call to `front` will produce a different value.


I would prefer such ranges to not have `front` and return new 
item from `popFront` instead but yes, I would much prefer it 
to existing form, transient or not. It is impossible to 
correctly define input range without caching front which may 
not be always possible and may have negative performance 
impact. Because of that, a lot of Phobos ranges compromise 
`front` consistency in favor of speed up - which only seems to 
work because most algorithms need to access `front` once.


I believe this is biggest issue in D ranges design right now, 
by large margin.


+1
I think making popFront return a value for transient ranges is 
a sound idea. It would allow to easily distinguish between 
InputRange and TransientRange with very simple CT 
introspection. The biggest blocker is to teach the compiler to 
recognize TransientRange types in foreach.


I don't think that should be a huge problem, but after having 
looked at the compiler code [1]: we should name it neither front 
nor popFront, because how would the compiler know that it is 
supposed to be transient and not a normal InputRange without 
front or popFront for which it should throw an error?


Idea 1: New name that will make it easier to distinguish that 
transient ranges are something completly different to normal 
ranges.

How about next?
Problem 1: One can't use algorithms that work on transient ranges 
(map, reduce) anymore


Idea 2: Help the compiler with @Transient or `enum transient = 
true`
Problem 2: How would the "transientivity" be automatically 
forwarded to ranges that work on it.


Btw thinking longer about it - transient ranges aren't bad per 
se. They objey the InputRange contract and e.g. the following 
works just fine. It's just impossible to distinguish between a 
transient and a non-transient InputRange.


```
// input: 1\n2\n\3\n4\n...
void main()
{
import std.stdio, std.conv, std.algorithm;
stdin
.byLine
.map!((a) => a.to!int)
.sum
.writeln;
}
```

https://github.com/dlang/dmd/blob/master/src/statement.d#L2596

Re: Transient ranges

2016-05-29 Thread default0 via Digitalmars-d


On Sunday, 29 May 2016 at 11:36:37 UTC, ZombineDev wrote:

On Sunday, 29 May 2016 at 11:28:11 UTC, ZombineDev wrote:

On Sunday, 29 May 2016 at 11:15:19 UTC, Dicebot wrote:

On 05/28/2016 08:27 PM, Joseph Rushton Wakeling wrote:
On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis 
wrote:
On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d 
wrote:
So what about the convention to explicitely declare a 
`.transient` enum member on a range, if the front element 
value can change?


Honestly, I don't think that supporting transient ranges is 
worth it.


I have personally wondered if there was a case for a 
TransientRange concept where the only primitives defined are 
`empty` and `front`.


`popFront()` is not defined because the whole point is that 
every single call to `front` will produce a different value.


I would prefer such ranges to not have `front` and return new 
item from `popFront` instead but yes, I would much prefer it 
to existing form, transient or not. It is impossible to 
correctly define input range without caching front which may 
not be always possible and may have negative performance 
impact. Because of that, a lot of Phobos ranges compromise 
`front` consistency in favor of speed up - which only seems 
to work because most algorithms need to access `front` once.


I believe this is biggest issue in D ranges design right now, 
by large margin.


+1
I think making popFront return a value for transient ranges is 
a sound idea. It would allow to easily distinguish between 
InputRange and TransientRange with very simple CT 
introspection. The biggest blocker is to teach the compiler to 
recognize TransientRange types in foreach.


Scratch that:

Another option is to make popFront return a new range, ala 
slice[1..$] (like std.range.dropOne) which would have the 
benefit of allowing const/immutable ranges to work.


This won't work safely, because the compiler would need to 
disallow access to the previous instance of the range (sort of 
Rust moved-from objects), but it's currently no possible.


I proposed that idea because I have other uses for immutable 
ranges, unrelated to this discussion.


Isnt the idea to communicate "hey! You may not possibly cache 
whatever value you get in an iteration"? If yes, just removing 
.front seems odd as, well, I can still obviously just cache the 
result of .popFront if I'm inexperienced with ranges. Maybe 
instead simply rename .front to .buffer for transient ranges? 
That name would more accurately convey that this is just a buffer 
the range uses to dump the current front value in, but that if 
you want to cache it you will need to duplicate it (because, hey, 
this is the buffer of the range, not yours).


Not sure this is a great idea or a great name, but simply 
removing .front and returning the current iteration value from 
.popFront imho does not convey "you should not expect to be able 
to simply cache this by assignment" to me - arguably this is not 
a problem since how transient ranges work would be a documented 
thing, but thats just my 2 cents.

Re: Transient ranges

2016-05-29 Thread ZombineDev via Digitalmars-d


On Sunday, 29 May 2016 at 11:28:11 UTC, ZombineDev wrote:

On Sunday, 29 May 2016 at 11:15:19 UTC, Dicebot wrote:

On 05/28/2016 08:27 PM, Joseph Rushton Wakeling wrote:
On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis 
wrote:

On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:
So what about the convention to explicitely declare a 
`.transient` enum member on a range, if the front element 
value can change?


Honestly, I don't think that supporting transient ranges is 
worth it.


I have personally wondered if there was a case for a 
TransientRange concept where the only primitives defined are 
`empty` and `front`.


`popFront()` is not defined because the whole point is that 
every single call to `front` will produce a different value.


I would prefer such ranges to not have `front` and return new 
item from `popFront` instead but yes, I would much prefer it 
to existing form, transient or not. It is impossible to 
correctly define input range without caching front which may 
not be always possible and may have negative performance 
impact. Because of that, a lot of Phobos ranges compromise 
`front` consistency in favor of speed up - which only seems to 
work because most algorithms need to access `front` once.


I believe this is biggest issue in D ranges design right now, 
by large margin.


+1
I think making popFront return a value for transient ranges is 
a sound idea. It would allow to easily distinguish between 
InputRange and TransientRange with very simple CT 
introspection. The biggest blocker is to teach the compiler to 
recognize TransientRange types in foreach.


Scratch that:

Another option is to make popFront return a new range, ala 
slice[1..$] (like std.range.dropOne) which would have the 
benefit of allowing const/immutable ranges to work.


This won't work safely, because the compiler would need to 
disallow access to the previous instance of the range (sort of 
Rust moved-from objects), but it's currently no possible.


I proposed that idea because I have other uses for immutable 
ranges, unrelated to this discussion.

Re: Transient ranges

2016-05-29 Thread ZombineDev via Digitalmars-d


On Sunday, 29 May 2016 at 11:15:19 UTC, Dicebot wrote:

On 05/28/2016 08:27 PM, Joseph Rushton Wakeling wrote:
On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis 
wrote:

On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:
So what about the convention to explicitely declare a 
`.transient` enum member on a range, if the front element 
value can change?


Honestly, I don't think that supporting transient ranges is 
worth it.


I have personally wondered if there was a case for a 
TransientRange concept where the only primitives defined are 
`empty` and `front`.


`popFront()` is not defined because the whole point is that 
every single call to `front` will produce a different value.


I would prefer such ranges to not have `front` and return new 
item from `popFront` instead but yes, I would much prefer it to 
existing form, transient or not. It is impossible to correctly 
define input range without caching front which may not be 
always possible and may have negative performance impact. 
Because of that, a lot of Phobos ranges compromise `front` 
consistency in favor of speed up - which only seems to work 
because most algorithms need to access `front` once.


I believe this is biggest issue in D ranges design right now, 
by large margin.


+1
I think making popFront return a value for transient ranges is a 
sound idea. It would allow to easily distinguish between 
InputRange and TransientRange with very simple CT introspection. 
The biggest blocker is to teach the compiler to recognize 
TransientRange types in foreach.


Another option is to make popFront return a new range, ala 
slice[1..$] (like std.range.dropOne) which would have the benefit 
of allowing const/immutable ranges to work.

Re: Transient ranges

2016-05-29 Thread Dicebot via Digitalmars-d

On 05/28/2016 08:27 PM, Joseph Rushton Wakeling wrote:
> On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:
>> On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:
>>> So what about the convention to explicitely declare a `.transient`
>>> enum member on a range, if the front element value can change?
>>
>> Honestly, I don't think that supporting transient ranges is worth it.
> 
> I have personally wondered if there was a case for a TransientRange
> concept where the only primitives defined are `empty` and `front`.
> 
> `popFront()` is not defined because the whole point is that every single
> call to `front` will produce a different value.

I would prefer such ranges to not have `front` and return new item from
`popFront` instead but yes, I would much prefer it to existing form,
transient or not. It is impossible to correctly define input range
without caching front which may not be always possible and may have
negative performance impact. Because of that, a lot of Phobos ranges
compromise `front` consistency in favor of speed up - which only seems
to work because most algorithms need to access `front` once.

I believe this is biggest issue in D ranges design right now, by large
margin.

Re: Transient ranges

2016-05-28 Thread Joseph Rushton Wakeling via Digitalmars-d


On Saturday, 28 May 2016 at 21:32:15 UTC, Brad Roberts wrote:
On 5/28/2016 10:27 AM, Joseph Rushton Wakeling via 
Digitalmars-d wrote:
On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis 
wrote:

On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:
So what about the convention to explicitely declare a 
`.transient`
enum member on a range, if the front element value can 
change?


Honestly, I don't think that supporting transient ranges is 
worth it.


I have personally wondered if there was a case for a 
TransientRange
concept where the only primitives defined are `empty` and 
`front`.


`popFront()` is not defined because the whole point is that 
every single

call to `front` will produce a different value.


Um. that's wrong, just look at byLine as evidence.  It is 
popFront that changes the value to the 'next' one in the 
iteration.  Front called twice in a row does NOT change the 
value, and shouldn't.  I think you're misunderstanding 
something.


I'm not talking about how byLine currently behaves.  I'm talking 
about a meaningful API that could be used for a range that (by 
design) has a transient front, that would be unambiguous in its 
behaviour.

Re: Transient ranges

2016-05-28 Thread Brad Roberts via Digitalmars-d


On 5/28/2016 10:27 AM, Joseph Rushton Wakeling via Digitalmars-d wrote:

On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:

On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:

So what about the convention to explicitely declare a `.transient`
enum member on a range, if the front element value can change?


Honestly, I don't think that supporting transient ranges is worth it.


I have personally wondered if there was a case for a TransientRange
concept where the only primitives defined are `empty` and `front`.

`popFront()` is not defined because the whole point is that every single
call to `front` will produce a different value.


Um. that's wrong, just look at byLine as evidence.  It is popFront that 
changes the value to the 'next' one in the iteration.  Front called 
twice in a row does NOT change the value, and shouldn't.  I think you're 
misunderstanding something.

Re: Transient ranges

2016-05-28 Thread Seb via Digitalmars-d


On Saturday, 28 May 2016 at 19:09:09 UTC, Stefan Koch wrote:
On Saturday, 28 May 2016 at 17:27:17 UTC, Joseph Rushton 
Wakeling wrote:
On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis 
wrote:

On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:
So what about the convention to explicitely declare a 
`.transient` enum member on a range, if the front element 
value can change?


Honestly, I don't think that supporting transient ranges is 
worth it.


I have personally wondered if there was a case for a 
TransientRange concept where the only primitives defined are 
`empty` and `front`.


`popFront()` is not defined because the whole point is that 
every single call to `front` will produce a different value.


That is a rather sound idea.


I like this idea too!
`opApply` implies passing in a delegate which (if I am not 
mistaken) implies the use of GC at the moment and isn't as easy 
to optimize as `while (!file.empty) { writeln(file.front) }`

Re: Transient ranges

2016-05-28 Thread Stefan Koch via Digitalmars-d

On Saturday, 28 May 2016 at 17:27:17 UTC, Joseph Rushton Wakeling 
wrote:
On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis 
wrote:

On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:
So what about the convention to explicitely declare a 
`.transient` enum member on a range, if the front element 
value can change?


Honestly, I don't think that supporting transient ranges is 
worth it.


I have personally wondered if there was a case for a 
TransientRange concept where the only primitives defined are 
`empty` and `front`.


`popFront()` is not defined because the whole point is that 
every single call to `front` will produce a different value.


That is a rather sound idea.

Re: Transient ranges

2016-05-28 Thread Joseph Rushton Wakeling via Digitalmars-d


On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:

On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:
So what about the convention to explicitely declare a 
`.transient` enum member on a range, if the front element 
value can change?


Honestly, I don't think that supporting transient ranges is 
worth it.


I have personally wondered if there was a case for a 
TransientRange concept where the only primitives defined are 
`empty` and `front`.


`popFront()` is not defined because the whole point is that every 
single call to `front` will produce a different value.

Re: Transient ranges

2016-05-28 Thread cym13 via Digitalmars-d


On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:

On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:
So what about the convention to explicitely declare a 
`.transient` enum member on a range, if the front element 
value can change?


Honestly, I don't think that supporting transient ranges is 
worth it. Every single range-based function would have to 
either test that the "transient" enum wasn't there or take 
transient ranges into account, and realistically, that isn't 
going to happen. For better or worse, we do have byLine in 
std.stdio, which has a transient front, but aside from the 
performance benefits, it's been a disaster. It's way too 
error-prone. We now have byLineCopy to combat that, but of 
course, byLine is the more obvious function and thus more 
likely to be used (plus it's been around longer), so a _lot_ of 
code is going to end up using it, and a good chunk of that code 
really should be using byLineCopy.


I'm of the opinion that if you want a transient front, you 
should just use opApply and skip ranges entirely. Allowing for 
front to be transient - whether you can check for it or not - 
simply is not worth the extra complications. I'd love it if we 
deprecated byLine's range functions, and made it use opApply 
instead and just declare transient ranges to be completely 
unsupported. If you want to write your code to have a transient 
front, you can obviously take that risk, but you're on your own.


- Jonathan M Davis


Agreed, any time I see « you're code is wrong, byLine reuses its 
buffer you should use byLineCopy instead » I cringe as it means 
that people are supposed to know implementation details. It's 
obviously a leaky abstraction. While its speed makes it useful 
nonetheless I don't think such code should be encouraged.

Re: Transient ranges

2016-05-27 Thread Jonathan M Davis via Digitalmars-d

On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:
> So what about the convention to explicitely declare a
> `.transient` enum member on a range, if the front element value
> can change?

Honestly, I don't think that supporting transient ranges is worth it. Every
single range-based function would have to either test that the "transient"
enum wasn't there or take transient ranges into account, and realistically,
that isn't going to happen. For better or worse, we do have byLine in
std.stdio, which has a transient front, but aside from the performance
benefits, it's been a disaster. It's way too error-prone. We now have
byLineCopy to combat that, but of course, byLine is the more obvious
function and thus more likely to be used (plus it's been around longer), so
a _lot_ of code is going to end up using it, and a good chunk of that code
really should be using byLineCopy.

I'm of the opinion that if you want a transient front, you should just use
opApply and skip ranges entirely. Allowing for front to be transient -
whether you can check for it or not - simply is not worth the extra
complications. I'd love it if we deprecated byLine's range functions, and
made it use opApply instead and just declare transient ranges to be
completely unsupported. If you want to write your code to have a transient
front, you can obviously take that risk, but you're on your own.

- Jonathan M Davis

Transient ranges

2016-05-27 Thread Seb via Digitalmars-d

A couple of weeks ago on the next/shift convenience wrapper 
discussion [1], there was a nice discussion about transient 
ranges. It didn't come to a conclusion, so let's revie it. I just 
cite the best three quotes from the thread as a summary:


jmdavis:

The reality of the matter is that ranges with a transient front 
tend to break if you do much more than use foreach on them. 
They don't even work with std.array.array


schveiguy:

There is an implicit expectation that if you assign the front 
of a range to something, it's going to last forever, because 
many ranges do actually behave that way. But the range contract 
does not guarantee that, it just happens to work.


quickfur:

isTransient can only be implemented if the range implementor 
defines its value. There is no way to know whether a range is 
transient or not just by looking at its type and the signatures 
of front, popFront, etc.. Having said that, though, since 
transient ranges are generally the exception rather than the 
norm, we could adopt a convention that transient ranges declare 
themselves thus by defining a .transient enum member or 
something like that, and have isTransient return false if a 
range doesn't have such a member.


On a related note, there are a good number of Phobos algorithms 
that do not work  on transient ranges at all, because 
internally they (blindly) assume that assigning the value of 
.front to a local variable will retain its original value after 
popFront is called. I've fixed a couple of places where this 
happens, but I'm almost certain many other places haven't been 
fixed yet, and in some cases, cannot be fixed without 
restricting the algorithms to forward ranges only rather than 
input ranges. Some things simply cannot be implemented without 
copying .front somehow, and the only way to do this reliably 
with the current API is to require forward ranges and use save 
to keep track of the previous position of the range.


So what about the convention to explicitely declare a 
`.transient` enum member on a range, if the front element value 
can change?


[1] https://github.com/dlang/phobos/pull/4010

Re: One way to deal with transient ranges char[] buffers

2013-08-02 Thread Jakob Ovrum


On Friday, 2 August 2013 at 05:35:28 UTC, H. S. Teoh wrote:
Recently, I discovered an interesting idiom for dealing with 
transient
ranges, esp. w.r.t. strings / char[]. Many places in D require 
string,
but sometimes what you have is char[] which can't be converted 
to string
except by .idup. But you don't want to .idup in generic code, 
because if

the input's already a string, then it's wasteful duplication.


Places in D that require `string` either do so because they need 
the immutable guarantee or they do so out of error (e.g. should 
have used a string of const characters instead). The latter can 
of course be worked around, but the only *solution* involves 
fixing the upstream code, so I'll assume we're discussing the 
former case.


We don't have any generic mechanism for deep copying ranges. The 
`save` primitive is often implemented by means of copying, but 
conceptually is doing something very different, so it cannot be 
applied here. So, I don't see how your idea translates to ranges 
in general (not completely sure if it was intended to).


Thus, let's tackle the case of arrays/slices in particular, of 
which strings are the most common example. There is a precedent 
in D to push to the decision to copy an array upwards in the 
code. When the operations at hand require the immutable 
guarantee, state it in the interface of the code, such as by 
asking for `string` on a function's parameter. That's why so many 
functions take `string` when they need the immutable guarantee, 
as opposed to `const(char)[]` or a template parameter, followed 
by a GC copy operation. This way, copies are not only minimized, 
but centralized more in user code where they are more visible, 
and the method of making the copy - remember, not all client code 
is fine with rampant GC use - is also pushed up. Also, copies are 
one thing, but what if the caller had a string but in a different 
encoding? Not only does an allocation have to be made, but 
decoding and encoding is also necessary; the details of how to 
handle this are also pushed up, with the same benefits. It's a 
pretty mainstream idiom and is often reiterated by members of the 
community, such as in Ali's talk at dconf.


Your proposed solution only shares one benefit with the solution 
described above - that if the direct caller had a `string` 
already (or a range of `string`s), nothing has to be done. It 
forfeits all the other benefits for convenience. It also has 
problems with template bloat, which can be fixed but at a 
syntactical cost.


Overall I think it reduces the genericity of algorithms by trying 
to handle input types it doesn't actually support, which can be a 
big problem for performance-critical code.

Re: One way to deal with transient ranges char[] buffers

2013-08-02 Thread monarch_dodra


On Friday, 2 August 2013 at 05:35:28 UTC, H. S. Teoh wrote:

void func(S)(S input) if (isSomeString!S) {
string x = to!string(input);
... // use at will
}


+1. I saw this used recently, and I find it very clever.

Re: One way to deal with transient ranges char[] buffers

2013-08-02 Thread monarch_dodra


On Friday, 2 August 2013 at 07:50:28 UTC, Jakob Ovrum wrote:
Places in D that require `string` either do so because they 
need the immutable guarantee or they do so out of error (e.g. 
should have used a string of const characters instead). The 
latter can of course be worked around, but the only *solution* 
involves fixing the upstream code, so I'll assume we're 
discussing the former case.


One of the problems is often the return type. For example, readln 
will return a string, because it is simply more convenient in end 
user code. The standard string format in D is string, so 
that's what it returns.


However, if you call readln!(char[]), then you'll the *exact* 
same string, but of a non-immutable type.


This means that you *can* get the best out of both worlds, at no 
extra run-time cost. I think more functions should do this.


Without doing this, you face the eternal problem: Should I return 
a string, to give my end user more guarantees, when in fact my 
char array is perfectly mutable, or should I return a char[], 
forcing my end user to make an idup(or an unsafe cast) if he 
actually needed a string?


It's a tough problem to tackle.

Re: One way to deal with transient ranges char[] buffers

2013-08-02 Thread Tobias Pankrath


On Friday, 2 August 2013 at 08:46:18 UTC, monarch_dodra wrote:


Without doing this, you face the eternal problem: Should I 
return a string, to give my end user more guarantees, when in 
fact my char array is perfectly mutable, or should I return a 
char[], forcing my end user to make an idup(or an unsafe cast) 
if he actually needed a string?


It's a tough problem to tackle.


The solution has been posted to this newsgroup already in form of 
unique/unaliased types. Thats THE selling point for them.


If you have an array of char and it could be char[] as well as 
string, than let the type system infer the right one for you.

Re: One way to deal with transient ranges char[] buffers

2013-08-02 Thread monarch_dodra


On Friday, 2 August 2013 at 11:19:05 UTC, Tobias Pankrath wrote:

On Friday, 2 August 2013 at 08:46:18 UTC, monarch_dodra wrote:


Without doing this, you face the eternal problem: Should I 
return a string, to give my end user more guarantees, when 
in fact my char array is perfectly mutable, or should I return 
a char[], forcing my end user to make an idup(or an unsafe 
cast) if he actually needed a string?


It's a tough problem to tackle.


The solution has been posted to this newsgroup already in form 
of unique/unaliased types. Thats THE selling point for them.


If you have an array of char and it could be char[] as well as 
string, than let the type system infer the right one for you.


Interesting. Thanks for the tip. I'll look these up.

Re: One way to deal with transient ranges char[] buffers

2013-08-02 Thread Tobias Pankrath


On Friday, 2 August 2013 at 11:27:48 UTC, monarch_dodra wrote:

On Friday, 2 August 2013 at 11:19:05 UTC, Tobias Pankrath wrote:

On Friday, 2 August 2013 at 08:46:18 UTC, monarch_dodra wrote:


Without doing this, you face the eternal problem: Should I 
return a string, to give my end user more guarantees, when 
in fact my char array is perfectly mutable, or should I 
return a char[], forcing my end user to make an idup(or an 
unsafe cast) if he actually needed a string?


It's a tough problem to tackle.


The solution has been posted to this newsgroup already in form 
of unique/unaliased types. Thats THE selling point for them.


If you have an array of char and it could be char[] as well as 
string, than let the type system infer the right one for you.


Interesting. Thanks for the tip. I'll look these up.


http://www.digitalmars.com/d/archives/digitalmars/D/Immutable_and_unique_in_C_180572.html

Re: One way to deal with transient ranges char[] buffers

2013-08-02 Thread Jakob Ovrum


On Friday, 2 August 2013 at 08:37:50 UTC, monarch_dodra wrote:

On Friday, 2 August 2013 at 05:35:28 UTC, H. S. Teoh wrote:

void func(S)(S input) if (isSomeString!S) {
string x = to!string(input);
... // use at will
}


+1. I saw this used recently, and I find it very clever.


It's really not. It takes the decision to make a GC-allocated 
copy away from the client code in return for some minor 
convenience.

One way to deal with transient ranges char[] buffers

2013-08-01 Thread H. S. Teoh

Recently, I discovered an interesting idiom for dealing with transient
ranges, esp. w.r.t. strings / char[]. Many places in D require string,
but sometimes what you have is char[] which can't be converted to string
except by .idup. But you don't want to .idup in generic code, because if
the input's already a string, then it's wasteful duplication. The
solution is to do this:

void func(S)(S input) if (isSomeString!S) {
string x = to!string(input);
... // use at will
}

The nice thing about this is that if input is already a string,
to!string does nothing, and if it's char[] or const(char)[], to!()
handles calling .idup for you so you don't have to pollute generic code
with it.

When writing a generic function that takes a range of some string, and
you're not sure if the range is transient or not, the same trick helps
ensure that you don't run into transience-related problems:

auto func(R)(R range) if (isSomeString!(ElementType!S)) {
struct WrapperRange {
...
auto front() {
// This ensures we don't run into
// transience related problems, and that
// the range we return will *not* be
// transient.
return range.front.to!string();
}
}
return WrapperRange(...);
}


T

-- 
Freedom: (n.) Man's self-given right to be enslaved by his own depravity.

78 matches

Mail list logo