subject:"Re\: Range of n lines from stdin"

Re: Range of n lines from stdin

2013-12-28 Thread Jakob Ovrum


On Friday, 27 December 2013 at 20:30:52 UTC, Ivan Kazmenko wrote:
Hmm?..  From my experience, attempting to use a range in a 
wrong way usually results in a compilation error.  For example, 
I can't do

n.iota.map!(_ = readln).sort())
since MapResult isn't a random access range with swappable 
elements. I can instead do

n.iota.map!(_ = readln).array().sort())
and it allocates an array and works as expected.  So, how do I 
misuse that range?


Yes, the idea is that ranges only present interfaces that make 
sense, so cases of misuse will result in a compilation error. 
However, hacks like using map with functions that ignore their 
argument(s) throws that out of the window: `r` in `auto r = 
n.iota.map!(_ = readln);` claims to support forward, 
bidirectional and random access (all read-only, as the argument 
function returns by value) as well as slicing, but none of these 
make any sense; all access primitives do exactly the same thing, 
with the result being different every time. Even the simplest 
invariants fail, such as `r.front == r.front`, and `popFront`, 
`popBack` and slicing only has a binary effect, whether or not 
the range is empty yet.


Hmm.  For example, that could be a RNG emitting (a range of) 
random numbers, then empty is always false.  But we still 
want a new random number each time.  Something like

n.iota.map!(_ = uniform(0, 10))


That would only provide `n` random numbers, not an infinite 
number.


All the random number generator types in `std.random` are 
infinite forward ranges of random numbers, which is completely 
fine. For any PRNG `r`, `r.front == r.front` is true, and remains 
the same number until `r.popFront()`, it correctly has no length 
and is always non-empty (infinite range), and `r.save` works 
correctly etc.



So, something like
n.iota.map !(_ = readln).writeln;
is bad style but
writeln (n.iota.map !(_ = readln));
better shows what's the main action?  Makes sense for me.


No, it has nothing to do with syntax. The two examples are 
completely equivalent, and the only problem is that it breaks the 
invariant that the result of map's transformation function should 
be derived from the arguments it was given. The fact that the 
transformation function is impure is not in itself a problem: 
pure functions can also ignore arguments, and impure functions 
can return consistent results while still being necessarily 
impure.


Perhaps there's a wholly different way of thinking about this 
in which the first definition makes much more sense than then 
second one from the start.  If so, please share it.


All you have to do is look at the signature of the function, 
which is the primary part of its documentation:


Repeat!T repeat(T)(T value);

It takes one value of any type T, not a function pointer or 
delegate that returns T. Even if you give it a function pointer 
or delegate (which your example does not), it will simply repeat 
that function pointer or delegate, never calling it.


As I already explained, `readln.repeat(n)` is just a different 
way of writing `readln().repeat(n)` which in turn is also 
equivalent to `repeat(readln(), n)`. This should make it 
perfectly clear what it does - `readln` is called and its return 
value is passed to `repeat`. Barring one relatively obscure 
exception[1], this is the only way to interpret the expression 
regardless of the signature of the function, as a consequence of 
basic languages rules common to the entire C family of 
programming languages.


[1] ... in D we have something (slightly controversial) called 
the `lazy` parameter storage class, but when used, it is clearly 
visible in the signature of the function. 
http://dlang.org/function.html#parameters

Re: Range of n lines from stdin

2013-12-28 Thread Ivan Kazmenko


Many thanks to Marco, Ali and Jakob for the answers!

On Saturday, 28 December 2013 at 08:56:53 UTC, Jakob Ovrum wrote:
On Friday, 27 December 2013 at 20:30:52 UTC, Ivan Kazmenko 
wrote:
Hmm?..  From my experience, attempting to use a range in a 
wrong way usually results in a compilation error.  For 
example, I can't do

n.iota.map!(_ = readln).sort())
since MapResult isn't a random access range with swappable 
elements. I can instead do

n.iota.map!(_ = readln).array().sort())
and it allocates an array and works as expected.  So, how do I 
misuse that range?


Yes, the idea is that ranges only present interfaces that make 
sense, so cases of misuse will result in a compilation error. 
However, hacks like using map with functions that ignore their 
argument(s) throws that out of the window: `r` in `auto r = 
n.iota.map!(_ = readln);` claims to support forward, 
bidirectional and random access (all read-only, as the argument 
function returns by value) as well as slicing, but none of 
these make any sense; all access primitives do exactly the same 
thing, with the result being different every time. Even the 
simplest invariants fail, such as `r.front == r.front`, and 
`popFront`, `popBack` and slicing only has a binary effect, 
whether or not the range is empty yet.


OK, I'm now beginning to understand how hacky is that.

All the random number generator types in `std.random` are 
infinite forward ranges of random numbers, which is completely 
fine. For any PRNG `r`, `r.front == r.front` is true, and 
remains the same number until `r.popFront()`, it correctly has 
no length and is always non-empty (infinite range), and 
`r.save` works correctly etc.


So, for both of my examples, support for desired behavior is 
provided at the different side: not a non-caching repeat for a 
given function but a range of lines or random numbers with the 
desired properties instead of such function.  Maybe that's 
usually the right thing to do in the general case, too...


Perhaps there's a wholly different way of thinking about this 
in which the first definition makes much more sense than then 
second one from the start.  If so, please share it.


All you have to do is look at the signature of the function, 
which is the primary part of its documentation:


Repeat!T repeat(T)(T value);

It takes one value of any type T, not a function pointer or 
delegate that returns T. Even if you give it a function pointer 
or delegate (which your example does not), it will simply 
repeat that function pointer or delegate, never calling it.


So what I initially wanted is possible with something like (I 
just checked):

(readln!(string)) . repeat(n) . map!(f = f('\n'))
However, I'm having a hard time trying to get rid of !(string) 
and '\n' to make something like the following work:

(readln) . repeat(n) . map!(f = f())
Anyway, even the second line (which does not compile) looks 
cryptic a bit.  And it still has the problem of silently adding 
empty lines after end-of-file was reached.


[1] ... in D we have something (slightly controversial) called 
the `lazy` parameter storage class, but when used, it is 
clearly visible in the signature of the function. 
http://dlang.org/function.html#parameters


Thank you for the link.  This is indeed what was my other 
expectation for repeat.


Ivan Kazmenko.

Re: Range of n lines from stdin

2013-12-27 Thread Marco Leise

Am Fri, 27 Dec 2013 14:26:59 +
schrieb Ivan Kazmenko ga...@mail.ru:

 Quick question.
 
 (1) I can do
 n.iota.map!(_ = readln)
 to get the next n lines from stdin.
 
 (2) However, when I do
 readln.repeat(n)
 it looks clearer but works differently: preserves front and reads 
 only one line.
 
 (3) In the particular case of readln, we can substitute it with
 stdin.byLine.take(n)
 but the question remains for other impure functions.
 
 So, what I ask for is some non-caching repeat for functions with 
 side effects.  More idiomatic than (1).  Is there something like 
 that in Phobos?  Is it an OK style to have an impure function in 
 an UFCS chain?
 
 If repeat could know whether its first argument is pure, it could 
 then enable or disable front caching depending on purity... no 
 way currently?

repeat() is only meant to repeat the same first element over
and over. I think it would be wrong if it changed its value
during iteration. A wrapper struct could be more ideomatic:

  FuncRange!readln.take(n)

 Ivan Kazmenko.

-- 
Marco

Re: Range of n lines from stdin

2013-12-27 Thread Ali Çehreli


On 12/27/2013 06:26 AM, Ivan Kazmenko wrote:

 n.iota.map!(_ = readln)
 to get the next n lines from stdin.

 So, what I ask for is some non-caching repeat for functions with side
 effects.  More idiomatic than (1).

This request comes up once in a while.

 Is there something like that in Phobos?

As far as I know, no. Although, bearophile may have a bug report to 
track the issue. :)


 Is it an OK style to have an impure function in an UFCS chain?

I don't think any different than side effects in other parts of the 
language. In other words, side effects are a part of D. :)


Ali

Re: Range of n lines from stdin

2013-12-27 Thread Ivan Kazmenko


On Friday, 27 December 2013 at 18:32:29 UTC, Jakob Ovrum wrote:

(1) I can do
n.iota.map!(_ = readln)
to get the next n lines from stdin.


This has several issues:

 * The result claims to have all kinds of range capabilities 
that don't make sense at all. Attempting to actually use these 
capabilities, likely indirectly through range algorithms, can 
cause all kinds of havoc.


Hmm?..  From my experience, attempting to use a range in a wrong 
way usually results in a compilation error.  For example, I can't 
do

n.iota.map!(_ = readln).sort())
since MapResult isn't a random access range with swappable 
elements. I can instead do

n.iota.map!(_ = readln).array().sort())
and it allocates an array and works as expected.  So, how do I 
misuse that range?


 * It will allocate a new buffer for the read line every time 
`front` is called, which is less granular than `byLine`'s 
allocation behaviour.


 * If `stdin` (or whatever file) only has `i` number of lines 
left in it where `i  n`, the range will erroneously report `n 
- i` number of empty lines at the end.


 * It's not showing intent as clear as it should be.


Thank you for pointing these out!  So it's not performant, not 
correct and not idiomatic.  I understood only a part of that, but 
already asked for a better alternative.  Well, that's more 
arguments to the same point.  And yeah, stdin.byLine serves 
rather well in this particular case.


So, what I ask for is some non-caching repeat for functions 
with side effects.  More idiomatic than (1).  Is there 
something like that in Phobos?


It's hard generalize. For one, what is the empty condition?


Hmm.  For example, that could be a RNG emitting (a range of) 
random numbers, then empty is always false.  But we still want 
a new random number each time.  Something like

n.iota.map!(_ = uniform(0, 10))


Is it an OK style to have an impure function in an UFCS chain?


I assume by UFCS chain you mean range compositions in 
particular.


It's not really about purity; impure links in the chain are 
fine (e.g. `byLine`). The issue is when the side effects are 
the only result - I think that is very bad style, and should 
either be rewritten in terms of return values, or rewritten to 
use an imperative style.


So, something like
n.iota.map !(_ = readln).writeln;
is bad style but
writeln (n.iota.map !(_ = readln));
better shows what's the main action?  Makes sense for me.

If repeat could know whether its first argument is pure, it 
could then enable or disable front caching depending on 
purity... no way currently?


`readln.repeat(n)` can also be written `repeat(readln(), n)`. 
Maybe that makes it more obvious what it does - reads one line 
from standard input and passes that to `repeat`, which returns 
a range that returns that same line `n` times.


The confusion for me is this: does repeat mean eagerly get a 
value once and then lazily repeat it n times or do what the 
first argument suggests (emit constant, call function, etc.) n 
times?  I guess it depends on the defaults of the language.  
Currently, I had no strong preference for one definition over the 
other when I saw the name.  Maybe I would indeed prefer the first 
definition if I knew D better, I don't know.


In the first definition, the eagerly vs. lazily contradiction 
in my mind is what scares me off from making it the default: if 
repeat is a lazy range by itself, why would it treat its 
argument eagerly?  What if the argument is a lazy range itself, 
having a new value each time repeat asks for it?


The first definition makes much more sense for me when I treat it 
this way: repeat expects its first argument to be pure (not able 
to change between calls).


Perhaps there's a wholly different way of thinking about this in 
which the first definition makes much more sense than then second 
one from the start.  If so, please share it.


Ivan Kazmenko.

Re: Range of n lines from stdin

2013-12-27 Thread Ivan Kazmenko


On Friday, 27 December 2013 at 20:30:52 UTC, Ivan Kazmenko wrote:

On Friday, 27 December 2013 at 18:32:29 UTC, Jakob Ovrum wrote:
If repeat could know whether its first argument is pure, it 
could then enable or disable front caching depending on 
purity... no way currently?


`readln.repeat(n)` can also be written `repeat(readln(), n)`. 
Maybe that makes it more obvious what it does - reads one line 
from standard input and passes that to `repeat`, which returns 
a range that returns that same line `n` times.


The confusion for me is this: does repeat mean eagerly get a 
value once and then lazily repeat it n times or do what the 
first argument suggests (emit constant, call function, etc.) n 
times?  I guess it depends on the defaults of the language.  
Currently, I had no strong preference for one definition over 
the other when I saw the name.  Maybe I would indeed prefer the 
first definition if I knew D better, I don't know.


In the first definition, the eagerly vs. lazily contradiction 
in my mind is what scares me off from making it the default: if 
repeat is a lazy range by itself, why would it treat its 
argument eagerly?  What if the argument is a lazy range itself, 
having a new value each time repeat asks for it?


The first definition makes much more sense for me when I treat 
it this way: repeat expects its first argument to be pure (not 
able to change between calls).


Perhaps there's a wholly different way of thinking about this 
in which the first definition makes much more sense than then 
second one from the start.  If so, please share it.


Maybe the imperative should be repeat is a function, and 
arguments of functions should be evaluated only once?  It does 
make sense from a language point of view, but somewhat breaks the 
abstraction for me.

Re: Range of n lines from stdin

2013-12-27 Thread Marco Leise

Am Fri, 27 Dec 2013 20:34:02 +
schrieb Ivan Kazmenko ga...@mail.ru:

 Maybe the imperative should be repeat is a function, and 
 arguments of functions should be evaluated only once?  It does 
 make sense from a language point of view, but somewhat breaks the 
 abstraction for me.

The documentation is clear about it:

Repeats one value forever.

It has nothing to do with purity, whether the input range is
lazy or the element is fetched eagerly. If it was meant to do
what you expected it would read:

Constructs a range from lazily evaluating the expression
passed to it over and over.

This is not a limitation of the language either I think, since
arguments to functions can be declared lazy.

-- 
Marco

Re: Range of n lines from stdin

Re: Range of n lines from stdin

Re: Range of n lines from stdin

Re: Range of n lines from stdin

Re: Range of n lines from stdin

Re: Range of n lines from stdin

Re: Range of n lines from stdin

7 matches

Site Navigation

Mail list logo

Footer information