Re: Why is stdin.byLine.writeln so slow?

2014-06-14 Thread via Digitalmars-d-learn

On Friday, 13 June 2014 at 22:12:01 UTC, Ali Çehreli wrote:

On 06/13/2014 03:02 PM, monarch_dodra wrote:

 No, it just receives a range, so it does range formating. eg:
 [ ~ Element ~ ,  ~ Element ... ].

It still looks like it could send the formatting characters as 
well as the elements separately to the output stream:


[
Element
, 
...
]

I am assuming that the slowness in OP's example is due to 
constructing a long string.


It already does what you suggest, and doesn't constructing one 
big string. You can test this


void main() {
import std.stdio;
stdin.byLine.writeln;
}

When you type in several lines in the terminal, it will output 
the first element as soon as you pressed enter for the first line.


Why is stdin.byLine.writeln so slow?

2014-06-13 Thread Jyxent via Digitalmars-d-learn

I've been playing around with D and noticed that:

stdin.byLine.writeln

takes ~20 times as long as:

foreach(line; stdin.byLine) writeln(line);

I asked on IRC and this was suggested:

stdin.byLine(KeepTerminator.yes).copy(stdout.lockingTextWriter)

which is slightly faster than the foreach case.

It was suggested that there is something slow about writeln 
taking the input range, but I'm not sure I see why.  If I follow 
the code correctly, formatRange in std.format will eventually be 
called and iterate over the range.


Re: Why is stdin.byLine.writeln so slow?

2014-06-13 Thread monarch_dodra via Digitalmars-d-learn

On Friday, 13 June 2014 at 20:48:16 UTC, Jyxent wrote:

I've been playing around with D and noticed that:

stdin.byLine.writeln

takes ~20 times as long as:

foreach(line; stdin.byLine) writeln(line);

I asked on IRC and this was suggested:

stdin.byLine(KeepTerminator.yes).copy(stdout.lockingTextWriter)

which is slightly faster than the foreach case.

It was suggested that there is something slow about writeln 
taking the input range, but I'm not sure I see why.  If I 
follow the code correctly, formatRange in std.format will 
eventually be called and iterate over the range.


Because:
stdin.byLine.writeln
and
foreach(line; stdin.byLine) writeln(line);
Don't produce the same output. One prints a range that contains 
strings, whereas the second repeatedly prints strings.


Given this input:
line 1
line2
Yo!

Then stdin.byLine.writeln will produce this string:
[line 1, line\t2, Yo!]

So that's the extra overhead which is slowing you down, because 
*each* character needs to be individually parsed, and potentially 
escaped (eg: \t).


The copy option is the same as the foreach one, since each 
string is individually passed to the writeln, which doesn't parse 
your string. The lockingTextWriter is just sugar to squeeze out 
extra speed.


Re: Why is stdin.byLine.writeln so slow?

2014-06-13 Thread Ali Çehreli via Digitalmars-d-learn

On 06/13/2014 02:08 PM, monarch_dodra wrote:

 Given this input:
 line 1
 line2
 Yo!

 Then stdin.byLine.writeln will produce this string:
 [line 1, line\t2, Yo!]

Do you mean writeln() first generates an array and then prints that 
array? I've always imagined that it used the range interface and did 
similar to what copy() does.


Is there a good reason why the imagined-by-me-range-overload of 
writeln() behaves that way?


Ali



Re: Why is stdin.byLine.writeln so slow?

2014-06-13 Thread Jyxent via Digitalmars-d-learn

On Friday, 13 June 2014 at 21:08:08 UTC, monarch_dodra wrote:

On Friday, 13 June 2014 at 20:48:16 UTC, Jyxent wrote:

I've been playing around with D and noticed that:

stdin.byLine.writeln

takes ~20 times as long as:

foreach(line; stdin.byLine) writeln(line);

I asked on IRC and this was suggested:

stdin.byLine(KeepTerminator.yes).copy(stdout.lockingTextWriter)

which is slightly faster than the foreach case.

It was suggested that there is something slow about writeln 
taking the input range, but I'm not sure I see why.  If I 
follow the code correctly, formatRange in std.format will 
eventually be called and iterate over the range.


Because:
stdin.byLine.writeln
and
foreach(line; stdin.byLine) writeln(line);
Don't produce the same output. One prints a range that contains 
strings, whereas the second repeatedly prints strings.


Given this input:
line 1
line2
Yo!

Then stdin.byLine.writeln will produce this string:
[line 1, line\t2, Yo!]

So that's the extra overhead which is slowing you down, because 
*each* character needs to be individually parsed, and 
potentially escaped (eg: \t).


The copy option is the same as the foreach one, since each 
string is individually passed to the writeln, which doesn't 
parse your string. The lockingTextWriter is just sugar to 
squeeze out extra speed.


Hah.  You're right.  I had seen writeln being used this way and 
just assumed that it printed every line, without looking at the 
output too closely.


Thanks for clearing that up.


Re: Why is stdin.byLine.writeln so slow?

2014-06-13 Thread monarch_dodra via Digitalmars-d-learn

On Friday, 13 June 2014 at 21:17:27 UTC, Ali Çehreli wrote:

On 06/13/2014 02:08 PM, monarch_dodra wrote:

 Given this input:
 line 1
 line2
 Yo!

 Then stdin.byLine.writeln will produce this string:
 [line 1, line\t2, Yo!]

Do you mean writeln() first generates an array and then prints 
that array?


No, it just receives a range, so it does range formating. eg:
[ ~ Element ~ ,  ~ Element ... ].

I've always imagined that it used the range interface and did 
similar to what copy() does.


That wouldn't make sense. Then, if I did [1, 2, 3].writeln();, 
it would print:

123
instead of
[1, 2, 3]

Is there a good reason why the imagined-by-me-range-overload of 
writeln() behaves that way?


Ali


As I said, it's a range, so it prints a range. That's all there 
is to it.


That said, you can use one of D's most powerful formating 
abilities: Range formating:

writefln(%-(%s\n%), stdin.byLine());

And BOOM. Does what you want. I freaking love range formatting.
More info here:
http://dlang.org/phobos/std_format.html#.formattedWrite

TLDR:
%( = range start
%) = range end
%-( = range start without element escape (for strings mostly).


Re: Why is stdin.byLine.writeln so slow?

2014-06-13 Thread Ali Çehreli via Digitalmars-d-learn

On 06/13/2014 03:02 PM, monarch_dodra wrote:

 No, it just receives a range, so it does range formating. eg:
 [ ~ Element ~ ,  ~ Element ... ].

It still looks like it could send the formatting characters as well as 
the elements separately to the output stream:


[
Element
, 
...
]

I am assuming that the slowness in OP's example is due to constructing a 
long string.


Ali



Re: Why is stdin.byLine.writeln so slow?

2014-06-13 Thread monarch_dodra via Digitalmars-d-learn

On Friday, 13 June 2014 at 22:12:01 UTC, Ali Çehreli wrote:

On 06/13/2014 03:02 PM, monarch_dodra wrote:

 No, it just receives a range, so it does range formating. eg:
 [ ~ Element ~ ,  ~ Element ... ].

It still looks like it could send the formatting characters as 
well as the elements separately to the output stream:


[
Element
, 
...
]

I am assuming that the slowness in OP's example is due to 
constructing a long string.


Ali


We'd have to check, but don't think that formatted write actually 
ever allocates anywhere, so there should be no constructing a 
long string. The real issue (I think), is that when you ask 
formatted write to write a string, it just pipes the entire char 
array at once to the underlying stream.


If the characters are escaped though (which is the case when you 
print an array of strings), then formatedWrite needs to check 
each character individually, and then also pass each character 
individually to the underlying stream. And *that* could 
definitely justify the order of magnitude slowdown observed.


What's more this *may* trigger a per-character decode-encode 
loop. I'd have to check. But that shouldn't be observable next to 
the stream overhead anyways.


Re: Why is stdin.byLine.writeln so slow?

2014-06-13 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Jun 13, 2014 at 10:02:49PM +, monarch_dodra via Digitalmars-d-learn 
wrote:
[...]
 That said, you can use one of D's most powerful formating abilities:
 Range formating:
 writefln(%-(%s\n%), stdin.byLine());
 
 And BOOM. Does what you want. I freaking love range formatting.
 More info here:
 http://dlang.org/phobos/std_format.html#.formattedWrite
 
 TLDR:
 %( = range start
 %) = range end
 %-( = range start without element escape (for strings mostly).

I wrote part of that documentation, and my favorite example is matrix
formatting:

auto mat = [[1,2,3], [4,5,6], [7,8,9]];
writefln([%([%(%d %)]%|\n %)], mat);

Output:

[[1 2 3]
 [4 5 6]
 [7 8 9]]

D coolness at its finest!

Whoever invented %(, %|, %) is a genius. It takes C's printf formatting
from weak sauce to whole new levels of awesome. I remember debugging
some range-based code, and being able to write stuff like:

debug writefln(%(%(%s, %); %), buggyNestedRange().take(10));

at strategic spots in the code is just pure win.  In C/C++, you'd have
to manually write nested loops to print out the data, which may involve
manually calling accessor methods, manually counting them, perhaps
storing intermediate output fragments in temporary buffers,
encapsulating all this jazz in a throwaway function so that you can use
it at multiple strategic points (in D you just copy-n-paste the single
line above), etc..  Pure lose.

(Speaking of which, this might be an awesome lightning talk topic at the
next DConf. ;-) Or did somebody already do it?)


T

-- 
Having a smoking section in a restaurant is like having a peeing section in a 
swimming pool. -- Edward Burr 


Re: Why is stdin.byLine.writeln so slow?

2014-06-13 Thread monarch_dodra via Digitalmars-d-learn
On Friday, 13 June 2014 at 22:25:25 UTC, H. S. Teoh via 
Digitalmars-d-learn wrote:

In C/C++,

you'd have
to manually write nested loops to print out the data, which may 
involve
manually calling accessor methods, manually counting them, 
perhaps

storing intermediate output fragments in temporary buffers,
encapsulating all this jazz in a throwaway function so that you 
can use
it at multiple strategic points (in D you just copy-n-paste the 
single

line above), etc..  Pure lose.

T


In C++, I usually use copy/transform:

*std::copy(begin(), end(), std::ostream_iteratorT(std::cout, 
\n)) = \n;

or
*std::tranform(begin(), end(), 
std::ostream_iteratorT(std::cout, \n), [](???){???}) = \n;


It's a bit verbose, and looks like ass to the non-initiated, but 
once you are used to it, it's quite convenient. It's just 
something that grows on you. You can stack on a foreach if you 
need more depth.


foreach(begin(), end(), [](R r){
  *std::copy(r.begin(), r.end(), 
std::ostream_iteratorT(std::cout, \n)) = \n;

});

Though arguably, that's just a loop in disguise :)