Re: Leaving a pointer to it on the stack

2020-08-13 Thread Andre Pany via Digitalmars-d-learn
On Thursday, 13 August 2020 at 21:10:57 UTC, Steven Schveighoffer 
wrote:

On 8/13/20 4:51 PM, Andre Pany wrote:

[...]


So in your real world scenario, a non-D thread/program is 
calling sample, and it controls the location of *i? If so, then 
no, you can't depend on D not collecting that data, because D 
might not scan that location.



[...]


As long as you aren't allocating again later, yes. You can also 
disable collections and only run them when you know it's safe 
to do so.


-steve


Thanks for the answers. This clarifies all my questions.

Kind regards
Andre


Re: Leaving a pointer to it on the stack

2020-08-13 Thread Steven Schveighoffer via Digitalmars-d-learn

On 8/13/20 4:51 PM, Andre Pany wrote:

On Thursday, 13 August 2020 at 20:11:50 UTC, Steven Schveighoffer wrote:
The garbage collector scans all of the stack as if it were an array of 
pointers. So if you have a pointer to it anywhere on the stack, it 
won't be collected.


However, it only scans threads that the runtime knows about.



If I understand it right, this paragraph doesn't help in my real 
scenario where sample (Dll) is called from a delphi executable.


So in your real world scenario, a non-D thread/program is calling 
sample, and it controls the location of *i? If so, then no, you can't 
depend on D not collecting that data, because D might not scan that 
location.


It is another runtime (delphi). But on the other side, as D GC only runs 
if something new should be allocated on D side, I can safely assume that 
the Delphi caller can access the heap variables? Of course as long as it 
doesn't store the references and use it later...


As long as you aren't allocating again later, yes. You can also disable 
collections and only run them when you know it's safe to do so.


-steve


Re: generating random numbers

2020-08-13 Thread Andy Balba via Digitalmars-d-learn

On Monday, 10 August 2020 at 15:43:04 UTC, Andy Balba wrote:

On Monday, 10 August 2020 at 15:13:51 UTC, bachmeier wrote:

On Monday, 10 August 2020 at 14:20:23 UTC, bachmeier wrote:

On Monday, 10 August 2020 at 05:51:07 UTC, Andy Balba wrote:
generating random numbers using 
https://dlang.org/library/std/random/uniform01.html


I find the example given in this section totally 
incomprehensible

.. Can any help me answer two simple questions:
How to generate a random floating number in range [0,1) ?
How to set a seed value, prior to generating random values ?


Strange example for sure. I'd recommend checking out the 
examples on the landing page for std.random: 
https://dlang.org/library/std/random.html


I created a PR with a hopefully clearer example:
https://github.com/dlang/phobos/pull/7588


Ahhh yes, yes .. this is the way to write Dlang example code :
https://dlang.org/library/std/random.html



... a very neat random byte generator is at
 https://github.com/LightBender/SecureD/tree/master/source/secured

here's the essential code :

import std.digest;
import std.stdio;
import std.exception;

@trusted ubyte[] random (uint bytes)
{
  if (bytes == 0)
 { printf("number of bytes must be > zero"); return null; }

  ubyte[] buffer = new ubyte[bytes];

  try
  { File urandom = File("/dev/urandom", "rb");
urandom.setvbuf (null, _IONBF);
scope(exit) urandom.close();

try
  { buffer= urandom.rawRead(buffer); }

catch(ErrnoException ex)
  { printf("Cant get next random bytes"); return null;}
catch(Exception ex)
  { printf("Cant get next random bytes"); return null; }
  }

  catch(ErrnoException ex)
{ printf("Cant initialize system RNG"); return null; }
  catch(Exception ex)
{ printf ("Cant initialize system RNG"); return null; }

return buffer;
}

void main()
{
  ubyte[] rnd1 = random(32);
  writeln("32Bytes: ", toHexString!(LetterCase.lower)(rnd1));

  ubyte[] rnd2 = random(128);
  writeln("128Bytes: ", toHexString!(LetterCase.lower)(rnd2));

  ubyte[] rnd3 = random(512);
  writeln("512Bytes:"); 
writeln(toHexString!(LetterCase.lower)(rnd3));


  ubyte[] rnd4 = random(2048);
  writeln("2048 Bytes:"); 
writeln(toHexString!(LetterCase.lower)(rnd4));


}


Re: Leaving a pointer to it on the stack

2020-08-13 Thread Andre Pany via Digitalmars-d-learn
On Thursday, 13 August 2020 at 20:11:50 UTC, Steven Schveighoffer 
wrote:

On 8/13/20 4:04 PM, Andre Pany wrote:

Hi,

in the specification 
https://dlang.org/spec/interfaceToC.html#storage_allocation 
there is this paragraph:
"Leaving a pointer to it on the stack (as a parameter or 
automatic variable), as the garbage collector will scan the 
stack."


I have some trouble to understand what does this mean. Given 
this example:


```
import std;

void main()
{
 int* i;
 sample();
 writeln(*i);
}

extern(C) export void sample(int** i)
{
 *i = new int();
 **i = 42;
}
```

Int variable is created on the heap. How do I leave a pointer 
on the stack?
(In the real coding, sample function will be called from 
Delphi)




The garbage collector scans all of the stack as if it were an 
array of pointers. So if you have a pointer to it anywhere on 
the stack, it won't be collected.


However, it only scans threads that the runtime knows about.

-Steve


If I understand it right, this paragraph doesn't help in my real 
scenario where sample (Dll) is called from a delphi executable.


It is another runtime (delphi). But on the other side, as D GC 
only runs if something new should be allocated on D side, I can 
safely assume that the Delphi caller can access the heap 
variables? Of course as long as it doesn't store the references 
and use it later...


Kind regards
Andre


Re: Leaving a pointer to it on the stack

2020-08-13 Thread Steven Schveighoffer via Digitalmars-d-learn

On 8/13/20 4:04 PM, Andre Pany wrote:

Hi,

in the specification 
https://dlang.org/spec/interfaceToC.html#storage_allocation there is 
this paragraph:
"Leaving a pointer to it on the stack (as a parameter or automatic 
variable), as the garbage collector will scan the stack."


I have some trouble to understand what does this mean. Given this example:

```
import std;

void main()
{
 int* i;
 sample();
 writeln(*i);
}

extern(C) export void sample(int** i)
{
 *i = new int();
 **i = 42;
}
```

Int variable is created on the heap. How do I leave a pointer on the stack?
(In the real coding, sample function will be called from Delphi)



The garbage collector scans all of the stack as if it were an array of 
pointers. So if you have a pointer to it anywhere on the stack, it won't 
be collected.


However, it only scans threads that the runtime knows about.

-Steve


Re: Leaving a pointer to it on the stack

2020-08-13 Thread Adam D. Ruppe via Digitalmars-d-learn

On Thursday, 13 August 2020 at 20:04:59 UTC, Andre Pany wrote:

Hi,

in the specification 
https://dlang.org/spec/interfaceToC.html#storage_allocation 
there is this paragraph:
"Leaving a pointer to it on the stack (as a parameter or 
automatic variable), as the garbage collector will scan the 
stack."


I have some trouble to understand what does this mean. Given 
this example:


```
import std;

void main()
{
int* i;
sample();
writeln(*i);
}

extern(C) export void sample(int** i)
{
*i = new int();
**i = 42;
}
```

Int variable is created on the heap. How do I leave a pointer 
on the stack?


You just did - the `int* i` is a pointer left on the stack for 
the duration of `main` so the GC won't collect it until after 
main returns.


But after main returns, even if `sample` kept a copy of it 
somewhere in some other location, the GC might reap it...


Leaving a pointer to it on the stack

2020-08-13 Thread Andre Pany via Digitalmars-d-learn

Hi,

in the specification 
https://dlang.org/spec/interfaceToC.html#storage_allocation there 
is this paragraph:
"Leaving a pointer to it on the stack (as a parameter or 
automatic variable), as the garbage collector will scan the 
stack."


I have some trouble to understand what does this mean. Given this 
example:


```
import std;

void main()
{
int* i;
sample();
writeln(*i);
}

extern(C) export void sample(int** i)
{
*i = new int();
**i = 42;
}
```

Int variable is created on the heap. How do I leave a pointer on 
the stack?

(In the real coding, sample function will be called from Delphi)

Kind regards
André


Re: Reading from stdin significantly slower than reading file directly?

2020-08-13 Thread methonash via Digitalmars-d-learn

Thank you all very much for your detailed feedback!

I wound up pulling the "TREE_GRM_ESTN.csv" file referred to by 
Jon and used it in subsequent tests. Created D-programs for 
reading directly through a File() structure, versus reading 
byLine() from the stdin alias.


After copying the large CSV file to /dev/shm/ (e.g. a ramdisk), I 
re-ran the two programs repeatedly, and I was able to approach 
the 20-30% overhead margin I would expect to see for using a 
shell pipe and its buffer; my results now similarly match Jon's 
above.


Lesson learned: be wary of networked I/O systems (e.g. Isilon 
storage arrays); all kinds of weirdness can happen there ...


Re: DMD: how to restore old unittest+main

2020-08-13 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Aug 13, 2020 at 08:30:44AM +, Jonathan via Digitalmars-d-learn 
wrote:
[...]
> Is there a reason you need to run all unittests every time you want to
> run the program?

During development, this is a valuable time-saver: instead of compiling
twice, once with -unittest and once without, then running each
individually, it helps to have unittests run before main() so that any
regressions are immediately noticed, and if unittests pass, then main()
runs and manual testing can proceed.

Obviously, for release builds unittests are useless, so in that case
there's no need to include unittests before running main().


> I personally compile with -unittest to make sure all my unittests
> pass, then recompile without the -unittest flag if I actually want to
> run the program.  This way, time isn't wasted running unittests every
> time the program is run.

Lots of time is wasted if you have to compile twice, once with -unittest
and once without, while you're in the code-compile-test cycle and need
to run main() for manual testing. (Not everything is testable with
unittests!)


T

-- 
"Holy war is an oxymoron." -- Lazarus Long


Re: Reading from stdin significantly slower than reading file directly?

2020-08-13 Thread Jon Degenhardt via Digitalmars-d-learn
On Thursday, 13 August 2020 at 14:41:02 UTC, Steven Schveighoffer 
wrote:
But for sure, reading from stdin doesn't do anything different 
than reading from a file if you are using the File struct.


A more appropriate test might be using the shell to feed the 
file into the D program:


dprogram < FILE

Which means the same code runs for both tests.


Indeed, using the 'prog < file' approach rather than 'cat file | 
prog' indeed removes any distinction for 'tsv-select'. 
'tsv-select' uses File.rawRead rather than File.byLine.




Re: vibe.d and my first web service

2020-08-13 Thread Steven Schveighoffer via Digitalmars-d-learn

On 8/13/20 3:28 AM, WebFreak001 wrote:

On Wednesday, 12 August 2020 at 21:11:54 UTC, Daniel Kozak wrote:

[...]

Unfortunately, I think vibe-d is dead. With every release it is worse 
than before and it seems there is almost no activity. So D really need 
new champion here maybe hunt will be next champion.


Can you give an example how vibe.d gets worse? It's really stable right 
now and while I would enjoy the feature PRs to go in more quickly, I 
think they are pretty well reviewed when they get reviewed and 
eventually once merged it's only good code that makes it in.


The environment just updated and made previous versions more unstable, 
most of which is inherited into current versions too, but these things 
are getting fixed. Like for example a lot of linux distros changed to 
OpenSSL 1.1 which broke HTTPS client calls in vibe.d and was fixed in 
0.8.6 or MongoDB 3.4+ changed a lot of things like indexes which were 
then broken which was fixed in 0.9.0 now though.


I haven't had a really big problem with vibe.d, except for the ctrl-c 
bug (which I've worked around as described).


I agree with the OP of this subthread that it is the most important 
problem for vibe.d (and vibed-core really).


I wish I knew how to fix it...

My experience with getting features into vibe.d has been good, I've 
added a few and Sonke has been very receptive, even if his responses are 
delayed (he must be really busy). I've been there too.


-Steve


Re: DMD: how to restore old unittest+main

2020-08-13 Thread Steven Schveighoffer via Digitalmars-d-learn

On 8/13/20 5:02 AM, Nils Lankila wrote:

On Thursday, 13 August 2020 at 08:49:21 UTC, WebFreak001 wrote:

On Thursday, 13 August 2020 at 07:52:07 UTC, novice3 wrote:

Hello.

I don't use dub.
I use Windows and *.d file association to compile small apps by dmd 
with "-i -unittest -g" switches.
Now i update dmd, and found, that apps compiled with "-unittest" not 
runs main().


How i can restore old behaviour (run all unittests then main())
*without use "--DRT-testmode=run-main" switch every time then i start 
compiled app.exe*?

I want just press Enter on app.d file, then press Enter on app.exe.
Any advises?

Thanks.


Try

version (unittest) extern(C) __gshared string[] rt_options = [ 
"testmode=run-main" ];


Yeah that works but we should really have a way to do this 
programmatically, in a way it is already, but by calling function, not 
by the bias of a string that get parsed.


https://dlang.org/phobos/core_runtime.html#.Runtime.extendedModuleUnitTester

Though I highly recommend using the rt_options mechanism if all you are 
after is the original behavior, it's much simpler.


-Steve


Re: Reading from stdin significantly slower than reading file directly?

2020-08-13 Thread Steven Schveighoffer via Digitalmars-d-learn

On 8/12/20 6:44 PM, methonash wrote:

Hi,

Relative beginner to D-lang here, and I'm very confused by the apparent 
performance disparity I've noticed between programs that do the following:


1) cat some-large-file | D-program-reading-stdin-byLine()

2) D-program-directly-reading-file-byLine() using File() struct

The D-lang difference I've noticed from options (1) and (2) is somewhere 
in the range of 80% wall time taken (7.5s vs 4.1s), which seems pretty 
extreme.


For comparison, I attempted the same using Perl with the same large 
file, and I only noticed a 25% difference (10s vs 8s) in performance, 
which I imagine to be partially attributable to the overhead incurred by 
using a pipe and its buffer.


So, is this difference in D-lang performance typical? Is this expected 
behavior?


Was wondering if this may have anything to do with the library 
definition for std.stdio.stdin 
(https://dlang.org/library/std/stdio/stdin.html)? Does global 
file-locking significantly affect read-performance?


For reference: I'm trying to build a single-threaded application; my 
present use-case cannot benefit from parallelism, because its ultimate 
purpose is to serve as a single-threaded downstream filter from an 
upstream application consuming (n-1) system threads.


Are we missing the obvious here? cat needs to read from disk, write the 
results into a pipe buffer, then context-switch into your D program, 
then the D program reads from the pipe buffer.


Whereas, reading from a file just needs to read from the file.

The difference does seem a bit extreme, so maybe there is another more 
complex explanation.


But for sure, reading from stdin doesn't do anything different than 
reading from a file if you are using the File struct.


A more appropriate test might be using the shell to feed the file into 
the D program:


dprogram < FILE

Which means the same code runs for both tests.

-Steve


Re: Reading from stdin significantly slower than reading file directly?

2020-08-13 Thread wjoe via Digitalmars-d-learn

On Thursday, 13 August 2020 at 07:08:21 UTC, Jon Degenhardt wrote:

Test  Elapsed  System   User
  ---  --   
tsv-select -f 2,3 FILE  10.280.42   9.85
cat FILE | tsv-select -f 2,311.101.45  10.23
cut -f 2,3 FILE 14.640.60  14.03
cat FILE | cut -f 2,3   14.361.03  14.19
wc -l FILE   1.320.39   0.93
cat FILE | wc -l 1.180.96   1.04


The TREE file:

Test  Elapsed  System   User
  ---  --   
tsv-select -f 2,3 FILE   3.770.95   2.81
cat FILE | tsv-select -f 2,3 4.542.65   3.28
cut -f 2,3 FILE 17.781.53  16.24
cat FILE | cut -f 2,3   16.772.64  16.36
wc -l FILE   1.380.91   0.46
cat FILE | wc -l 2.022.63   0.77




Your table shows that when piping the output from one process to 
another, there's a lot more time spent in kernel mode. A switch 
from user mode to kernel mode is expensive [1].
It costs around 1000-1500 clock cycles for a call to getpid() on 
most systems. That's around 100 clock cycles for the actual 
switch and the rest is overhead.


My theory is this:
One of the reasons for the slowdown is very likely mutex 
un/locking of which there is more need when multiple processes 
and (global) resources are involved compared to a single instance.

Another is copying buffers.
 When you read a file the data is first read into a kernel buffer 
which is then copied to the user space buffer i.e. the buffer you 
allocated in your program (the reading part might not happen if 
the data is still in the cache).
If you read the file directly in your program, the data is copied 
once from kernel space to user space.
When you read from stdin (which is technically a file) it would 
seem that cat reads the file which means a copy from kernel to 
user space (cat), then cat outputs that buffer to stdout (also 
technically a file) which is another copy, then you read from 
stdin in your program which will cause another copy from stdout 
to stdin and finally to your allocated buffer.

Each of those steps may invlovle a mutex un/lock.
Also with pipes you start two programs. Starting a program takes 
a few ms.


PS. If you do your own caching, or if you don't care about it 
because you just read a file sequentially once, you may benefit 
from opening your file with the O_DIRECT flag which basically 
means that the kernel copies directly into user space buffers.


[1] https://en.wikipedia.org/wiki/Ring_(computer_security)


Re: vibe.d and my first web service

2020-08-13 Thread Mr. Backup via Digitalmars-d-learn

On Wednesday, 12 August 2020 at 13:46:06 UTC, James Blachly wrote:


Unfortunately the problem still occurs with Vibe.d 0.9.0

IMO **this is the single most important problem to fix** for 
vibe.d -- if the most basic of examples (indeed, supplied by 
dub itself) fails so spectacularly, the casual new user will 
not spend the time to find out why this is happening, but 
instead move on. The ctrl-C non-termination bug has existed 
since at least 2015 from what I can tell from the forums.




As a casual new novice, I really like dlang as such, and I think 
it should be the most widespread and popular language in the 
world. And as soon as I came across it, I wanted to use it in my 
project. But it has many packages for the same things, but these 
packages are unfinished. Everyone creates their own. You start 
comparing them and don't know what to choose for your job and 
then you find out that you should have chosen another and then 
find out that you should have written it yourself. And then I 
finally done it in golang in a while. I think the dlang community 
should focus on creating a quality standard library.


We live in the 21st century where there are web technologies 
everywhere around us, so I think that the http package should be 
part of a standard library.




Re: DMD: how to restore old unittest+main

2020-08-13 Thread novice3 via Digitalmars-d-learn

On Thursday, 13 August 2020 at 09:02:28 UTC, Nils Lankila wrote:
programmatically, in a way it is already, but by calling 
function


Better with compiler switch, may be...



Re: DMD: how to restore old unittest+main

2020-08-13 Thread Nils Lankila via Digitalmars-d-learn

On Thursday, 13 August 2020 at 08:49:21 UTC, WebFreak001 wrote:

On Thursday, 13 August 2020 at 07:52:07 UTC, novice3 wrote:

Hello.

I don't use dub.
I use Windows and *.d file association to compile small apps 
by dmd with "-i -unittest -g" switches.
Now i update dmd, and found, that apps compiled with 
"-unittest" not runs main().


How i can restore old behaviour (run all unittests then main())
*without use "--DRT-testmode=run-main" switch every time then 
i start compiled app.exe*?
I want just press Enter on app.d file, then press Enter on 
app.exe.

Any advises?

Thanks.


Try

version (unittest) extern(C) __gshared string[] rt_options = [ 
"testmode=run-main" ];


Yeah that works but we should really have a way to do this 
programmatically, in a way it is already, but by calling 
function, not by the bias of a string that get parsed.


Re: DMD: how to restore old unittest+main

2020-08-13 Thread novice3 via Digitalmars-d-learn

On Thursday, 13 August 2020 at 08:30:44 UTC, Jonathan wrote:
Is there a reason you need to run all unittests every time you 
want to run the program?


Starting app with unittests while develop - frequent event for me.
Releasing app - rare event for me.
I want do frequent action without efforts (just start and see all 
ok - unitests and main),
and can do rare action (release) with some efforts (compile with 
other switches).




Re: DMD: how to restore old unittest+main

2020-08-13 Thread novice3 via Digitalmars-d-learn

On Thursday, 13 August 2020 at 08:49:21 UTC, WebFreak001 wrote:

Try

version (unittest) extern(C) __gshared string[] rt_options = 
["testmode=run-main" ];


Thanks! It works as needed.


Re: DMD: how to restore old unittest+main

2020-08-13 Thread WebFreak001 via Digitalmars-d-learn

On Thursday, 13 August 2020 at 07:52:07 UTC, novice3 wrote:

Hello.

I don't use dub.
I use Windows and *.d file association to compile small apps by 
dmd with "-i -unittest -g" switches.
Now i update dmd, and found, that apps compiled with 
"-unittest" not runs main().


How i can restore old behaviour (run all unittests then main())
*without use "--DRT-testmode=run-main" switch every time then i 
start compiled app.exe*?
I want just press Enter on app.d file, then press Enter on 
app.exe.

Any advises?

Thanks.


Try

version (unittest) extern(C) __gshared string[] rt_options = [ 
"testmode=run-main" ];


Re: DMD: how to restore old unittest+main

2020-08-13 Thread novice3 via Digitalmars-d-learn

On Thursday, 13 August 2020 at 08:30:44 UTC, Jonathan wrote:
Is there a reason you need to run all unittests every time you 
want to run the program?


App will be used by other peoples,
and i will release it after developing without unittests.
Release is rare action for me.
Running while developing is frequent action.
I want see is all ok - unittests and main{} - for every test run 
while develop whithout special efforts from me.
Then release with some efforts (compile without "-unittest", with 
"-release" switch etc).





Re: DMD: how to restore old unittest+main

2020-08-13 Thread Jonathan via Digitalmars-d-learn

On Thursday, 13 August 2020 at 07:52:07 UTC, novice3 wrote:

Hello.

I don't use dub.
I use Windows and *.d file association to compile small apps by 
dmd with "-i -unittest -g" switches.
Now i update dmd, and found, that apps compiled with 
"-unittest" not runs main().


How i can restore old behaviour (run all unittests then main())
*without use "--DRT-testmode=run-main" switch every time then i 
start compiled app.exe*?
I want just press Enter on app.d file, then press Enter on 
app.exe.

Any advises?

Thanks.


Is there a reason you need to run all unittests every time you 
want to run the program?


I personally compile with -unittest to make sure all my unittests 
pass, then recompile without the -unittest flag if I actually 
want to run the program.  This way, time isn't wasted running 
unittests every time the program is run.


DMD: how to restore old unittest+main

2020-08-13 Thread novice3 via Digitalmars-d-learn

Hello.

I don't use dub.
I use Windows and *.d file association to compile small apps by 
dmd with "-i -unittest -g" switches.
Now i update dmd, and found, that apps compiled with "-unittest" 
not runs main().


How i can restore old behaviour (run all unittests then main())
*without use "--DRT-testmode=run-main" switch every time then i 
start compiled app.exe*?
I want just press Enter on app.d file, then press Enter on 
app.exe.

Any advises?

Thanks.


Re: vibe.d and my first web service

2020-08-13 Thread WebFreak001 via Digitalmars-d-learn

On Wednesday, 12 August 2020 at 21:11:54 UTC, Daniel Kozak wrote:

[...]

Unfortunately, I think vibe-d is dead. With every release it is 
worse than before and it seems there is almost no activity. So 
D really need new champion here maybe hunt will be next 
champion.


Can you give an example how vibe.d gets worse? It's really stable 
right now and while I would enjoy the feature PRs to go in more 
quickly, I think they are pretty well reviewed when they get 
reviewed and eventually once merged it's only good code that 
makes it in.


The environment just updated and made previous versions more 
unstable, most of which is inherited into current versions too, 
but these things are getting fixed. Like for example a lot of 
linux distros changed to OpenSSL 1.1 which broke HTTPS client 
calls in vibe.d and was fixed in 0.8.6 or MongoDB 3.4+ changed a 
lot of things like indexes which were then broken which was fixed 
in 0.9.0 now though.


Re: Reading from stdin significantly slower than reading file directly?

2020-08-13 Thread Jon Degenhardt via Digitalmars-d-learn

On Wednesday, 12 August 2020 at 22:44:44 UTC, methonash wrote:

Hi,

Relative beginner to D-lang here, and I'm very confused by the 
apparent performance disparity I've noticed between programs 
that do the following:


1) cat some-large-file | D-program-reading-stdin-byLine()

2) D-program-directly-reading-file-byLine() using File() struct

The D-lang difference I've noticed from options (1) and (2) is 
somewhere in the range of 80% wall time taken (7.5s vs 4.1s), 
which seems pretty extreme.


I don't know enough details of the implementation to really 
answer the question, and I expect it's a bit complicated.


However, it's an interesting question, and I have relevant 
programs and data files, so I tried to get some actuals.


The tests I ran don't directly answer the question posed, but may 
be a useful proxy. I used Unix 'cut' (latest GNU version) and 
'tsv-select' from the tsv-utils package 
(https://github.com/eBay/tsv-utils). 'tsv-select' is written in 
D, and works like 'cut'. 'tsv-select' reads from stdin or a file 
via a 'File' struct. It's not using the built-in 'byLine' member 
though, it uses a version of 'byLine' that includes some 
additional buffering. Both stdin and a file system file are read 
this way.


I used a file from the google ngram collection 
(http://storage.googleapis.com/books/ngrams/books/datasetsv2.html) and the file TREE_GRM_ESTN.csv from https://apps.fs.usda.gov/fia/datamart/CSV/datamart_csv.html, converted to a tsv file.


The ngram file is a narrow file (21 bytes/line, 4 columns), the 
TREE file is wider (206 bytes/line, 49 columns). In both cases I 
cut the 2nd and 3rd columns. This tends to focus processing on 
input rather than processing and output. I also timed 'wc -l' for 
another data point.


I ran the benchmarks 5 times each way and recorded the median 
time below. Machine used is a MacMini (so Mac OS) with 16 GB RAM 
and SSD drives. The numbers are very consisent for this test on 
this machine. Differences in the reported times are real deltas, 
not system noise. The commands timed were:


* bash -c 'tsv-select -f 2,3 FILE > /dev/null'
* bash -c 'cat FILE | tsv-select -f 2,3 > /dev/null'
* bash -c 'gcut -f 2,3 FILE > /dev/null'
* bash -c 'cat FILE | gcut -f 2,3 > /dev/null'
* bash -c 'gwc -l FILE > /dev/null'
* bash -c 'cat FILE | gwc -l > /dev/null'

Note that 'gwc' and 'gcut' are the GNU versions of 'wc' and 'cut' 
installed by Homebrew.


Google ngram file (the 's' unigram file):

Test  Elapsed  System   User
  ---  --   
tsv-select -f 2,3 FILE  10.280.42   9.85
cat FILE | tsv-select -f 2,311.101.45  10.23
cut -f 2,3 FILE 14.640.60  14.03
cat FILE | cut -f 2,3   14.361.03  14.19
wc -l FILE   1.320.39   0.93
cat FILE | wc -l 1.180.96   1.04


The TREE file:

Test  Elapsed  System   User
  ---  --   
tsv-select -f 2,3 FILE   3.770.95   2.81
cat FILE | tsv-select -f 2,3 4.542.65   3.28
cut -f 2,3 FILE 17.781.53  16.24
cat FILE | cut -f 2,3   16.772.64  16.36
wc -l FILE   1.380.91   0.46
cat FILE | wc -l 2.022.63   0.77


What this shows is that 'tsv-select' (D program) was faster when 
reading from a file than when reading from a standard input. It 
doesn't indicate why or whether the delta is due to code D 
library or code in 'tsv-select'.


Interestingly, 'cut' showed the opposite behavior. It was faster 
when reading from standard input than when reading from the file. 
For 'wc', which method was faster was dependent on line length.


Again, I caution against reading too much into this regarding 
performance of reading from standard input vs a disk file. Much 
more definitive tests can be done. However, it is an interesting 
comparison.


Also, the D program is still fast in both cases.

--Jon