Re: What are the best available D (not C) File input/output options?

2023-11-05 Thread confuzzled via Digitalmars-d-learn

On 11/3/23 2:30 AM, Steven Schveighoffer wrote:

On Thursday, 2 November 2023 at 15:46:23 UTC, confuzzled wrote:

You should use a buffering library like iopipe to write properly here 
(it handles the encoding of text for you).




Thanks Steve, I will try that.



Re: What are the best available D (not C) File input/output options?

2023-11-05 Thread confuzzled via Digitalmars-d-learn



Good morning,

First, thanks to you, Steve, and Julian for responding to my inquiry.

On 11/3/23 4:59 AM, Sergey wrote:

On Thursday, 2 November 2023 at 15:46:23 UTC, confuzzled wrote:
I've ported a small script from C to D. The original C version takes 
roughly 6.5 minutes to parse a 12G file while the port originally took 
about 48 minutes.


In my experience I/O in D is quite slow.
But you can try to improve it:

Try to use std.outbuffer instead of writeln. And flush the result only 
in the end.


Unless I did it incorrectly, this did nothing for me. My understanding 
is that I should first prepare an OutBuffer to which I write all my 
output. Once complete, I then write the OutBuffer to file; which still 
requires the use of writeln, albeit not as often.


First I tried buffering the entire thing, but that turned out to be a 
big mistake. Next I tried writing and clearing the buffer every 100_000 
records (about 3000 writeln calls).


Not as bad as the first attempt but significantly worse than what I 
obtained with the fopen/fprintf combo. I even tried writing the buffer 
to disk with fprintf but jumped ship because it took far longer than 
fopen/fprintf. Can't say how much longer because I terminated execution 
at 14 minutes.


Also check this article. It is showing how manual buffers in D could 
speed up the processing of files significantly: 
https://tech.nextroll.com/blog/data/2014/11/17/d-is-for-data-science.html





The link above was quite helpful. Thanks. I am a bit slow on the uptake 
so it took a while to figure out how to apply the idea to my own use 
case. However, once I figured it out, the result was 2 minutes faster 
than the original C implementation and 3 minutes faster than the 
fopen/printf port.


Whether it did anything for the writeln implementation or not, I don't 
know. Wasn't will to wait 45+ minutes for something that can feasibly be 
done in 6 minutes. I gave up at 12.


Haven't played with std.string.representation as suggested by Julian as 
yet but I plan to.



Thank again.
--Confuzzled


Re: What are the best available D (not C) File input/output options?

2023-11-02 Thread Sergey via Digitalmars-d-learn

On Thursday, 2 November 2023 at 15:46:23 UTC, confuzzled wrote:
I've ported a small script from C to D. The original C version 
takes roughly 6.5 minutes to parse a 12G file while the port 
originally took about 48 minutes.


In my experience I/O in D is quite slow.
But you can try to improve it:

Try to use std.outbuffer instead of writeln. And flush the result 
only in the end.


Also check this article. It is showing how manual buffers in D 
could speed up the processing of files significantly: 
https://tech.nextroll.com/blog/data/2014/11/17/d-is-for-data-science.html





Re: What are the best available D (not C) File input/output options?

2023-11-02 Thread Steven Schveighoffer via Digitalmars-d-learn

On Thursday, 2 November 2023 at 15:46:23 UTC, confuzzled wrote:

I tried std.io but write() only outputs ubyte[] while I'm 
trying to output text so I abandoned idea early.


Just specifically to answer this, this is so you understand this 
is what is going into the file -- bytes.


You should use a buffering library like iopipe to write properly 
here (it handles the encoding of text for you).


And I really don't have a good formatting library, you can rely 
on formattedWrite maybe. A lot of things need to be better for 
this solution to be smooth, it's one of the things I have to work 
on.


-Steve


Re: What are the best available D (not C) File input/output options?

2023-11-02 Thread Julian Fondren via Digitalmars-d-learn

On Thursday, 2 November 2023 at 15:46:23 UTC, confuzzled wrote:
I've ported a small script from C to D. The original C version 
takes roughly 6.5 minutes to parse a 12G file while the port 
originally took about 48 minutes. My naïve attempt to improve 
the situation pushed it over an hour and 15 minutes. However, 
replacing std.stdio:File with core.stdc.stdio:FILE* and 
changing my output code in this latest version from:


outputFile.writefln("%c\t%u\t%u\t%d.%09u\t%c", ...)

to
fprintf(outputFile, "%c,%u,%u,%llu.%09llu,%c\n", ...)

reduced the processing time to roughly 7.5 minutes. Why is 
File.writefln() so appallingly slow? Is there a better D 
alternative?


First, strace your program. The slowest thing about I/O is the 
syscall itself. If the D program does more syscalls, it's going 
to be slower almost no matter what else is going on. Both D and C 
are using libc to buffer I/O to reduce syscalls, but you might be 
defeating that by constantly flushing the buffer.




I tried std.io but write() only outputs ubyte[] while I'm 
trying to output text so I abandoned idea early.


string -> immutable(ubyte)[]: alias with 
std.string.representation(st)


'alias' meaning, this doesn't allocate. If gives you a byte slice 
of the same memory the string is using.


You'd still need to do the formatting, before writing.

Now that I've got the program execution time within an 
acceptable range, I tried replacing core.stdc.fread() with 
std.io.read() but that increased the time to 24 minutes. Now 
I'm starting to think there is something seriously wrong with 
my understanding of how to use D correctly because there's no 
way D's input/output capabilities can suck so bad in comparison 
to C's.





What are the best available D (not C) File input/output options?

2023-11-02 Thread confuzzled via Digitalmars-d-learn
I've ported a small script from C to D. The original C version takes 
roughly 6.5 minutes to parse a 12G file while the port originally took 
about 48 minutes. My naïve attempt to improve the situation pushed it 
over an hour and 15 minutes. However, replacing std.stdio:File with 
core.stdc.stdio:FILE* and changing my output code in this latest version 
from:


outputFile.writefln("%c\t%u\t%u\t%d.%09u\t%c", ...)

to
fprintf(outputFile, "%c,%u,%u,%llu.%09llu,%c\n", ...)

reduced the processing time to roughly 7.5 minutes. Why is 
File.writefln() so appallingly slow? Is there a better D alternative?


I tried std.io but write() only outputs ubyte[] while I'm trying to 
output text so I abandoned idea early. Now that I've got the program 
execution time within an acceptable range, I tried replacing 
core.stdc.fread() with std.io.read() but that increased the time to 24 
minutes. Now I'm starting to think there is something seriously wrong 
with my understanding of how to use D correctly because there's no way 
D's input/output capabilities can suck so bad in comparison to C's.


Re: File Input

2017-05-14 Thread Ali Çehreli via Digitalmars-d-learn

On 05/13/2017 09:15 PM, JV wrote:
> it doesn't pause and store but just keeps reading
>
> string studNum;
>
> readf("%s",&studNum);
> write(studNum);

That's the normal behavior for reading into strings. If you want to read 
to the end of the line, try this:


import std.stdio;
import std.string;

void main() {
write("What is your name? ");
string name = readln.strip;

writeln("Hello ", name, "!");
}

(It's the same as strip(readln()).)

Here is more information about readln() and strip() as well as 
formattedRead(), which can be more convenient:


  http://ddili.org/ders/d.en/strings.html

Ali



Re: File Input

2017-05-14 Thread k-five via Digitalmars-d-learn

On Sunday, 14 May 2017 at 04:15:02 UTC, JV wrote:

Hey i'm not sure if i should create a new post for this but
how should i fix this it doesn't pause and store but just keeps 
reading


string studNum;

readf("%s",&studNum);
write(studNum);


Can you say exactly what you need?
It is better to be familiar with C and File in C, then using File 
in D

However, do not hesitate about asking, just feel free and ask


Re: File Input

2017-05-13 Thread JV via Digitalmars-d-learn

On Monday, 8 May 2017 at 10:34:42 UTC, k-five wrote:

On Monday, 8 May 2017 at 10:22:53 UTC, JV wrote:

On Monday, 8 May 2017 at 09:26:48 UTC, k-five wrote:

On Monday, 8 May 2017 at 08:54:50 UTC, JV wrote:


---

If I continue to learn D I will do but there is no guarantee
and it got ready :)
https://github.com/k-five/D-By-Example

if you have git, download them:
git clone https://github.com/k-five/D-By-Example.git


Hey i'm not sure if i should create a new post for this but
how should i fix this it doesn't pause and store but just keeps 
reading


string studNum;

readf("%s",&studNum);
write(studNum);


Re: File Input

2017-05-08 Thread k-five via Digitalmars-d-learn

On Monday, 8 May 2017 at 10:22:53 UTC, JV wrote:

On Monday, 8 May 2017 at 09:26:48 UTC, k-five wrote:

On Monday, 8 May 2017 at 08:54:50 UTC, JV wrote:


---

If I continue to learn D I will do but there is no guarantee
and it got ready :)
https://github.com/k-five/D-By-Example

if you have git, download them:
git clone https://github.com/k-five/D-By-Example.git


Re: File Input

2017-05-08 Thread JV via Digitalmars-d-learn

On Monday, 8 May 2017 at 09:26:48 UTC, k-five wrote:

On Monday, 8 May 2017 at 08:54:50 UTC, JV wrote:

[...]

---

Do not worry. Your request is not rude. I give you a better 
tool. I finished to collect some examples in D and in a few 
days I will share it in my githup

https://github.com/k-five/
and you can use it. I tested all examples and even put the 
output of each, beside the example.


[...]


Thank you and i hope to see more of you collection about D's 
filestream its very helpful


Re: File Input

2017-05-08 Thread k-five via Digitalmars-d-learn

On Monday, 8 May 2017 at 08:54:50 UTC, JV wrote:

On Sunday, 7 May 2017 at 16:40:50 UTC, k-five wrote:

On Sunday, 7 May 2017 at 15:59:25 UTC, JV wrote:

---

Do not worry. Your request is not rude. I give you a better tool. 
I finished to collect some examples in D and in a few days I will 
share it in my githup

https://github.com/k-five/
and you can use it. I tested all examples and even put the output 
of each, beside the example.


here is the list of them:

├── 01_overview
├── 02_environment
├── 03_basic_syntax
├── 04_variable
├── 05_data-type
├── 06_enums
├── 07_literals
├── 08_operators
├── 09_loops
├── 10_decision
├── 11_functons
├── 12_characters
├── 13_strings
├── 14_array
├── 15_associative_array
├── 16_pointers
├── 17_tuple
├── 18_struct
├── 19_union
├── 20_range
├── 21_alias
├── 22_mixin
├── 23_module
├── 24_template
├── 25_immutable_type
├── 26_file_io
├── 27_thread
├── 28_exception
├── 29_contract_programming
├── 30_conditional_compilation
├── 31_classes
├── 32_inheritance
├── 33_overloading
├── 34_encapsulation
├── 35_interface
└── 36_abstract

plus
some examples for D feature and how to install d-mode on emacs 
editor


I got familiar with D for two weeks, so I am a beginner too.
You will see these examples ( almost 190 ) until Friday on my 
githup.






Re: File Input

2017-05-08 Thread JV via Digitalmars-d-learn

On Sunday, 7 May 2017 at 16:40:50 UTC, k-five wrote:

On Sunday, 7 May 2017 at 15:59:25 UTC, JV wrote:

[...]



[...]


--

You have the right for confusing :) there is many read and 
write names. But I assumed you are familiar with [Type] and 
[Object] concept.


in:
auto output_file_stream = File( "file.txt", "w" );

auto = File  == A type
File( "file.txt", "w" ); == Constructor

So this type has its own property, like read for "r" mode and 
write for "w" mode.


So you should use output_file_stream.write(), not readf or so 
on.


Still I am very new in D, but this is the same concept in other 
language like C++


in C++:
#include 
#include 
#include 

int main(int argc, char **argv)
{

std::ofstream ofs( "file.txt" );
std::string line = "This is the first line";
// write is a method in class ofstream
ofs.write( &*line.begin(), line.length() );
ofs.close();
}


Yeah i understand it very much like the other language like C/C++ 
and python..
since i'm self studying D language ..though i learn faster when 
there is a sample code
i don't know if it is rude but can i ask if you can give me a 
sample code for it?
a code for asking the user to enter something and then store it 
in a .txt file?


Thank you


Re: File Input

2017-05-07 Thread k-five via Digitalmars-d-learn

On Sunday, 7 May 2017 at 15:59:25 UTC, JV wrote:

On Sunday, 7 May 2017 at 15:16:58 UTC, k-five wrote:

On Sunday, 7 May 2017 at 13:57:47 UTC, JV wrote:


I'm kinda getting it but how do i write the stored user 
input(string) varaible into a .txt??im getting confused since D 
has so many read and write


 ->sample below
string num;
auto attendance= File("studAttendance.txt","a+");

writeln("Add Student Attendance");
readf("%s ",&num);//im not sure if this is correct but 
assuming it works
  //how do i write what is stored in 
num in the studAttendance.txt

  //file??

attendance.close();


--

You have the right for confusing :) there is many read and write 
names. But I assumed you are familiar with [Type] and [Object] 
concept.


in:
auto output_file_stream = File( "file.txt", "w" );

auto = File  == A type
File( "file.txt", "w" ); == Constructor

So this type has its own property, like read for "r" mode and 
write for "w" mode.


So you should use output_file_stream.write(), not readf or so on.

Still I am very new in D, but this is the same concept in other 
language like C++


in C++:
#include 
#include 
#include 

int main(int argc, char **argv)
{

std::ofstream ofs( "file.txt" );
std::string line = "This is the first line";
// write is a method in class ofstream
ofs.write( &*line.begin(), line.length() );
ofs.close();
}


Re: File Input

2017-05-07 Thread JV via Digitalmars-d-learn

On Sunday, 7 May 2017 at 15:16:58 UTC, k-five wrote:

On Sunday, 7 May 2017 at 13:57:47 UTC, JV wrote:

Hi guys

I'd like to know how to get an input from the user to be 
stored in a .txt file using import std.file and is it possible 
to directly write in a .txt file without using a variable to 
store the user input?


Thanks for the answer in advance my mind is kinda jumbled 
about this since im new to this language.


First of all see here:
https://dlang.org/phobos/std_stdio.html#.File

also:

import std.stdio; // for File

void main(){

// an output file with name file.txt
// w for writing
auto ofs = File( "file.txt", "w" );

// output file stream:
ofs.write( stdin.readln() ); // get a line from console
ofs.close();
}


cat file.txt:
This is the first line.


and for std.file:
https://dlang.org/phobos/std_file.html




I'm kinda getting it but how do i write the stored user 
input(string) varaible into a .txt??im getting confused since D 
has so many read and write


 ->sample below
string num;
auto attendance= File("studAttendance.txt","a+");

writeln("Add Student Attendance");
readf("%s ",&num);//im not sure if this is correct but 
assuming it works
  //how do i write what is stored in num 
in the studAttendance.txt

  //file??

attendance.close();


Re: File Input

2017-05-07 Thread Suliman via Digitalmars-d-learn

On Sunday, 7 May 2017 at 13:57:47 UTC, JV wrote:

Hi guys

I'd like to know how to get an input from the user to be stored 
in a .txt file using import std.file and is it possible to 
directly write in a .txt file without using a variable to store 
the user input?


Thanks for the answer in advance my mind is kinda jumbled about 
this since im new to this language.


http://nomad.so/2015/09/working-with-files-in-the-d-programming-language/


Re: File Input

2017-05-07 Thread k-five via Digitalmars-d-learn

On Sunday, 7 May 2017 at 13:57:47 UTC, JV wrote:

Hi guys

I'd like to know how to get an input from the user to be stored 
in a .txt file using import std.file and is it possible to 
directly write in a .txt file without using a variable to store 
the user input?


Thanks for the answer in advance my mind is kinda jumbled about 
this since im new to this language.


First of all see here:
https://dlang.org/phobos/std_stdio.html#.File

also:

import std.stdio; // for File

void main(){

// an output file with name file.txt
// w for writing
auto ofs = File( "file.txt", "w" );

// output file stream:
ofs.write( stdin.readln() ); // get a line from console
ofs.close();
}


cat file.txt:
This is the first line.


and for std.file:
https://dlang.org/phobos/std_file.html


File Input

2017-05-07 Thread JV via Digitalmars-d-learn

Hi guys

I'd like to know how to get an input from the user to be stored 
in a .txt file using import std.file and is it possible to 
directly write in a .txt file without using a variable to store 
the user input?


Thanks for the answer in advance my mind is kinda jumbled about 
this since im new to this language.









Re: d2 file input performance

2011-10-17 Thread Marco Leise
Am 04.09.2011, 19:01 Uhr, schrieb Christian Köstlin  
:



On 9/3/11 7:53 , dennis luehring wrote:

Am 26.08.2011 19:43, schrieb Christian Köstlin:

Hi guys,


i started the thread:
http://stackoverflow.com/questions/7202710/fastest-way-of-reading-bytes-in-d2

on stackoverflow, because i ran into kind of a problem.

i wanted to read data from a file (or even better from a stream, but
lets stay with file), byte-by-byte. the whole thing was part of my
protobuf implementation for d2, and there you have to look at each byte
to read out the varints. i was very proud of my implementation until i
benchmarked it first against java (ok ... i was a little slower than
java) and then against c++ (ok ... this was a complete different game).

after some optimizing i got better, but was still way slower than c++.
so i started some small microbenchmarks regarding fileio:
https://github.com/gizmomogwai/performance in c++, java and d2.

could you help me improve on the d2 performance? i am sure, that i am
missing something fundamental, because i thing it should be at least
possible be equal or better than java.

thanks in advance

christian


i would change the test szenario a little bit

1. use a ramdisk - so stuff like location on disk, fragmentation, driver
speed will reducued down to a little bit of noise

2. make your szenario much bigger

3. would be interesting to see for example to cumulated every 1000
benchmarks-steps or something like that - to see caching coming in etc.

running 10.000 times

time for 1. 1000 steps xyzw
time for 2. 1000 steps xyzw
time for 3. 1000 steps xyzw
time for 4. 1000 steps xyzw
overall time ... xyz
...

good point ...
will see if i can adapt the tests...

cK


-release -O -inline -noboundscheck is the options set for D2. In D1  
-release included -noboundscheck.


Re: d2 file input performance

2011-09-04 Thread Christian Köstlin

On 9/3/11 7:53 , dennis luehring wrote:

Am 26.08.2011 19:43, schrieb Christian Köstlin:

Hi guys,


i started the thread:
http://stackoverflow.com/questions/7202710/fastest-way-of-reading-bytes-in-d2

on stackoverflow, because i ran into kind of a problem.

i wanted to read data from a file (or even better from a stream, but
lets stay with file), byte-by-byte. the whole thing was part of my
protobuf implementation for d2, and there you have to look at each byte
to read out the varints. i was very proud of my implementation until i
benchmarked it first against java (ok ... i was a little slower than
java) and then against c++ (ok ... this was a complete different game).

after some optimizing i got better, but was still way slower than c++.
so i started some small microbenchmarks regarding fileio:
https://github.com/gizmomogwai/performance in c++, java and d2.

could you help me improve on the d2 performance? i am sure, that i am
missing something fundamental, because i thing it should be at least
possible be equal or better than java.

thanks in advance

christian


i would change the test szenario a little bit

1. use a ramdisk - so stuff like location on disk, fragmentation, driver
speed will reducued down to a little bit of noise

2. make your szenario much bigger

3. would be interesting to see for example to cumulated every 1000
benchmarks-steps or something like that - to see caching coming in etc.

running 10.000 times

time for 1. 1000 steps xyzw
time for 2. 1000 steps xyzw
time for 3. 1000 steps xyzw
time for 4. 1000 steps xyzw
overall time ... xyz
...

good point ...
will see if i can adapt the tests...

cK



Re: d2 file input performance

2011-09-04 Thread Christian Köstlin

On 9/1/11 7:24 , David Nadlinger wrote:

On 9/1/11 7:12 AM, Christian Köstlin wrote:

Update:

I added performance tests for ldc and gdc with the same programs.
The results are interesting (please see the github page for the details).


Oh wow, LDC must accidentally call some druntime functions for the
ubyte[1] case, or something similar, could you please file a ticket at
http://dsource.org/projects/ldc/newticket?

Thanks,
David

hi david,

i am not sure what i have to do to open a ticket, i suppose that i 
should get an trac account and so on. but what would be the description. 
i suppose the time for this particular tests is quite strange and out of 
bounds :)


right now my tests show, that lcd seems to be faster than dmd and slower 
than gdc in most cases. even my c++ program runs slower compiled with 
llvm-c++.


perhaps you could open the bug report? feel free to point to the github 
repository or take the source and put it into the bugreport.


thanks for your feedback

christian


Re: d2 file input performance

2011-09-02 Thread dennis luehring

Am 26.08.2011 19:43, schrieb Christian Köstlin:

Hi guys,


i started the thread:
http://stackoverflow.com/questions/7202710/fastest-way-of-reading-bytes-in-d2
on stackoverflow, because i ran into kind of a problem.

i wanted to read data from a file (or even better from a stream, but
lets stay with file), byte-by-byte. the whole thing was part of my
protobuf implementation for d2, and there you have to look at each byte
to read out the varints. i was very proud of my implementation until i
benchmarked it first against java (ok ... i was a little slower than
java) and then against c++ (ok ... this was a complete different game).

after some optimizing i got better, but was still way slower than c++.
so i started some small microbenchmarks regarding fileio:
https://github.com/gizmomogwai/performance in c++, java and d2.

could you help me improve on the d2 performance? i am sure, that i am
missing something fundamental, because i thing it should be at least
possible be equal or better than java.

thanks in advance

christian


i would change the test szenario a little bit

1. use a ramdisk - so stuff like location on disk, fragmentation, driver 
speed will reducued down to a little bit of noise


2. make your szenario much bigger

3. would be interesting to see for example to cumulated every 1000 
benchmarks-steps or something like that - to see caching coming in etc.


running 10.000 times

time for 1. 1000 steps xyzw
time for 2. 1000 steps xyzw
time for 3. 1000 steps xyzw
time for 4. 1000 steps xyzw
overall time ... xyz
...


Re: d2 file input performance

2011-08-31 Thread David Nadlinger

On 9/1/11 7:12 AM, Christian Köstlin wrote:

Update:

I added performance tests for ldc and gdc with the same programs.
The results are interesting (please see the github page for the details).


Oh wow, LDC must accidentally call some druntime functions for the 
ubyte[1] case, or something similar, could you please file a ticket at 
http://dsource.org/projects/ldc/newticket?


Thanks,
David


Re: d2 file input performance

2011-08-31 Thread Christian Köstlin

Update:

I added performance tests for ldc and gdc with the same programs.
The results are interesting (please see the github page for the details).

regards

christian

On 8/26/11 19:43 , Christian Köstlin wrote:

Hi guys,


i started the thread:
http://stackoverflow.com/questions/7202710/fastest-way-of-reading-bytes-in-d2
on stackoverflow, because i ran into kind of a problem.

i wanted to read data from a file (or even better from a stream, but
lets stay with file), byte-by-byte. the whole thing was part of my
protobuf implementation for d2, and there you have to look at each byte
to read out the varints. i was very proud of my implementation until i
benchmarked it first against java (ok ... i was a little slower than
java) and then against c++ (ok ... this was a complete different game).

after some optimizing i got better, but was still way slower than c++.
so i started some small microbenchmarks regarding fileio:
https://github.com/gizmomogwai/performance in c++, java and d2.

could you help me improve on the d2 performance? i am sure, that i am
missing something fundamental, because i thing it should be at least
possible be equal or better than java.

thanks in advance

christian




Re: d2 file input performance

2011-08-29 Thread Christian Köstlin

On 8/26/11 23:56 , bearophile wrote:

Steven Schveighoffer:


In fact, it would probably be faster.


I suggest the OP to keep us updated on this matter. And later after some time, 
if no solutions are found, to bring the issue to the main D newsgroup and to 
Bugzilla too. This is a significant issue.

Bye,
bearophile

Small update:
I added some more example implementations as a reaction to Mehrdad's 
suggestion to make sure to use the same file-read api. So the c++ and 
the d version both load libc dynamically and from that the symbol fread.

respective times from c++ and d: 115ms vs. 504ms.

the only thing i could also try is to use ldc or gdc (but i first have 
to install those).



regards
christian



Re: d2 file input performance

2011-08-28 Thread Heywood Floyd
Christian Köstlin Wrote:
> after some optimizing i got better, but was still way slower than c++. 
> so i started some small microbenchmarks regarding fileio: 
> https://github.com/gizmomogwai/performance in c++, java and d2.
> 
> christian


Hello!

Thanks for you effort in putting this together!


I found this interesting and played around with some of your code examples.
My findings differ somewhat from yours, so I thought I'd post them.



>From what I can tell, G++ does generate almost twice (~1.9x) as fast code, in 
>the fread()/File-example, as DMD. Even though the D-code does handle errors 
>encountered by fread(), that certainly can't explain the dramatic difference 
>in speed alone.

It would be very interesting to see how GDC and LDC perform in these tests!! (I 
don't have them installed.)



Anyway, here are my notes:

I concentrated on the G++-fread-example and the DMD-File-example, as they seem 
comparable enough. However, I did some changes to the benchmark in order to 
"level" the playing field:

  1) Made sure both C++ and D used a 1 kb (fixed-size) buffer
  2) Made sure the underlying setvbuf() buffer is the same (64 kb)
  3) Made sure the read data has an actual side effect by printing out the 
accumulated data after each file. (Cheapo CRC)

The last point, 3, particularly seemed to make the G++-example considerably 
slower, perhaps hinting G++ is otherwise doing some clever optimization here. 
The second point, 2, seemed to have no effect on C++, but it helped somewhat 
for D. This may hint at C++ doing its own buffering or something. (?) In that 
case the benchmark is bogus.

Anyway, these are the results:

(G++ 4.2.1, fread()+crc, osx)
G++ 1135 ms (no flags)
G++ 399 ms  -O1
G++ 368 ms  -O2
G++ 368 ms  -O3
G++nofx 156 ms  -O3 (Disqualified!)

(DMD 2.054, rawRead()+crc, osx)
DMD 995 ms  (no flags)
DMD 913 ms  -O
DMD 888 ms  -release
DMD 713 ms  -release -O -inline
DMD 703 ms  -release -O
DMD 693 ms  -release -O -inline -noboundscheck

Well, I suppose a possible (and to me plausable) explanation is that G++'s 
optimizations are a lot more extensive than DMD's.

Skipping printing out the CRC-value ("nofx") makes the C++ code more than twice 
as fast. Note that the code calculating the CRC-value is still in place, the 
value is just not printed out, and surely, calling printf() 10 times can hardly 
account for a 200 ms increase. (?) I think it's safe to assume code is simply 
being ignored here, as it's not having any side effect.

My gut feel is DMD is not doing inlining, at least not to the same extent G++ 
is, as that seems to be especially important since we're making a function call 
for every single byte here. (Using the -inline flag even seems to make the D 
code slower. Weird.) But of course I don't really know. Again, GDC and LDC 
would be interesting to see here.

Finally, to this I must add the size of the generated binary:

G++   15 kb
DMD   882 kb

Yikes. I believe there's nothing (large enough) to hide behind for DMD there.



That's it!
Kind regards
/HF





Here's the modifed code: (Original https://github.com/gizmomogwai/performance)

// - - - - - - 8< - - - - - - 

import  std.stdio,
std.datetime,
core.stdc.stdio;

struct FileReader
{
private:
File file;

enum BUFFER_SIZE = 1024;
ubyte[BUFFER_SIZE] readBuf;
size_t pos, len;

this(string name){
file = File(name, "rb");
//setbuf(file.getFP(), null); // No buffer
setvbuf(file.getFP(), null, _IOFBF, BUFFER_SIZE * 64);
}

bool fillBuffer()
{
auto tmpBuf = file.rawRead(readBuf);
len = tmpBuf.length;
pos = 0;
return len > 0;
}

public: 
int read()
{
if(pos == len){
if(fillBuffer() == false)
return -1;
}
return readBuf[pos++];
}
}

size_t readBytes()
{
size_t count = 0;
ulong crc = 0;
for (int i=0; i<10; i++) {
auto file = FileReader("/tmp/shop_with_ids.pb");
auto data = file.read();
while(data != -1){
count++;
crc += data;
data = file.read();
}
writeln(crc);
}
return count;
}


int main(string[] args) {
  auto sw = StopWatch(AutoStart.no);
  sw.start();
  auto count = readBytes();
  sw.stop();
  writeln("d2-6-B", count, "", sw.peek().msecs, 
"using s

Re: d2 file input performance

2011-08-26 Thread bearophile
Steven Schveighoffer:

> In fact, it would probably be faster.

I suggest the OP to keep us updated on this matter. And later after some time, 
if no solutions are found, to bring the issue to the main D newsgroup and to 
Bugzilla too. This is a significant issue.

Bye,
bearophile


Re: d2 file input performance

2011-08-26 Thread Steven Schveighoffer
On Fri, 26 Aug 2011 13:43:23 -0400, Christian Köstlin  
 wrote:



Hi guys,


i started the thread:  
http://stackoverflow.com/questions/7202710/fastest-way-of-reading-bytes-in-d2  
on stackoverflow, because i ran into kind of a problem.


i wanted to read data from a file (or even better from a stream, but  
lets stay with file), byte-by-byte. the whole thing was part of my  
protobuf implementation for d2, and there you have to look at each byte  
to read out the varints. i was very proud of my implementation until i
benchmarked it first against java (ok ... i was a little slower than  
java) and then against c++ (ok ... this was a complete different game).


after some optimizing i got better, but was still way slower than c++.  
so i started some small microbenchmarks regarding fileio:  
https://github.com/gizmomogwai/performance in c++, java and d2.


could you help me improve on the d2 performance? i am sure, that i am  
missing something fundamental, because i thing it should be at least  
possible be equal or better than java.


thanks in advance


Two things:

First, there is a large difference:

C++ version:

 int read() {...}

D version:

int read(ubyte *bufferptr) {...}

This may not be optimized as well.  You should make it the same.

Second, use -inline, it will help tremendously.

I'd bet money that the largest slowdown is the function calls.  Inlining  
makes things run so much faster it's not even funny.


Also, note that FILE* is *already buffered*, there is no reason to do  
anything but fgetc.  In fact, it would probably be faster.


-Steve


d2 file input performance

2011-08-26 Thread Christian Köstlin

Hi guys,


i started the thread: 
http://stackoverflow.com/questions/7202710/fastest-way-of-reading-bytes-in-d2 
on stackoverflow, because i ran into kind of a problem.


i wanted to read data from a file (or even better from a stream, but 
lets stay with file), byte-by-byte. the whole thing was part of my 
protobuf implementation for d2, and there you have to look at each byte 
to read out the varints. i was very proud of my implementation until i
benchmarked it first against java (ok ... i was a little slower than 
java) and then against c++ (ok ... this was a complete different game).


after some optimizing i got better, but was still way slower than c++. 
so i started some small microbenchmarks regarding fileio: 
https://github.com/gizmomogwai/performance in c++, java and d2.


could you help me improve on the d2 performance? i am sure, that i am 
missing something fundamental, because i thing it should be at least 
possible be equal or better than java.


thanks in advance

christian