[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 Jerry DeLisle changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |FIXED --- Comment #14 from Jerry DeLisle --- Nothing, closing
[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 Dominique d'Humieres changed: What|Removed |Added Status|NEW |WAITING --- Comment #13 from Dominique d'Humieres --- What is missing to close this PR as FIXED?
[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 --- Comment #12 from Jerry DeLisle --- $ strace -c ./a.out % time seconds usecs/call callserrors syscall -- --- --- - - 99.970.078667 0 100 write 0.030.22 211 fstat 0.000.00 0 5 read 0.000.00 01710 open 0.000.00 0 7 close 0.000.00 0 8 6 stat 0.000.00 017 mmap 0.000.00 012 mprotect 0.000.00 0 1 munmap 0.000.00 0 3 brk 0.000.00 010 rt_sigaction 0.000.00 010 9 access 0.000.00 0 1 execve 0.000.00 0 1 readlink 0.000.00 0 1 arch_prctl -- --- --- - - 100.000.078689 100010425 total This is looking pretty good now!
[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 --- Comment #11 from Jerry DeLisle --- Janne, can you post some new benchmarks for comparison?
[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 --- Comment #10 from Janne Blomqvist --- Author: jb Date: Sun Jun 8 05:43:29 2014 New Revision: 211353 URL: http://gcc.gnu.org/viewcvs?rev=211353&root=gcc&view=rev Log: PR 56981 Flush buffer at record boundary if possible. 2014-06-08 Janne Blomqvist PR libfortran/56981 * io/unix.h (struct stream_vtable): Add new member function, markeor. (smarkeor): New inline function. (flush_if_unbuffered): Remove prototype. * io/unix.c (raw_markeor): New function. (raw_vtable): Initialize markeor member. (buf_markeor): New function. (buf_vtable): Initialize markeor member. (mem_vtable): Likewise. (mem4_vtable): Likewise. (flush_if_unbuffered): Remove function. * io/transfer.c (next_record): Call smarkeor instead of flush_if_unbuffered. Modified: trunk/libgfortran/ChangeLog trunk/libgfortran/io/transfer.c trunk/libgfortran/io/unix.c trunk/libgfortran/io/unix.h
[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 Dominique d'Humieres changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2013-12-21 Ever confirmed|0 |1
[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 --- Comment #9 from Tobias Burnus 2013-04-29 09:36:04 UTC --- Follow-up idea regarding the flushing of when the buffer is full to avoid unnecessary seeks: http://gcc.gnu.org/ml/fortran/2013-04/msg00258.html
[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 --- Comment #8 from Tobias Burnus 2013-04-29 09:34:58 UTC --- Author: jb Date: Mon Apr 29 08:42:00 2013 New Revision: 198390 URL: http://gcc.gnu.org/viewcvs?rev=198390&root=gcc&view=rev Log: PR 56981 Improve unbuffered performance on special files. 2013-04-29 Janne Blomqvist PR fortran/56981 * io/transfer.c (next_record_w_unf): First fix head marker, then write tail. (next_record): Call flush_if_unbuffered. * io/unix.c (struct unix_stream): Add field unbuffered. (flush_if_unbuffered): New function. (fd_to_stream): New argument. (open_external): Fix fd_to_stream call. (input_stream): Likewise. (output_stream): Likewise. (error_stream): Likewise. * io/unix.h (flush_if_unbuffered): New prototype. Modified: trunk/libgfortran/ChangeLog trunk/libgfortran/io/transfer.c trunk/libgfortran/io/unix.c trunk/libgfortran/io/unix.h
[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 Janne Blomqvist changed: What|Removed |Added URL||http://gcc.gnu.org/ml/gcc-p ||atches/2013-04/msg01127.htm ||l --- Comment #7 from Janne Blomqvist 2013-04-19 10:34:20 UTC --- Patch implementing the "unbuffered really means buffered but flush after every write" idea: http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01127.html
[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 --- Comment #6 from Jerry DeLisle 2013-04-18 01:21:42 UTC --- I like Jannes idea with the flags. Also, it seems that at the time we open a file we know it is /dev/null or /dev/nul in some cases by the file name. It would be very low overhead in a few cases to disable some or all checks and even disable the writing completely. We would not get all situations, but the low hanging fruit we could. It could be done by setting a "NULL" bit. One could consider doing this at compile time in some cases where the frontend could have more elaborate configuration checks that determine the name of the null device on the target system and look for its use. (probably not really worth if fur NULL I/O The other idea to consider is a compiler flag, say -fast-IO or similar that also disables the extra error checking that is not critical to runtime after a program has been debugged.
[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 --- Comment #5 from Tobias Burnus 2013-04-17 14:50:16 UTC --- (In reply to comment #4) > The reason why gfortran is slow here is that for non-regular files we use > unbuffered I/O. If you write to a regular file instead of /dev/null, you'll > see us doing ~8 KB writes at a time. > > The reason for this is that non-regular files (a.k.a. special files) are > special in many ways wrt seeking. Some allow seeking just fine, some always > return 0, some return an error (and which special files behave in which way is > to some extent different on different OS'es). I do not understand the argument regarding seek. If seek doesn't work - why should there be a problem with buffering but not without? At least with SEQUENTIAL one cannot do without (buffer exceeded or no buffering) and with STREAM no seek should be required. > Also, for special files users often expect non-buffered IO, e.g. they want > output on the terminal directly instead of waiting until the 8 KB buffer fills > up, programs communicating via pipes can deadlock if data sits in the buffers, > etc. But the code should be able to wait until a complete record has been written? That should be rather quick, unless one write a 2GB array. I am not talking about flushing the data only when 8kB are filled or when the file is closed. And doing buffering within a record avoids seeks. > One could of course make "unbuffered" I/O in gfortran really mean "flush > the buffer at the end of each I/O statement" rather than not using a buffer at > all. We should consider this. * * * I have now updated timings with writing to a file. Results for the example in comment 0, but writing to a file ("test.dat", tmpfs). Unformatted is much faster with a normal file, but some others compilers are still significantly faster. And for formatted, all other compilers are significantly faster. Timing in sec Unformatted Formatted real / user real / user Compiler --- --- - 0.378/0.352 2.815/2.804 GCC 4.8.0 (-Ofast, 20130308, Rev. 196547) 0.307/0.296 1.303/1.288 g95 4.0.3 (g95 0.93!) Aug 17 2010 (-O3) 0.210/0.196 0.555/0.532 Sun Fortran 95 8.3 Linux_i386 2007/05/03 0.208/0.184 0.920/0.888 PathScale 3.2.99 0.176/0.152 2.185/2.168 NAGWare Fortran 5.1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0.127/0.125 1.091/1.080 GCC 4.9 (trunk, -Ofast) 0.120/0.118 0.465/0.459 g95 4.0.3 (g95 0.94!) Dec 17 2012 0.136/0.131 0.527/0.524 PathScale EKOPath 4.9.0 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0.335/0.316 2.866/2.860 GCC 4.7.2 20120920 (Cray Inc.) 0.204/0.188 0.659/0.628 Cray Fortran : Version 8.1.6 0.881/0.328 1.281/0.672 Intel 64, Version 13.1.1.163 0.444/0.432 0.884/0.864 pgf90 12.10-0 ---
[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 --- Comment #4 from Janne Blomqvist 2013-04-17 10:50:07 UTC --- The reason why gfortran is slow here is that for non-regular files we use unbuffered I/O. If you write to a regular file instead of /dev/null, you'll see us doing ~8 KB writes at a time. On my system, timing writing to /dev/null gives real0m0.727s user0m0.272s sys 0m0.452s whereas writing to a file gives real0m0.202s user0m0.180s sys 0m0.020s The reason for this is that non-regular files (a.k.a. special files) are special in many ways wrt seeking. Some allow seeking just fine, some always return 0, some return an error (and which special files behave in which way is to some extent different on different OS'es). As the buffered IO keeps track of the logical file pointer position, it can easily get out of sync with the physical position if it doesn't behave as for a regular file. Also, for special files users often expect non-buffered IO, e.g. they want output on the terminal directly instead of waiting until the 8 KB buffer fills up, programs communicating via pipes can deadlock if data sits in the buffers, etc. One could of course make "unbuffered" I/O in gfortran really mean "flush the buffer at the end of each I/O statement" rather than not using a buffer at all and instead using the raw POSIX I/O syscalls. This would perhaps not be a bad idea per se, but would require making the buffered I/O code handle special files in some sensible way. Another reason for gfortran slowness is that we do quite a lot of checking in data_transfer_init(), which means that there's quite a lot of per-record overhead. Writing a single element unformatted is thus the worst case. One way to speed up data_transfer_init, I think, is that instead of checking each flag bit (which says which I/O specifiers are present) separately, create a variable with forbidden flags for each I/O type (unformatted/formatted, sequential/direct/stream => 6x), and check the entire flag variable once (flag & forbidden_flags == 0). Only if there is an error, do the bit-by-bit checking in order to generate the error message.
[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 --- Comment #3 from Tobias Burnus 2013-04-17 09:39:58 UTC --- (In reply to comment #2) > There is a seek inside next_record_w_unf. That function is used for DIRECT > I/O. > Looks conceptually wrong to me for sequential unformatted. I won't have time > for a few days to look at this further. Well, what gfortran does is: * write place-holder record length in the heading record marker * write actual data * write tailing record marker (1st call to write_us_marker in next_record_w_unf) * write actual length of this record, i.e. seek back + write_us_marker + see to past the tailing record marker (all in next_record_w_unf) I think what other compilers do is to make use of the following item in the Fortran standard: "The value of the RECL= specifier shall be positive. It specifies the length of each record in a file being connected for direct access, or specifies the maximum length of a record in a file being connected for sequential access." (F2008, "9.5.6.15 RECL= specifier in the OPEN statement") I tried the following program: --- integer, allocatable :: array(:) integer :: rl, i open(99,file="/dev/null",form="unformatted") inquire(99,recl=rl) allocate(array(1024*1024*100)) array = 0 print *,rl, size(array)/4 write(99) (array, i=1,1000) close(99) end --- With gfortran, it takes only: 0.203s and one has: 19 mmap 26 open 392 lseek 1784 write The question is why there are that many seeks. There should be only a single record! With pathf95, it fails after 0.099s with the error: "This request exceeds the maximum record size." And with g95, it takes 4.946s (!) until it fails with "Writing more data than the record size (RECL)": 11 close 17 fstat 20 mprotect 21 stat 25 mmap 30 write 47 open where the mmap+munmap pairs seem to take the lion share of the time. However, one can do better: NAG f95 only needs 0.007s and does: 5 read 6 lseek 8 mprotect 10 fstat 23 stat 29 mmap 40 open 2003 write Maybe something like the following would work: * Create a reasonable sized buffer * Use it to buffer the writes, and if it fits, write the length, the buffer, the length. * If the argument is a (too) big array, write the length of data in the buffer plus array byte size, then the data - and only if another item comes, seek to the beginning and update the length. That should take care of: write(99) i, j, k write(99) i, j, k, small_array write(99) big_array and even write(99) i, j, k, big_array but it will not help for write(99) big_array1, big_array2 I think that covers the most important cases. One question is how large the buffer should be initially, whether it should be resizable - and how long it should remain allocated. Even a small buffer of 1024 kbyte (= 128 real(8) values) will help when writing small data like in the example of comment 0. If it is larger, the issue of freeing the data and/or resizing becomes more important - and one needs to be careful not to require huge amount of memory and/or do do very frequent memory allocation+freeing, which causes the problems with g95. * * * Closer look at NAG: It does the following (allocate moved before open, inquire removed): open("/dev/null", O_RDWR) = 3 mmap(NULL, 3856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ab0e000 mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ab22000 fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0 ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fff645b1c90) = -1 ENOTTY (Inappropriate ioctl for device) fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0 ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fff645b2200) = -1 ENOTTY (Inappropriate ioctl for device) mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ac22000 lseek(3, 0, SEEK_CUR) = 0 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 419426304) = 419426304 (and 999 further write lines) lseek(3, 0, SEEK_SET) = 0 read(3, "", 4096) = 0 lseek(3, 12, SEEK_CUR) = 0 write(3, "\0\0\0\250", 4) = 4 lseek(3, 18446744072233156608, SEEK_SET) = 0 read(3, "", 4096) = 0 lseek(3, 20, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 ftruncate(3, 0) = -1 EINVAL (Invalid argument) close(3)= 0 munmap(0x2ac22000, 4096)= 0 munmap(0x
[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 --- Comment #2 from Jerry DeLisle 2013-04-17 00:58:02 UTC --- There is a seek inside next_record_w_unf. That function is used for DIRECT I/O. Looks conceptually wrong to me for sequential unformatted. I won't have time for a few days to look at this further.
[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 Tobias Burnus changed: What|Removed |Added CC||burnus at gcc dot gnu.org --- Comment #1 from Tobias Burnus 2013-04-16 15:20:20 UTC --- For unformatted pathf95 and g95 have (top 10 entries of strace): 3 brk 4 close 3 getrlimit 4 ioctl 8 read 4 rt_sigaction 10 close 6 fstat 13 fstat 6 mprotect 20 mprotect 7 brk 21 stat 10 mmap 22 mmap 17 stat 46 open 23 open 367 write 734 write gfortran has the following syscalls, which are at least invoked twice: 3 brk 5 read 7 close 10 rt_sigaction 11 access 11 fstat 12 mprotect 16 stat 18 mmap 26 open 200 lseek 400 write