[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2016-02-26 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981

Jerry DeLisle  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

--- Comment #14 from Jerry DeLisle  ---
Nothing, closing

[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2016-02-25 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981

Dominique d'Humieres  changed:

   What|Removed |Added

 Status|NEW |WAITING

--- Comment #13 from Dominique d'Humieres  ---
What is missing to close this PR as FIXED?

[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2014-06-08 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981

--- Comment #12 from Jerry DeLisle  ---
$ strace -c ./a.out 
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 99.970.078667   0   100   write
  0.030.22   211   fstat
  0.000.00   0 5   read
  0.000.00   01710 open
  0.000.00   0 7   close
  0.000.00   0 8 6 stat
  0.000.00   017   mmap
  0.000.00   012   mprotect
  0.000.00   0 1   munmap
  0.000.00   0 3   brk
  0.000.00   010   rt_sigaction
  0.000.00   010 9 access
  0.000.00   0 1   execve
  0.000.00   0 1   readlink
  0.000.00   0 1   arch_prctl
-- --- --- - - 
100.000.078689   100010425 total

This is looking pretty good now!


[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2014-06-08 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981

--- Comment #11 from Jerry DeLisle  ---
Janne, can you post some new benchmarks for comparison?


[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2014-06-07 Thread jb at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981

--- Comment #10 from Janne Blomqvist  ---
Author: jb
Date: Sun Jun  8 05:43:29 2014
New Revision: 211353

URL: http://gcc.gnu.org/viewcvs?rev=211353&root=gcc&view=rev
Log:
PR 56981 Flush buffer at record boundary if possible.

2014-06-08  Janne Blomqvist  

PR libfortran/56981
* io/unix.h (struct stream_vtable): Add new member function,
markeor.
(smarkeor): New inline function.
(flush_if_unbuffered): Remove prototype.
* io/unix.c (raw_markeor): New function.
(raw_vtable): Initialize markeor member.
(buf_markeor): New function.
(buf_vtable): Initialize markeor member.
(mem_vtable): Likewise.
(mem4_vtable): Likewise.
(flush_if_unbuffered): Remove function.
* io/transfer.c (next_record): Call smarkeor instead of
flush_if_unbuffered.

Modified:
trunk/libgfortran/ChangeLog
trunk/libgfortran/io/transfer.c
trunk/libgfortran/io/unix.c
trunk/libgfortran/io/unix.h


[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2013-12-21 Thread dominiq at lps dot ens.fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981

Dominique d'Humieres  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2013-12-21
 Ever confirmed|0   |1


[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2013-04-29 Thread burnus at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981



--- Comment #9 from Tobias Burnus  2013-04-29 
09:36:04 UTC ---

Follow-up idea regarding the flushing of when the buffer is full to avoid

unnecessary seeks: http://gcc.gnu.org/ml/fortran/2013-04/msg00258.html


[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2013-04-29 Thread burnus at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981



--- Comment #8 from Tobias Burnus  2013-04-29 
09:34:58 UTC ---

Author: jb

Date: Mon Apr 29 08:42:00 2013

New Revision: 198390



URL: http://gcc.gnu.org/viewcvs?rev=198390&root=gcc&view=rev

Log:

PR 56981 Improve unbuffered performance on special files.



2013-04-29  Janne Blomqvist  



PR fortran/56981

* io/transfer.c (next_record_w_unf): First fix head marker, then

write tail.

(next_record): Call flush_if_unbuffered.

* io/unix.c (struct unix_stream): Add field unbuffered.

(flush_if_unbuffered): New function.

(fd_to_stream): New argument.

(open_external): Fix fd_to_stream call.

(input_stream): Likewise.

(output_stream): Likewise.

(error_stream): Likewise.

* io/unix.h (flush_if_unbuffered): New prototype.



Modified:

trunk/libgfortran/ChangeLog

trunk/libgfortran/io/transfer.c

trunk/libgfortran/io/unix.c

trunk/libgfortran/io/unix.h


[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2013-04-19 Thread jb at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981



Janne Blomqvist  changed:



   What|Removed |Added



URL||http://gcc.gnu.org/ml/gcc-p

   ||atches/2013-04/msg01127.htm

   ||l



--- Comment #7 from Janne Blomqvist  2013-04-19 10:34:20 
UTC ---

Patch implementing the "unbuffered really means buffered but flush after every

write" idea: http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01127.html


[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2013-04-17 Thread jvdelisle at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981



--- Comment #6 from Jerry DeLisle  2013-04-18 
01:21:42 UTC ---

I like Jannes idea with the flags.  Also, it seems that at the time we open a

file we know it is /dev/null or /dev/nul in some cases by the file name.  It

would be very low overhead in a few cases to disable some or all checks and

even disable the writing completely.  We would not get all situations, but the

low hanging fruit we could.  It could be done by setting a "NULL" bit.



One could consider doing this at compile time in some cases where the frontend

could have more elaborate configuration checks that determine the name of the

null device on the target system and look for its use. (probably not really

worth if fur NULL I/O



The other idea to consider is a compiler flag, say -fast-IO or similar that

also disables the extra error checking that is not critical to runtime after a

program has been debugged.


[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2013-04-17 Thread burnus at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981



--- Comment #5 from Tobias Burnus  2013-04-17 
14:50:16 UTC ---

(In reply to comment #4)

> The reason why gfortran is slow here is that for non-regular files we use

> unbuffered I/O. If you write to a regular file instead of /dev/null, you'll 

> see us doing ~8 KB writes at a time.

> 

> The reason for this is that non-regular files (a.k.a. special files) are

> special in many ways wrt seeking. Some allow seeking just fine, some always

> return 0, some return an error (and which special files behave in which way is

> to some extent different on different OS'es).



I do not understand the argument regarding seek. If seek doesn't work - why

should there be a problem with buffering but not without? At least with

SEQUENTIAL one cannot do without (buffer exceeded or no buffering) and with

STREAM no seek should be required.



> Also, for special files users often expect non-buffered IO, e.g. they want

> output on the terminal directly instead of waiting until the 8 KB buffer fills

> up, programs communicating via pipes can deadlock if data sits in the buffers,

> etc.



But the code should be able to wait until a complete record has been written?

That should be rather quick, unless one write a 2GB array. I am not talking

about flushing the data only when 8kB are filled or when the file is closed.

And doing buffering within a record avoids seeks.



> One could of course make "unbuffered" I/O in gfortran really mean "flush

> the buffer at the end of each I/O statement" rather than not using a buffer at

> all.



We should consider this.



 * * *



I have now updated timings with writing to a file.



Results for the example in comment 0, but writing to a file ("test.dat",

tmpfs). Unformatted is much faster with a normal file, but some others

compilers are still significantly faster. And for formatted, all other

compilers are significantly faster.



 Timing in sec 

Unformatted  Formatted

real / user  real / user  Compiler

---  ---  -

0.378/0.352  2.815/2.804  GCC 4.8.0 (-Ofast, 20130308, Rev. 196547)

0.307/0.296  1.303/1.288  g95 4.0.3 (g95 0.93!) Aug 17 2010 (-O3)

0.210/0.196  0.555/0.532  Sun Fortran 95 8.3 Linux_i386 2007/05/03

0.208/0.184  0.920/0.888  PathScale 3.2.99

0.176/0.152  2.185/2.168  NAGWare Fortran 5.1

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

0.127/0.125  1.091/1.080  GCC 4.9 (trunk, -Ofast)

0.120/0.118  0.465/0.459  g95 4.0.3 (g95 0.94!) Dec 17 2012

0.136/0.131  0.527/0.524  PathScale EKOPath 4.9.0

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

0.335/0.316  2.866/2.860  GCC 4.7.2 20120920 (Cray Inc.)

0.204/0.188  0.659/0.628  Cray Fortran : Version 8.1.6

0.881/0.328  1.281/0.672  Intel 64, Version 13.1.1.163

0.444/0.432  0.884/0.864  pgf90 12.10-0

---


[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2013-04-17 Thread jb at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981



--- Comment #4 from Janne Blomqvist  2013-04-17 10:50:07 
UTC ---

The reason why gfortran is slow here is that for non-regular files we use

unbuffered I/O. If you write to a regular file instead of /dev/null, you'll see

us doing ~8 KB writes at a time. On my system, timing writing to /dev/null

gives



real0m0.727s

user0m0.272s

sys 0m0.452s



whereas writing to a file gives



real0m0.202s

user0m0.180s

sys 0m0.020s





The reason for this is that non-regular files (a.k.a. special files) are

special in many ways wrt seeking. Some allow seeking just fine, some always

return 0, some return an error (and which special files behave in which way is

to some extent different on different OS'es). As the buffered IO keeps track of

the logical file pointer position, it can easily get out of sync with the

physical position if it doesn't behave as for a regular file.



Also, for special files users often expect non-buffered IO, e.g. they want

output on the terminal directly instead of waiting until the 8 KB buffer fills

up, programs communicating via pipes can deadlock if data sits in the buffers,

etc. One could of course make "unbuffered" I/O in gfortran really mean "flush

the buffer at the end of each I/O statement" rather than not using a buffer at

all and instead using the raw POSIX I/O syscalls. This would perhaps not be a

bad idea per se, but would require making the buffered I/O code handle special

files in some sensible way.



Another reason for gfortran slowness is that we do quite a lot of checking in

data_transfer_init(), which means that there's quite a lot of per-record

overhead. Writing a single element unformatted is thus the worst case. One way

to speed up data_transfer_init, I think, is that instead of checking each flag

bit (which says which I/O specifiers are present) separately, create a variable

with forbidden flags for each I/O type (unformatted/formatted,

sequential/direct/stream => 6x), and check the entire flag variable once (flag

& forbidden_flags == 0). Only if there is an error, do the bit-by-bit checking

in order to generate the error message.


[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2013-04-17 Thread burnus at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981



--- Comment #3 from Tobias Burnus  2013-04-17 
09:39:58 UTC ---

(In reply to comment #2)

> There is a seek inside next_record_w_unf. That function is used for DIRECT 
> I/O.

> Looks conceptually wrong to me for sequential unformatted.  I won't have time

> for a few days to look at this further.



Well, what gfortran does is:



* write place-holder record length in the heading record marker

* write actual data

* write tailing record marker (1st call to write_us_marker in

next_record_w_unf)

* write actual length of this record, i.e. seek back + write_us_marker + see to

past the tailing record marker (all in next_record_w_unf)





I think what other compilers do is to make use of the following item in the

Fortran standard:



"The value of the RECL= specifier shall be positive. It specifies the length of

each record in a file being connected for direct access, or specifies the

maximum length of a record in a file being connected for sequential access."

(F2008, "9.5.6.15 RECL= specifier in the OPEN statement")





I tried the following program:

---

integer, allocatable :: array(:)

integer :: rl, i

open(99,file="/dev/null",form="unformatted")

inquire(99,recl=rl)

allocate(array(1024*1024*100))

array = 0

print *,rl, size(array)/4

write(99) (array, i=1,1000)

close(99)

end

---



With gfortran, it takes only: 0.203s and one has:

 19 mmap

 26 open

392 lseek

   1784 write



The question is why there are that many seeks. There should be only a single

record!





With pathf95, it fails after 0.099s with the error:

 "This request exceeds the maximum record size."



And with g95, it takes 4.946s (!) until it fails with "Writing more data than

the record size (RECL)":

 11 close

 17 fstat

 20 mprotect

 21 stat

 25 mmap

 30 write

 47 open

where the mmap+munmap pairs seem to take the lion share of the time.





However, one can do better: NAG f95 only needs 0.007s and does:

  5 read

  6 lseek

  8 mprotect

 10 fstat

 23 stat

 29 mmap

 40 open

   2003 write





Maybe something like the following would work:

* Create a reasonable sized buffer

* Use it to buffer the writes, and if it fits, write the length, the buffer,

the length.

* If the argument is a (too) big array, write the length of data in the buffer

plus array byte size, then the data - and only if another item comes, seek to

the beginning and update the length.



That should take care of:

  write(99) i, j, k

  write(99) i, j, k, small_array

  write(99) big_array

and even

  write(99) i, j, k, big_array

but it will not help for

  write(99) big_array1, big_array2



I think that covers the most important cases. One question is how large the

buffer should be initially, whether it should be resizable - and how long it

should remain allocated. Even a small buffer of 1024 kbyte (= 128 real(8)

values) will help when writing small data like in the example of comment 0.



If it is larger, the issue of freeing the data and/or resizing becomes more

important - and one needs to be careful not to require huge amount of memory

and/or do do very frequent memory allocation+freeing, which causes the problems

with g95.



 * * *



Closer look at NAG: It does the following (allocate moved before open, inquire

removed):



open("/dev/null", O_RDWR)   = 3

mmap(NULL, 3856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =

0x2ab0e000

mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =

0x2ab22000

fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0

ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS,

0x7fff645b1c90) = -1 ENOTTY (Inappropriate ioctl for device)

fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0

ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS,

0x7fff645b2200) = -1 ENOTTY (Inappropriate ioctl for device)

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =

0x2ac22000

lseek(3, 0, SEEK_CUR)   = 0



write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,

4096) = 4096

write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,

419426304) = 419426304



(and 999 further write lines)



lseek(3, 0, SEEK_SET)   = 0

read(3, "", 4096)   = 0

lseek(3, 12, SEEK_CUR)  = 0

write(3, "\0\0\0\250", 4)   = 4

lseek(3, 18446744072233156608, SEEK_SET) = 0

read(3, "", 4096)   = 0

lseek(3, 20, SEEK_CUR)  = 0

lseek(3, 0, SEEK_CUR)   = 0

ftruncate(3, 0) = -1 EINVAL (Invalid argument)

close(3)= 0

munmap(0x2ac22000, 4096)= 0

munmap(0x

[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2013-04-16 Thread jvdelisle at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981



--- Comment #2 from Jerry DeLisle  2013-04-17 
00:58:02 UTC ---

There is a seek inside next_record_w_unf. That function is used for DIRECT I/O.

Looks conceptually wrong to me for sequential unformatted.  I won't have time

for a few days to look at this further.


[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2013-04-16 Thread burnus at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981



Tobias Burnus  changed:



   What|Removed |Added



 CC||burnus at gcc dot gnu.org



--- Comment #1 from Tobias Burnus  2013-04-16 
15:20:20 UTC ---

For unformatted pathf95 and g95 have (top 10 entries of strace):

  3 brk   4 close

  3 getrlimit 4 ioctl

  8 read  4 rt_sigaction

 10 close 6 fstat

 13 fstat 6 mprotect

 20 mprotect  7 brk

 21 stat 10 mmap

 22 mmap 17 stat

 46 open 23 open

367 write   734 write



gfortran has the following syscalls, which are at least invoked twice:

  3 brk

  5 read

  7 close

 10 rt_sigaction

 11 access

 11 fstat

 12 mprotect

 16 stat

 18 mmap

 26 open

200 lseek

400 write