http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48419
Summary: Reduce gfortran stack usage for procedures doing IO Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: fortran AssignedTo: unassig...@gcc.gnu.org ReportedBy: j...@gcc.gnu.org Consider program iostack integer, save :: diff = 0 print *, "Stack usage when executing print" call x(1) print *, "Stack usage in equivalent non-IO procedure" call y(1) contains recursive subroutine x(i) print *, diff - loc(i) diff = loc(i) if (i < 10) call x(i+1) return end subroutine x recursive subroutine y(i) call printy(diff - loc(i)) diff = loc(i) if (i < 10) call y(i+1) return end subroutine y recursive subroutine printy(i) print *, i end subroutine printy end program iostack On i686 I get Stack usage when executing print -134515072 1210538920 400 400 400 400 400 400 400 400 Stack usage in equivalent non-IO procedure -1210542120 1210538904 32 32 32 32 32 32 32 32 Which means that for the procedure with an IO call, we need an extra 400-32 = 368 bytes stack space. On 64-bit targets, more. This a) will limit recursion depth as most systems are by default configured with quite small stack sizes, and b) perhaps reduce performance as we put a largish amount of sparsely accessed data right in the cache-hottest memory. This large stack usage is due to allocating a large struct on the stack prior to making any IO calls. There are two main reasons for this. 1. Most of the struct is for library private use. Apart from storing the pointer to the gfc_unit, so that we don't need to lock the unit tree and traverse it for every transfer_*() call, I see little reason why the rest of this data cannot be part of the gfc_unit itself (which is stored on the heap), or in heap memory with only a pointer to it kept. Depending on how wants to structure it, one could maybe argue that it's useful to retain the pointer to the transfer function and another pointer to the transfer data if one wants to keep it separate from gfc_unit. But still, 1-3 pointers vs. the current 16 pointers and 32 ints. 2. Another part of the struct is fields for every possible IO specifier. As most IO calls use just a few specifiers, most of this data is never accessed (one field in the struct is a bitmap specifying which of the other fields are valid). Case #1 should be relatively easy to fix. For #2 there are a few options. Say, a) A char array containing all the data. Walk over the flags variable, and for each set bit, read the appropriate number of bytes from the char array and bump a pointer to the current position. This is probably the most space efficient, but might run into alignment issues on some targets? So maybe one needs to memcpy() the data to some properly aligned data before actually using it. b) Create a union of all the specifier data structs. Then pass an array of this union type, with the flags variable specifying the type of each element (that is, the length of the array is the number of set bits in the flag variable). This might waste a bit of space compared to the previous approach, but should ensure that everything is properly aligned. From a brief scan of current sources, the biggest element is probably for character variables, which is a pointer to the string and the length, so in no case will the union type be particularly large. c) Get rid of the flags variable, and instead pass a variable specifying the array length, the array elements being a struct of an enum specifying the element type, and the union type described in the previous approach. Personally, I think option c) looks the cleanest.