[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 Bug 88076 depends on bug 97530, which changed state. Bug 97530 Summary: Segmentation fault compiling coarray program with option -fcoarray=shared (not with -fcoarray={lib,single}) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97530 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 --- Comment #19 from Jerry DeLisle --- (In reply to Nicolas Koenig from comment #18) > Created attachment 46723 [details] > Compiler Diff > > I accidentally attached an old patch, here is the right one :) And thanks > for helping, Jerry, what will you be working on? My first step is to get what you have set up here and get some test cases in place and try some things, then I can look further at trying to understand. Then go from there. Will let you know seprately from this PR so we dont clutter it up here. I also need to coordinate with Steve as we look at it.
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 Nicolas Koenig changed: What|Removed |Added Attachment #46714|0 |1 is obsolete|| --- Comment #18 from Nicolas Koenig --- Created attachment 46723 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46723=edit Compiler Diff I accidentally attached an old patch, here is the right one :) And thanks for helping, Jerry, what will you be working on?
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 --- Comment #17 from Jerry DeLisle --- I am getting this at the moment after applying patches to trunk. ../../trunk/gcc/fortran/trans-stmt.c: In function ‘tree_node* gfc_trans_deallocate(gfc_code*)’: ../../trunk/gcc/fortran/trans-stmt.c:6925:4: error: ‘is_native_coarray’ was not declared in this scope 6925 |is_native_coarray = ar_attr.codimension; |^ ../../trunk/gcc/fortran/trans-stmt.c:6928:45: error: ‘is_native_coarray’ was not declared in this scope 6928 | if (expr->rank || is_coarray_array || is_native_coarray) | ^ make[2]: *** [Makefile:1118: fortran/trans-stmt.o] Error 1 make[2]: *** Waiting for unfinished jobs make[1]: *** [Makefile:4360: all-gcc] Error 2 make: *** [Makefile:958: all] Error 2
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 --- Comment #16 from Jerry DeLisle --- (In reply to Steve Kargl from comment #15) > On Wed, Aug 14, 2019 at 08:33:04PM +, koenigni at gcc dot gnu.org wrote: > > > > Yes, I'm still working on it (slowly, though, sorry :( ). Here is a diff of > > my > > current trunk. I don't know what exactly changed since the last version, but > > the biggest two things still missing are actually deallocating coarrays in > > DEALLOCATE-statements or in types and all the intrinsics like CO_SUM. Locks > > already work, though. > > Thanks for the update. No need to apologize. I assume you > are off doing work, school, fun stuff, ... > > I'm running out of bugs that I can fix, so thought I would > venture into coarray territory. Having a SHM backend might > be easier to work with than opencoarray and openmpi. I will be loading up these patches on my new machine here to as well. so we start to grind away on this.
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 --- Comment #15 from Steve Kargl --- On Wed, Aug 14, 2019 at 08:33:04PM +, koenigni at gcc dot gnu.org wrote: > > Yes, I'm still working on it (slowly, though, sorry :( ). Here is a diff of my > current trunk. I don't know what exactly changed since the last version, but > the biggest two things still missing are actually deallocating coarrays in > DEALLOCATE-statements or in types and all the intrinsics like CO_SUM. Locks > already work, though. Thanks for the update. No need to apologize. I assume you are off doing work, school, fun stuff, ... I'm running out of bugs that I can fix, so thought I would venture into coarray territory. Having a SHM backend might be easier to work with than opencoarray and openmpi.
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 --- Comment #14 from Nicolas Koenig --- Created attachment 46715 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46715=edit Library Diff
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 Nicolas Koenig changed: What|Removed |Added Attachment #45669|0 |1 is obsolete|| --- Comment #13 from Nicolas Koenig --- Created attachment 46714 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46714=edit Compiler Diff Yes, I'm still working on it (slowly, though, sorry :( ). Here is a diff of my current trunk. I don't know what exactly changed since the last version, but the biggest two things still missing are actually deallocating coarrays in DEALLOCATE-statements or in types and all the intrinsics like CO_SUM. Locks already work, though. Here's an approximate list of features already in the patch (based on the test cases I have lying around): - coarray accesses (both implicit `ca` and explicit `ca[1]`) - this_image/num_images (retrieving image numbers only, no arguments) - statically allocated arrays - dynamically allocated arrays - syncing (SYNC ALL and SYNC IMAGES) - locking (LOCK_TYPE)
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 kargl at gcc dot gnu.org changed: What|Removed |Added CC||kargl at gcc dot gnu.org --- Comment #12 from kargl at gcc dot gnu.org --- Hi Nicolas, Any progress/update on a shared memory coarray implementation?
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 Nicolas Koenig changed: What|Removed |Added Attachment #45536|0 |1 is obsolete|| --- Comment #11 from Nicolas Koenig --- Created attachment 45670 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45670=edit library v2 Here is the new version of the library. They should really add the ability to attach more than one file. I will ask the mailing list about integration with libgfortran.
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 Nicolas Koenig changed: What|Removed |Added Attachment #45535|0 |1 is obsolete|| --- Comment #10 from Nicolas Koenig --- Created attachment 45669 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45669=edit Proof-of-concept v2 Also, here is an updated version that adds (preliminary) support for implicit coarray accesses and coarrays in modules as well as fixing some bugs in the old one. It is now capable of compiling and running the following simple test: module co integer:: a[*] end module program main use co, only: a implicit none a[next_image()] = this_image() sync all print *, 'Hi from', a, 'to', this_image() contains function next_image() integer:: next_image next_image = mod(this_image(), num_images()) + 1 end function end program
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 --- Comment #9 from Nicolas Koenig --- Sorry for the late reply, there was a sad incidence with my laptop and ice cream :D (In reply to Damian Rouson from comment #8) > (In reply to Nicolas Koenig from comment #7) > > > I actually opted to use multiprocessing with shared memory (shm_open() & co) > > instead of multithreading, since it will be much easier and faster with > > static variables, of which gfortran makes extensive use. Also, it greatly > > simplifies interoperability with OpenMP. > > This sounds like a great choice. I have no prior familiarity with > shm_open(), > but I very much like the idea of simplifying interoperability with OpenMP. > > > The only real downsides I can think of are slower spinup times... > > It will be interesting to compare the performance with MPI. I also wonder if > this would also someday provide for a hybrid implementation wherein > shm_open() > is used within a node and MPI is used across nodes, e.g., maybe images within > a TEAM could use shm_open() to communicate, while any communication between > TEAMs could use MPI. > I think that would be ideal. The only problem with this would be that we would have to maintain 3 implementations, which would be quite work intensive. > > > > > I actually think it would be best not to turn it into a separate library but > > instead integrate it into libgfortran. > > I agree. > > > This way, it will not be necessary to > > install a seperate library and thereby make it easier for people to start > > using coarrays. Therefore, it would make sense to use the libgfortran > > descriptors. > > > > > At the moment, sync_all() is called after image creation. > > I think that will suffice.
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 --- Comment #8 from Damian Rouson --- (In reply to Nicolas Koenig from comment #7) > I actually opted to use multiprocessing with shared memory (shm_open() & co) > instead of multithreading, since it will be much easier and faster with > static variables, of which gfortran makes extensive use. Also, it greatly > simplifies interoperability with OpenMP. This sounds like a great choice. I have no prior familiarity with shm_open(), but I very much like the idea of simplifying interoperability with OpenMP. > The only real downsides I can think of are slower spinup times... It will be interesting to compare the performance with MPI. I also wonder if this would also someday provide for a hybrid implementation wherein shm_open() is used within a node and MPI is used across nodes, e.g., maybe images within a TEAM could use shm_open() to communicate, while any communication between TEAMs could use MPI. > > I actually think it would be best not to turn it into a separate library but > instead integrate it into libgfortran. I agree. > This way, it will not be necessary to > install a seperate library and thereby make it easier for people to start > using coarrays. Therefore, it would make sense to use the libgfortran > descriptors. > > At the moment, sync_all() is called after image creation. I think that will suffice.
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 --- Comment #7 from Nicolas Koenig --- (In reply to Damian Rouson from comment #5) > This is an exciting idea. When I gave some thought to writing a > shared-memory alternative coarray ABI, it seemed to me that pthreads would > be a better choice than OpenMP. Part of the problem is that I was > considering writing the implementation in Fortran, and OpenMP lacked support > several modern Fortran features, including several object-oriented > programming features. That of course won't be an issue for you, however, > assuming you're going to write the implementation in C. I was going to > leverage "forthreads," an open-source Fortran 20003 interface to pthreads. > One thing that I think would be a major benefit of having a Fortran > implementation of the library is that it greatly expand the potential > community of contributors to include more of the users of the compiler. > I actually opted to use multiprocessing with shared memory (shm_open() & co) instead of multithreading, since it will be much easier and faster with static variables, of which gfortran makes extensive use. Also, it greatly simplifies interoperability with OpenMP. The only real downsides I can think of are slower spinup times (~1 cycles for processes vs. ~1000 for threads), far slower context switches (only a problem if more more images than cores are used) and slower allocation, since at the moment a mmap() call is needed for each one (the allocator tracks the offset and size in the memfile instead of the mmap()'ed memory regions. If this is to slow, I can just cache the pointers). As for writting it in fortran, see below :) > > Another important consideration is whether to use the current gfortran > descriptors as arguments in the library functions (as is currently used) or > instead to use the Fortran 2018 CFI descriptors for which Paul recently > committed support. If you go with the current gfortran descriptors, then > there could be a lot of code to rewrite if gfortran later adopts the > standard descriptors internally. Paul's recent commit adds functions that > can translate between the gfortran and standard descriptors. I have a > volunteer who I'm hoping will use the translation functions to develop a > new, alternative coarray ABI that accepts the standard descriptors. > I actually think it would be best not to turn it into a separate library but instead integrate it into libgfortran. This way, it will not be necessary to install a seperate library and thereby make it easier for people to start using coarrays. Therefore, it would make sense to use the libgfortran descriptors. > > On another note mentioned earlier in this PR, I believe it will be necessary > to fork all threads at the beginning of execution and not join them at the > end. Section 5.3.5 of the Fortran 2018 standard states, "Following the > creation of a fixed number of images, execution begins on each image." > Assuming there is a one-to-one correspondence between images and threads, I > read that as implying that a fixed number of threads have to be set up > before any one thread can execute. (Possibly there could also be additional > non-image threads that get forked later also though.) At the moment, sync_all() is called after image creation. > I recall seeing several interesting papers from 10-15 years ago on SPMD-style > programming using threads (OpenMP) so a literature search on this topic be > useful to read.
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 --- Comment #6 from Damian Rouson --- Correction to the end of the first sentence of the final paragraph in Comment 5: "... not join them _until_ the end."
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 --- Comment #5 from Damian Rouson --- This is an exciting idea. When I gave some thought to writing a shared-memory alternative coarray ABI, it seemed to me that pthreads would be a better choice than OpenMP. Part of the problem is that I was considering writing the implementation in Fortran, and OpenMP lacked support several modern Fortran features, including several object-oriented programming features. That of course won't be an issue for you, however, assuming you're going to write the implementation in C. I was going to leverage "forthreads," an open-source Fortran 20003 interface to pthreads. One thing that I think would be a major benefit of having a Fortran implementation of the library is that it greatly expand the potential community of contributors to include more of the users of the compiler. Another important consideration is whether to use the current gfortran descriptors as arguments in the library functions (as is currently used) or instead to use the Fortran 2018 CFI descriptors for which Paul recently committed support. If you go with the current gfortran descriptors, then there could be a lot of code to rewrite if gfortran later adopts the standard descriptors internally. Paul's recent commit adds functions that can translate between the gfortran and standard descriptors. I have a volunteer who I'm hoping will use the translation functions to develop a new, alternative coarray ABI that accepts the standard descriptors. On another note mentioned earlier in this PR, I believe it will be necessary to fork all threads at the beginning of execution and not join them at the end. Section 5.3.5 of the Fortran 2018 standard states, "Following the creation of a fixed number of images, execution begins on each image." Assuming there is a one-to-one correspondence between images and threads, I read that as implying that a fixed number of threads have to be set up before any one thread can execute. (Possibly there could also be additional non-image threads that get forked later also though.) I recall seeing several interesting papers from 10-15 years ago on SPMD-style programming using threads (OpenMP) so a literature search on this topic be useful to read.
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 --- Comment #4 from Nicolas Koenig --- Created attachment 45536 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45536=edit library Here is the library. At the moment, it has an interprocess allocator and handles the creation and reaping of images. It also has a very simple synchroniziation function. The library part still has to integrated with libgfortrans build system. I would suggest to handle it the same way libcaf_single is handled at the moment which is linked against if -fcoarray=native is specified. To compile the library at the moment, the Makefile has to edited to allow it to find libgfortran.h and libgfortran's config.h. After that 'make' will build it.
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 --- Comment #3 from Nicolas Koenig --- Created attachment 45535 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45535=edit Proof-of-concept As a little update, here is a proof-of-concept patch. It adds a new coarray option -fcoarray=native and allows (together with the library I will attach in the next post) the compilation and execution of the following very simple coarray program: $ cat test.f90 program main implicit none integer:: a[*] a[next_image()] = this_image() sync all print *, 'Hi from', a[this_image()], 'to', this_image() contains function next_image() integer:: next_image next_image = mod(this_image() + 1, num_images()) end function end program $ gfortran -fcoarray=native -Lpath/to/native/coarray/library test.f90 -lcoarraynative -lrt $ GFORTRAN_NUM_IMAGES=4 ./a.out Hi from 2 to 3 Hi from 4 to 1 Hi from 1 to 2 Hi from 0 to 4 P.S.: I got a bit sidetracked the last few months, so this all took a bit longer than expected :D
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 --- Comment #2 from Nicolas Koenig --- > Once you are done on this, you might consider implementing a -parallel as in > ifort. > > This could conveniently be triggered in frontend-passes.c, I suspect. ie. > this would be a good place to check for dependencies within a do loop and > signal, if there are none, that the loop can be parallelised. Then, with > everything that you have learned about trans-*.c in dealing with coarrays, > you should be able to do what is needed for do loops (and scalarization > loops). > I have a few ideas how we could do that, but it might get quite interesting. Do we spin up the threads once in the beginning or only when needed? And do we use OpenMP (It might give problems if OpenMP is used by the code itself)? But I'll think about that once the coarrays are done, which might take a bit :D > Just a thought. > > Paul
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 --- Comment #1 from Paul Thomas --- > I have opened this bug to track the progress and provide a forum for > discussion :) Nicolas, Once you are done on this, you might consider implementing a -parallel as in ifort. This could conveniently be triggered in frontend-passes.c, I suspect. ie. this would be a good place to check for dependencies within a do loop and signal, if there are none, that the loop can be parallelised. Then, with everything that you have learned about trans-*.c in dealing with coarrays, you should be able to do what is needed for do loops (and scalarization loops). Just a thought. Paul PS I am just going to look at your last email to have a stab at answering your question.
[Bug fortran/88076] Shared Memory implementation for Coarrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076 Nicolas Koenig changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2018-11-18 Ever confirmed|0 |1