[Bug fortran/88076] Shared Memory implementation for Coarrays

2020-10-25 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076
Bug 88076 depends on bug 97530, which changed state.

Bug 97530 Summary: Segmentation fault compiling coarray program with option 
-fcoarray=shared (not with -fcoarray={lib,single})
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97530

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug fortran/88076] Shared Memory implementation for Coarrays

2019-08-16 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

--- Comment #19 from Jerry DeLisle  ---
(In reply to Nicolas Koenig from comment #18)
> Created attachment 46723 [details]
> Compiler Diff
> 
> I accidentally attached an old patch, here is the right one :) And thanks
> for helping, Jerry, what will you be working on?

My first step is to get what you have set up here and get some test cases in
place and try some things, then I can look further at trying to understand.
Then go from there. Will let you know seprately from this PR so we dont clutter
it up here. I also need to coordinate with Steve as we look at it.

[Bug fortran/88076] Shared Memory implementation for Coarrays

2019-08-16 Thread koenigni at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

Nicolas Koenig  changed:

   What|Removed |Added

  Attachment #46714|0   |1
is obsolete||

--- Comment #18 from Nicolas Koenig  ---
Created attachment 46723
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46723=edit
Compiler Diff

I accidentally attached an old patch, here is the right one :) And thanks for
helping, Jerry, what will you be working on?

[Bug fortran/88076] Shared Memory implementation for Coarrays

2019-08-16 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

--- Comment #17 from Jerry DeLisle  ---
I am getting this at the moment after applying patches to trunk.

../../trunk/gcc/fortran/trans-stmt.c: In function ‘tree_node*
gfc_trans_deallocate(gfc_code*)’:
../../trunk/gcc/fortran/trans-stmt.c:6925:4: error: ‘is_native_coarray’ was not
declared in this scope
 6925 |is_native_coarray = ar_attr.codimension;
  |^
../../trunk/gcc/fortran/trans-stmt.c:6928:45: error: ‘is_native_coarray’ was
not declared in this scope
 6928 |   if (expr->rank || is_coarray_array || is_native_coarray)
  | ^
make[2]: *** [Makefile:1118: fortran/trans-stmt.o] Error 1
make[2]: *** Waiting for unfinished jobs
make[1]: *** [Makefile:4360: all-gcc] Error 2
make: *** [Makefile:958: all] Error 2

[Bug fortran/88076] Shared Memory implementation for Coarrays

2019-08-16 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

--- Comment #16 from Jerry DeLisle  ---
(In reply to Steve Kargl from comment #15)
> On Wed, Aug 14, 2019 at 08:33:04PM +, koenigni at gcc dot gnu.org wrote:
> > 
> > Yes, I'm still working on it (slowly, though, sorry :( ). Here is a diff of 
> > my
> > current trunk. I don't know what exactly changed since the last version, but
> > the biggest two things still missing are actually deallocating coarrays in
> > DEALLOCATE-statements or in types and all the intrinsics like CO_SUM. Locks
> > already work, though.
> 
> Thanks for the update.  No need to apologize.  I assume you
> are off doing work, school, fun stuff, ...
> 
> I'm running out of bugs that I can fix, so thought I would
> venture into coarray territory.  Having a SHM backend might
> be easier to work with than opencoarray and openmpi.

I will be loading up these patches on my new machine here to as well. so we
start to grind away on this.

[Bug fortran/88076] Shared Memory implementation for Coarrays

2019-08-14 Thread sgk at troutmask dot apl.washington.edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

--- Comment #15 from Steve Kargl  ---
On Wed, Aug 14, 2019 at 08:33:04PM +, koenigni at gcc dot gnu.org wrote:
> 
> Yes, I'm still working on it (slowly, though, sorry :( ). Here is a diff of my
> current trunk. I don't know what exactly changed since the last version, but
> the biggest two things still missing are actually deallocating coarrays in
> DEALLOCATE-statements or in types and all the intrinsics like CO_SUM. Locks
> already work, though.

Thanks for the update.  No need to apologize.  I assume you
are off doing work, school, fun stuff, ...

I'm running out of bugs that I can fix, so thought I would
venture into coarray territory.  Having a SHM backend might
be easier to work with than opencoarray and openmpi.

[Bug fortran/88076] Shared Memory implementation for Coarrays

2019-08-14 Thread koenigni at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

--- Comment #14 from Nicolas Koenig  ---
Created attachment 46715
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46715=edit
Library Diff

[Bug fortran/88076] Shared Memory implementation for Coarrays

2019-08-14 Thread koenigni at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

Nicolas Koenig  changed:

   What|Removed |Added

  Attachment #45669|0   |1
is obsolete||

--- Comment #13 from Nicolas Koenig  ---
Created attachment 46714
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46714=edit
Compiler Diff

Yes, I'm still working on it (slowly, though, sorry :( ). Here is a diff of my
current trunk. I don't know what exactly changed since the last version, but
the biggest two things still missing are actually deallocating coarrays in
DEALLOCATE-statements or in types and all the intrinsics like CO_SUM. Locks
already work, though.

Here's an approximate list of features already in the patch (based on the test
cases I have lying around):

- coarray accesses (both implicit `ca` and explicit `ca[1]`)
- this_image/num_images (retrieving image numbers only, no arguments)
- statically allocated arrays
- dynamically allocated arrays
- syncing (SYNC ALL and SYNC IMAGES)
- locking (LOCK_TYPE)

[Bug fortran/88076] Shared Memory implementation for Coarrays

2019-08-07 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

kargl at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kargl at gcc dot gnu.org

--- Comment #12 from kargl at gcc dot gnu.org ---
Hi Nicolas,

Any progress/update on a shared memory coarray implementation?

[Bug fortran/88076] Shared Memory implementation for Coarrays

2019-02-12 Thread koenigni at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

Nicolas Koenig  changed:

   What|Removed |Added

  Attachment #45536|0   |1
is obsolete||

--- Comment #11 from Nicolas Koenig  ---
Created attachment 45670
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45670=edit
library v2

Here is the new version of the library. They should really add the ability to
attach more than one file. I will ask the mailing list about integration with
libgfortran.

[Bug fortran/88076] Shared Memory implementation for Coarrays

2019-02-12 Thread koenigni at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

Nicolas Koenig  changed:

   What|Removed |Added

  Attachment #45535|0   |1
is obsolete||

--- Comment #10 from Nicolas Koenig  ---
Created attachment 45669
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45669=edit
Proof-of-concept v2

Also, here is an updated version that adds (preliminary) support for implicit
coarray accesses and coarrays in modules as well as fixing some bugs in the old
one. It is now capable of compiling and running the following simple test:

module co
  integer:: a[*]
end module

program main
  use co, only: a
  implicit none
  a[next_image()] = this_image()
  sync all
  print *, 'Hi from', a, 'to', this_image()
contains
  function next_image()
integer:: next_image
next_image = mod(this_image(), num_images()) + 1
  end function
end program

[Bug fortran/88076] Shared Memory implementation for Coarrays

2019-02-12 Thread koenigni at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

--- Comment #9 from Nicolas Koenig  ---
Sorry for the late reply, there was a sad incidence with my laptop and ice
cream :D

(In reply to Damian Rouson from comment #8)
> (In reply to Nicolas Koenig from comment #7)
> 
> > I actually opted to use multiprocessing with shared memory (shm_open() & co)
> > instead of multithreading, since it will be much easier and faster with
> > static variables, of which gfortran makes extensive use. Also, it greatly
> > simplifies interoperability with OpenMP. 
> 
> This sounds like a great choice.  I have no prior familiarity with
> shm_open(),
> but I very much like the idea of simplifying interoperability with OpenMP. 
> 
> > The only real downsides I can think of are slower spinup times... 
> 
> It will be interesting to compare the performance with MPI.  I also wonder if
> this would also someday provide for a hybrid implementation wherein
> shm_open()
> is used within a node and MPI is used across nodes, e.g., maybe images within
> a TEAM could use shm_open() to communicate, while any communication between
> TEAMs could use MPI.
> 

I think that would be ideal. The only problem with this would be that we would
have to maintain 3 implementations, which would be quite work intensive.

>
> > 
> > I actually think it would be best not to turn it into a separate library but
> > instead integrate it into libgfortran. 
> 
> I agree. 
> 
> > This way, it will not be necessary to
> > install a seperate library and thereby make it easier for people to start
> > using coarrays. Therefore, it would make sense to use the libgfortran
> > descriptors.
> 
> > 
> > At the moment, sync_all() is called after image creation.
> 
> I think that will suffice.

[Bug fortran/88076] Shared Memory implementation for Coarrays

2019-01-29 Thread damian at sourceryinstitute dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

--- Comment #8 from Damian Rouson  ---
(In reply to Nicolas Koenig from comment #7)

> I actually opted to use multiprocessing with shared memory (shm_open() & co)
> instead of multithreading, since it will be much easier and faster with
> static variables, of which gfortran makes extensive use. Also, it greatly
> simplifies interoperability with OpenMP. 

This sounds like a great choice.  I have no prior familiarity with shm_open(),
but I very much like the idea of simplifying interoperability with OpenMP. 

> The only real downsides I can think of are slower spinup times... 

It will be interesting to compare the performance with MPI.  I also wonder if
this would also someday provide for a hybrid implementation wherein shm_open()
is used within a node and MPI is used across nodes, e.g., maybe images within
a TEAM could use shm_open() to communicate, while any communication between
TEAMs could use MPI.

> 
> I actually think it would be best not to turn it into a separate library but
> instead integrate it into libgfortran. 

I agree. 

> This way, it will not be necessary to
> install a seperate library and thereby make it easier for people to start
> using coarrays. Therefore, it would make sense to use the libgfortran
> descriptors.

> 
> At the moment, sync_all() is called after image creation.

I think that will suffice.

[Bug fortran/88076] Shared Memory implementation for Coarrays

2019-01-29 Thread koenigni at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

--- Comment #7 from Nicolas Koenig  ---
(In reply to Damian Rouson from comment #5)
> This is an exciting idea.  When I gave some thought to writing a
> shared-memory alternative coarray ABI, it seemed to me that pthreads would
> be a better choice than OpenMP.  Part of the problem is that I was
> considering writing the implementation in Fortran, and OpenMP lacked support
> several modern Fortran features, including several object-oriented
> programming features.  That of course won't be an issue for you, however,
> assuming you're going to write the implementation in C.  I was going to
> leverage "forthreads," an open-source Fortran 20003 interface to pthreads. 
> One thing that I think would be a major benefit of having a Fortran
> implementation of the library is that it greatly expand the potential
> community of contributors to include more of the users of the compiler.
> 

I actually opted to use multiprocessing with shared memory (shm_open() & co)
instead of multithreading, since it will be much easier and faster with static
variables, of which gfortran makes extensive use. Also, it greatly simplifies
interoperability with OpenMP. The only real downsides I can think of are slower
spinup times (~1 cycles for processes vs. ~1000 for threads), far slower
context switches (only a problem if more more images than cores are used) and
slower allocation, since at the moment a mmap() call is needed for each one
(the allocator tracks the offset and size in the memfile instead of the
mmap()'ed memory regions. If this is to slow, I can just cache the pointers).
As for writting it in fortran, see below :)

>
> Another important consideration is whether to use the current gfortran
> descriptors as arguments in the library functions (as is currently used) or
> instead to use the Fortran 2018 CFI descriptors for which Paul recently
> committed support.  If you go with the current gfortran descriptors, then
> there could be a lot of code to rewrite if gfortran later adopts the
> standard descriptors internally.  Paul's recent commit adds functions that
> can translate between the gfortran and standard descriptors. I have a
> volunteer who I'm hoping will use the translation functions to develop a
> new, alternative coarray ABI that accepts the standard descriptors.
> 

I actually think it would be best not to turn it into a separate library but
instead integrate it into libgfortran. This way, it will not be necessary to
install a seperate library and thereby make it easier for people to start using
coarrays. Therefore, it would make sense to use the libgfortran descriptors.

>
> On another note mentioned earlier in this PR, I believe it will be necessary
> to fork all threads at the beginning of execution and not join them at the
> end.  Section 5.3.5 of the Fortran 2018 standard states, "Following the
> creation of a fixed number of images, execution begins on each image."
> Assuming there is a one-to-one correspondence between images and threads, I
> read that as implying that a fixed number of threads have to be set up
> before any one thread can execute.  (Possibly there could also be additional
> non-image threads that get forked later also though.) 

At the moment, sync_all() is called after image creation.

> I recall seeing several interesting papers from 10-15 years ago on SPMD-style
> programming using threads (OpenMP) so a literature search on this topic be 
> useful to read.

[Bug fortran/88076] Shared Memory implementation for Coarrays

2019-01-27 Thread damian at sourceryinstitute dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

--- Comment #6 from Damian Rouson  ---
Correction to the end of the first sentence of the final paragraph in Comment
5: "... not join them _until_ the end."

[Bug fortran/88076] Shared Memory implementation for Coarrays

2019-01-27 Thread damian at sourceryinstitute dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

--- Comment #5 from Damian Rouson  ---
This is an exciting idea.  When I gave some thought to writing a shared-memory
alternative coarray ABI, it seemed to me that pthreads would be a better choice
than OpenMP.  Part of the problem is that I was considering writing the
implementation in Fortran, and OpenMP lacked support several modern Fortran
features, including several object-oriented programming features.  That of
course won't be an issue for you, however, assuming you're going to write the
implementation in C.  I was going to leverage "forthreads," an open-source
Fortran 20003 interface to pthreads.  One thing that I think would be a major
benefit of having a Fortran implementation of the library is that it greatly
expand the potential community of contributors to include more of the users of
the compiler.

Another important consideration is whether to use the current gfortran
descriptors as arguments in the library functions (as is currently used) or
instead to use the Fortran 2018 CFI descriptors for which Paul recently
committed support.  If you go with the current gfortran descriptors, then there
could be a lot of code to rewrite if gfortran later adopts the standard
descriptors internally.  Paul's recent commit adds functions that can translate
between the gfortran and standard descriptors. I have a volunteer who I'm
hoping will use the translation functions to develop a new, alternative coarray
ABI that accepts the standard descriptors.

On another note mentioned earlier in this PR, I believe it will be necessary to
fork all threads at the beginning of execution and not join them at the end. 
Section 5.3.5 of the Fortran 2018 standard states, "Following the creation of a
fixed number of images, execution begins on each image."  Assuming there is a
one-to-one correspondence between images and threads, I read that as implying
that a fixed number of threads have to be set up before any one thread can
execute.  (Possibly there could also be additional non-image threads that get
forked later also though.)  I recall seeing several interesting papers from
10-15 years ago on SPMD-style programming using threads (OpenMP) so a
literature search on this topic be useful to read.

[Bug fortran/88076] Shared Memory implementation for Coarrays

2019-01-27 Thread koenigni at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

--- Comment #4 from Nicolas Koenig  ---
Created attachment 45536
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45536=edit
library

Here is the library. At the moment, it has an interprocess allocator and
handles the creation and reaping of images. It also has a very simple
synchroniziation function.

The library part still has to integrated with libgfortrans build system. I
would suggest to handle it the same way libcaf_single is handled at the moment
which is linked against if -fcoarray=native is specified. 

To compile the library at the moment, the Makefile has to edited to allow it to
find libgfortran.h and libgfortran's config.h. After that 'make' will build it.

[Bug fortran/88076] Shared Memory implementation for Coarrays

2019-01-27 Thread koenigni at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

--- Comment #3 from Nicolas Koenig  ---
Created attachment 45535
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45535=edit
Proof-of-concept

As a little update, here is a proof-of-concept patch. It adds a new coarray
option -fcoarray=native and allows (together with the library I will attach in
the next post) the compilation and execution of the following very simple
coarray program:

$ cat test.f90
program main
  implicit none
  integer:: a[*]
  a[next_image()] = this_image()
  sync all
  print *, 'Hi from', a[this_image()], 'to', this_image()
contains
  function next_image()
integer:: next_image
next_image = mod(this_image() + 1, num_images())
  end function
end program
$ gfortran -fcoarray=native -Lpath/to/native/coarray/library test.f90
-lcoarraynative -lrt
$ GFORTRAN_NUM_IMAGES=4 ./a.out
 Hi from   2 to   3
 Hi from   4 to   1
 Hi from   1 to   2
 Hi from   0 to   4

P.S.: I got a bit sidetracked the last few months, so this all took a bit
longer than expected :D

[Bug fortran/88076] Shared Memory implementation for Coarrays

2018-11-22 Thread koenigni at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

--- Comment #2 from Nicolas Koenig  ---
> Once you are done on this, you might consider implementing a -parallel as in
> ifort.
> 
> This could conveniently be triggered in frontend-passes.c, I suspect. ie.
> this would be a good place to check for dependencies within a do loop and
> signal, if there are none, that the loop can be parallelised. Then, with
> everything that you have learned about trans-*.c in dealing with coarrays,
> you should be able to do what is needed for do loops (and scalarization
> loops).
> 

I have a few ideas how we could do that, but it might get quite interesting. Do
we spin up the threads once in the beginning or only when needed? And do we use
OpenMP (It might give problems if OpenMP is used by the code itself)? But I'll
think about that once the coarrays are done, which might take a bit :D

> Just a thought.
> 
> Paul

[Bug fortran/88076] Shared Memory implementation for Coarrays

2018-11-18 Thread pault at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

--- Comment #1 from Paul Thomas  ---

> I have opened this bug to track the progress and provide a forum for
> discussion :)

Nicolas,

Once you are done on this, you might consider implementing a -parallel as in
ifort.

This could conveniently be triggered in frontend-passes.c, I suspect. ie. this
would be a good place to check for dependencies within a do loop and signal, if
there are none, that the loop can be parallelised. Then, with everything that
you have learned about trans-*.c in dealing with coarrays, you should be able
to do what is needed for do loops (and scalarization loops).

Just a thought.

Paul

PS I am just going to look at your last email to have a stab at answering your
question.

[Bug fortran/88076] Shared Memory implementation for Coarrays

2018-11-18 Thread koenigni at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88076

Nicolas Koenig  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2018-11-18
 Ever confirmed|0   |1