RFC: Merge the GUPC branch into the GCC 4.8 trunk (patch 0 of 16)

2012-10-15 Thread Gary Funck
We have maintained the gupc (GNU Unified Parallel C) branch for
a couple of years now, and would like to merge these changes into
the GCC trunk.

It is our goal to integrate the GUPC changes into the GCC 4.8
trunk, in order to provide a UPC (Unified Parallel C) capability
in the subsequent GCC 4.8 release.

The purpose of this note is to introduce the GUPC project,
provide an overview of the UPC-related changes and to introduce
the subsequent sets of patches which merge the GUPC branch into
GCC 4.8.

For reference,

The GUPC project page is here:
http://gcc.gnu.org/projects/gupc.html

The current GUPC release is distributed here:
http://gccupc.org

Roughly a year ago, we described the front-end related
changes at the time:
http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00081.html

We merge the GCC trunk into the gupc branch on approximately
a weekly basis.  The current GUPC branch is based upon a recent
version of the GCC trunk (192449 dated 2012-10-15), and has
been bootstrapped on x86_64/i686 Linux, PPC/POWER7/Linux and
IA64/Altix Linux. In earlier versions, GUPC was successfully
ported to SGI/MIPS (big endian) and SciCortex/MIPS (little endian).

The UPC-related source code differences
can be viewed here in various formats:
  http://gccupc.org/gupc-changes

In the discussion below, the changes are
excerpted in order to highlight important
aspects of the UPC-related changes.  The version used in
this presentation is 190707.

UPC's Shared Qualifier and Layout Qualifier
---

The UPC language specification describes
the language syntax and semantics:
  http://upc.gwu.edu/docs/upc_specs_1.2.pdf

UPC introduces a new qualifier, shared
that indicates that the qualified object
is located in a global shared address space
that is accessible by all UPC threads.
Additional qualifiers (strict and relaxed)
further specify the semantics of accesses to
UPC shared objects.

In UPC, a shared qualified array can further
specify a layout qualifier that indicates
how the shared data is blocked and distributed.

There are two language pre-defined identifiers
that indicate the number of threads that
will be created when the program starts (THREADS)
and the current (zero-based) thread number
(MYTHREAD).  Typically, a UPC thread is implemented
as an operating system process.  Access to UPC
shared memory may be implemented locally via
OS provided facilities (for example, mmap),
or across nodes via a high speed network
inter-connect (for example, Infiniband).

GUPC provides a runtime (libgupc) that targets
an SMP-based system and uses mmap() to implement
global shared memory.  

Optionally, GUPC can use the more general and
more capable Berkeley UPCR runtime:
  http://upc.lbl.gov/download/source.shtml#runtime
The UPCR runtime supports a number of network
topologies, and has been ported to most of the
current High Performance Computing (HPC) systems.

The following example illustrates
the use of the UPC shared qualifier
combined with a layout qualifier.

#define BLKSIZE 5
#define N_PER_THREAD (4 * BLKSIZE)
shared [BLKSIZE] double A[N_PER_THREAD*THREADS];

Above the [BLKSIZE] construct is the UPC
layout factor; this specifies that the shared
array, A, distributes its elements across
each thread in blocks of 5 elements.  If the
program is run with two threads, then A is
distributed as shown below:

Thread 0Thread 1
-
A[ 0.. 4]   A[ 5.. 9]
A[10..14]   A[15..19]
A[20..24]   A[25..29]
A[30..34]   A[35..39]

Above, the elements shown for thread 0
are defined as having affinity to thread 0.
Similarly, those elements shown for thread 1
have affinity to thread 1.  In UPC, a pointer
to a shared object can be cast to a thread
local pointer (a C pointer), when the
designated shared object has affinity
to the referencing thread.

A UPC pointer-to-shared (PTS) is a pointer
that references a UPC shared object.
A UPC pointer-to-shared is a fat pointer
with the following logical fields:
   (virt_addr, thread, offset)

The virtual address (virt_addr) field is combined with
the thread number (thread) and offset within the
block (offset), to derive the location of the
referenced object within the UPC shared address space.

GUPC implements pointer-to-shared objects using
either a packed representation or a struct
representation.  The user can select the
pointer-to-shared representation with a configure
parameter.  The packed representation is the default.

The packed pointer-to-shared representation
limits the range of the various fields within
the pointer-to-shared in order to gain efficiency.
Packed pointer-to-shared values encode the three
part shared address (described above) as a 64-bit
value (on both 64-bit and 32-bit platforms).

The struct representation provides a wider
addressing range at the expense of requiring
twice the number of bits (128) needed to encode
the pointer-to-shared value.

UPC-Related Front-End Changes

Re: RFC: Merge the GUPC branch into the GCC 4.8 trunk (patch 0 of 16)

2012-10-15 Thread Richard Biener
On Mon, Oct 15, 2012 at 5:47 PM, Gary Funck g...@intrepid.com wrote:
 We have maintained the gupc (GNU Unified Parallel C) branch for
 a couple of years now, and would like to merge these changes into
 the GCC trunk.

 It is our goal to integrate the GUPC changes into the GCC 4.8
 trunk, in order to provide a UPC (Unified Parallel C) capability
 in the subsequent GCC 4.8 release.

 The purpose of this note is to introduce the GUPC project,
 provide an overview of the UPC-related changes and to introduce
 the subsequent sets of patches which merge the GUPC branch into
 GCC 4.8.

 For reference,

 The GUPC project page is here:
 http://gcc.gnu.org/projects/gupc.html

 The current GUPC release is distributed here:
 http://gccupc.org

 Roughly a year ago, we described the front-end related
 changes at the time:
 http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00081.html

 We merge the GCC trunk into the gupc branch on approximately
 a weekly basis.  The current GUPC branch is based upon a recent
 version of the GCC trunk (192449 dated 2012-10-15), and has
 been bootstrapped on x86_64/i686 Linux, PPC/POWER7/Linux and
 IA64/Altix Linux. In earlier versions, GUPC was successfully
 ported to SGI/MIPS (big endian) and SciCortex/MIPS (little endian).

 The UPC-related source code differences
 can be viewed here in various formats:
   http://gccupc.org/gupc-changes

 In the discussion below, the changes are
 excerpted in order to highlight important
 aspects of the UPC-related changes.  The version used in
 this presentation is 190707.

 UPC's Shared Qualifier and Layout Qualifier
 ---

 The UPC language specification describes
 the language syntax and semantics:
   http://upc.gwu.edu/docs/upc_specs_1.2.pdf

 UPC introduces a new qualifier, shared
 that indicates that the qualified object
 is located in a global shared address space
 that is accessible by all UPC threads.
 Additional qualifiers (strict and relaxed)
 further specify the semantics of accesses to
 UPC shared objects.

 In UPC, a shared qualified array can further
 specify a layout qualifier that indicates
 how the shared data is blocked and distributed.

 There are two language pre-defined identifiers
 that indicate the number of threads that
 will be created when the program starts (THREADS)
 and the current (zero-based) thread number
 (MYTHREAD).  Typically, a UPC thread is implemented
 as an operating system process.  Access to UPC
 shared memory may be implemented locally via
 OS provided facilities (for example, mmap),
 or across nodes via a high speed network
 inter-connect (for example, Infiniband).

 GUPC provides a runtime (libgupc) that targets
 an SMP-based system and uses mmap() to implement
 global shared memory.

 Optionally, GUPC can use the more general and
 more capable Berkeley UPCR runtime:
   http://upc.lbl.gov/download/source.shtml#runtime
 The UPCR runtime supports a number of network
 topologies, and has been ported to most of the
 current High Performance Computing (HPC) systems.

 The following example illustrates
 the use of the UPC shared qualifier
 combined with a layout qualifier.

 #define BLKSIZE 5
 #define N_PER_THREAD (4 * BLKSIZE)
 shared [BLKSIZE] double A[N_PER_THREAD*THREADS];

 Above the [BLKSIZE] construct is the UPC
 layout factor; this specifies that the shared
 array, A, distributes its elements across
 each thread in blocks of 5 elements.  If the
 program is run with two threads, then A is
 distributed as shown below:

 Thread 0Thread 1
 -
 A[ 0.. 4]   A[ 5.. 9]
 A[10..14]   A[15..19]
 A[20..24]   A[25..29]
 A[30..34]   A[35..39]

 Above, the elements shown for thread 0
 are defined as having affinity to thread 0.
 Similarly, those elements shown for thread 1
 have affinity to thread 1.  In UPC, a pointer
 to a shared object can be cast to a thread
 local pointer (a C pointer), when the
 designated shared object has affinity
 to the referencing thread.

 A UPC pointer-to-shared (PTS) is a pointer
 that references a UPC shared object.
 A UPC pointer-to-shared is a fat pointer
 with the following logical fields:
(virt_addr, thread, offset)

 The virtual address (virt_addr) field is combined with
 the thread number (thread) and offset within the
 block (offset), to derive the location of the
 referenced object within the UPC shared address space.

 GUPC implements pointer-to-shared objects using
 either a packed representation or a struct
 representation.  The user can select the
 pointer-to-shared representation with a configure
 parameter.  The packed representation is the default.

 The packed pointer-to-shared representation
 limits the range of the various fields within
 the pointer-to-shared in order to gain efficiency.
 Packed pointer-to-shared values encode the three
 part shared address (described above) as a 64-bit
 value (on both 64-bit and 32-bit platforms).

 The struct representation provides a wider

Re: RFC: Merge the GUPC branch into the GCC 4.8 trunk (patch 0 of 16)

2012-10-15 Thread Gary Funck
On 10/15/12 17:51:14, Richard Guenther wrote:
 On Mon, Oct 15, 2012 at 5:47 PM, Gary Funck g...@intrepid.com wrote:
[...]
  UPC-Related Front-End Changes
  -
 
  GCC's internal tree representation is
  extended to record the UPC shared,
  strict, relaxed qualifiers,
  and the layout qualifier.
[...]
 
 What immediately comes to my mind is that apart from parsing
 the core machinery should be shareable with Cilk+, no?

I haven't looked at Cilk in detail, but my understanding is
that Cilk and UPC have different runtime models, and clearly
different language syntax and semantics.  Perhaps those
knowledgeable in the Cilk implementation can comment further.

- Gary


Re: RFC: Merge the GUPC branch into the GCC 4.8 trunk (patch 0 of 16)

2012-10-15 Thread Joseph S. Myers
On Mon, 15 Oct 2012, Gary Funck wrote:

 Various UPC language related checks and operations
 are called in the C front-end and middle-end.
 To insure that these operations are defined,
 when linked with the other language front-ends
 and compilers, these functions are stub-ed,
 in a fashion similar to Objective C:

Is there a reason you chose this approach rather than the -fcilkplus 
approach of enabling an extension in the C front end given a command-line 
option?  (If you don't want to support e.g. the ObjC / UPC combination, 
you can always give an error in such cases.)  In general I think such 
conditionals are preferable to linking in stub variants of functions - and 
I'm sure people doing all-languages LTO bootstraps will appreciate not 
having to do link-time optimization of the language-independent parts of 
the compiler yet more times because of yet another binary like cc1, 
cc1plus, ... that links in much the same code.  The functions you stub out 
would then all start with assertions that they are only ever called in UPC 
mode - or if they are meant to be called in C mode but do nothing in that 
case, with appropriate checks that return early for C (if needed).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: RFC: Merge the GUPC branch into the GCC 4.8 trunk (patch 0 of 16)

2012-10-15 Thread Gary Funck
On 10/15/12 17:06:28, Joseph S. Myers wrote:
 On Mon, 15 Oct 2012, Gary Funck wrote:
  Various UPC language related checks and operations
  are called in the C front-end and middle-end.
  To insure that these operations are defined,
  when linked with the other language front-ends
  and compilers, these functions are stub-ed,
  in a fashion similar to Objective C:
 
 Is there a reason you chose this approach rather than the -fcilkplus 
 approach of enabling an extension in the C front end given a command-line 
 option?  (If you don't want to support e.g. the ObjC / UPC combination, 
 you can always give an error in such cases.)

Back when we began to develop GUPC, it was recommended that we
introduce the UPC capability as a language dialect, similar to
Objective C.  That is the approach that we have taken.

 In general I think such conditionals are preferable to linking
 in stub variants of functions - and 
 I'm sure people doing all-languages LTO bootstraps will appreciate not 
 having to do link-time optimization of the language-independent parts of 
 the compiler yet more times because of yet another binary like cc1, 
 cc1plus, ... that links in much the same code.

I agree that there is no de facto reason that cc1upc is built
other than the fact we use a similar approach to Objective C.
However, I think that re-working this aspect of how GUPC is
implemented will require a fair amount of time/effort.  If we
can find a way to make that happen in the GCC 4.8 time frame,
or if other GCC contributors are willing to help on this,
then perhaps such a change is feasible.

- Gary


Re: RFC: Merge the GUPC branch into the GCC 4.8 trunk (patch 0 of 16)

2012-10-15 Thread Joseph S. Myers
On Mon, 15 Oct 2012, Gary Funck wrote:

 On 10/15/12 17:06:28, Joseph S. Myers wrote:
  On Mon, 15 Oct 2012, Gary Funck wrote:
   Various UPC language related checks and operations
   are called in the C front-end and middle-end.
   To insure that these operations are defined,
   when linked with the other language front-ends
   and compilers, these functions are stub-ed,
   in a fashion similar to Objective C:
  
  Is there a reason you chose this approach rather than the -fcilkplus 
  approach of enabling an extension in the C front end given a command-line 
  option?  (If you don't want to support e.g. the ObjC / UPC combination, 
  you can always give an error in such cases.)
 
 Back when we began to develop GUPC, it was recommended that we
 introduce the UPC capability as a language dialect, similar to
 Objective C.  That is the approach that we have taken.

Recommended where?  I think that approach has been a bad idea for a long 
time and the approach of building into cc1, as taken by the cilkplus 
patches, is better (and that really most objc/ code should be like 
c-family/, built once and linked into both cc1 and cc1plus, though in its 
present state that's much harder to achieve).

 I agree that there is no de facto reason that cc1upc is built
 other than the fact we use a similar approach to Objective C.
 However, I think that re-working this aspect of how GUPC is
 implemented will require a fair amount of time/effort.  If we

I'd expect it to be a fairly straightforward rework (as would making ObjC 
a mode of cc1, if ObjC++ didn't exist).

-- 
Joseph S. Myers
jos...@codesourcery.com