RFC: Merge the GUPC branch into the GCC 4.8 trunk (patch 0 of 16)
We have maintained the gupc (GNU Unified Parallel C) branch for a couple of years now, and would like to merge these changes into the GCC trunk. It is our goal to integrate the GUPC changes into the GCC 4.8 trunk, in order to provide a UPC (Unified Parallel C) capability in the subsequent GCC 4.8 release. The purpose of this note is to introduce the GUPC project, provide an overview of the UPC-related changes and to introduce the subsequent sets of patches which merge the GUPC branch into GCC 4.8. For reference, The GUPC project page is here: http://gcc.gnu.org/projects/gupc.html The current GUPC release is distributed here: http://gccupc.org Roughly a year ago, we described the front-end related changes at the time: http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00081.html We merge the GCC trunk into the gupc branch on approximately a weekly basis. The current GUPC branch is based upon a recent version of the GCC trunk (192449 dated 2012-10-15), and has been bootstrapped on x86_64/i686 Linux, PPC/POWER7/Linux and IA64/Altix Linux. In earlier versions, GUPC was successfully ported to SGI/MIPS (big endian) and SciCortex/MIPS (little endian). The UPC-related source code differences can be viewed here in various formats: http://gccupc.org/gupc-changes In the discussion below, the changes are excerpted in order to highlight important aspects of the UPC-related changes. The version used in this presentation is 190707. UPC's Shared Qualifier and Layout Qualifier --- The UPC language specification describes the language syntax and semantics: http://upc.gwu.edu/docs/upc_specs_1.2.pdf UPC introduces a new qualifier, shared that indicates that the qualified object is located in a global shared address space that is accessible by all UPC threads. Additional qualifiers (strict and relaxed) further specify the semantics of accesses to UPC shared objects. In UPC, a shared qualified array can further specify a layout qualifier that indicates how the shared data is blocked and distributed. There are two language pre-defined identifiers that indicate the number of threads that will be created when the program starts (THREADS) and the current (zero-based) thread number (MYTHREAD). Typically, a UPC thread is implemented as an operating system process. Access to UPC shared memory may be implemented locally via OS provided facilities (for example, mmap), or across nodes via a high speed network inter-connect (for example, Infiniband). GUPC provides a runtime (libgupc) that targets an SMP-based system and uses mmap() to implement global shared memory. Optionally, GUPC can use the more general and more capable Berkeley UPCR runtime: http://upc.lbl.gov/download/source.shtml#runtime The UPCR runtime supports a number of network topologies, and has been ported to most of the current High Performance Computing (HPC) systems. The following example illustrates the use of the UPC shared qualifier combined with a layout qualifier. #define BLKSIZE 5 #define N_PER_THREAD (4 * BLKSIZE) shared [BLKSIZE] double A[N_PER_THREAD*THREADS]; Above the [BLKSIZE] construct is the UPC layout factor; this specifies that the shared array, A, distributes its elements across each thread in blocks of 5 elements. If the program is run with two threads, then A is distributed as shown below: Thread 0Thread 1 - A[ 0.. 4] A[ 5.. 9] A[10..14] A[15..19] A[20..24] A[25..29] A[30..34] A[35..39] Above, the elements shown for thread 0 are defined as having affinity to thread 0. Similarly, those elements shown for thread 1 have affinity to thread 1. In UPC, a pointer to a shared object can be cast to a thread local pointer (a C pointer), when the designated shared object has affinity to the referencing thread. A UPC pointer-to-shared (PTS) is a pointer that references a UPC shared object. A UPC pointer-to-shared is a fat pointer with the following logical fields: (virt_addr, thread, offset) The virtual address (virt_addr) field is combined with the thread number (thread) and offset within the block (offset), to derive the location of the referenced object within the UPC shared address space. GUPC implements pointer-to-shared objects using either a packed representation or a struct representation. The user can select the pointer-to-shared representation with a configure parameter. The packed representation is the default. The packed pointer-to-shared representation limits the range of the various fields within the pointer-to-shared in order to gain efficiency. Packed pointer-to-shared values encode the three part shared address (described above) as a 64-bit value (on both 64-bit and 32-bit platforms). The struct representation provides a wider addressing range at the expense of requiring twice the number of bits (128) needed to encode the pointer-to-shared value. UPC-Related Front-End Changes
Re: RFC: Merge the GUPC branch into the GCC 4.8 trunk (patch 0 of 16)
On Mon, Oct 15, 2012 at 5:47 PM, Gary Funck g...@intrepid.com wrote: We have maintained the gupc (GNU Unified Parallel C) branch for a couple of years now, and would like to merge these changes into the GCC trunk. It is our goal to integrate the GUPC changes into the GCC 4.8 trunk, in order to provide a UPC (Unified Parallel C) capability in the subsequent GCC 4.8 release. The purpose of this note is to introduce the GUPC project, provide an overview of the UPC-related changes and to introduce the subsequent sets of patches which merge the GUPC branch into GCC 4.8. For reference, The GUPC project page is here: http://gcc.gnu.org/projects/gupc.html The current GUPC release is distributed here: http://gccupc.org Roughly a year ago, we described the front-end related changes at the time: http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00081.html We merge the GCC trunk into the gupc branch on approximately a weekly basis. The current GUPC branch is based upon a recent version of the GCC trunk (192449 dated 2012-10-15), and has been bootstrapped on x86_64/i686 Linux, PPC/POWER7/Linux and IA64/Altix Linux. In earlier versions, GUPC was successfully ported to SGI/MIPS (big endian) and SciCortex/MIPS (little endian). The UPC-related source code differences can be viewed here in various formats: http://gccupc.org/gupc-changes In the discussion below, the changes are excerpted in order to highlight important aspects of the UPC-related changes. The version used in this presentation is 190707. UPC's Shared Qualifier and Layout Qualifier --- The UPC language specification describes the language syntax and semantics: http://upc.gwu.edu/docs/upc_specs_1.2.pdf UPC introduces a new qualifier, shared that indicates that the qualified object is located in a global shared address space that is accessible by all UPC threads. Additional qualifiers (strict and relaxed) further specify the semantics of accesses to UPC shared objects. In UPC, a shared qualified array can further specify a layout qualifier that indicates how the shared data is blocked and distributed. There are two language pre-defined identifiers that indicate the number of threads that will be created when the program starts (THREADS) and the current (zero-based) thread number (MYTHREAD). Typically, a UPC thread is implemented as an operating system process. Access to UPC shared memory may be implemented locally via OS provided facilities (for example, mmap), or across nodes via a high speed network inter-connect (for example, Infiniband). GUPC provides a runtime (libgupc) that targets an SMP-based system and uses mmap() to implement global shared memory. Optionally, GUPC can use the more general and more capable Berkeley UPCR runtime: http://upc.lbl.gov/download/source.shtml#runtime The UPCR runtime supports a number of network topologies, and has been ported to most of the current High Performance Computing (HPC) systems. The following example illustrates the use of the UPC shared qualifier combined with a layout qualifier. #define BLKSIZE 5 #define N_PER_THREAD (4 * BLKSIZE) shared [BLKSIZE] double A[N_PER_THREAD*THREADS]; Above the [BLKSIZE] construct is the UPC layout factor; this specifies that the shared array, A, distributes its elements across each thread in blocks of 5 elements. If the program is run with two threads, then A is distributed as shown below: Thread 0Thread 1 - A[ 0.. 4] A[ 5.. 9] A[10..14] A[15..19] A[20..24] A[25..29] A[30..34] A[35..39] Above, the elements shown for thread 0 are defined as having affinity to thread 0. Similarly, those elements shown for thread 1 have affinity to thread 1. In UPC, a pointer to a shared object can be cast to a thread local pointer (a C pointer), when the designated shared object has affinity to the referencing thread. A UPC pointer-to-shared (PTS) is a pointer that references a UPC shared object. A UPC pointer-to-shared is a fat pointer with the following logical fields: (virt_addr, thread, offset) The virtual address (virt_addr) field is combined with the thread number (thread) and offset within the block (offset), to derive the location of the referenced object within the UPC shared address space. GUPC implements pointer-to-shared objects using either a packed representation or a struct representation. The user can select the pointer-to-shared representation with a configure parameter. The packed representation is the default. The packed pointer-to-shared representation limits the range of the various fields within the pointer-to-shared in order to gain efficiency. Packed pointer-to-shared values encode the three part shared address (described above) as a 64-bit value (on both 64-bit and 32-bit platforms). The struct representation provides a wider
Re: RFC: Merge the GUPC branch into the GCC 4.8 trunk (patch 0 of 16)
On 10/15/12 17:51:14, Richard Guenther wrote: On Mon, Oct 15, 2012 at 5:47 PM, Gary Funck g...@intrepid.com wrote: [...] UPC-Related Front-End Changes - GCC's internal tree representation is extended to record the UPC shared, strict, relaxed qualifiers, and the layout qualifier. [...] What immediately comes to my mind is that apart from parsing the core machinery should be shareable with Cilk+, no? I haven't looked at Cilk in detail, but my understanding is that Cilk and UPC have different runtime models, and clearly different language syntax and semantics. Perhaps those knowledgeable in the Cilk implementation can comment further. - Gary
Re: RFC: Merge the GUPC branch into the GCC 4.8 trunk (patch 0 of 16)
On Mon, 15 Oct 2012, Gary Funck wrote: Various UPC language related checks and operations are called in the C front-end and middle-end. To insure that these operations are defined, when linked with the other language front-ends and compilers, these functions are stub-ed, in a fashion similar to Objective C: Is there a reason you chose this approach rather than the -fcilkplus approach of enabling an extension in the C front end given a command-line option? (If you don't want to support e.g. the ObjC / UPC combination, you can always give an error in such cases.) In general I think such conditionals are preferable to linking in stub variants of functions - and I'm sure people doing all-languages LTO bootstraps will appreciate not having to do link-time optimization of the language-independent parts of the compiler yet more times because of yet another binary like cc1, cc1plus, ... that links in much the same code. The functions you stub out would then all start with assertions that they are only ever called in UPC mode - or if they are meant to be called in C mode but do nothing in that case, with appropriate checks that return early for C (if needed). -- Joseph S. Myers jos...@codesourcery.com
Re: RFC: Merge the GUPC branch into the GCC 4.8 trunk (patch 0 of 16)
On 10/15/12 17:06:28, Joseph S. Myers wrote: On Mon, 15 Oct 2012, Gary Funck wrote: Various UPC language related checks and operations are called in the C front-end and middle-end. To insure that these operations are defined, when linked with the other language front-ends and compilers, these functions are stub-ed, in a fashion similar to Objective C: Is there a reason you chose this approach rather than the -fcilkplus approach of enabling an extension in the C front end given a command-line option? (If you don't want to support e.g. the ObjC / UPC combination, you can always give an error in such cases.) Back when we began to develop GUPC, it was recommended that we introduce the UPC capability as a language dialect, similar to Objective C. That is the approach that we have taken. In general I think such conditionals are preferable to linking in stub variants of functions - and I'm sure people doing all-languages LTO bootstraps will appreciate not having to do link-time optimization of the language-independent parts of the compiler yet more times because of yet another binary like cc1, cc1plus, ... that links in much the same code. I agree that there is no de facto reason that cc1upc is built other than the fact we use a similar approach to Objective C. However, I think that re-working this aspect of how GUPC is implemented will require a fair amount of time/effort. If we can find a way to make that happen in the GCC 4.8 time frame, or if other GCC contributors are willing to help on this, then perhaps such a change is feasible. - Gary
Re: RFC: Merge the GUPC branch into the GCC 4.8 trunk (patch 0 of 16)
On Mon, 15 Oct 2012, Gary Funck wrote: On 10/15/12 17:06:28, Joseph S. Myers wrote: On Mon, 15 Oct 2012, Gary Funck wrote: Various UPC language related checks and operations are called in the C front-end and middle-end. To insure that these operations are defined, when linked with the other language front-ends and compilers, these functions are stub-ed, in a fashion similar to Objective C: Is there a reason you chose this approach rather than the -fcilkplus approach of enabling an extension in the C front end given a command-line option? (If you don't want to support e.g. the ObjC / UPC combination, you can always give an error in such cases.) Back when we began to develop GUPC, it was recommended that we introduce the UPC capability as a language dialect, similar to Objective C. That is the approach that we have taken. Recommended where? I think that approach has been a bad idea for a long time and the approach of building into cc1, as taken by the cilkplus patches, is better (and that really most objc/ code should be like c-family/, built once and linked into both cc1 and cc1plus, though in its present state that's much harder to achieve). I agree that there is no de facto reason that cc1upc is built other than the fact we use a similar approach to Objective C. However, I think that re-working this aspect of how GUPC is implemented will require a fair amount of time/effort. If we I'd expect it to be a fairly straightforward rework (as would making ObjC a mode of cc1, if ObjC++ didn't exist). -- Joseph S. Myers jos...@codesourcery.com