Re: [X10-users] X10 with CUDA and MPI

David E Hudak Mon, 13 Feb 2012 16:43:59 -0800

Here is the nvcc --help output you requested.  Let me know if I can supply 
anything else:


Options for steering GPU code generation
========================================

--gpu-architecture <gpu architecture name>  (-arch)
        Specify the name of the class of nVidia GPU architectures for which the
        cuda input files must be compiled.
        With the exception as described for the shorthand below, the
        architecture specified with this option must be a virtual architecture
        (such as compute_10), and it will be the assumed architecture during
        the cicc compilation stage.
        This option will cause no code to be generated (that is the role of
        nvcc option '--gpu-code', see below); rather, its purpose is to steer
        the cicc stage, influencing the architecture of the generated ptx
        intermediate.
        For convenience in case of simple nvcc compilations the following
        shorthand is supported: if no value for option '--gpu-code' is
        specified, then the value of this option defaults to the value of
        '--gpu-architecture'. In this situation, as only exception to the
        description above, the value specified for '--gpu-architecture' may be
        a 'real' architecture (such as a sm_13), in which case nvcc uses the
        closest virtual architecture as effective architecture value. For
        example, 'nvcc -arch=sm_13' is equivalent to 'nvcc -arch=compute_13
        -code=sm_13'.
        Allowed values for this option:  'compute_10','compute_11','compute_12',
        'compute_13','compute_20','sm_10','sm_11','sm_12','sm_13','sm_20',
        'sm_21'.

--gpu-code <gpu architecture name>,...      (-code)
        Specify the names of nVidia gpus to generate code for.
        Unless option -export-dir is specified (see below), nvcc will embed a
        compiled code image in the resulting executable for each specified
        'code' architecture. This code image will be a true binary load image
        for each 'real' architecture (such as a sm_13), and ptx intermediate
        code for each virtual architecture (such as compute_10). During
        runtime, in case no better binary load image is found, and provided
        that the ptx architecture is compatible with the 'current' GPU, such
        embedded ptx code will be dynamically translated for this current GPU
        by the cuda runtime system.
        Architectures specified for this option can be virtual as well as real,
        but each of these 'code' architectures must be compatible with the
        architecture specified with option '--gpu-architecture'.
        For instance, 'arch'=compute_13 is not compatible with 'code'=sm_10,
        because the generated ptx code will assume the availability of
        compute_13 features that are not present on sm_10.
        Allowed values for this option:  'compute_10','compute_11','compute_12',
        'compute_13','compute_20','sm_10','sm_11','sm_12','sm_13','sm_20',
        'sm_21'.

On Feb 13, 2012, at 6:23 PM, Dave Cunningham wrote:

I would guess you are right, they have probably broken backwards compatability 
in that version. I'll need to get the latest CUDA installed and try it.  In the 
mean time, can you paste the output of nvcc --help on your system, at this 
point:

--gpu-architecture <gpu architecture name>  (-arch)
[...]
        Allowed values for this option:  'compute_10','compute_11','compute_12',
        'compute_13','compute_20','compute_30','sm_10','sm_11','sm_12','sm_13',
        'sm_20','sm_21','sm_22','sm_23','sm_30'.

Probably I should be using --gpu-code instead

thanks for the report

On Mon, Feb 13, 2012 at 1:05 PM, David E Hudak 
<dhu...@osc.edu<mailto:dhu...@osc.edu>> wrote:
OK, I followed these instructions:
http://x10-lang.org/documentation/practical-x10-programming/x10-on-gpus

…and got CUDATopology to work:
1004  x10c++ -O -NO_CHECKS -x10rt mpi CUDATopology.x10 -o CUDATopology
 1005  X10RT_ACCELS=ALL mpiexec -pernode ./CUDATopology
dhudak@n0282 1012%> !1005
X10RT_ACCELS=ALL mpiexec -pernode ./CUDATopology
Dumping places at place: Place(0)
Place: Place(0)
 Parent: Place(0)
 NumChildren: 2
 Is a Host place
 Child 0: Place(2)
   Parent: Place(0)
   NumChildren: 0
   Is a CUDA place
 Child 1: Place(3)
   Parent: Place(0)
   NumChildren: 0
   Is a CUDA place
Place: Place(1)
 Parent: Place(1)
 NumChildren: 2
 Is a Host place
 Child 0: Place(4)
   Parent: Place(1)
   NumChildren: 0
   Is a CUDA place
 Child 1: Place(5)
   Parent: Place(1)
   NumChildren: 0
   Is a CUDA place

Dumping places at place: Place(1)
Place: Place(0)
 Parent: Place(0)
 NumChildren: 2
 Is a Host place
 Child 0: Place(2)
   Parent: Place(0)
   NumChildren: 0
   Is a CUDA place
 Child 1: Place(3)
   Parent: Place(0)
   NumChildren: 0
   Is a CUDA place
Place: Place(1)
 Parent: Place(1)
 NumChildren: 2
 Is a Host place
 Child 0: Place(4)
   Parent: Place(1)
   NumChildren: 0
   Is a CUDA place
 Child 1: Place(5)
   Parent: Place(1)
   NumChildren: 0
   Is a CUDA place

…but, other examples are not building.  I am assuming its the new version of 
X10 along with the new version of CUDA, but I figured I would pass it along to 
the mailing list.

dhudak@oak-rw 999%> module list
Currently Loaded Modules:
 1) torque/2.5.10  2) moab/6.1.4  3) modules/1.0  4) gnu/4.4.5  5) mvapich2/1.7 
 6) mkl/10.3.0  7) cuda/4.1.28  8) java/1.7.0_02  9) x10/2.2.2-cuda
dhudak@oak-rw 1000%> which nvcc
/usr/local/cuda/4.1.28/bin/nvcc
dhudak@oak-rw 1001%> x10c++ -O -NO_CHECKS -x10rt mpi CUDA3DFD.x10 -o CUDA3DFD

x10c++: nvcc fatal : Value 'sm_30' is not defined for option 'gpu-architecture'
x10c++: Non-zero return code: 255
x10c++: Found @CUDA annotation, but not compiling for GPU because nvcc could 
not be run (check your $PATH).
dhudak@oak-rw 1002%>
dhudak@oak-rw 1002%> x10c++ -O -NO_CHECKS -x10rt mpi CUDAKernelTest.x10 -o 
CUDAKernelTest
x10c++: ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points to, 
assuming global memory space
    ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points to, 
assuming global memory space
x10c++: ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points to, 
assuming global memory space
    ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points to, 
assuming global memory space
x10c++: ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points to, 
assuming global memory space
    ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points to, 
assuming global memory space
x10c++: ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points to, 
assuming global memory space
    ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points to, 
assuming global memory space
x10c++: nvcc fatal : Value 'sm_30' is not defined for option 'gpu-architecture'
x10c++: Non-zero return code: 255
x10c++: Found @CUDA annotation, but not compiling for GPU because nvcc could 
not be run (check your $PATH).
dhudak@oak-rw 1003%> which nvcc
/usr/local/cuda/4.1.28/bin/nvcc

Regards,
Dave

On Feb 11, 2012, at 5:09 PM, David E Hudak wrote:

> Hi All,
>
> I have a code sample that I want to try on our new cluster.  These are 
> dual-socket nodes with dual-M2070 cards connected by QDR IB.
>
> I configured my local environment and built the code as follows:
> svn co https://x10.svn.sourceforge.net/svnroot/x10/tags/SF_RELEASE_2_2_2 
> x10-2.2.2
> cd x10-2.2.2/x10.dist
> ant -DNO_CHECKS=true -Doptimize=true -DX10RT_MPI=true -DX10RT_CUDA=true diet
>
> Things build.
>
> And, then I get an interactive PBS job on 2 nodes.  I would like the launch 
> the program with 2 X10 places per node, with each X10 place having one child 
> place for a GPU.  Does anyone have the incantation that would launch this 
> configuration?
>
> By the way, is there a hostname function in X10 I can call to verify which 
> node I am running on?
>
> So, first I tried...
>
> dhudak@n0282 1021%> mpiexec -pernode ./CUDATopology
> Dumping places at place: Place(0)
> Place: Place(0)
>  Parent: Place(0)
>  NumChildren: 0
>  Is a Host place
>
> Dumping places at place: Place(0)
> Place: Place(0)
>  Parent: Place(0)
>  NumChildren: 0
>  Is a Host place
>
> …and it ran two copies of the program, each on the two nodes.  (I verified by 
> running top on the other node, and seeing a CUDATopology process running.)
>
> If I add the X10RT_ACCELS variable, each copy finds the two cards:
>
> dhudak@n0282 1012%> X10RT_ACCELS=ALL mpiexec -pernode ./CUDATopology
> Dumping places at place: Place(0)
> Place: Place(0)
>  Parent: Place(0)
>  NumChildren: 2
>  Is a Host place
>  Child 0: Place(1)
>    Parent: Place(0)
>    NumChildren: 0
>    Is a CUDA place
>  Child 1: Place(2)
>    Parent: Place(0)
>    NumChildren: 0
>    Is a CUDA place
>
> Dumping places at place: Place(0)
> Place: Place(0)
>  Parent: Place(0)
>  NumChildren: 2
>  Is a Host place
>  Child 0: Place(1)
>    Parent: Place(0)
>    NumChildren: 0
>    Is a CUDA place
>  Child 1: Place(2)
>    Parent: Place(0)
>    NumChildren: 0
>    Is a CUDA place
>
> OK, so I wanted place 1 on one node and place 2 on another node:
>
> dhudak@n0282 1029%> X10RT_ACCELS=ALL X10_NPLACES=2 mpiexec -pernode 
> ./CUDATopology
> Dumping places at place: Place(0)
> Place: Place(0)
>  Parent: Place(0)
>  NumChildren: 2
>  Is a Host place
>  Child 0: Place(2)
>    Parent: Place(0)
>    NumChildren: 0
>    Is a CUDA place
>  Child 1: Place(3)
>    Parent: Place(0)
>    NumChildren: 0
>    Is a CUDA place
> Place: Place(1)
>  Parent: Place(1)
>  NumChildren: 2
>  Is a Host place
>  Child 0: Place(4)
>    Parent: Place(1)
>    NumChildren: 0
>    Is a CUDA place
>  Child 1: Place(5)
>    Parent: Place(1)
>    NumChildren: 0
>    Is a CUDA place
>
> Dumping places at place: Place(1)
> Place: Place(0)
>  Parent: Place(0)
>  NumChildren: 2
>  Is a Host place
>  Child 0: Place(2)
>    Parent: Place(0)
>    NumChildren: 0
>    Is a CUDA place
>  Child 1: Place(3)
>    Parent: Place(0)
>    NumChildren: 0
>    Is a CUDA place
> Place: Place(1)
>  Parent: Place(1)
>  NumChildren: 2
>  Is a Host place
>  Child 0: Place(4)
>    Parent: Place(1)
>    NumChildren: 0
>    Is a CUDA place
>  Child 1: Place(5)
>    Parent: Place(1)
>    NumChildren: 0
>    Is a CUDA place
>
> Dumping places at place: Place(0)
> Place: Place(0)
>  Parent: Place(0)
>  NumChildren: 2
>  Is a Host place
>  Child 0: Place(2)
>    Parent: Place(0)
>    NumChildren: 0
>    Is a CUDA place
>  Child 1: Place(3)
>    Parent: Place(0)
>    NumChildren: 0
>    Is a CUDA place
> Place: Place(1)
>  Parent: Place(1)
>  NumChildren: 2
>  Is a Host place
>  Child 0: Place(4)
>    Parent: Place(1)
>    NumChildren: 0
>    Is a CUDA place
>  Child 1: Place(5)
>    Parent: Place(1)
>    NumChildren: 0
>    Is a CUDA place
>
> Dumping places at place: Place(1)
> Place: Place(0)
>  Parent: Place(0)
>  NumChildren: 2
>  Is a Host place
>  Child 0: Place(2)
>    Parent: Place(0)
>    NumChildren: 0
>    Is a CUDA place
>  Child 1: Place(3)
>    Parent: Place(0)
>    NumChildren: 0
>    Is a CUDA place
> Place: Place(1)
>  Parent: Place(1)
>  NumChildren: 2
>  Is a Host place
>  Child 0: Place(4)
>    Parent: Place(1)
>    NumChildren: 0
>    Is a CUDA place
>  Child 1: Place(5)
>    Parent: Place(1)
>    NumChildren: 0
>    Is a CUDA place
>
> Does anyone have any advice?
>
> Thanks,
> Dave
> ---
> David E. Hudak, Ph.D.          dhu...@osc.edu<mailto:dhu...@osc.edu>
> Program Director, HPC Engineering
> Ohio Supercomputer Center
> http://www.osc.edu<http://www.osc.edu/>
>
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Virtualization & Cloud Management Using Capacity Planning
> Cloud computing makes use of virtualization - but cloud computing
> also focuses on allowing computing to be delivered as a service.
> http://www.accelacomm.com/jaw/sfnl/114/51521223/
> _______________________________________________
> X10-users mailing list
> X10-users@lists.sourceforge.net<mailto:X10-users@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/x10-users

---
David E. Hudak, Ph.D.          dhu...@osc.edu<mailto:dhu...@osc.edu>
Program Director, HPC Engineering
Ohio Supercomputer Center
http://www.osc.edu<http://www.osc.edu/>










------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net<mailto:X10-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/x10-users

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users

---
David E. Hudak, Ph.D.          dhu...@osc.edu<mailto:dhu...@osc.edu>
Program Director, HPC Engineering
Ohio Supercomputer Center
http://www.osc.edu

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d

_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users

Re: [X10-users] X10 with CUDA and MPI

Reply via email to