Re: [X10-users] X10 with CUDA and MPI

Dave Cunningham Tue, 14 Feb 2012 12:46:25 -0800

Apparently the hardware for sm_30 doesn't exist yet, although this doesn't
explain why they added it as a compilation target and then removed it
again...


But at any rate, we had no business compiling for that profile so I'm going
to make the following change in X10 trunk:

x10.compiler/src/x10cpp/postcompiler/CXXCommandBuilder.java

    public List<String> getCUDAArchitectures() {
        ArrayList<String> ans = new ArrayList<String>();
    ans.add("sm_10");
    ans.add("sm_11");
    ans.add("sm_12");
    ans.add("sm_13");
    ans.add("sm_20");
    ans.add("sm_21");
    //ans.add("sm_30");
    return ans;
    }

If you're happy doing so, you can also make this change locally, but I
think you can also just ignore the error message you're getting.  You
should still get a binary out of the compiler and be able to run it.




On Tue, Feb 14, 2012 at 3:14 PM, Dave Cunningham <sparkpr...@gmail.com>wrote:

> Thanks
>
> Strange that Nvidia removed support for sm30...
>
> I'l have to find out why they did this in order to decide what to do about
> it
>
>
>
> On Mon, Feb 13, 2012 at 7:43 PM, David E Hudak <dhu...@osc.edu> wrote:
>
>>  Here is the nvcc --help output you requested.  Let me know if I can
>> supply anything else:
>>
>>  Options for steering GPU code generation
>> ========================================
>>
>>  --gpu-architecture <gpu architecture name>  (-arch)
>>
>>         Specify the name of the class of nVidia GPU architectures for
>> which the
>>         cuda input files must be compiled.
>>         With the exception as described for the shorthand below, the
>>         architecture specified with this option must be a virtual
>> architecture
>>         (such as compute_10), and it will be the assumed architecture
>> during
>>         the cicc compilation stage.
>>         This option will cause no code to be generated (that is the role
>> of
>>         nvcc option '--gpu-code', see below); rather, its purpose is to
>> steer
>>         the cicc stage, influencing the architecture of the generated ptx
>>         intermediate.
>>         For convenience in case of simple nvcc compilations the following
>>         shorthand is supported: if no value for option '--gpu-code' is
>>         specified, then the value of this option defaults to the value of
>>         '--gpu-architecture'. In this situation, as only exception to the
>>         description above, the value specified for '--gpu-architecture'
>> may be
>>         a 'real' architecture (such as a sm_13), in which case nvcc uses
>> the
>>         closest virtual architecture as effective architecture value. For
>>         example, 'nvcc -arch=sm_13' is equivalent to 'nvcc
>> -arch=compute_13
>>         -code=sm_13'.
>>         Allowed values for this option:
>>  'compute_10','compute_11','compute_12',
>>         'compute_13','compute_20','sm_10','sm_11','sm_12','sm_13','sm_20',
>>         'sm_21'.
>>
>>  --gpu-code <gpu architecture name>,...      (-code)
>>
>>         Specify the names of nVidia gpus to generate code for.
>>         Unless option -export-dir is specified (see below), nvcc will
>> embed a
>>         compiled code image in the resulting executable for each
>> specified
>>         'code' architecture. This code image will be a true binary load
>> image
>>         for each 'real' architecture (such as a sm_13), and ptx
>> intermediate
>>         code for each virtual architecture (such as compute_10). During
>>         runtime, in case no better binary load image is found, and
>> provided
>>         that the ptx architecture is compatible with the 'current' GPU,
>> such
>>         embedded ptx code will be dynamically translated for this current
>> GPU
>>         by the cuda runtime system.
>>         Architectures specified for this option can be virtual as well as
>> real,
>>         but each of these 'code' architectures must be compatible with
>> the
>>         architecture specified with option '--gpu-architecture'.
>>         For instance, 'arch'=compute_13 is not compatible with
>> 'code'=sm_10,
>>         because the generated ptx code will assume the availability of
>>         compute_13 features that are not present on sm_10.
>>         Allowed values for this option:
>>  'compute_10','compute_11','compute_12',
>>         'compute_13','compute_20','sm_10','sm_11','sm_12','sm_13','sm_20',
>>         'sm_21'.
>>
>>  On Feb 13, 2012, at 6:23 PM, Dave Cunningham wrote:
>>
>> I would guess you are right, they have probably broken backwards
>> compatability in that version. I'll need to get the latest CUDA installed
>> and try it.  In the mean time, can you paste the output of nvcc --help on
>> your system, at this point:
>>
>>  --gpu-architecture <gpu architecture name>  (-arch)
>>
>> [...]
>>         Allowed values for this option:
>>  'compute_10','compute_11','compute_12',
>>
>> 'compute_13','compute_20','compute_30','sm_10','sm_11','sm_12','sm_13',
>>         'sm_20','sm_21','sm_22','sm_23','sm_30'.
>>
>>  Probably I should be using --gpu-code instead
>>
>>  thanks for the report
>>
>> On Mon, Feb 13, 2012 at 1:05 PM, David E Hudak <dhu...@osc.edu> wrote:
>>
>>> OK, I followed these instructions:
>>> http://x10-lang.org/documentation/practical-x10-programming/x10-on-gpus
>>>
>>> …and got CUDATopology to work:
>>> 1004  x10c++ -O -NO_CHECKS -x10rt mpi CUDATopology.x10 -o CUDATopology
>>>  1005  X10RT_ACCELS=ALL mpiexec -pernode ./CUDATopology
>>> dhudak@n0282 1012%> !1005
>>> X10RT_ACCELS=ALL mpiexec -pernode ./CUDATopology
>>> Dumping places at place: Place(0)
>>> Place: Place(0)
>>>  Parent: Place(0)
>>>  NumChildren: 2
>>>  Is a Host place
>>>   Child 0: Place(2)
>>>    Parent: Place(0)
>>>    NumChildren: 0
>>>    Is a CUDA place
>>>  Child 1: Place(3)
>>>    Parent: Place(0)
>>>    NumChildren: 0
>>>    Is a CUDA place
>>> Place: Place(1)
>>>  Parent: Place(1)
>>>  NumChildren: 2
>>>  Is a Host place
>>>  Child 0: Place(4)
>>>    Parent: Place(1)
>>>    NumChildren: 0
>>>    Is a CUDA place
>>>  Child 1: Place(5)
>>>    Parent: Place(1)
>>>    NumChildren: 0
>>>    Is a CUDA place
>>>
>>> Dumping places at place: Place(1)
>>> Place: Place(0)
>>>  Parent: Place(0)
>>>  NumChildren: 2
>>>  Is a Host place
>>>  Child 0: Place(2)
>>>    Parent: Place(0)
>>>    NumChildren: 0
>>>    Is a CUDA place
>>>  Child 1: Place(3)
>>>    Parent: Place(0)
>>>    NumChildren: 0
>>>    Is a CUDA place
>>> Place: Place(1)
>>>  Parent: Place(1)
>>>  NumChildren: 2
>>>  Is a Host place
>>>  Child 0: Place(4)
>>>    Parent: Place(1)
>>>    NumChildren: 0
>>>    Is a CUDA place
>>>  Child 1: Place(5)
>>>    Parent: Place(1)
>>>    NumChildren: 0
>>>    Is a CUDA place
>>>
>>>  …but, other examples are not building.  I am assuming its the new
>>> version of X10 along with the new version of CUDA, but I figured I would
>>> pass it along to the mailing list.
>>>
>>> dhudak@oak-rw 999%> module list
>>> Currently Loaded Modules:
>>>  1) torque/2.5.10  2) moab/6.1.4  3) modules/1.0  4) gnu/4.4.5  5)
>>> mvapich2/1.7  6) mkl/10.3.0  7) cuda/4.1.28  8) java/1.7.0_02  9)
>>> x10/2.2.2-cuda
>>> dhudak@oak-rw 1000%> which nvcc
>>> /usr/local/cuda/4.1.28/bin/nvcc
>>> dhudak@oak-rw 1001%> x10c++ -O -NO_CHECKS -x10rt mpi CUDA3DFD.x10 -o
>>> CUDA3DFD
>>>
>>> x10c++: nvcc fatal : Value 'sm_30' is not defined for option
>>> 'gpu-architecture'
>>> x10c++: Non-zero return code: 255
>>> x10c++: Found @CUDA annotation, but not compiling for GPU because nvcc
>>> could not be run (check your $PATH).
>>> dhudak@oak-rw 1002%>
>>> dhudak@oak-rw 1002%> x10c++ -O -NO_CHECKS -x10rt mpi CUDAKernelTest.x10
>>> -o CUDAKernelTest
>>> x10c++: ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer
>>> points to, assuming global memory space
>>>     ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points
>>> to, assuming global memory space
>>> x10c++: ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer
>>> points to, assuming global memory space
>>>     ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points
>>> to, assuming global memory space
>>> x10c++: ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer
>>> points to, assuming global memory space
>>>     ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points
>>> to, assuming global memory space
>>> x10c++: ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer
>>> points to, assuming global memory space
>>>     ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points
>>> to, assuming global memory space
>>> x10c++: nvcc fatal : Value 'sm_30' is not defined for option
>>> 'gpu-architecture'
>>> x10c++: Non-zero return code: 255
>>> x10c++: Found @CUDA annotation, but not compiling for GPU because nvcc
>>> could not be run (check your $PATH).
>>> dhudak@oak-rw 1003%> which nvcc
>>> /usr/local/cuda/4.1.28/bin/nvcc
>>>
>>> Regards,
>>> Dave
>>>
>>> On Feb 11, 2012, at 5:09 PM, David E Hudak wrote:
>>>
>>> > Hi All,
>>> >
>>> > I have a code sample that I want to try on our new cluster.  These are
>>> dual-socket nodes with dual-M2070 cards connected by QDR IB.
>>> >
>>> > I configured my local environment and built the code as follows:
>>> > svn co
>>> https://x10.svn.sourceforge.net/svnroot/x10/tags/SF_RELEASE_2_2_2x10-2.2.2
>>> > cd x10-2.2.2/x10.dist
>>> > ant -DNO_CHECKS=true -Doptimize=true -DX10RT_MPI=true
>>> -DX10RT_CUDA=true diet
>>> >
>>> > Things build.
>>> >
>>> > And, then I get an interactive PBS job on 2 nodes.  I would like the
>>> launch the program with 2 X10 places per node, with each X10 place having
>>> one child place for a GPU.  Does anyone have the incantation that would
>>> launch this configuration?
>>> >
>>> > By the way, is there a hostname function in X10 I can call to verify
>>> which node I am running on?
>>> >
>>> > So, first I tried...
>>> >
>>> > dhudak@n0282 1021%> mpiexec -pernode ./CUDATopology
>>> > Dumping places at place: Place(0)
>>> > Place: Place(0)
>>> >  Parent: Place(0)
>>> >  NumChildren: 0
>>> >  Is a Host place
>>> >
>>> > Dumping places at place: Place(0)
>>> > Place: Place(0)
>>> >  Parent: Place(0)
>>> >  NumChildren: 0
>>> >  Is a Host place
>>> >
>>> > …and it ran two copies of the program, each on the two nodes.  (I
>>> verified by running top on the other node, and seeing a CUDATopology
>>> process running.)
>>> >
>>> > If I add the X10RT_ACCELS variable, each copy finds the two cards:
>>> >
>>> > dhudak@n0282 1012%> X10RT_ACCELS=ALL mpiexec -pernode ./CUDATopology
>>> > Dumping places at place: Place(0)
>>> > Place: Place(0)
>>> >  Parent: Place(0)
>>> >  NumChildren: 2
>>> >  Is a Host place
>>> >  Child 0: Place(1)
>>> >    Parent: Place(0)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> >  Child 1: Place(2)
>>> >    Parent: Place(0)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> >
>>> > Dumping places at place: Place(0)
>>> > Place: Place(0)
>>> >  Parent: Place(0)
>>> >  NumChildren: 2
>>> >  Is a Host place
>>> >  Child 0: Place(1)
>>> >    Parent: Place(0)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> >  Child 1: Place(2)
>>> >    Parent: Place(0)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> >
>>> > OK, so I wanted place 1 on one node and place 2 on another node:
>>> >
>>> > dhudak@n0282 1029%> X10RT_ACCELS=ALL X10_NPLACES=2 mpiexec -pernode
>>> ./CUDATopology
>>> > Dumping places at place: Place(0)
>>> > Place: Place(0)
>>> >  Parent: Place(0)
>>> >  NumChildren: 2
>>> >  Is a Host place
>>> >  Child 0: Place(2)
>>> >    Parent: Place(0)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> >  Child 1: Place(3)
>>> >    Parent: Place(0)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> > Place: Place(1)
>>> >  Parent: Place(1)
>>> >  NumChildren: 2
>>> >  Is a Host place
>>> >  Child 0: Place(4)
>>> >    Parent: Place(1)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> >  Child 1: Place(5)
>>> >    Parent: Place(1)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> >
>>> > Dumping places at place: Place(1)
>>> > Place: Place(0)
>>> >  Parent: Place(0)
>>> >  NumChildren: 2
>>> >  Is a Host place
>>> >  Child 0: Place(2)
>>> >    Parent: Place(0)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> >  Child 1: Place(3)
>>> >    Parent: Place(0)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> > Place: Place(1)
>>> >  Parent: Place(1)
>>> >  NumChildren: 2
>>> >  Is a Host place
>>> >  Child 0: Place(4)
>>> >    Parent: Place(1)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> >  Child 1: Place(5)
>>> >    Parent: Place(1)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> >
>>> > Dumping places at place: Place(0)
>>> > Place: Place(0)
>>> >  Parent: Place(0)
>>> >  NumChildren: 2
>>> >  Is a Host place
>>> >  Child 0: Place(2)
>>> >    Parent: Place(0)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> >  Child 1: Place(3)
>>> >    Parent: Place(0)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> > Place: Place(1)
>>> >  Parent: Place(1)
>>> >  NumChildren: 2
>>> >  Is a Host place
>>> >  Child 0: Place(4)
>>> >    Parent: Place(1)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> >  Child 1: Place(5)
>>> >    Parent: Place(1)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> >
>>> > Dumping places at place: Place(1)
>>> > Place: Place(0)
>>> >  Parent: Place(0)
>>> >  NumChildren: 2
>>> >  Is a Host place
>>> >  Child 0: Place(2)
>>> >    Parent: Place(0)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> >  Child 1: Place(3)
>>> >    Parent: Place(0)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> > Place: Place(1)
>>> >  Parent: Place(1)
>>> >  NumChildren: 2
>>> >  Is a Host place
>>> >  Child 0: Place(4)
>>> >    Parent: Place(1)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> >  Child 1: Place(5)
>>> >    Parent: Place(1)
>>> >    NumChildren: 0
>>> >    Is a CUDA place
>>> >
>>> > Does anyone have any advice?
>>> >
>>> > Thanks,
>>> > Dave
>>> > ---
>>> > David E. Hudak, Ph.D.          dhu...@osc.edu
>>> > Program Director, HPC Engineering
>>> > Ohio Supercomputer Center
>>> > http://www.osc.edu
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > Virtualization & Cloud Management Using Capacity Planning
>>> > Cloud computing makes use of virtualization - but cloud computing
>>> > also focuses on allowing computing to be delivered as a service.
>>> > http://www.accelacomm.com/jaw/sfnl/114/51521223/
>>> > _______________________________________________
>>> > X10-users mailing list
>>> > X10-users@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/x10-users
>>>
>>> ---
>>> David E. Hudak, Ph.D.          dhu...@osc.edu
>>> Program Director, HPC Engineering
>>> Ohio Supercomputer Center
>>> http://www.osc.edu
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>  Try before you buy = See our experts in action!
>>> The most comprehensive online learning library for Microsoft developers
>>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>>> Metro Style Apps, more. Free future releases when you subscribe now!
>>> http://p.sf.net/sfu/learndevnow-dev2
>>>  _______________________________________________
>>> X10-users mailing list
>>> X10-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/x10-users
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Try before you buy = See our experts in action!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>>
>> http://p.sf.net/sfu/learndevnow-dev2_______________________________________________
>> X10-users mailing list
>> X10-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/x10-users
>>
>>
>>         ---
>>  David E. Hudak, Ph.D.          dhu...@osc.edu
>> Program Director, HPC Engineering
>> Ohio Supercomputer Center
>> http://www.osc.edu
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Keep Your Developer Skills Current with LearnDevNow!
>>
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/learndevnow-d2d
>>
>> _______________________________________________
>> X10-users mailing list
>> X10-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/x10-users
>>
>>
>

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d

_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users

Re: [X10-users] X10 with CUDA and MPI

Reply via email to