Re: [X10-users] X10 with CUDA and MPI

Dave Cunningham Tue, 14 Feb 2012 12:15:17 -0800

Thanks

Strange that Nvidia removed support for sm30...


I'l have to find out why they did this in order to decide what to do about
it



On Mon, Feb 13, 2012 at 7:43 PM, David E Hudak <dhu...@osc.edu> wrote:

>  Here is the nvcc --help output you requested.  Let me know if I can
> supply anything else:
>
>  Options for steering GPU code generation
> ========================================
>
>  --gpu-architecture <gpu architecture name>  (-arch)
>
>         Specify the name of the class of nVidia GPU architectures for
> which the
>         cuda input files must be compiled.
>         With the exception as described for the shorthand below, the
>         architecture specified with this option must be a virtual
> architecture
>         (such as compute_10), and it will be the assumed architecture
> during
>         the cicc compilation stage.
>         This option will cause no code to be generated (that is the role
> of
>         nvcc option '--gpu-code', see below); rather, its purpose is to
> steer
>         the cicc stage, influencing the architecture of the generated ptx
>         intermediate.
>         For convenience in case of simple nvcc compilations the following
>         shorthand is supported: if no value for option '--gpu-code' is
>         specified, then the value of this option defaults to the value of
>         '--gpu-architecture'. In this situation, as only exception to the
>         description above, the value specified for '--gpu-architecture'
> may be
>         a 'real' architecture (such as a sm_13), in which case nvcc uses
> the
>         closest virtual architecture as effective architecture value. For
>         example, 'nvcc -arch=sm_13' is equivalent to 'nvcc
> -arch=compute_13
>         -code=sm_13'.
>         Allowed values for this option:
>  'compute_10','compute_11','compute_12',
>         'compute_13','compute_20','sm_10','sm_11','sm_12','sm_13','sm_20',
>         'sm_21'.
>
>  --gpu-code <gpu architecture name>,...      (-code)
>
>         Specify the names of nVidia gpus to generate code for.
>         Unless option -export-dir is specified (see below), nvcc will
> embed a
>         compiled code image in the resulting executable for each specified
>         'code' architecture. This code image will be a true binary load
> image
>         for each 'real' architecture (such as a sm_13), and ptx
> intermediate
>         code for each virtual architecture (such as compute_10). During
>         runtime, in case no better binary load image is found, and
> provided
>         that the ptx architecture is compatible with the 'current' GPU,
> such
>         embedded ptx code will be dynamically translated for this current
> GPU
>         by the cuda runtime system.
>         Architectures specified for this option can be virtual as well as
> real,
>         but each of these 'code' architectures must be compatible with the
>         architecture specified with option '--gpu-architecture'.
>         For instance, 'arch'=compute_13 is not compatible with
> 'code'=sm_10,
>         because the generated ptx code will assume the availability of
>         compute_13 features that are not present on sm_10.
>         Allowed values for this option:
>  'compute_10','compute_11','compute_12',
>         'compute_13','compute_20','sm_10','sm_11','sm_12','sm_13','sm_20',
>         'sm_21'.
>
>  On Feb 13, 2012, at 6:23 PM, Dave Cunningham wrote:
>
> I would guess you are right, they have probably broken backwards
> compatability in that version. I'll need to get the latest CUDA installed
> and try it.  In the mean time, can you paste the output of nvcc --help on
> your system, at this point:
>
>  --gpu-architecture <gpu architecture name>  (-arch)
>
> [...]
>         Allowed values for this option:
>  'compute_10','compute_11','compute_12',
>
> 'compute_13','compute_20','compute_30','sm_10','sm_11','sm_12','sm_13',
>         'sm_20','sm_21','sm_22','sm_23','sm_30'.
>
>  Probably I should be using --gpu-code instead
>
>  thanks for the report
>
> On Mon, Feb 13, 2012 at 1:05 PM, David E Hudak <dhu...@osc.edu> wrote:
>
>> OK, I followed these instructions:
>> http://x10-lang.org/documentation/practical-x10-programming/x10-on-gpus
>>
>> …and got CUDATopology to work:
>> 1004  x10c++ -O -NO_CHECKS -x10rt mpi CUDATopology.x10 -o CUDATopology
>>  1005  X10RT_ACCELS=ALL mpiexec -pernode ./CUDATopology
>> dhudak@n0282 1012%> !1005
>> X10RT_ACCELS=ALL mpiexec -pernode ./CUDATopology
>> Dumping places at place: Place(0)
>> Place: Place(0)
>>  Parent: Place(0)
>>  NumChildren: 2
>>  Is a Host place
>>   Child 0: Place(2)
>>    Parent: Place(0)
>>    NumChildren: 0
>>    Is a CUDA place
>>  Child 1: Place(3)
>>    Parent: Place(0)
>>    NumChildren: 0
>>    Is a CUDA place
>> Place: Place(1)
>>  Parent: Place(1)
>>  NumChildren: 2
>>  Is a Host place
>>  Child 0: Place(4)
>>    Parent: Place(1)
>>    NumChildren: 0
>>    Is a CUDA place
>>  Child 1: Place(5)
>>    Parent: Place(1)
>>    NumChildren: 0
>>    Is a CUDA place
>>
>> Dumping places at place: Place(1)
>> Place: Place(0)
>>  Parent: Place(0)
>>  NumChildren: 2
>>  Is a Host place
>>  Child 0: Place(2)
>>    Parent: Place(0)
>>    NumChildren: 0
>>    Is a CUDA place
>>  Child 1: Place(3)
>>    Parent: Place(0)
>>    NumChildren: 0
>>    Is a CUDA place
>> Place: Place(1)
>>  Parent: Place(1)
>>  NumChildren: 2
>>  Is a Host place
>>  Child 0: Place(4)
>>    Parent: Place(1)
>>    NumChildren: 0
>>    Is a CUDA place
>>  Child 1: Place(5)
>>    Parent: Place(1)
>>    NumChildren: 0
>>    Is a CUDA place
>>
>>  …but, other examples are not building.  I am assuming its the new
>> version of X10 along with the new version of CUDA, but I figured I would
>> pass it along to the mailing list.
>>
>> dhudak@oak-rw 999%> module list
>> Currently Loaded Modules:
>>  1) torque/2.5.10  2) moab/6.1.4  3) modules/1.0  4) gnu/4.4.5  5)
>> mvapich2/1.7  6) mkl/10.3.0  7) cuda/4.1.28  8) java/1.7.0_02  9)
>> x10/2.2.2-cuda
>> dhudak@oak-rw 1000%> which nvcc
>> /usr/local/cuda/4.1.28/bin/nvcc
>> dhudak@oak-rw 1001%> x10c++ -O -NO_CHECKS -x10rt mpi CUDA3DFD.x10 -o
>> CUDA3DFD
>>
>> x10c++: nvcc fatal : Value 'sm_30' is not defined for option
>> 'gpu-architecture'
>> x10c++: Non-zero return code: 255
>> x10c++: Found @CUDA annotation, but not compiling for GPU because nvcc
>> could not be run (check your $PATH).
>> dhudak@oak-rw 1002%>
>> dhudak@oak-rw 1002%> x10c++ -O -NO_CHECKS -x10rt mpi CUDAKernelTest.x10
>> -o CUDAKernelTest
>> x10c++: ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points
>> to, assuming global memory space
>>     ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points to,
>> assuming global memory space
>> x10c++: ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points
>> to, assuming global memory space
>>     ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points to,
>> assuming global memory space
>> x10c++: ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points
>> to, assuming global memory space
>>     ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points to,
>> assuming global memory space
>> x10c++: ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points
>> to, assuming global memory space
>>     ./CUDAKernelTest.cu(56): Warning: Cannot tell what pointer points to,
>> assuming global memory space
>> x10c++: nvcc fatal : Value 'sm_30' is not defined for option
>> 'gpu-architecture'
>> x10c++: Non-zero return code: 255
>> x10c++: Found @CUDA annotation, but not compiling for GPU because nvcc
>> could not be run (check your $PATH).
>> dhudak@oak-rw 1003%> which nvcc
>> /usr/local/cuda/4.1.28/bin/nvcc
>>
>> Regards,
>> Dave
>>
>> On Feb 11, 2012, at 5:09 PM, David E Hudak wrote:
>>
>> > Hi All,
>> >
>> > I have a code sample that I want to try on our new cluster.  These are
>> dual-socket nodes with dual-M2070 cards connected by QDR IB.
>> >
>> > I configured my local environment and built the code as follows:
>> > svn co
>> https://x10.svn.sourceforge.net/svnroot/x10/tags/SF_RELEASE_2_2_2x10-2.2.2
>> > cd x10-2.2.2/x10.dist
>> > ant -DNO_CHECKS=true -Doptimize=true -DX10RT_MPI=true -DX10RT_CUDA=true
>> diet
>> >
>> > Things build.
>> >
>> > And, then I get an interactive PBS job on 2 nodes.  I would like the
>> launch the program with 2 X10 places per node, with each X10 place having
>> one child place for a GPU.  Does anyone have the incantation that would
>> launch this configuration?
>> >
>> > By the way, is there a hostname function in X10 I can call to verify
>> which node I am running on?
>> >
>> > So, first I tried...
>> >
>> > dhudak@n0282 1021%> mpiexec -pernode ./CUDATopology
>> > Dumping places at place: Place(0)
>> > Place: Place(0)
>> >  Parent: Place(0)
>> >  NumChildren: 0
>> >  Is a Host place
>> >
>> > Dumping places at place: Place(0)
>> > Place: Place(0)
>> >  Parent: Place(0)
>> >  NumChildren: 0
>> >  Is a Host place
>> >
>> > …and it ran two copies of the program, each on the two nodes.  (I
>> verified by running top on the other node, and seeing a CUDATopology
>> process running.)
>> >
>> > If I add the X10RT_ACCELS variable, each copy finds the two cards:
>> >
>> > dhudak@n0282 1012%> X10RT_ACCELS=ALL mpiexec -pernode ./CUDATopology
>> > Dumping places at place: Place(0)
>> > Place: Place(0)
>> >  Parent: Place(0)
>> >  NumChildren: 2
>> >  Is a Host place
>> >  Child 0: Place(1)
>> >    Parent: Place(0)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> >  Child 1: Place(2)
>> >    Parent: Place(0)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> >
>> > Dumping places at place: Place(0)
>> > Place: Place(0)
>> >  Parent: Place(0)
>> >  NumChildren: 2
>> >  Is a Host place
>> >  Child 0: Place(1)
>> >    Parent: Place(0)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> >  Child 1: Place(2)
>> >    Parent: Place(0)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> >
>> > OK, so I wanted place 1 on one node and place 2 on another node:
>> >
>> > dhudak@n0282 1029%> X10RT_ACCELS=ALL X10_NPLACES=2 mpiexec -pernode
>> ./CUDATopology
>> > Dumping places at place: Place(0)
>> > Place: Place(0)
>> >  Parent: Place(0)
>> >  NumChildren: 2
>> >  Is a Host place
>> >  Child 0: Place(2)
>> >    Parent: Place(0)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> >  Child 1: Place(3)
>> >    Parent: Place(0)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> > Place: Place(1)
>> >  Parent: Place(1)
>> >  NumChildren: 2
>> >  Is a Host place
>> >  Child 0: Place(4)
>> >    Parent: Place(1)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> >  Child 1: Place(5)
>> >    Parent: Place(1)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> >
>> > Dumping places at place: Place(1)
>> > Place: Place(0)
>> >  Parent: Place(0)
>> >  NumChildren: 2
>> >  Is a Host place
>> >  Child 0: Place(2)
>> >    Parent: Place(0)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> >  Child 1: Place(3)
>> >    Parent: Place(0)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> > Place: Place(1)
>> >  Parent: Place(1)
>> >  NumChildren: 2
>> >  Is a Host place
>> >  Child 0: Place(4)
>> >    Parent: Place(1)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> >  Child 1: Place(5)
>> >    Parent: Place(1)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> >
>> > Dumping places at place: Place(0)
>> > Place: Place(0)
>> >  Parent: Place(0)
>> >  NumChildren: 2
>> >  Is a Host place
>> >  Child 0: Place(2)
>> >    Parent: Place(0)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> >  Child 1: Place(3)
>> >    Parent: Place(0)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> > Place: Place(1)
>> >  Parent: Place(1)
>> >  NumChildren: 2
>> >  Is a Host place
>> >  Child 0: Place(4)
>> >    Parent: Place(1)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> >  Child 1: Place(5)
>> >    Parent: Place(1)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> >
>> > Dumping places at place: Place(1)
>> > Place: Place(0)
>> >  Parent: Place(0)
>> >  NumChildren: 2
>> >  Is a Host place
>> >  Child 0: Place(2)
>> >    Parent: Place(0)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> >  Child 1: Place(3)
>> >    Parent: Place(0)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> > Place: Place(1)
>> >  Parent: Place(1)
>> >  NumChildren: 2
>> >  Is a Host place
>> >  Child 0: Place(4)
>> >    Parent: Place(1)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> >  Child 1: Place(5)
>> >    Parent: Place(1)
>> >    NumChildren: 0
>> >    Is a CUDA place
>> >
>> > Does anyone have any advice?
>> >
>> > Thanks,
>> > Dave
>> > ---
>> > David E. Hudak, Ph.D.          dhu...@osc.edu
>> > Program Director, HPC Engineering
>> > Ohio Supercomputer Center
>> > http://www.osc.edu
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Virtualization & Cloud Management Using Capacity Planning
>> > Cloud computing makes use of virtualization - but cloud computing
>> > also focuses on allowing computing to be delivered as a service.
>> > http://www.accelacomm.com/jaw/sfnl/114/51521223/
>> > _______________________________________________
>> > X10-users mailing list
>> > X10-users@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/x10-users
>>
>> ---
>> David E. Hudak, Ph.D.          dhu...@osc.edu
>> Program Director, HPC Engineering
>> Ohio Supercomputer Center
>> http://www.osc.edu
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>  Try before you buy = See our experts in action!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/learndevnow-dev2
>>  _______________________________________________
>> X10-users mailing list
>> X10-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/x10-users
>>
>
>
> ------------------------------------------------------------------------------
> Try before you buy = See our experts in action!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
>
> http://p.sf.net/sfu/learndevnow-dev2_______________________________________________
> X10-users mailing list
> X10-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/x10-users
>
>
>         ---
>  David E. Hudak, Ph.D.          dhu...@osc.edu
> Program Director, HPC Engineering
> Ohio Supercomputer Center
> http://www.osc.edu
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> X10-users mailing list
> X10-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/x10-users
>
>

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d

_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users

Re: [X10-users] X10 with CUDA and MPI

Reply via email to