Hi Paul, This is very interesting. I tried building the code with --download-txpetscgpu and it doesn't work for me. It runs out of memory, no matter how small the problem (this is ex2 from src/ksp/ksp/examples/tutorials):
mpirun -np 1 ./ex2 -n 10 -m 10 -ksp_type cg -pc_type sacusp -mat_type aijcusp -vec_type cusp -cusp_storage_format csr -use_cusparse 0 terminate called after throwing an instance of 'thrust::system::detail::bad_alloc' what(): std::bad_alloc: out of memory MPI Application rank 0 killed before MPI_Finalize() with signal 6 This example works fine when I build without your gpu additions (and for much larger problems too). Am I doing something wrong? For reference, I'm using CUDA 4.1, CUSP 0.3, and Thrust 1.5.1 John On Fri, Feb 10, 2012 at 5:04 PM, Paul Mullowney <paulm at txcorp.com> wrote: > Hi All, > > I've been developing GPU capabilities for PETSc. The development has > focused mostly on > (1) An efficient multi-GPU SpMV, i.e. MatMult. This is working well. > (2) Triangular Solve used in ILU preconditioners; i.e. MatSolve. The > performance of this ... is what it is :| > This code is in beta mode. Keep that in mind, if you decide to use it. It > supports single and double precision, real numbers only! Complex will be > supported at some point in the future, but not any time soon. > > To build with these capabilities, add the following to your configure line. > --download-txpetscgpu=yes > > The capabilities of the SpMV code are accessed with the following 2 > command line flags > -cusp_storage_format csr (other options are coo (coordinate), ell > (ellpack), dia (diagonal). hyb (hybrid) is not yet supported) > -use_cusparse (this is a boolean and at the moment is only supported with > csr format matrices. In the future, cusparse will work with ell, coo, and > hyb formats). > > Regarding the number of GPUs to run on: > Imagine a system with P nodes, N cores per node, and M GPUs per node. > Then, to use only the GPUs, I would run with M ranks per node over P nodes. > As an example, I have a system with 2 nodes. Each node has 8 cores, and 4 > GPUs attached to each node (P=2, N=8, M=4). In a PBS queue script, one > would use 2 nodes at 4 processors per node. Each mpi rank (CPU processor) > will be attached to a GPU. > > You do not need to explicitly manage the GPUs, apart from understanding > what type of system you are running on. To learn how many devices are > available per node, use the command line flag: > -cuda_show_devices > > -Paul > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120227/32e56dd4/attachment.html>
