FYI, I think Paul is still on vacation through the end of this week.
I have also had problems running with sacusp and sacusppoly and having thrust
complain about not being able to allocate memory. I had not connected it with
building with the txpetscgpu package. Perhaps I'll try to confirm that as well.
Its hard for me to imagine doing without the txpetscgpu package right now.
The package allows for some definite performance gains in the matvec using
the different matrix storage formats. I've seen my matvecs be anywhere from
2x to 5x faster on some problems using the "dia" format versus the default "csr"
format. Also, Paul has added some support for running on multiple gpus that
I am using. I'm not sure what is available in that area without his package.
Thanks,
Dave
--
Dave Nystrom
LANL HPC-5
Phone: 505-667-7913
Email: wdn at lanl.gov
Smail: Mail Stop B272
Group HPC-5
Los Alamos National Laboratory
Los Alamos, NM 87545
________________________________
From: petsc-dev-bounces at mcs.anl.gov [petsc-dev-bounces at mcs.anl.gov] on
behalf of John Fettig [[email protected]]
Sent: Monday, February 27, 2012 2:48 PM
To: For users of the development version of PETSc
Subject: Re: [petsc-dev] PETSc GPU capabilities
Hi Paul,
This is very interesting. I tried building the code with --download-txpetscgpu
and it doesn't work for me. It runs out of memory, no matter how small the
problem (this is ex2 from src/ksp/ksp/examples/tutorials):
mpirun -np 1 ./ex2 -n 10 -m 10 -ksp_type cg -pc_type sacusp -mat_type aijcusp
-vec_type cusp -cusp_storage_format csr -use_cusparse 0
terminate called after throwing an instance of
'thrust::system::detail::bad_alloc'
what(): std::bad_alloc: out of memory
MPI Application rank 0 killed before MPI_Finalize() with signal 6
This example works fine when I build without your gpu additions (and for much
larger problems too). Am I doing something wrong?
For reference, I'm using CUDA 4.1, CUSP 0.3, and Thrust 1.5.1
John
On Fri, Feb 10, 2012 at 5:04 PM, Paul Mullowney <paulm at
txcorp.com<mailto:paulm at txcorp.com>> wrote:
Hi All,
I've been developing GPU capabilities for PETSc. The development has focused
mostly on
(1) An efficient multi-GPU SpMV, i.e. MatMult. This is working well.
(2) Triangular Solve used in ILU preconditioners; i.e. MatSolve. The
performance of this ... is what it is :|
This code is in beta mode. Keep that in mind, if you decide to use it. It
supports single and double precision, real numbers only! Complex will be
supported at some point in the future, but not any time soon.
To build with these capabilities, add the following to your configure line.
--download-txpetscgpu=yes
The capabilities of the SpMV code are accessed with the following 2 command
line flags
-cusp_storage_format csr (other options are coo (coordinate), ell (ellpack),
dia (diagonal). hyb (hybrid) is not yet supported)
-use_cusparse (this is a boolean and at the moment is only supported with csr
format matrices. In the future, cusparse will work with ell, coo, and hyb
formats).
Regarding the number of GPUs to run on:
Imagine a system with P nodes, N cores per node, and M GPUs per node. Then, to
use only the GPUs, I would run with M ranks per node over P nodes. As an
example, I have a system with 2 nodes. Each node has 8 cores, and 4 GPUs
attached to each node (P=2, N=8, M=4). In a PBS queue script, one would use 2
nodes at 4 processors per node. Each mpi rank (CPU processor) will be attached
to a GPU.
You do not need to explicitly manage the GPUs, apart from understanding what
type of system you are running on. To learn how many devices are available per
node, use the command line flag:
-cuda_show_devices
-Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120227/e481d33a/attachment.html>