On 12/12/2013 04:38 PM, Paul Mullowney wrote:
Provided you have a good parallel sparse direct solve for a single SM,
you could unleash 32 direct solves (or perhaps 16) which run
concurrently on the K20x. One only needs to set an environment
variable to use Hypre Q
On Titan all you need to do is
|
$ export CRAY_CUDA_PROXY=1
|
See here:
https://www.olcf.ornl.gov/tutorials/cuda-proxy-managing-gpu-context/
Cheers,
Dominic
I don't know of any good parallel sparse direct solver for small systems.
-Paul
On Thu, Dec 12, 2013 at 4:29 PM, Dominic Meiser <[email protected]
<mailto:[email protected]>> wrote:
Hi Karli,
On 12/12/2013 02:50 PM, Karl Rupp wrote:
Hmm, this does not sound like something I would consider a good
fit for GPUs. With 16 MPI processes you have additional
congestion of the one or two GPUs per node, so you would have the
rethink the solution procedure as a whole.
Are you sure about that for Titan? Supposedly the K20X's can deal
with multiple MPI processes hitting a single GPU pretty well using
Hyper-Q. Paul has seen pretty good speed up with small GPU kernels
simply by over-subscribing each GPU with 4 MPI processes.
See here:
http://blogs.nvidia.com/blog/2012/08/23/unleash-legacy-mpi-codes-with-keplers-hyper-q/
Cheers,
Dominic
--
Dominic Meiser
Tech-X Corporation
5621 Arapahoe Avenue
Boulder, CO 80303
USA
Telephone:303-996-2036 <tel:303-996-2036>
Fax:303-448-7756 <tel:303-448-7756>
www.txcorp.com <http://www.txcorp.com>
--
Dominic Meiser
Tech-X Corporation
5621 Arapahoe Avenue
Boulder, CO 80303
USA
Telephone: 303-996-2036
Fax: 303-448-7756
www.txcorp.com