On 12/12/2013 04:38 PM, Paul Mullowney wrote:
Provided you have a good parallel sparse direct solve for a single SM, you could unleash 32 direct solves (or perhaps 16) which run concurrently on the K20x. One only needs to set an environment variable to use Hypre Q
On Titan all you need to do is
|
$ export CRAY_CUDA_PROXY=1

|

See here:
https://www.olcf.ornl.gov/tutorials/cuda-proxy-managing-gpu-context/

Cheers,
Dominic

I don't know of any good parallel sparse direct solver for small systems.

-Paul


On Thu, Dec 12, 2013 at 4:29 PM, Dominic Meiser <[email protected] <mailto:[email protected]>> wrote:

    Hi Karli,


    On 12/12/2013 02:50 PM, Karl Rupp wrote:

    Hmm, this does not sound like something I would consider a good
    fit for GPUs. With 16 MPI processes you have additional
    congestion of the one or two GPUs per node, so you would have the
    rethink the solution procedure as a whole.

    Are you sure about that for Titan? Supposedly the K20X's can deal
    with multiple MPI processes hitting a single GPU pretty well using
    Hyper-Q. Paul has seen pretty good speed up with small GPU kernels
    simply by over-subscribing each GPU with 4 MPI processes.

    See here:
    
http://blogs.nvidia.com/blog/2012/08/23/unleash-legacy-mpi-codes-with-keplers-hyper-q/


    Cheers,
    Dominic


-- Dominic Meiser
    Tech-X Corporation
    5621 Arapahoe Avenue
    Boulder, CO 80303
    USA
    Telephone:303-996-2036  <tel:303-996-2036>
    Fax:303-448-7756  <tel:303-448-7756>
    www.txcorp.com  <http://www.txcorp.com>




--
Dominic Meiser
Tech-X Corporation
5621 Arapahoe Avenue
Boulder, CO 80303
USA
Telephone: 303-996-2036
Fax: 303-448-7756
www.txcorp.com

Reply via email to