Hello,
thank you for describing your application in great detail! With a system
size of 2000 to 50000 unknowns you are most likely better off with
staying on the CPU (assuming that your system is indeed rather sparse
with less than about 100 nonzeros per row on average). This is because
each GPU kernel launch involves a couple of microseconds of latency;
this doesn't sound much, but it accumulates over many kernel launches.
Also, with multiple right hand sides I recommend to compute a sparse LU
factorization (PARDISO, SuperLU, etc.), and then apply this
factorization for each of the right hand sides. This will be more
efficient than calling iterative solvers (which is the standard approach
for GPUs). Sparse factorizations on the GPU don't really work that well
and to the best of my knowledge just match those in equally powerful
(with similar energy consumption) CPUs.
Regarding symmetry: You can use the symmetry to compute a sparse
Cholesky factorization instead of an LU factorization. This, again, fits
better onto a CPU than a GPU.
Overall, I *think* that you can use the same parallelization approaches
(esp. datastructures) for the GPU to also speed up your CPU code
(OpenMP, MPI, etc.). In terms of solving these systems, sparse direct
solvers on the CPU will be hard to beat at the system sizes you
mentioned. Productivity-wise, your best option is most likely to stay
with the CPU and don't worry about GPUs for this particular problem.
Best regards,
Karli
On 9/10/21 15:28, Arno Gehrer wrote:
Good afternoon!
Maybe you can support me to find out if it would make sense to apply
ViennaCL to my problem?
Background:
·In the context of a reverse engineering problem I need to solve a
linear system of equations.
The number of unknowns is in the range of n=2000 … 50000 and the system
needs to be solved a lot of times within an iteration loop.
·The matrix is symmetric, hence only the upper triangle is stored in
compressed CSR format
·I need to solve this system with multiple right hand side vectors.
·At present, I’m using Intel MKL / PARDISO to solve the linear system
with mtype = 2 (real and symmetric positive definite) or -2 (in some
cases, the matrix is real and symmetric indefinite) which works very well.
·Recently, I managed to speed up the whole algorithm by setting up the
system on the GPU with CUDA and I’m looking for a suitable library to
solve the system on the GPU as well.
oI have already tried to solve the system with cusparse (using
cusolverSpDcsrlsvchol or cusolverSpDcsrlsvqr) which in principle worked.
I have faced the problem that I did not find a possibility to
simultaneously solve multiple right hand sides and also the symmetric
property is not supported for cusolverSp. So I had to extend the matrix
to a full matrix and to solve the system for each rhs which in total was
much slower than solving the system on the CPU by means of PARDISO.
So, after this lengthy introduction, my question is:
Is it possible to apply ViennaCL to such a problem and can I expect a
significant speed up compared to mkl?
·The perfect solution would be if I directly could transfer the matrix
in csr format and the rhs vectors (which are all stored in GPU memory)
to a suitable solver that replaces PARDISO, mtype 2,2 (I currently copy
these data to the host and pass it to PARDISO)
My environment for development is Win10(x64) / Visual Studio 2019 / MKL
2017 / CUDA 11.2 and the code also compiles on Linux where CUDA 7.5 is
installed.
Thanks for your feedback,
Arno Gehrer
_______________________________________________
ViennaCL-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/viennacl-support
_______________________________________________
ViennaCL-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/viennacl-support