The incomplete factorizations have been around for a while, and with recent hardware they tend to be less competitive (note that they use a Tesla 2050 in their benchmarks, which is ~7 years old).

The fine-grained parallel version here:
is an attractive alternative (and available in the master-branch of PETSc through ViennaCL), yet it also has drawbacks.

I'm not sure if this is the right place to post this, but I wanted to
point out a new white paper I stumbled across about preconditioned
iterative solvers on GPU's:
The speed-ups are not huge, but they're not negligible either. I thought
it might be of interest to some of you.


