I see in the example for how to solve A.x_i = b_i (ie multiple right-hand-sides for the same matrix), it simply loops over multiple separate KSP calls.
Wouldn't there be some benefit in having a matrix-vector routine that computed y_i = A.x_i for multiple "i" values? The parallel overhead comes from communicating the off-process x values. As this is probably latency dominated (especially as we go to many processes), the comms cost of doing the communications would rise quite slowly for additional vectors. Likewise, the cache utilisation of the sparse A matrix entries would be helped by doing sevefal multiplications at once. Has this already been implemented, or maybe in the pipleline? Or am I somehow missing the point ... Thanks! David -- Dr David Henty EPCC, The University of Edinburgh HPC Training and Support Edinburgh EH9 3JZ, UK d.henty at epcc.ed.ac.uk Tel: +44 (0)131 650 5960 http://www.epcc.ed.ac.uk/~dsh/ Fax: +44 (0)131 650 6555 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
