Hi Paul, > For GMRES, the current performance of VecMDot_SeqCUSP sucks. I have an > solution, but I haven't tested all cases yet. > For BCGS, some part of the algorithm is broken but I don't know what it > is. By broken, I mean that CPU and GPU residuals diverge fairly quickly.
Since I just stumbled over VecMDot_SeqCUSP() when interfacing ViennaCL: Do you know what was the reason why the 'old' version was replaced by this expensive call to gemv() including the creation of temporaries, etc.? Just writing a custom kernel with one work group per dot-product should do the job perfectly, shouldn't it? Best regards, Karli
