El 16/03/2013, a las 00:46, Karl Rupp escribi?: > Hi Paul, > >> For GMRES, the current performance of VecMDot_SeqCUSP sucks. I have an >> solution, but I haven't tested all cases yet. >> For BCGS, some part of the algorithm is broken but I don't know what it >> is. By broken, I mean that CPU and GPU residuals diverge fairly quickly. > > Since I just stumbled over VecMDot_SeqCUSP() when interfacing ViennaCL: Do > you know what was the reason why the 'old' version was replaced by this > expensive call to gemv() including the creation of temporaries, etc.? Just > writing a custom kernel with one work group per dot-product should do the job > perfectly, shouldn't it? > > Best regards, > Karli
My fault: https://bitbucket.org/petsc/petsc-hg/commits/ec7a7de2acd477e5edd24cc5a3af441ce7a68a36 The motivation was that the previous version was even worse for me (VecMDot is used a lot in SLEPc and GPU performance was really bad). At that time I did not have the time to write a custom kernel. If you write one, I could help in testing and measuring performance. Jose
