El 16/03/2013, a las 00:46, Karl Rupp escribi?:

> Hi Paul,
> 
>> For GMRES, the current performance of VecMDot_SeqCUSP sucks. I have an
>> solution, but I haven't tested all cases yet.
>> For BCGS, some part of the algorithm is broken but I don't know what it
>> is. By broken, I mean that CPU and GPU residuals diverge fairly quickly.
> 
> Since I just stumbled over VecMDot_SeqCUSP() when interfacing ViennaCL: Do 
> you know what was the reason why the 'old' version was replaced by this 
> expensive call to gemv() including the creation of temporaries, etc.? Just 
> writing a custom kernel with one work group per dot-product should do the job 
> perfectly, shouldn't it?
> 
> Best regards,
> Karli

My fault: 
https://bitbucket.org/petsc/petsc-hg/commits/ec7a7de2acd477e5edd24cc5a3af441ce7a68a36

The motivation was that the previous version was even worse for me (VecMDot is 
used a lot in SLEPc and GPU performance was really bad). At that time I did not 
have the time to write a custom kernel. If you write one, I could help in 
testing and measuring performance.

Jose

Reply via email to