[petsc-dev] What is this? "Optimize VecNorm_MPI. Use BLASdot_ instead of BLASnrm2_"

Barry Smith Tue, 3 Jan 2012 18:09:26 -0600

On Jan 3, 2012, at 6:00 PM, Jed Brown wrote:

> On Tue, Jan 3, 2012 at 17:48, Barry Smith <bsmith at mcs.anl.gov> wrote:
> Yes the Blas norm is often a good bit (much) slower than the Blas dot for the 
> reason Jack points out. This is a very real measurable result using blas 
> obtained from the Fortran reference that has not been optimized (by taking 
> out the stability crap)
> 
> It seems silly to optimize for the reference BLAS.


   It is not just the reference BLAS. It is the -lblas that come on many Linux 
systems by default (that are not much more than compiled versions of the 
reference blas).

   Now you can say that you don't care about that situation, and those blas are 
stupid but it is a common situation and saying that is stupid doesn't help all 
those users who spend way to much time on norm.


> If the concern is just this routine and just on x86-64, I would be inclined 
> to write a simple vectorized implementation (probably using SSE intrinsics) 
> that still includes the stability stuff.
> 
    I don't think the stability stuff is needed for how norm() is used in PETSc 
(if it is important how come it is not important for the dot products also?).  
It is just there for pathological matrices the LINPACK guys knew about; I 
consider it just a fetish that got the LINPACK guys excited.


> Whatever the case, I'm not a fan of replacing nrm2() with dot().

   Why not? If the dot is highly optimized it may be faster than your own hand 
coded blas thing.

   So you are saying we need Matt to write another BuildSystem test?


   Barry

[petsc-dev] What is this? "Optimize VecNorm_MPI. Use BLASdot_ instead of BLASnrm2_"

Reply via email to