On Jan 3, 2012, at 4:47 PM, Jed Brown wrote:

> On Tue, Jan 3, 2012 at 16:44, Jack Poulson <jack.poulson at gmail.com> wrote:
> It is possible, though unlikely that the BLAS dot could be faster than the 
> BLAS nrm2, though I am skeptical. The reason is that the result of dnrm2 on a 
> vector u is more stable than the square root of the inner product of u with 
> itself via ddot, as it scales the temporary products of the norm to make the 
> computation more accurate:
> http://www.netlib.org/blas/dnrm2.f
> 
> Ah, thanks for pointing this out.
>  
> 
> 
> Thus, if you don't care about accuracy, then it is _possible_ that ddot would 
> be faster, but i doubt it, and it is likely a bad idea to give up on some 
> stability.
> 
> Agreed.

  Yes the Blas norm is often a good bit (much) slower than the Blas dot for the 
reason Jack points out. This is a very real measurable result using blas 
obtained from the Fortran reference that has not been optimized (by taking out 
the stability crap) (some of the Linux bundled blases) ; the blasnorm can give 
less than half the flop rate of the blas dot on real machines on real codes. On 
those same situations just writing a loop to do the norm is faster than calling 
the blas.

    Now ideally configure would run both, get the timings and then only use the 
norm version if it is not significently slower than the dot version. But since 
Matt is the only person who can wrangle this stuff out of BuildSystem ......

     I use to have a PETSC_BLAS_NORM_SLOW or something that allowed switching 
off the blas norm but that got lost over the years.

    Given that this is a real problem (despite your skepticism) how do you 
suggest handling it?  Just live with the crappy performance, have a bunch of 
#if defined() to switch based on configure flags, have Matt wrangle BuildSystem?



   Barry



Reply via email to