On Jan 3, 2012, at 4:47 PM, Jed Brown wrote:
> On Tue, Jan 3, 2012 at 16:44, Jack Poulson <jack.poulson at gmail.com> wrote:
> It is possible, though unlikely that the BLAS dot could be faster than the
> BLAS nrm2, though I am skeptical. The reason is that the result of dnrm2 on a
> vector u is more stable than the square root of the inner product of u with
> itself via ddot, as it scales the temporary products of the norm to make the
> computation more accurate:
> http://www.netlib.org/blas/dnrm2.f
>
> Ah, thanks for pointing this out.
>
>
>
> Thus, if you don't care about accuracy, then it is _possible_ that ddot would
> be faster, but i doubt it, and it is likely a bad idea to give up on some
> stability.
>
> Agreed.
Yes the Blas norm is often a good bit (much) slower than the Blas dot for the
reason Jack points out. This is a very real measurable result using blas
obtained from the Fortran reference that has not been optimized (by taking out
the stability crap) (some of the Linux bundled blases) ; the blasnorm can give
less than half the flop rate of the blas dot on real machines on real codes. On
those same situations just writing a loop to do the norm is faster than calling
the blas.
Now ideally configure would run both, get the timings and then only use the
norm version if it is not significently slower than the dot version. But since
Matt is the only person who can wrangle this stuff out of BuildSystem ......
I use to have a PETSC_BLAS_NORM_SLOW or something that allowed switching
off the blas norm but that got lost over the years.
Given that this is a real problem (despite your skepticism) how do you
suggest handling it? Just live with the crappy performance, have a bunch of
#if defined() to switch based on configure flags, have Matt wrangle BuildSystem?
Barry