I get similar results for OpenBLAS. I expect that axpy gains more from
vectorization than dot.
On Fri, Sep 9, 2016 at 5:31 PM, Sheehan Olver wrote:
> I did blas_set_num_threads(1) with the same profile numbers. This is
> using Apple’s BLAS.
>
> Maybe I’ll try 0.5 and
I did blas_set_num_threads(1) with the same profile numbers. This is using
Apple’s BLAS.
Maybe I’ll try 0.5 and OpenBLAS for comparison.
> On 10 Sep 2016, at 2:34 AM, Andreas Noack
> wrote:
>
> Try to time it again with threading disabled. Sometimes the
Try to time it again with threading disabled. Sometimes the
threading heuristics can cause unintuitive performance.
On Friday, September 9, 2016 at 6:39:13 AM UTC-4, Sheehan Olver wrote:
>
>
> I have the following code that is part of a Householder routine, where
> j::Int64,
> N::Int64,