Re: [Numpy-discussion] C-coded dot 1000x faster than numpy?

2021-02-24 Thread Neal Becker
Supposedly can control through env variables but I didn't see any effect On Wed, Feb 24, 2021, 10:12 AM Charles R Harris wrote: > > > On Wed, Feb 24, 2021 at 8:02 AM Charles R Harris < > charlesr.har...@gmail.com> wrote: > >> >> >> On Wed, Feb 24, 2021 at 5:36 AM Neal Becker wrote: >> >>> See

Re: [Numpy-discussion] C-coded dot 1000x faster than numpy?

2021-02-24 Thread Charles R Harris
On Wed, Feb 24, 2021 at 8:02 AM Charles R Harris wrote: > > > On Wed, Feb 24, 2021 at 5:36 AM Neal Becker wrote: > >> See my earlier email - this is fedora 33, python3.9. >> >> I'm using fedora 33 standard numpy. >> ldd says: >> >> /usr/lib64/python3.9/site-packages/numpy/core/_ >>

Re: [Numpy-discussion] C-coded dot 1000x faster than numpy?

2021-02-24 Thread Charles R Harris
On Wed, Feb 24, 2021 at 5:36 AM Neal Becker wrote: > See my earlier email - this is fedora 33, python3.9. > > I'm using fedora 33 standard numpy. > ldd says: > > /usr/lib64/python3.9/site-packages/numpy/core/_ > multiarray_umath.cpython-39-x86_64-linux-gnu.so: > linux-vdso.so.1

Re: [Numpy-discussion] C-coded dot 1000x faster than numpy?

2021-02-24 Thread Neal Becker
See my earlier email - this is fedora 33, python3.9. I'm using fedora 33 standard numpy. ldd says: /usr/lib64/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so: linux-vdso.so.1 (0x7ffdd1487000) libflexiblas.so.3 => /lib64/libflexiblas.so.3

Re: [Numpy-discussion] C-coded dot 1000x faster than numpy?

2021-02-23 Thread Charles R Harris
On Tue, Feb 23, 2021 at 5:47 PM Charles R Harris wrote: > > > On Tue, Feb 23, 2021 at 11:10 AM Neal Becker wrote: > >> I have code that performs dot product of a 2D matrix of size (on the >> order of) [1000,16] with a vector of size [1000]. The matrix is >> float64 and the vector is

Re: [Numpy-discussion] C-coded dot 1000x faster than numpy?

2021-02-23 Thread Charles R Harris
On Tue, Feb 23, 2021 at 11:10 AM Neal Becker wrote: > I have code that performs dot product of a 2D matrix of size (on the > order of) [1000,16] with a vector of size [1000]. The matrix is > float64 and the vector is complex128. I was using numpy.dot but it > turned out to be a bottleneck. > >

Re: [Numpy-discussion] C-coded dot 1000x faster than numpy?

2021-02-23 Thread Carl Kleffner
The stackoverflow link above contains a simple testcase: >>> from scipy.linalg import get_blas_funcs>>> gemm = get_blas_funcs("gemm", >>> [X, Y])>>> np.all(gemm(1, X, Y) == np.dot(X, Y))True It would be of interest to benchmark gemm against np.dot. Maybe np.dot doesn't use blas at al for

Re: [Numpy-discussion] C-coded dot 1000x faster than numpy?

2021-02-23 Thread David Menéndez Hurtado
On Tue, 23 Feb 2021, 7:41 pm Roman Yurchak, wrote: > For the first benchmark apparently A.dot(B) with A real and B complex is > a known issue performance wise https://github.com/numpy/numpy/issues/10468 I splitted B into a vector of size (N, 2) for the real and imaginary part, and that makes

Re: [Numpy-discussion] C-coded dot 1000x faster than numpy?

2021-02-23 Thread Neal Becker
I'm using fedora 33 standard numpy. ldd says: /usr/lib64/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so: linux-vdso.so.1 (0x7ffdd1487000) libflexiblas.so.3 => /lib64/libflexiblas.so.3 (0x7f0512787000) So whatever flexiblas is doing controls blas. On

Re: [Numpy-discussion] C-coded dot 1000x faster than numpy?

2021-02-23 Thread Carl Kleffner
https://stackoverflow.com/questions/19839539/how-to-get-faster-code-than-numpy-dot-for-matrix-multiplication maybe C_CONTIGUOUS vs F_CONTIGUOUS? Carl Am Di., 23. Feb. 2021 um 19:52 Uhr schrieb Neal Becker : > One suspect is that it seems the numpy version was multi-threading. > This isn't

Re: [Numpy-discussion] C-coded dot 1000x faster than numpy?

2021-02-23 Thread Neal Becker
One suspect is that it seems the numpy version was multi-threading. This isn't useful here, because I'm running parallel monte-carlo simulations using all cores. Perhaps this is perversely slowing things down? I don't know how to account for 1000x slowdown though. On Tue, Feb 23, 2021 at 1:40

Re: [Numpy-discussion] C-coded dot 1000x faster than numpy?

2021-02-23 Thread Roman Yurchak
For the first benchmark apparently A.dot(B) with A real and B complex is a known issue performance wise https://github.com/numpy/numpy/issues/10468 In general, it might be worth trying different BLAS backends. For instance, if you install numpy from conda-forge you should be able to switch

Re: [Numpy-discussion] C-coded dot 1000x faster than numpy?

2021-02-23 Thread Andrea Gavana
Hi, On Tue, 23 Feb 2021 at 19.11, Neal Becker wrote: > I have code that performs dot product of a 2D matrix of size (on the > order of) [1000,16] with a vector of size [1000]. The matrix is > float64 and the vector is complex128. I was using numpy.dot but it > turned out to be a bottleneck. >

[Numpy-discussion] C-coded dot 1000x faster than numpy?

2021-02-23 Thread Neal Becker
I have code that performs dot product of a 2D matrix of size (on the order of) [1000,16] with a vector of size [1000]. The matrix is float64 and the vector is complex128. I was using numpy.dot but it turned out to be a bottleneck. So I coded dot2x1 in c++ (using xtensor-python just for the