Re: [sage-devel] multithreading performance issues

2016-10-06 Thread Jean-Pierre Flori


On Thursday, October 6, 2016 at 1:39:05 PM UTC+2, Jonathan Bober wrote:
>
> I understand the reasons why OpenBLAS shouldn't be multithreading 
> everything, and why it shouldn't necessarily use all available cpu cores 
> when it does do multihreading, but the point is: it currently uses all or 
> one, and sometimes it decides to use multithreading even when using 2 
> threads doesn't really seem to give me a benefit. So I guess there are two 
> points to consider. One is a "public service announcement" that if things 
> don't change, then in Sage 7.4 users might want to strongly consider 
> setting OpenBLAS to be single threaded. The other is that we may want to 
> reconsider the OpenBLAS defaults in Sage.
>
That seems to be a better option at the moment.
And it is still time to open a ticket and get it reviewed in time for 7.4. 

>
> One possibility might be to expose the openblas_set_num_threads function 
> at the top level, and keep the default at 1. Another possibility is to 
> build OpenBLAS single threaded by default and force someone compiling Sage 
> to pass some option for multithreaded OpenBLAS; that way, at least, only 
> "advanced" users will run into and have to deal with the sub-par 
> multithreading behavior.
>
> On Wed, Oct 5, 2016 at 10:34 AM, Clement Pernet  > wrote:
>
>> To follow up on Jean-Pierre summary of the situation:
>>
>> The current version of fflas-ffpack in sage (v2.2.2) uses the BLAS 
>> provided as is. Running it with a multithreaded BLAS may result in a slower 
>> code than with a single threaded BLAS. This is very likely due to memory 
>> transfer and coherence problems.
>>
>> More generally, we strongly suggest to use a single threaded BLAS and let 
>> fflas-ffpack deal with the parallelization. This is common practice for 
>> example with parallel versions of LAPACK.
>>
>> Therefore, after the discussion https://trac.sagemath.org/ticket/21323 
>> we have decided to let fflas-ffpack the possibility to force the number of 
>> threads that OpenBLAS can use at runtime. In this context we will force it 
>> to 1.
>> This is available upsteam and I plan to update sage's fflas-ffpack 
>> whenever we release v2.3.0.
>>
>> Clément
>>
>>
>> Le 05/10/2016 à 11:24, Jean-Pierre Flori a écrit :
>>
>>> Currently OpenBlas does what it wants for multithreading.
>>> We hesitated to disable it but prefered to wait and think about it:
>>> see https://trac.sagemath.org/ticket/21323.
>>>
>>> You can still influence its use of threads setting OPENBLAS_NUM_THREADS.
>>> See the trac ticket, just note that this is not Sage specific.
>>> And as you discovered, it seems it is also influenced by 
>>> OMP_NUM_THREADS...
>>>
>>> On Wednesday, October 5, 2016 at 9:28:23 AM UTC+2, tdumont wrote:
>>>
>>> What is the size of the matrix you use ?
>>> Whatever you do, openmp in blas is interesting only if you compute 
>>> with
>>> large matrices.
>>> If your computations are embedded  in an @parallel and launch n
>>> processes, be careful  that your  OMP_NUM_THREADS be less or equal to
>>> ncores/n.
>>>
>>> My experience is (I am doing numerical computations)  that there are
>>> very few cases where using openmp in blas libraries is interesting.
>>> Parallelism should generally be searched at a higher level.
>>>
>>> One of the interest of multithreaded blas is for constructors: with
>>> Intel's mkl blas, you can obtain the maximum possible performances of
>>> tah machines  when you use DGEMM (ie product of matrices), due to the
>>> high arithmetic intensity of matrix vector products. On my 2x8 core
>>> sandy bridge à 2.7GHZ, I have obtained more that 300 giga flops, but
>>> with matrices of size > 1000 ! And this is only true for DGEMM
>>>
>>> t.d.
>>>
>>> Le 04/10/2016 à 20:26, Jonathan Bober a écrit :
>>> > See the following timings: If I start Sage with OMP_NUM_THREADS=1, 
>>> a
>>> > particular computation takes 1.52 cpu seconds and 1.56 wall 
>>> seconds.
>>> >
>>> > The same computation without OMP_NUM_THREADS set takes 12.8 cpu 
>>> seconds
>>> > and 1.69 wall seconds. This is particularly devastating when I'm 
>>> running
>>> > with @parallel to use all of my cpu cores.
>>> >
>>> > My guess is that this is Linbox related, since these computations 
>>> do
>>> > some exact linear algebra, and Linbox can do some multithreading, 
>>> which
>>> > perhaps uses OpenMP.
>>> >
>>> > jb12407@lmfdb1:~$ OMP_NUM_THREADS=1 sage
>>> > [...]
>>> > SageMath version 7.4.beta6, Release Date: 2016-09-24
>>> > [...]
>>> > Warning: this is a prerelease version, and it may be unstable.
>>> > [...]
>>> > sage: %time M = ModularSymbols(5113, 2, -1)
>>> > CPU times: user 509 ms, sys: 21 ms, total: 530 ms
>>> > Wall time: 530 ms
>>> > sage: %time S = M.cuspidal_subspace().new_subspace()
>>> > CPU times: user 1.42 s, sys: 97 ms, total: 1.52 s

Re: [sage-devel] multithreading performance issues

2016-10-06 Thread Jonathan Bober
I understand the reasons why OpenBLAS shouldn't be multithreading
everything, and why it shouldn't necessarily use all available cpu cores
when it does do multihreading, but the point is: it currently uses all or
one, and sometimes it decides to use multithreading even when using 2
threads doesn't really seem to give me a benefit. So I guess there are two
points to consider. One is a "public service announcement" that if things
don't change, then in Sage 7.4 users might want to strongly consider
setting OpenBLAS to be single threaded. The other is that we may want to
reconsider the OpenBLAS defaults in Sage.

One possibility might be to expose the openblas_set_num_threads function at
the top level, and keep the default at 1. Another possibility is to build
OpenBLAS single threaded by default and force someone compiling Sage to
pass some option for multithreaded OpenBLAS; that way, at least, only
"advanced" users will run into and have to deal with the sub-par
multithreading behavior.

On Wed, Oct 5, 2016 at 10:34 AM, Clement Pernet 
wrote:

> To follow up on Jean-Pierre summary of the situation:
>
> The current version of fflas-ffpack in sage (v2.2.2) uses the BLAS
> provided as is. Running it with a multithreaded BLAS may result in a slower
> code than with a single threaded BLAS. This is very likely due to memory
> transfer and coherence problems.
>
> More generally, we strongly suggest to use a single threaded BLAS and let
> fflas-ffpack deal with the parallelization. This is common practice for
> example with parallel versions of LAPACK.
>
> Therefore, after the discussion https://trac.sagemath.org/ticket/21323 we
> have decided to let fflas-ffpack the possibility to force the number of
> threads that OpenBLAS can use at runtime. In this context we will force it
> to 1.
> This is available upsteam and I plan to update sage's fflas-ffpack
> whenever we release v2.3.0.
>
> Clément
>
>
> Le 05/10/2016 à 11:24, Jean-Pierre Flori a écrit :
>
>> Currently OpenBlas does what it wants for multithreading.
>> We hesitated to disable it but prefered to wait and think about it:
>> see https://trac.sagemath.org/ticket/21323.
>>
>> You can still influence its use of threads setting OPENBLAS_NUM_THREADS.
>> See the trac ticket, just note that this is not Sage specific.
>> And as you discovered, it seems it is also influenced by
>> OMP_NUM_THREADS...
>>
>> On Wednesday, October 5, 2016 at 9:28:23 AM UTC+2, tdumont wrote:
>>
>> What is the size of the matrix you use ?
>> Whatever you do, openmp in blas is interesting only if you compute
>> with
>> large matrices.
>> If your computations are embedded  in an @parallel and launch n
>> processes, be careful  that your  OMP_NUM_THREADS be less or equal to
>> ncores/n.
>>
>> My experience is (I am doing numerical computations)  that there are
>> very few cases where using openmp in blas libraries is interesting.
>> Parallelism should generally be searched at a higher level.
>>
>> One of the interest of multithreaded blas is for constructors: with
>> Intel's mkl blas, you can obtain the maximum possible performances of
>> tah machines  when you use DGEMM (ie product of matrices), due to the
>> high arithmetic intensity of matrix vector products. On my 2x8 core
>> sandy bridge à 2.7GHZ, I have obtained more that 300 giga flops, but
>> with matrices of size > 1000 ! And this is only true for DGEMM
>>
>> t.d.
>>
>> Le 04/10/2016 à 20:26, Jonathan Bober a écrit :
>> > See the following timings: If I start Sage with OMP_NUM_THREADS=1, a
>> > particular computation takes 1.52 cpu seconds and 1.56 wall seconds.
>> >
>> > The same computation without OMP_NUM_THREADS set takes 12.8 cpu
>> seconds
>> > and 1.69 wall seconds. This is particularly devastating when I'm
>> running
>> > with @parallel to use all of my cpu cores.
>> >
>> > My guess is that this is Linbox related, since these computations do
>> > some exact linear algebra, and Linbox can do some multithreading,
>> which
>> > perhaps uses OpenMP.
>> >
>> > jb12407@lmfdb1:~$ OMP_NUM_THREADS=1 sage
>> > [...]
>> > SageMath version 7.4.beta6, Release Date: 2016-09-24
>> > [...]
>> > Warning: this is a prerelease version, and it may be unstable.
>> > [...]
>> > sage: %time M = ModularSymbols(5113, 2, -1)
>> > CPU times: user 509 ms, sys: 21 ms, total: 530 ms
>> > Wall time: 530 ms
>> > sage: %time S = M.cuspidal_subspace().new_subspace()
>> > CPU times: user 1.42 s, sys: 97 ms, total: 1.52 s
>> > Wall time: 1.56 s
>> >
>> >
>> > jb12407@lmfdb1:~$ sage
>> > [...]
>> > SageMath version 7.4.beta6, Release Date: 2016-09-24
>> > [...]
>> > sage: %time M = ModularSymbols(5113, 2, -1)
>> > CPU times: user 570 ms, sys: 18 ms, total: 588 ms
>> > Wall time: 591 ms
>> > sage: %time S = 

Re: [sage-devel] multithreading performance issues

2016-10-05 Thread Clement Pernet

To follow up on Jean-Pierre summary of the situation:

The current version of fflas-ffpack in sage (v2.2.2) uses the BLAS provided as is. Running it with a 
multithreaded BLAS may result in a slower code than with a single threaded BLAS. This is very likely 
due to memory transfer and coherence problems.


More generally, we strongly suggest to use a single threaded BLAS and let fflas-ffpack deal with the 
parallelization. This is common practice for example with parallel versions of LAPACK.


Therefore, after the discussion https://trac.sagemath.org/ticket/21323 we have decided to let 
fflas-ffpack the possibility to force the number of threads that OpenBLAS can use at runtime. In 
this context we will force it to 1.

This is available upsteam and I plan to update sage's fflas-ffpack whenever we 
release v2.3.0.

Clément

Le 05/10/2016 à 11:24, Jean-Pierre Flori a écrit :

Currently OpenBlas does what it wants for multithreading.
We hesitated to disable it but prefered to wait and think about it:
see https://trac.sagemath.org/ticket/21323.

You can still influence its use of threads setting OPENBLAS_NUM_THREADS.
See the trac ticket, just note that this is not Sage specific.
And as you discovered, it seems it is also influenced by OMP_NUM_THREADS...

On Wednesday, October 5, 2016 at 9:28:23 AM UTC+2, tdumont wrote:

What is the size of the matrix you use ?
Whatever you do, openmp in blas is interesting only if you compute with
large matrices.
If your computations are embedded  in an @parallel and launch n
processes, be careful  that your  OMP_NUM_THREADS be less or equal to
ncores/n.

My experience is (I am doing numerical computations)  that there are
very few cases where using openmp in blas libraries is interesting.
Parallelism should generally be searched at a higher level.

One of the interest of multithreaded blas is for constructors: with
Intel's mkl blas, you can obtain the maximum possible performances of
tah machines  when you use DGEMM (ie product of matrices), due to the
high arithmetic intensity of matrix vector products. On my 2x8 core
sandy bridge à 2.7GHZ, I have obtained more that 300 giga flops, but
with matrices of size > 1000 ! And this is only true for DGEMM

t.d.

Le 04/10/2016 à 20:26, Jonathan Bober a écrit :
> See the following timings: If I start Sage with OMP_NUM_THREADS=1, a
> particular computation takes 1.52 cpu seconds and 1.56 wall seconds.
>
> The same computation without OMP_NUM_THREADS set takes 12.8 cpu seconds
> and 1.69 wall seconds. This is particularly devastating when I'm running
> with @parallel to use all of my cpu cores.
>
> My guess is that this is Linbox related, since these computations do
> some exact linear algebra, and Linbox can do some multithreading, which
> perhaps uses OpenMP.
>
> jb12407@lmfdb1:~$ OMP_NUM_THREADS=1 sage
> [...]
> SageMath version 7.4.beta6, Release Date: 2016-09-24
> [...]
> Warning: this is a prerelease version, and it may be unstable.
> [...]
> sage: %time M = ModularSymbols(5113, 2, -1)
> CPU times: user 509 ms, sys: 21 ms, total: 530 ms
> Wall time: 530 ms
> sage: %time S = M.cuspidal_subspace().new_subspace()
> CPU times: user 1.42 s, sys: 97 ms, total: 1.52 s
> Wall time: 1.56 s
>
>
> jb12407@lmfdb1:~$ sage
> [...]
> SageMath version 7.4.beta6, Release Date: 2016-09-24
> [...]
> sage: %time M = ModularSymbols(5113, 2, -1)
> CPU times: user 570 ms, sys: 18 ms, total: 588 ms
> Wall time: 591 ms
> sage: %time S = M.cuspidal_subspace().new_subspace()
> CPU times: user 3.76 s, sys: 9.01 s, total: 12.8 s
> Wall time: 1.69 s
>
> --
> You received this message because you are subscribed to the Google
> Groups "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sage-devel+...@googlegroups.com
> .
> To post to this group, send email to sage-...@googlegroups.com
> .
> Visit this group at https://groups.google.com/group/sage-devel
.
> For more options, visit https://groups.google.com/d/optout 
.

--
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
sage-devel+unsubscr...@googlegroups.com 
.
To post to this group, send email to sage-devel@googlegroups.com 
.
Visit this group at https://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 

Re: [sage-devel] multithreading performance issues

2016-10-05 Thread Jean-Pierre Flori
Currently OpenBlas does what it wants for multithreading.
We hesitated to disable it but prefered to wait and think about it:
see https://trac.sagemath.org/ticket/21323.

You can still influence its use of threads setting OPENBLAS_NUM_THREADS.
See the trac ticket, just note that this is not Sage specific.
And as you discovered, it seems it is also influenced by OMP_NUM_THREADS...

On Wednesday, October 5, 2016 at 9:28:23 AM UTC+2, tdumont wrote:
>
> What is the size of the matrix you use ? 
> Whatever you do, openmp in blas is interesting only if you compute with 
> large matrices. 
> If your computations are embedded  in an @parallel and launch n 
> processes, be careful  that your  OMP_NUM_THREADS be less or equal to 
> ncores/n. 
>
> My experience is (I am doing numerical computations)  that there are 
> very few cases where using openmp in blas libraries is interesting. 
> Parallelism should generally be searched at a higher level. 
>
> One of the interest of multithreaded blas is for constructors: with 
> Intel's mkl blas, you can obtain the maximum possible performances of 
> tah machines  when you use DGEMM (ie product of matrices), due to the 
> high arithmetic intensity of matrix vector products. On my 2x8 core 
> sandy bridge à 2.7GHZ, I have obtained more that 300 giga flops, but 
> with matrices of size > 1000 ! And this is only true for DGEMM 
>
> t.d. 
>
> Le 04/10/2016 à 20:26, Jonathan Bober a écrit : 
> > See the following timings: If I start Sage with OMP_NUM_THREADS=1, a 
> > particular computation takes 1.52 cpu seconds and 1.56 wall seconds. 
> > 
> > The same computation without OMP_NUM_THREADS set takes 12.8 cpu seconds 
> > and 1.69 wall seconds. This is particularly devastating when I'm running 
> > with @parallel to use all of my cpu cores. 
> > 
> > My guess is that this is Linbox related, since these computations do 
> > some exact linear algebra, and Linbox can do some multithreading, which 
> > perhaps uses OpenMP. 
> > 
> > jb12407@lmfdb1:~$ OMP_NUM_THREADS=1 sage 
> > [...] 
> > SageMath version 7.4.beta6, Release Date: 2016-09-24 
> > [...] 
> > Warning: this is a prerelease version, and it may be unstable. 
> > [...] 
> > sage: %time M = ModularSymbols(5113, 2, -1) 
> > CPU times: user 509 ms, sys: 21 ms, total: 530 ms 
> > Wall time: 530 ms 
> > sage: %time S = M.cuspidal_subspace().new_subspace() 
> > CPU times: user 1.42 s, sys: 97 ms, total: 1.52 s 
> > Wall time: 1.56 s 
> > 
> > 
> > jb12407@lmfdb1:~$ sage 
> > [...] 
> > SageMath version 7.4.beta6, Release Date: 2016-09-24 
> > [...] 
> > sage: %time M = ModularSymbols(5113, 2, -1) 
> > CPU times: user 570 ms, sys: 18 ms, total: 588 ms 
> > Wall time: 591 ms 
> > sage: %time S = M.cuspidal_subspace().new_subspace() 
> > CPU times: user 3.76 s, sys: 9.01 s, total: 12.8 s 
> > Wall time: 1.69 s 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "sage-devel" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> > an email to sage-devel+...@googlegroups.com  
> > . 
> > To post to this group, send email to sage-...@googlegroups.com 
>  
> > . 
> > Visit this group at https://groups.google.com/group/sage-devel. 
> > For more options, visit https://groups.google.com/d/optout. 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at https://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.


Re: [sage-devel] multithreading performance issues

2016-10-05 Thread Thierry Dumont
What is the size of the matrix you use ?
Whatever you do, openmp in blas is interesting only if you compute with
large matrices.
If your computations are embedded  in an @parallel and launch n
processes, be careful  that your  OMP_NUM_THREADS be less or equal to
ncores/n.

My experience is (I am doing numerical computations)  that there are
very few cases where using openmp in blas libraries is interesting.
Parallelism should generally be searched at a higher level.

One of the interest of multithreaded blas is for constructors: with
Intel's mkl blas, you can obtain the maximum possible performances of
tah machines  when you use DGEMM (ie product of matrices), due to the
high arithmetic intensity of matrix vector products. On my 2x8 core
sandy bridge à 2.7GHZ, I have obtained more that 300 giga flops, but
with matrices of size > 1000 ! And this is only true for DGEMM

t.d.

Le 04/10/2016 à 20:26, Jonathan Bober a écrit :
> See the following timings: If I start Sage with OMP_NUM_THREADS=1, a
> particular computation takes 1.52 cpu seconds and 1.56 wall seconds.
> 
> The same computation without OMP_NUM_THREADS set takes 12.8 cpu seconds
> and 1.69 wall seconds. This is particularly devastating when I'm running
> with @parallel to use all of my cpu cores.
> 
> My guess is that this is Linbox related, since these computations do
> some exact linear algebra, and Linbox can do some multithreading, which
> perhaps uses OpenMP.
> 
> jb12407@lmfdb1:~$ OMP_NUM_THREADS=1 sage
> [...]
> SageMath version 7.4.beta6, Release Date: 2016-09-24
> [...]
> Warning: this is a prerelease version, and it may be unstable.
> [...]
> sage: %time M = ModularSymbols(5113, 2, -1)
> CPU times: user 509 ms, sys: 21 ms, total: 530 ms
> Wall time: 530 ms
> sage: %time S = M.cuspidal_subspace().new_subspace()
> CPU times: user 1.42 s, sys: 97 ms, total: 1.52 s
> Wall time: 1.56 s
> 
> 
> jb12407@lmfdb1:~$ sage
> [...]
> SageMath version 7.4.beta6, Release Date: 2016-09-24
> [...]
> sage: %time M = ModularSymbols(5113, 2, -1)
> CPU times: user 570 ms, sys: 18 ms, total: 588 ms
> Wall time: 591 ms
> sage: %time S = M.cuspidal_subspace().new_subspace()
> CPU times: user 3.76 s, sys: 9.01 s, total: 12.8 s
> Wall time: 1.69 s
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sage-devel+unsubscr...@googlegroups.com
> .
> To post to this group, send email to sage-devel@googlegroups.com
> .
> Visit this group at https://groups.google.com/group/sage-devel.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at https://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.
<>

Re: [sage-devel] multithreading performance issues

2016-10-04 Thread Jonathan Bober
I've done a few more tests finding bad performance (and some decent
improvements with a few threads). Also, I double checked that the default
behavior for me seems to be the same as setting OMP_NUM_THREADS=64. I
wonder if others who have a recent development version of Sage see similar
results. I'm using OpenBLAS 0.2.19, which is now
https://trac.sagemath.org/ticket/21627. (I suppose I ought to try this on
my laptop.)

(These tests are a bit messed up because I neglected to ignore the time to
generate the random matrices.)

I don't know what these say about what sensible defaults should be set, but
I think I'm adding OMP_NUM_THREADS=1 to my bashrc, since I don't think I
use OpenMP for anything else.

Computing eigenvalues:

jb12407@lmfdb1:~/test$ cat omptest.py
import os
import sys

size = sys.argv[1]

for n in range(1, 65):
os.system('OMP_NUM_THREADS={n} sage -c "import time; m =
random_matrix(RDF,{size}); s = time.time(); e = m.eigenvalues(); print({n},
time.clock(), time.time() - s)"'.format(n=n, size=size))

jb12407@lmfdb1:~/test$ python omptest.py 1000
(1, 8.28, 5.560720920562744)
(2, 13.71, 5.4581358432769775)
(3, 18.12, 5.155802011489868)
(4, 24.12, 5.381717205047607)
(5, 29.33, 5.332219123840332)
(6, 34.29, 5.307264089584351)
(7, 38.93, 5.198814153671265)
(8, 44.84, 5.271445989608765)
(9, 51.63, 5.453015089035034)
(10, 57.66, 5.515641927719116)
[...]
(61, 422.21, 6.9586780071258545)
(62, 419.21, 6.779545068740845)
(63, 427.15, 6.788045167922974)
(64, 448.9, 7.169056177139282)

Matrix multiplication:

jb12407@lmfdb1:~/test$ cat omptest2.py
import os
import sys

size = sys.argv[1]

for n in range(1, 65):
os.system('OMP_NUM_THREADS={n} sage -c "import time; m =
random_matrix(RDF,{size}); s = time.time(); m2 = m^30; print({n},
time.clock(), time.time() - s)"'.format(n=n, size=size))

(1, 3.52, 0.7552590370178223)
(2, 3.66, 0.41131114959716797)
(3, 3.82, 0.31482601165771484)
(4, 4.02, 0.2474370002746582)
(5, 4.2, 0.21387481689453125)
(6, 4.47, 0.19179105758666992)
(7, 4.53, 0.17720603942871094)
(8, 4.89, 0.17597389221191406)
(9, 5.15, 0.17040705680847168)
(10, 5.26, 0.17317700386047363)
[...]
(60, 18.88, 0.17498207092285156)
(61, 18.49, 0.1627058982849121)
(62, 20.46, 0.19742107391357422)
(63, 20.07, 0.18258190155029297)
(64, 20.76, 0.18776202201843262)

Matrix multiplication with a bigger matrix:

jb12407@lmfdb1:~/test$ python omptest2.py 5000
(1, 99.97, 90.38103914260864)
(2, 101.71, 46.28921890258789)
(3, 103.96, 31.841789960861206)
(4, 107.98, 24.800616025924683)
(5, 108.59, 20.051285982131958)
(6, 112.46, 17.170204877853394)
(7, 116.25, 15.497264862060547)
(8, 125.38, 14.533391952514648)
(9, 130.57, 13.497469902038574)
(10, 123.67, 11.505426168441772)
[...]
(60, 779.12, 12.92886209487915)
(61, 875.74, 14.310442924499512)
(62, 869.82, 14.241307973861694)
(63, 813.99, 13.089143991470337)
(64, 728.52, 11.443121910095215)



On Tue, Oct 4, 2016 at 9:40 PM, Jonathan Bober  wrote:

> On Tue, Oct 4, 2016 at 9:03 PM, William Stein  wrote:
>
>> On Tue, Oct 4, 2016 at 12:58 PM, Jonathan Bober 
>> wrote:
>> > No, in 7.3 Sage isn't multithreading in this example:
>> >
>> > jb12407@lmfdb1:~$ sage73
>> > sage: %time M = ModularSymbols(5113, 2, -1)
>> > CPU times: user 599 ms, sys: 25 ms, total: 624 ms
>> > Wall time: 612 ms
>> > sage: %time S = M.cuspidal_subspace().new_subspace()
>> > CPU times: user 1.32 s, sys: 89 ms, total: 1.41 s
>> > Wall time: 1.44 s
>> >
>> > I guess the issue may be OpenBLAS rather than Linbox, then, since LinBox
>> > uses BLAS. I misread https://trac.sagemath.org/ticket/21323, which I
>> now
>> > realize says "LinBox parallel routines (not yet exposed in SageMath)",
>> when
>> > I thought that the cause may be LinBox. My Sage 7.3. uses the system
>> ATLAS,
>> > and I don't know whether that might sometimes use multithreading.
>>
>> If you care about performance you should build ATLAS from source.  You
>> can (somehow... I can't remember how) specify how many cores it will
>> use, and it will greatly benefit from multithreading.
>>
>>
> Yes, I probably should. But for linear algebra I've generally been happy
> with "reasonable" performance. (And the system ATLAS, though not
> specifically tuned, is at least compiled for sse3 and x86_64.)
>
> Also, setting the number of threads isn't so easy. In this case I want to
> minimize cpu time, rather than wall time, because I can run 64 processes in
> parallel. Single threading should always do that, and in some problems some
> extra threads don't hurt, but here they definitely do.
>
> Possibly part of the issue here is that OpenBLAS seems to use either 1
> thread or #CPU threads, which is 64 in this case. In my case using 2
> threads might improve the wall time a trifle for a single process, but is
> bad for total performance.
>
>
>> >
>> >
>> > On Tue, Oct 4, 2016 at 8:06 PM, Francois Bissey
>> >  wrote:
>> >>
>> >> openmp is disabled in 

Re: [sage-devel] multithreading performance issues

2016-10-04 Thread Jonathan Bober
On Tue, Oct 4, 2016 at 9:03 PM, William Stein  wrote:

> On Tue, Oct 4, 2016 at 12:58 PM, Jonathan Bober  wrote:
> > No, in 7.3 Sage isn't multithreading in this example:
> >
> > jb12407@lmfdb1:~$ sage73
> > sage: %time M = ModularSymbols(5113, 2, -1)
> > CPU times: user 599 ms, sys: 25 ms, total: 624 ms
> > Wall time: 612 ms
> > sage: %time S = M.cuspidal_subspace().new_subspace()
> > CPU times: user 1.32 s, sys: 89 ms, total: 1.41 s
> > Wall time: 1.44 s
> >
> > I guess the issue may be OpenBLAS rather than Linbox, then, since LinBox
> > uses BLAS. I misread https://trac.sagemath.org/ticket/21323, which I now
> > realize says "LinBox parallel routines (not yet exposed in SageMath)",
> when
> > I thought that the cause may be LinBox. My Sage 7.3. uses the system
> ATLAS,
> > and I don't know whether that might sometimes use multithreading.
>
> If you care about performance you should build ATLAS from source.  You
> can (somehow... I can't remember how) specify how many cores it will
> use, and it will greatly benefit from multithreading.
>
>
Yes, I probably should. But for linear algebra I've generally been happy
with "reasonable" performance. (And the system ATLAS, though not
specifically tuned, is at least compiled for sse3 and x86_64.)

Also, setting the number of threads isn't so easy. In this case I want to
minimize cpu time, rather than wall time, because I can run 64 processes in
parallel. Single threading should always do that, and in some problems some
extra threads don't hurt, but here they definitely do.

Possibly part of the issue here is that OpenBLAS seems to use either 1
thread or #CPU threads, which is 64 in this case. In my case using 2
threads might improve the wall time a trifle for a single process, but is
bad for total performance.


> >
> >
> > On Tue, Oct 4, 2016 at 8:06 PM, Francois Bissey
> >  wrote:
> >>
> >> openmp is disabled in linbox/ffpack-fflas so it must come from somewhere
> >> else.
> >> Only R seems to be linked to libgomp (openmp) on my vanilla install.
> >> Curiosity: do you observe the same behaviour in 7.3?
> >>
> >> François
> >>
> >> > On 5/10/2016, at 07:26, Jonathan Bober  wrote:
> >> >
> >> > See the following timings: If I start Sage with OMP_NUM_THREADS=1, a
> >> > particular computation takes 1.52 cpu seconds and 1.56 wall seconds.
> >> >
> >> > The same computation without OMP_NUM_THREADS set takes 12.8 cpu
> seconds
> >> > and 1.69 wall seconds. This is particularly devastating when I'm
> running
> >> > with @parallel to use all of my cpu cores.
> >> >
> >> > My guess is that this is Linbox related, since these computations do
> >> > some exact linear algebra, and Linbox can do some multithreading,
> which
> >> > perhaps uses OpenMP.
> >> >
> >> > jb12407@lmfdb1:~$ OMP_NUM_THREADS=1 sage
> >> > [...]
> >> > SageMath version 7.4.beta6, Release Date: 2016-09-24
> >> > [...]
> >> > Warning: this is a prerelease version, and it may be unstable.
> >> > [...]
> >> > sage: %time M = ModularSymbols(5113, 2, -1)
> >> > CPU times: user 509 ms, sys: 21 ms, total: 530 ms
> >> > Wall time: 530 ms
> >> > sage: %time S = M.cuspidal_subspace().new_subspace()
> >> > CPU times: user 1.42 s, sys: 97 ms, total: 1.52 s
> >> > Wall time: 1.56 s
> >> >
> >> >
> >> > jb12407@lmfdb1:~$ sage
> >> > [...]
> >> > SageMath version 7.4.beta6, Release Date: 2016-09-24
> >> > [...]
> >> > sage: %time M = ModularSymbols(5113, 2, -1)
> >> > CPU times: user 570 ms, sys: 18 ms, total: 588 ms
> >> > Wall time: 591 ms
> >> > sage: %time S = M.cuspidal_subspace().new_subspace()
> >> > CPU times: user 3.76 s, sys: 9.01 s, total: 12.8 s
> >> > Wall time: 1.69 s
> >> >
> >> >
> >> > --
> >> > You received this message because you are subscribed to the Google
> >> > Groups "sage-devel" group.
> >> > To unsubscribe from this group and stop receiving emails from it, send
> >> > an email to sage-devel+unsubscr...@googlegroups.com.
> >> > To post to this group, send email to sage-devel@googlegroups.com.
> >> > Visit this group at https://groups.google.com/group/sage-devel.
> >> > For more options, visit https://groups.google.com/d/optout.
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups
> >> "sage-devel" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> an
> >> email to sage-devel+unsubscr...@googlegroups.com.
> >> To post to this group, send email to sage-devel@googlegroups.com.
> >> Visit this group at https://groups.google.com/group/sage-devel.
> >> For more options, visit https://groups.google.com/d/optout.
> >
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "sage-devel" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to sage-devel+unsubscr...@googlegroups.com.
> > To post to this group, send email to sage-devel@googlegroups.com.
> > Visit 

Re: [sage-devel] multithreading performance issues

2016-10-04 Thread William Stein
On Tue, Oct 4, 2016 at 12:58 PM, Jonathan Bober  wrote:
> No, in 7.3 Sage isn't multithreading in this example:
>
> jb12407@lmfdb1:~$ sage73
> sage: %time M = ModularSymbols(5113, 2, -1)
> CPU times: user 599 ms, sys: 25 ms, total: 624 ms
> Wall time: 612 ms
> sage: %time S = M.cuspidal_subspace().new_subspace()
> CPU times: user 1.32 s, sys: 89 ms, total: 1.41 s
> Wall time: 1.44 s
>
> I guess the issue may be OpenBLAS rather than Linbox, then, since LinBox
> uses BLAS. I misread https://trac.sagemath.org/ticket/21323, which I now
> realize says "LinBox parallel routines (not yet exposed in SageMath)", when
> I thought that the cause may be LinBox. My Sage 7.3. uses the system ATLAS,
> and I don't know whether that might sometimes use multithreading.

If you care about performance you should build ATLAS from source.  You
can (somehow... I can't remember how) specify how many cores it will
use, and it will greatly benefit from multithreading.

>
>
> On Tue, Oct 4, 2016 at 8:06 PM, Francois Bissey
>  wrote:
>>
>> openmp is disabled in linbox/ffpack-fflas so it must come from somewhere
>> else.
>> Only R seems to be linked to libgomp (openmp) on my vanilla install.
>> Curiosity: do you observe the same behaviour in 7.3?
>>
>> François
>>
>> > On 5/10/2016, at 07:26, Jonathan Bober  wrote:
>> >
>> > See the following timings: If I start Sage with OMP_NUM_THREADS=1, a
>> > particular computation takes 1.52 cpu seconds and 1.56 wall seconds.
>> >
>> > The same computation without OMP_NUM_THREADS set takes 12.8 cpu seconds
>> > and 1.69 wall seconds. This is particularly devastating when I'm running
>> > with @parallel to use all of my cpu cores.
>> >
>> > My guess is that this is Linbox related, since these computations do
>> > some exact linear algebra, and Linbox can do some multithreading, which
>> > perhaps uses OpenMP.
>> >
>> > jb12407@lmfdb1:~$ OMP_NUM_THREADS=1 sage
>> > [...]
>> > SageMath version 7.4.beta6, Release Date: 2016-09-24
>> > [...]
>> > Warning: this is a prerelease version, and it may be unstable.
>> > [...]
>> > sage: %time M = ModularSymbols(5113, 2, -1)
>> > CPU times: user 509 ms, sys: 21 ms, total: 530 ms
>> > Wall time: 530 ms
>> > sage: %time S = M.cuspidal_subspace().new_subspace()
>> > CPU times: user 1.42 s, sys: 97 ms, total: 1.52 s
>> > Wall time: 1.56 s
>> >
>> >
>> > jb12407@lmfdb1:~$ sage
>> > [...]
>> > SageMath version 7.4.beta6, Release Date: 2016-09-24
>> > [...]
>> > sage: %time M = ModularSymbols(5113, 2, -1)
>> > CPU times: user 570 ms, sys: 18 ms, total: 588 ms
>> > Wall time: 591 ms
>> > sage: %time S = M.cuspidal_subspace().new_subspace()
>> > CPU times: user 3.76 s, sys: 9.01 s, total: 12.8 s
>> > Wall time: 1.69 s
>> >
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "sage-devel" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an email to sage-devel+unsubscr...@googlegroups.com.
>> > To post to this group, send email to sage-devel@googlegroups.com.
>> > Visit this group at https://groups.google.com/group/sage-devel.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "sage-devel" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to sage-devel+unsubscr...@googlegroups.com.
>> To post to this group, send email to sage-devel@googlegroups.com.
>> Visit this group at https://groups.google.com/group/sage-devel.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sage-devel+unsubscr...@googlegroups.com.
> To post to this group, send email to sage-devel@googlegroups.com.
> Visit this group at https://groups.google.com/group/sage-devel.
> For more options, visit https://groups.google.com/d/optout.



-- 
William (http://wstein.org)

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at https://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.


Re: [sage-devel] multithreading performance issues

2016-10-04 Thread Jonathan Bober
No, in 7.3 Sage isn't multithreading in this example:

jb12407@lmfdb1:~$ sage73
sage: %time M = ModularSymbols(5113, 2, -1)
CPU times: user 599 ms, sys: 25 ms, total: 624 ms
Wall time: 612 ms
sage: %time S = M.cuspidal_subspace().new_subspace()
CPU times: user 1.32 s, sys: 89 ms, total: 1.41 s
Wall time: 1.44 s

I guess the issue may be OpenBLAS rather than Linbox, then, since LinBox
uses BLAS. I misread https://trac.sagemath.org/ticket/21323, which I now
realize says "LinBox parallel routines (not yet exposed in SageMath)", when
I thought that the cause may be LinBox. My Sage 7.3. uses the system ATLAS,
and I don't know whether that might sometimes use multithreading.


On Tue, Oct 4, 2016 at 8:06 PM, Francois Bissey <
francois.bis...@canterbury.ac.nz> wrote:

> openmp is disabled in linbox/ffpack-fflas so it must come from somewhere
> else.
> Only R seems to be linked to libgomp (openmp) on my vanilla install.
> Curiosity: do you observe the same behaviour in 7.3?
>
> François
>
> > On 5/10/2016, at 07:26, Jonathan Bober  wrote:
> >
> > See the following timings: If I start Sage with OMP_NUM_THREADS=1, a
> particular computation takes 1.52 cpu seconds and 1.56 wall seconds.
> >
> > The same computation without OMP_NUM_THREADS set takes 12.8 cpu seconds
> and 1.69 wall seconds. This is particularly devastating when I'm running
> with @parallel to use all of my cpu cores.
> >
> > My guess is that this is Linbox related, since these computations do
> some exact linear algebra, and Linbox can do some multithreading, which
> perhaps uses OpenMP.
> >
> > jb12407@lmfdb1:~$ OMP_NUM_THREADS=1 sage
> > [...]
> > SageMath version 7.4.beta6, Release Date: 2016-09-24
> > [...]
> > Warning: this is a prerelease version, and it may be unstable.
> > [...]
> > sage: %time M = ModularSymbols(5113, 2, -1)
> > CPU times: user 509 ms, sys: 21 ms, total: 530 ms
> > Wall time: 530 ms
> > sage: %time S = M.cuspidal_subspace().new_subspace()
> > CPU times: user 1.42 s, sys: 97 ms, total: 1.52 s
> > Wall time: 1.56 s
> >
> >
> > jb12407@lmfdb1:~$ sage
> > [...]
> > SageMath version 7.4.beta6, Release Date: 2016-09-24
> > [...]
> > sage: %time M = ModularSymbols(5113, 2, -1)
> > CPU times: user 570 ms, sys: 18 ms, total: 588 ms
> > Wall time: 591 ms
> > sage: %time S = M.cuspidal_subspace().new_subspace()
> > CPU times: user 3.76 s, sys: 9.01 s, total: 12.8 s
> > Wall time: 1.69 s
> >
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "sage-devel" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to sage-devel+unsubscr...@googlegroups.com.
> > To post to this group, send email to sage-devel@googlegroups.com.
> > Visit this group at https://groups.google.com/group/sage-devel.
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sage-devel+unsubscr...@googlegroups.com.
> To post to this group, send email to sage-devel@googlegroups.com.
> Visit this group at https://groups.google.com/group/sage-devel.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at https://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.


Re: [sage-devel] multithreading performance issues

2016-10-04 Thread Francois Bissey
openmp is disabled in linbox/ffpack-fflas so it must come from somewhere else.
Only R seems to be linked to libgomp (openmp) on my vanilla install.
Curiosity: do you observe the same behaviour in 7.3?

François

> On 5/10/2016, at 07:26, Jonathan Bober  wrote:
> 
> See the following timings: If I start Sage with OMP_NUM_THREADS=1, a 
> particular computation takes 1.52 cpu seconds and 1.56 wall seconds.
> 
> The same computation without OMP_NUM_THREADS set takes 12.8 cpu seconds and 
> 1.69 wall seconds. This is particularly devastating when I'm running with 
> @parallel to use all of my cpu cores.
> 
> My guess is that this is Linbox related, since these computations do some 
> exact linear algebra, and Linbox can do some multithreading, which perhaps 
> uses OpenMP.
> 
> jb12407@lmfdb1:~$ OMP_NUM_THREADS=1 sage
> [...]
> SageMath version 7.4.beta6, Release Date: 2016-09-24
> [...]
> Warning: this is a prerelease version, and it may be unstable.
> [...]
> sage: %time M = ModularSymbols(5113, 2, -1)
> CPU times: user 509 ms, sys: 21 ms, total: 530 ms
> Wall time: 530 ms
> sage: %time S = M.cuspidal_subspace().new_subspace()
> CPU times: user 1.42 s, sys: 97 ms, total: 1.52 s
> Wall time: 1.56 s
> 
> 
> jb12407@lmfdb1:~$ sage
> [...]
> SageMath version 7.4.beta6, Release Date: 2016-09-24
> [...]
> sage: %time M = ModularSymbols(5113, 2, -1)
> CPU times: user 570 ms, sys: 18 ms, total: 588 ms
> Wall time: 591 ms
> sage: %time S = M.cuspidal_subspace().new_subspace()
> CPU times: user 3.76 s, sys: 9.01 s, total: 12.8 s
> Wall time: 1.69 s
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sage-devel+unsubscr...@googlegroups.com.
> To post to this group, send email to sage-devel@googlegroups.com.
> Visit this group at https://groups.google.com/group/sage-devel.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at https://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.


[sage-devel] multithreading performance issues

2016-10-04 Thread Jonathan Bober
See the following timings: If I start Sage with OMP_NUM_THREADS=1, a
particular computation takes 1.52 cpu seconds and 1.56 wall seconds.

The same computation without OMP_NUM_THREADS set takes 12.8 cpu seconds and
1.69 wall seconds. This is particularly devastating when I'm running with
@parallel to use all of my cpu cores.

My guess is that this is Linbox related, since these computations do some
exact linear algebra, and Linbox can do some multithreading, which perhaps
uses OpenMP.

jb12407@lmfdb1:~$ OMP_NUM_THREADS=1 sage
[...]
SageMath version 7.4.beta6, Release Date: 2016-09-24
[...]
Warning: this is a prerelease version, and it may be unstable.
[...]
sage: %time M = ModularSymbols(5113, 2, -1)
CPU times: user 509 ms, sys: 21 ms, total: 530 ms
Wall time: 530 ms
sage: %time S = M.cuspidal_subspace().new_subspace()
CPU times: user 1.42 s, sys: 97 ms, total: 1.52 s
Wall time: 1.56 s


jb12407@lmfdb1:~$ sage
[...]
SageMath version 7.4.beta6, Release Date: 2016-09-24
[...]
sage: %time M = ModularSymbols(5113, 2, -1)
CPU times: user 570 ms, sys: 18 ms, total: 588 ms
Wall time: 591 ms
sage: %time S = M.cuspidal_subspace().new_subspace()
CPU times: user 3.76 s, sys: 9.01 s, total: 12.8 s
Wall time: 1.69 s

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at https://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.