Re: [scikit-learn] Scikit Learn in a Cray computer

2019-06-30 Thread desitter . gravity
Dear All,

Alex Lovell-Troy heads up innovation/cloud supercomputing at Cray (cc'd)
and he is a great resource for all things. I thought he might find this
thread useful.

Best, Alex

On Fri, Jun 28, 2019 at 11:45 PM Olivier Grisel 
wrote:

> You have to use a dedicated framework to distribute the computation on a
> cluster like you cray system.
>
> You can use mpi, or dask with dask-jobqueue but the also need to run
> parallel algorithms that are efficient when running in a distributed with a
> high cost for communication between distributed worker nodes.
>
> I am not sure that the dbscan implementation in scikit-learn would benefit
> much from naively running in distributed mode.
>
> Le ven. 28 juin 2019 22 h 06, Mauricio Reis  a écrit :
>
>> Sorry, but just now I reread your answer more closely.
>>
>> It seems that the "n_jobs" parameter of the DBScan routine brings no
>> benefit to performance. If I want to improve the performance of the
>> DBScan routine I will have to redesign the solution to use MPI
>> resources.
>>
>> Is it correct?
>>
>> ---
>> Ats.,
>> Mauricio Reis
>>
>> Em 28/06/2019 16:47, Mauricio Reis escreveu:
>> > My laptop has Intel I7 processor with 4 cores. When I run the program
>> > on Windows 10, the "joblib.cpu_count()" routine returns "4". In these
>> > cases, the same test I did on the Cray computer caused a 10% increase
>> > in the processing time of the DBScan routine when I used the "n_jobs =
>> > 4" parameter compared to the processing time of that routine without
>> > this parameter. Do you know what is the cause of the longer processing
>> > time when I use "n_jobs = 4" on my laptop?
>> >
>> > ---
>> > Ats.,
>> > Mauricio Reis
>> >
>> > Em 28/06/2019 06:29, Brown J.B. via scikit-learn escreveu:
>> >>> where you can see "ncpus = 1" (I still do not know why 4 lines were
>> >>> printed -
>> >>>
>> >>> (total of 40 nodes) and each node has 1 CPU and 1 GPU!
>> >>
>> >>> #PBS -l select=1:ncpus=8:mpiprocs=8
>> >>> aprun -n 4 p.sh ./ncpus.py
>> >>
>> >> You can request 8 CPUs from a job scheduler, but if each node the
>> >> script runs on contains only one virtual/physical core, then
>> >> cpu_count() will return 1.
>> >> If that CPU supports multi-threading, you would typically get 2.
>> >>
>> >> For example, on my workstation:
>> >> `--> egrep "processor|model name|core id" /proc/cpuinfo
>> >> processor : 0
>> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
>> >> core id : 0
>> >> processor : 1
>> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
>> >> core id : 1
>> >> processor : 2
>> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
>> >> core id : 0
>> >> processor : 3
>> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
>> >> core id : 1
>> >> `--> python3 -c "from sklearn.externals import joblib;
>> >> print(joblib.cpu_count())"
>> >> 4
>> >>
>> >> It seems that in this situation, if you're wanting to parallelize
>> >> *independent* sklearn calculations (e.g., changing dataset or random
>> >> seed), you'll ask for the MPI by PBS processes like you have, but
>> >> you'll need to place the sklearn computations in a function and then
>> >> take care of distributing that function call across the MPI processes.
>> >>
>> >> Then again, if the runs are independent, it's a lot easier to write a
>> >> for loop in a shell script that changes the dataset/seed and submits
>> >> it to the job scheduler to let the job handler take care of the
>> >> parallel distribution.
>> >> (I do this when performing 10+ independent runs of sklearn modeling,
>> >> where models use multiple threads during calculations; in my case,
>> >> SLURM then takes care of finding the available nodes to distribute the
>> >> work to.)
>> >>
>> >> Hope this helps.
>> >> J.B.
>> >> ___
>> >> scikit-learn mailing list
>> >> scikit-learn@python.org
>> >> https://mail.python.org/mailman/listinfo/scikit-learn
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 

Alex Morrise, PhD
Co-Founder & CTO, StayOpen.com
Chief Science Officer, MediaJel.com 
Professional Bio:  Machine Learning Intelligence

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Scikit Learn in a Cray computer

2019-06-29 Thread Olivier Grisel
You have to use a dedicated framework to distribute the computation on a
cluster like you cray system.

You can use mpi, or dask with dask-jobqueue but the also need to run
parallel algorithms that are efficient when running in a distributed with a
high cost for communication between distributed worker nodes.

I am not sure that the dbscan implementation in scikit-learn would benefit
much from naively running in distributed mode.

Le ven. 28 juin 2019 22 h 06, Mauricio Reis  a écrit :

> Sorry, but just now I reread your answer more closely.
>
> It seems that the "n_jobs" parameter of the DBScan routine brings no
> benefit to performance. If I want to improve the performance of the
> DBScan routine I will have to redesign the solution to use MPI
> resources.
>
> Is it correct?
>
> ---
> Ats.,
> Mauricio Reis
>
> Em 28/06/2019 16:47, Mauricio Reis escreveu:
> > My laptop has Intel I7 processor with 4 cores. When I run the program
> > on Windows 10, the "joblib.cpu_count()" routine returns "4". In these
> > cases, the same test I did on the Cray computer caused a 10% increase
> > in the processing time of the DBScan routine when I used the "n_jobs =
> > 4" parameter compared to the processing time of that routine without
> > this parameter. Do you know what is the cause of the longer processing
> > time when I use "n_jobs = 4" on my laptop?
> >
> > ---
> > Ats.,
> > Mauricio Reis
> >
> > Em 28/06/2019 06:29, Brown J.B. via scikit-learn escreveu:
> >>> where you can see "ncpus = 1" (I still do not know why 4 lines were
> >>> printed -
> >>>
> >>> (total of 40 nodes) and each node has 1 CPU and 1 GPU!
> >>
> >>> #PBS -l select=1:ncpus=8:mpiprocs=8
> >>> aprun -n 4 p.sh ./ncpus.py
> >>
> >> You can request 8 CPUs from a job scheduler, but if each node the
> >> script runs on contains only one virtual/physical core, then
> >> cpu_count() will return 1.
> >> If that CPU supports multi-threading, you would typically get 2.
> >>
> >> For example, on my workstation:
> >> `--> egrep "processor|model name|core id" /proc/cpuinfo
> >> processor : 0
> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
> >> core id : 0
> >> processor : 1
> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
> >> core id : 1
> >> processor : 2
> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
> >> core id : 0
> >> processor : 3
> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
> >> core id : 1
> >> `--> python3 -c "from sklearn.externals import joblib;
> >> print(joblib.cpu_count())"
> >> 4
> >>
> >> It seems that in this situation, if you're wanting to parallelize
> >> *independent* sklearn calculations (e.g., changing dataset or random
> >> seed), you'll ask for the MPI by PBS processes like you have, but
> >> you'll need to place the sklearn computations in a function and then
> >> take care of distributing that function call across the MPI processes.
> >>
> >> Then again, if the runs are independent, it's a lot easier to write a
> >> for loop in a shell script that changes the dataset/seed and submits
> >> it to the job scheduler to let the job handler take care of the
> >> parallel distribution.
> >> (I do this when performing 10+ independent runs of sklearn modeling,
> >> where models use multiple threads during calculations; in my case,
> >> SLURM then takes care of finding the available nodes to distribute the
> >> work to.)
> >>
> >> Hope this helps.
> >> J.B.
> >> ___
> >> scikit-learn mailing list
> >> scikit-learn@python.org
> >> https://mail.python.org/mailman/listinfo/scikit-learn
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Scikit Learn in a Cray computer

2019-06-28 Thread Mauricio Reis

Sorry, but just now I reread your answer more closely.

It seems that the "n_jobs" parameter of the DBScan routine brings no 
benefit to performance. If I want to improve the performance of the 
DBScan routine I will have to redesign the solution to use MPI 
resources.


Is it correct?

---
Ats.,
Mauricio Reis

Em 28/06/2019 16:47, Mauricio Reis escreveu:

My laptop has Intel I7 processor with 4 cores. When I run the program
on Windows 10, the "joblib.cpu_count()" routine returns "4". In these
cases, the same test I did on the Cray computer caused a 10% increase
in the processing time of the DBScan routine when I used the "n_jobs =
4" parameter compared to the processing time of that routine without
this parameter. Do you know what is the cause of the longer processing
time when I use "n_jobs = 4" on my laptop?

---
Ats.,
Mauricio Reis

Em 28/06/2019 06:29, Brown J.B. via scikit-learn escreveu:

where you can see "ncpus = 1" (I still do not know why 4 lines were
printed -

(total of 40 nodes) and each node has 1 CPU and 1 GPU!



#PBS -l select=1:ncpus=8:mpiprocs=8
aprun -n 4 p.sh ./ncpus.py


You can request 8 CPUs from a job scheduler, but if each node the
script runs on contains only one virtual/physical core, then
cpu_count() will return 1.
If that CPU supports multi-threading, you would typically get 2.

For example, on my workstation:
`--> egrep "processor|model name|core id" /proc/cpuinfo
processor : 0
model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
core id : 0
processor : 1
model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
core id : 1
processor : 2
model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
core id : 0
processor : 3
model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
core id : 1
`--> python3 -c "from sklearn.externals import joblib;
print(joblib.cpu_count())"
4

It seems that in this situation, if you're wanting to parallelize
*independent* sklearn calculations (e.g., changing dataset or random
seed), you'll ask for the MPI by PBS processes like you have, but
you'll need to place the sklearn computations in a function and then
take care of distributing that function call across the MPI processes.

Then again, if the runs are independent, it's a lot easier to write a
for loop in a shell script that changes the dataset/seed and submits
it to the job scheduler to let the job handler take care of the
parallel distribution.
(I do this when performing 10+ independent runs of sklearn modeling,
where models use multiple threads during calculations; in my case,
SLURM then takes care of finding the available nodes to distribute the
work to.)

Hope this helps.
J.B.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Scikit Learn in a Cray computer

2019-06-28 Thread Mauricio Reis
My laptop has Intel I7 processor with 4 cores. When I run the program on 
Windows 10, the "joblib.cpu_count()" routine returns "4". In these 
cases, the same test I did on the Cray computer caused a 10% increase in 
the processing time of the DBScan routine when I used the "n_jobs = 4" 
parameter compared to the processing time of that routine without this 
parameter. Do you know what is the cause of the longer processing time 
when I use "n_jobs = 4" on my laptop?


---
Ats.,
Mauricio Reis

Em 28/06/2019 06:29, Brown J.B. via scikit-learn escreveu:

where you can see "ncpus = 1" (I still do not know why 4 lines were
printed -

(total of 40 nodes) and each node has 1 CPU and 1 GPU!



#PBS -l select=1:ncpus=8:mpiprocs=8
aprun -n 4 p.sh ./ncpus.py


You can request 8 CPUs from a job scheduler, but if each node the
script runs on contains only one virtual/physical core, then
cpu_count() will return 1.
If that CPU supports multi-threading, you would typically get 2.

For example, on my workstation:
`--> egrep "processor|model name|core id" /proc/cpuinfo
processor : 0
model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
core id : 0
processor : 1
model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
core id : 1
processor : 2
model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
core id : 0
processor : 3
model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
core id : 1
`--> python3 -c "from sklearn.externals import joblib;
print(joblib.cpu_count())"
4

It seems that in this situation, if you're wanting to parallelize
*independent* sklearn calculations (e.g., changing dataset or random
seed), you'll ask for the MPI by PBS processes like you have, but
you'll need to place the sklearn computations in a function and then
take care of distributing that function call across the MPI processes.

Then again, if the runs are independent, it's a lot easier to write a
for loop in a shell script that changes the dataset/seed and submits
it to the job scheduler to let the job handler take care of the
parallel distribution.
(I do this when performing 10+ independent runs of sklearn modeling,
where models use multiple threads during calculations; in my case,
SLURM then takes care of finding the available nodes to distribute the
work to.)

Hope this helps.
J.B.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Scikit Learn in a Cray computer

2019-06-28 Thread Brown J.B. via scikit-learn
>
> where you can see "ncpus = 1" (I still do not know why 4 lines were
> printed -
>
> (total of 40 nodes) and each node has 1 CPU and 1 GPU!
>


> #PBS -l select=1:ncpus=8:mpiprocs=8
> aprun -n 4 p.sh ./ncpus.py
>

You can request 8 CPUs from a job scheduler, but if each node the script
runs on contains only one virtual/physical core, then cpu_count() will
return 1.
If that CPU supports multi-threading, you would typically get 2.

For example, on my workstation:
`--> egrep "processor|model name|core id" /proc/cpuinfo
processor : 0
model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
core id : 0
processor : 1
model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
core id : 1
processor : 2
model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
core id : 0
processor : 3
model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
core id : 1
`--> python3 -c "from sklearn.externals import joblib;
print(joblib.cpu_count())"
4

It seems that in this situation, if you're wanting to parallelize
*independent* sklearn calculations (e.g., changing dataset or random seed),
you'll ask for the MPI by PBS processes like you have, but you'll need to
place the sklearn computations in a function and then take care of
distributing that function call across the MPI processes.

Then again, if the runs are independent, it's a lot easier to write a for
loop in a shell script that changes the dataset/seed and submits it to the
job scheduler to let the job handler take care of the parallel distribution.
(I do this when performing 10+ independent runs of sklearn modeling, where
models use multiple threads during calculations; in my case, SLURM then
takes care of finding the available nodes to distribute the work to.)

Hope this helps.
J.B.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Scikit Learn in a Cray computer

2019-06-27 Thread Mauricio Reis
Finally I was able to access the Cray computer and run the  
routine.


I am sending below the files and commands I used and the result found, 
where you can see "ncpus = 1" (I still do not know why 4 lines were 
printed - I only know that this amount depends on the value of the 
"aprun" command used in the file "ncpus.pbs"). But I do not know if you 
know the Cray computer environment and you'll understand what I did!


I use Cray XK7 computer which has 10 blades, each blade has 4 nodes 
(total of 40 nodes) and each node has 1 CPU and 1 GPU!


---
Ats.,
Mauricio Reis


--
=== p.sh ===
#!/bin/bash
/usr/local/python_3.7/bin/python3.7 $1

=== ncpus.py ===
from sklearn.externals import joblib
import sklearn
print('The scikit-learn version is {}.'.format(sklearn.__version__))
ncpus = joblib.cpu_count()
print("--- ncpus =", ncpus)

=== ncpus.pbs ===
#!/bin/bash
#PBS -l select=1:ncpus=8:mpiprocs=8
#PBS -j oe
#PBS -l walltime=00:00:10

date

echo "[$PBS_O_WORKDIR]"
cd $PBS_O_WORKDIR

aprun -n 4 p.sh ./ncpus.py

=== command ===
qsub ncpus.pbs

=== output ===
Thu Jun 27 05:22:35 BRT 2019
[/home/reismc]
The scikit-learn version is 0.20.3.
The scikit-learn version is 0.20.3.
The scikit-learn version is 0.20.3.
The scikit-learn version is 0.20.3.
--- ncpus = 1
--- ncpus = 1
--- ncpus = 1
--- ncpus = 1
Application 32826 resources: utime ~8s, stime ~1s, Rss ~43168, inblocks 
~102981, outblocks ~0

--


Em 19/06/2019 17:44, Olivier Grisel escreveu:

How many cores du you have on this machine?

joblib.cpu_count()
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Scikit Learn in a Cray computer

2019-06-19 Thread Mauricio Reis
I can not access the Cray computer at this moment to run the suggested 
code. Once you have access, I'll let you know.


But documentation (provided by a teacher in charge of the Cray computer) 
shows:

- 10 blades
- 4 nodes per blade = 40 nodes
- each node: 1 CPU, 1 GPU, 32 GBytes

---
Ats.,
Mauricio Reis

Em 19/06/2019 17:44, Olivier Grisel escreveu:

How many cores du you have on this machine?

joblib.cpu_count()
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Scikit Learn in a Cray computer

2019-06-19 Thread Olivier Grisel
How many cores du you have on this machine?

joblib.cpu_count()
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Scikit Learn in a Cray computer

2019-06-19 Thread Mauricio Reis
I'd like to understand how parallelism works in the DBScan routine in 
SciKit Learn running on the Cray computer and what should I do to 
improve the results I'm looking at.


I have adapted the existing example in 
[https://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto-examples-cluster-plot-dbscan-py] 
to run with 100,000 points and thus enable one processing time allowing 
reasonable evaluation of times obtained. I changed the parameter "n_jobs 
= x", "x" ranging from 1 to 6. I repeated several times the same 
experiments and calculated the average values ​​of the processing time.


n_jobs  time
1   21,3
2   15,1
3   14,8
4   15,2
5   15,5
6   15,0

I then get the times that appear in the table above and in the attached 
image. As can be seen, there was only effective gain when "n_jobs = 2" 
and no difference for larger quantities. And yet, the gain was only less 
than 30%!!


Why were the gains so small? Why was there no greater gain for a greater 
value of the "n_jobs" parameter? Is it possible to improve the results I 
have obtained?


--
Ats.,
Mauricio Reis___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn