Dear All, Alex Lovell-Troy heads up innovation/cloud supercomputing at Cray (cc'd) and he is a great resource for all things. I thought he might find this thread useful.
Best, Alex On Fri, Jun 28, 2019 at 11:45 PM Olivier Grisel <olivier.gri...@ensta.org> wrote: > You have to use a dedicated framework to distribute the computation on a > cluster like you cray system. > > You can use mpi, or dask with dask-jobqueue but the also need to run > parallel algorithms that are efficient when running in a distributed with a > high cost for communication between distributed worker nodes. > > I am not sure that the dbscan implementation in scikit-learn would benefit > much from naively running in distributed mode. > > Le ven. 28 juin 2019 22 h 06, Mauricio Reis <rei...@ime.eb.br> a écrit : > >> Sorry, but just now I reread your answer more closely. >> >> It seems that the "n_jobs" parameter of the DBScan routine brings no >> benefit to performance. If I want to improve the performance of the >> DBScan routine I will have to redesign the solution to use MPI >> resources. >> >> Is it correct? >> >> --- >> Ats., >> Mauricio Reis >> >> Em 28/06/2019 16:47, Mauricio Reis escreveu: >> > My laptop has Intel I7 processor with 4 cores. When I run the program >> > on Windows 10, the "joblib.cpu_count()" routine returns "4". In these >> > cases, the same test I did on the Cray computer caused a 10% increase >> > in the processing time of the DBScan routine when I used the "n_jobs = >> > 4" parameter compared to the processing time of that routine without >> > this parameter. Do you know what is the cause of the longer processing >> > time when I use "n_jobs = 4" on my laptop? >> > >> > --- >> > Ats., >> > Mauricio Reis >> > >> > Em 28/06/2019 06:29, Brown J.B. via scikit-learn escreveu: >> >>> where you can see "ncpus = 1" (I still do not know why 4 lines were >> >>> printed - >> >>> >> >>> (total of 40 nodes) and each node has 1 CPU and 1 GPU! >> >> >> >>> #PBS -l select=1:ncpus=8:mpiprocs=8 >> >>> aprun -n 4 p.sh ./ncpus.py >> >> >> >> You can request 8 CPUs from a job scheduler, but if each node the >> >> script runs on contains only one virtual/physical core, then >> >> cpu_count() will return 1. >> >> If that CPU supports multi-threading, you would typically get 2. >> >> >> >> For example, on my workstation: >> >> `--> egrep "processor|model name|core id" /proc/cpuinfo >> >> processor : 0 >> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz >> >> core id : 0 >> >> processor : 1 >> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz >> >> core id : 1 >> >> processor : 2 >> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz >> >> core id : 0 >> >> processor : 3 >> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz >> >> core id : 1 >> >> `--> python3 -c "from sklearn.externals import joblib; >> >> print(joblib.cpu_count())" >> >> 4 >> >> >> >> It seems that in this situation, if you're wanting to parallelize >> >> *independent* sklearn calculations (e.g., changing dataset or random >> >> seed), you'll ask for the MPI by PBS processes like you have, but >> >> you'll need to place the sklearn computations in a function and then >> >> take care of distributing that function call across the MPI processes. >> >> >> >> Then again, if the runs are independent, it's a lot easier to write a >> >> for loop in a shell script that changes the dataset/seed and submits >> >> it to the job scheduler to let the job handler take care of the >> >> parallel distribution. >> >> (I do this when performing 10+ independent runs of sklearn modeling, >> >> where models use multiple threads during calculations; in my case, >> >> SLURM then takes care of finding the available nodes to distribute the >> >> work to.) >> >> >> >> Hope this helps. >> >> J.B. >> >> _______________________________________________ >> >> scikit-learn mailing list >> >> scikit-learn@python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Alex Morrise, PhD Co-Founder & CTO, StayOpen.com Chief Science Officer, MediaJel.com <http://mediajel.com/> Professional Bio: Machine Learning Intelligence <http://www.linkedin.com/in/amorrise>
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn