Re: [OMPI users] Running a hybrid MPI+openMP program

Reuti Wed, 20 Aug 2014 06:49:19 -0400 (EDT)

Hi,

Am 20.08.2014 um 06:26 schrieb Tetsuya Mishima:


> Reuti and Oscar,
> 
> I'm a Torque user and I myself have never used SGE, so I hesitated to join 
> the discussion.
> 
> From my experience with the Torque, the openmpi 1.8 series has already 
> resolved the issue you pointed out in combining MPI with OpenMP. 
> 
> Please try to add --map-by slot:pe=8 option, if you want to use 8 threads. 
> Then, then openmpi 1.8 should allocate processes properly without any 
> modification 
> of the hostfile provided by the Torque.
> 
> In your case(8 threads and 10 procs):
> 
> # you have to request 80 slots using SGE command before mpirun 
> mpirun --map-by slot:pe=8 -np 10 ./inverse.exe

Thx for pointing me to this option, for now I can't get it working though (in 
fact, I want to use it without binding essentially). This allows to tell Open 
MPI to bind more cores to each of the MPI processes - ok, but does it lower the 
slot count granted by Torque too? I mean, was your submission command like:

$ qsub -l nodes=10:ppn=8 ...

so that Torque knows, that it should grant and remember this slot count of a 
total of 80 for the correct accounting?

-- Reuti


> where you can omit --bind-to option because --bind-to core is assumed
> as default when pe=N is provided by the user.
> Regards,
> Tetsuya
> 
>> Hi,
>> 
>> Am 19.08.2014 um 19:06 schrieb Oscar Mojica:
>> 
>>> I discovered what was the error. I forgot include the '-fopenmp' when I 
>>> compiled the objects in the Makefile, so the program worked but it didn't 
>>> divide the job 
> in threads. Now the program is working and I can use until 15 cores for 
> machine in the queue one.q.
>>> 
>>> Anyway i would like to try implement your advice. Well I'm not alone in the 
>>> cluster so i must implement your second suggestion. The steps are
>>> 
>>> a) Use '$ qconf -mp orte' to change the allocation rule to 8
>> 
>> The number of slots defined in your used one.q was also increased to 8 
>> (`qconf -sq one.q`)?
>> 
>> 
>>> b) Set '#$ -pe orte 80' in the script
>> 
>> Fine.
>> 
>> 
>>> c) I'm not sure how to do this step. I'd appreciate your help here. I can 
>>> add some lines to the script to determine the PE_HOSTFILE path and 
>>> contents, but i 
> don't know how alter it 
>> 
>> For now you can put in your jobscript (just after OMP_NUM_THREAD is 
>> exported):
>> 
>> awk -v omp_num_threads=$OMP_NUM_THREADS '{ $2/=omp_num_threads; print }' 
>> $PE_HOSTFILE > $TMPDIR/machines
>> export PE_HOSTFILE=$TMPDIR/machines
>> 
>> =============
>> 
>> Unfortunately noone stepped into this discussion, as in my opinion it's a 
>> much broader issue which targets all users who want to combine MPI with 
>> OpenMP. The 
> queuingsystem should get a proper request for the overall amount of slots the 
> user needs. For now this will be forwarded to Open MPI and it will use this 
> information to start the appropriate number of processes (which was an 
> achievement for the Tight Integration out-of-the-box of course) and ignores 
> any setting of 
> OMP_NUM_THREADS. So, where should the generated list of machines be adjusted; 
> there are several options:
>> 
>> a) The PE of the queuingsystem should do it:
>> 
>> + a one time setup for the admin
>> + in SGE the "start_proc_args" of the PE could alter the $PE_HOSTFILE
>> - the "start_proc_args" would need to know the number of threads, i.e. 
>> OMP_NUM_THREADS must be defined by "qsub -v ..." outside of the jobscript 
>> (tricky scanning 
> of the submitted jobscript for OMP_NUM_THREADS would be too nasty)
>> - limits to use inside the jobscript calls to libraries behaving in the same 
>> way as Open MPI only
>> 
>> 
>> b) The particular queue should do it in a queue prolog:
>> 
>> same as a) I think
>> 
>> 
>> c) The user should do it
>> 
>> + no change in the SGE installation
>> - each and every user must include it in all the jobscripts to adjust the 
>> list and export the pointer to the $PE_HOSTFILE, but he could change it 
>> forth and back 
> for different steps of the jobscript though
>> 
>> 
>> d) Open MPI should do it
>> 
>> + no change in the SGE installation
>> + no change to the jobscript
>> + OMP_NUM_THREADS can be altered for different steps of the jobscript while 
>> staying inside the granted allocation automatically
>> o should MKL_NUM_THREADS be covered too (does it use OMP_NUM_THREADS 
>> already)?
>> 
>> -- Reuti
>> 
>> 
>>> echo "PE_HOSTFILE:"
>>> echo $PE_HOSTFILE
>>> echo
>>> echo "cat PE_HOSTFILE:"
>>> cat $PE_HOSTFILE 
>>> 
>>> Thanks for take a time for answer this emails, your advices had been very 
>>> useful
>>> 
>>> PS: The version of SGE is   OGS/GE 2011.11p1
>>> 
>>> 
>>> Oscar Fabian Mojica Ladino
>>> Geologist M.S. in  Geophysics
>>> 
>>> 
>>>> From: re...@staff.uni-marburg.de
>>>> Date: Fri, 15 Aug 2014 20:38:12 +0200
>>>> To: us...@open-mpi.org
>>>> Subject: Re: [OMPI users] Running a hybrid MPI+openMP program
>>>> 
>>>> Hi,
>>>> 
>>>> Am 15.08.2014 um 19:56 schrieb Oscar Mojica:
>>>> 
>>>>> Yes, my installation of Open MPI is SGE-aware. I got the following
>>>>> 
>>>>> [oscar@compute-1-2 ~]$ ompi_info | grep grid
>>>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.2)
>>>> 
>>>> Fine.
>>>> 
>>>> 
>>>>> I'm a bit slow and I didn't understand the las part of your message. So i 
>>>>> made a test trying to solve my doubts.
>>>>> This is the cluster configuration: There are some machines turned off but 
>>>>> that is no problem
>>>>> 
>>>>> [oscar@aguia free-noise]$ qhost
>>>>> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
>>>>> -------------------------------------------------------------------------------
>>>>> global - - - - - - -
>>>>> compute-1-10 linux-x64 16 0.97 23.6G 558.6M 996.2M 0.0
>>>>> compute-1-11 linux-x64 16 - 23.6G - 996.2M -
>>>>> compute-1-12 linux-x64 16 0.97 23.6G 561.1M 996.2M 0.0
>>>>> compute-1-13 linux-x64 16 0.99 23.6G 558.7M 996.2M 0.0
>>>>> compute-1-14 linux-x64 16 1.00 23.6G 555.1M 996.2M 0.0
>>>>> compute-1-15 linux-x64 16 0.97 23.6G 555.5M 996.2M 0.0
>>>>> compute-1-16 linux-x64 8 0.00 15.7G 296.9M 1000.0M 0.0
>>>>> compute-1-17 linux-x64 8 0.00 15.7G 299.4M 1000.0M 0.0
>>>>> compute-1-18 linux-x64 8 - 15.7G - 1000.0M -
>>>>> compute-1-19 linux-x64 8 - 15.7G - 996.2M -
>>>>> compute-1-2 linux-x64 16 1.19 23.6G 468.1M 1000.0M 0.0
>>>>> compute-1-20 linux-x64 8 0.04 15.7G 297.2M 1000.0M 0.0
>>>>> compute-1-21 linux-x64 8 - 15.7G - 1000.0M -
>>>>> compute-1-22 linux-x64 8 0.00 15.7G 297.2M 1000.0M 0.0
>>>>> compute-1-23 linux-x64 8 0.16 15.7G 299.6M 1000.0M 0.0
>>>>> compute-1-24 linux-x64 8 0.00 15.7G 291.5M 996.2M 0.0
>>>>> compute-1-25 linux-x64 8 0.04 15.7G 293.4M 996.2M 0.0
>>>>> compute-1-26 linux-x64 8 - 15.7G - 1000.0M -
>>>>> compute-1-27 linux-x64 8 0.00 15.7G 297.0M 1000.0M 0.0
>>>>> compute-1-29 linux-x64 8 - 15.7G - 1000.0M -
>>>>> compute-1-3 linux-x64 16 - 23.6G - 996.2M -
>>>>> compute-1-30 linux-x64 16 - 23.6G - 996.2M -
>>>>> compute-1-4 linux-x64 16 0.97 23.6G 571.6M 996.2M 0.0
>>>>> compute-1-5 linux-x64 16 1.00 23.6G 559.6M 996.2M 0.0
>>>>> compute-1-6 linux-x64 16 0.66 23.6G 403.1M 996.2M 0.0
>>>>> compute-1-7 linux-x64 16 0.95 23.6G 402.7M 996.2M 0.0
>>>>> compute-1-8 linux-x64 16 0.97 23.6G 556.8M 996.2M 0.0
>>>>> compute-1-9 linux-x64 16 1.02 23.6G 566.0M 1000.0M 0.0 
>>>>> 
>>>>> I ran my program using only MPI with 10 processors of the queue one.q 
>>>>> which has 14 machines (compute-1-2 to compute-1-15). Whit 'qstat -t' I 
>>>>> got:
>>>>> 
>>>>> [oscar@aguia free-noise]$ qstat -t
>>>>> job-ID prior name user state submit/start at queue master ja-task-ID 
>>>>> task-ID state cpu mem io stat failed 
>>>>> 
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
> ----
>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-2.local 
>>>>> MASTER r 00:49:12 554.13753 0.09163 
>>>>> one.q@compute-1-2.local SLAVE 
>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-5.local 
>>>>> SLAVE 1.compute-1-5 r 00:48:53 551.49022 0.09410 
>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-9.local 
>>>>> SLAVE 1.compute-1-9 r 00:50:00 564.22764 0.09409 
>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-12.local 
>>>>> SLAVE 1.compute-1-12 r 00:47:30 535.30379 0.09379 
>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-13.local 
>>>>> SLAVE 1.compute-1-13 r 00:49:51 561.69868 0.09379 
>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-14.local 
>>>>> SLAVE 1.compute-1-14 r 00:49:14 554.60818 0.09379 
>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-10.local 
>>>>> SLAVE 1.compute-1-10 r 00:49:59 562.95487 0.09349 
>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-15.local 
>>>>> SLAVE 1.compute-1-15 r 00:50:01 563.27221 0.09361 
>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-8.local 
>>>>> SLAVE 1.compute-1-8 r 00:49:26 556.68431 0.09349 
>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-4.local 
>>>>> SLAVE 1.compute-1-4 r 00:49:27 556.87510 0.04967 
>>>> 
>>>> Yes, here you got 10 slots (= cores) granted by SGE. So there is no free 
>>>> core left inside the allocation of SGE to allow the use of additional 
>>>> cores for your 
> threads. If you use more cores than granted by SGE, it will oversubscribe the 
> machines.
>>>> 
>>>> The issue is now:
>>>> 
>>>> a) If you want 8 threads per MPI process, your job will use 80 cores in 
>>>> total - for now SGE isn't aware of it.
>>>> 
>>>> b) Although you specified $fill_up as allocation rule, it looks like 
>>>> $round_robin. Is there more than one slot defined in the queue definition 
>>>> of one.q to get 
> exclusive access?
>>>> 
>>>> c) What version of SGE are you using? Certain ones use cgroups or bind 
>>>> processes directly to cores (although it usually needs to be requested by 
>>>> the job: 
> first line of `qconf -help`).
>>>> 
>>>> 
>>>> In case you are alone in the cluster, you could bypass the allocation with 
>>>> b) (unless you are hit by c)). But having a mixture of users and jobs a 
>>>> different 
> handling would be necessary to handle this in a proper way IMO:
>>>> 
>>>> a) having a PE with a fixed allocation rule of 8
>>>> 
>>>> b) requesting this PE with an overall slot count of 80
>>>> 
>>>> c) copy and alter the $PE_HOSTFILE to show only (granted core count per 
>>>> machine) divided by (OMP_NUM_THREADS) per entry, change $PE_HOSTFILE so 
>>>> that it points 
> to the altered file
>>>> 
>>>> d) Open MPI with a Tight Integration will now start only N process per 
>>>> machine according to the altered hostfile, in your case one
>>>> 
>>>> e) Your application can start the desired threads and you stay inside the 
>>>> granted allocation
>>>> 
>>>> -- Reuti
>>>> 
>>>> 
>>>>> I accessed to the MASTER processor with 'ssh compute-1-2.local' , and 
>>>>> with $ ps -e f and got this, I'm showing only the last lines 
>>>>> 
>>>>> 2506 ? Ss 0:00 /usr/sbin/atd
>>>>> 2548 tty1 Ss+ 0:00 /sbin/mingetty /dev/tty1
>>>>> 2550 tty2 Ss+ 0:00 /sbin/mingetty /dev/tty2
>>>>> 2552 tty3 Ss+ 0:00 /sbin/mingetty /dev/tty3
>>>>> 2554 tty4 Ss+ 0:00 /sbin/mingetty /dev/tty4
>>>>> 2556 tty5 Ss+ 0:00 /sbin/mingetty /dev/tty5
>>>>> 2558 tty6 Ss+ 0:00 /sbin/mingetty /dev/tty6
>>>>> 3325 ? Sl 0:04 /opt/gridengine/bin/linux-x64/sge_execd
>>>>> 17688 ? S 0:00 \_ sge_shepherd-2726 -bg
>>>>> 17695 ? Ss 0:00 \_ -bash 
>>>>> /opt/gridengine/default/spool/compute-1-2/job_scripts/2726
>>>>> 17797 ? S 0:00 \_ /usr/bin/time -f %E /opt/openmpi/bin/mpirun -v -np 10 
>>>>> ./inverse.exe
>>>>> 17798 ? S 0:01 \_ /opt/openmpi/bin/mpirun -v -np 10 ./inverse.exe
>>>>> 17799 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit -nostdin 
>>>>> -V compute-1-5.local PATH=/opt/openmpi/bin:$PATH ; expo
>>>>> 17800 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit -nostdin 
>>>>> -V compute-1-9.local PATH=/opt/openmpi/bin:$PATH ; expo
>>>>> 17801 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit -nostdin 
>>>>> -V compute-1-12.local PATH=/opt/openmpi/bin:$PATH ; exp
>>>>> 17802 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit -nostdin 
>>>>> -V compute-1-13.local PATH=/opt/openmpi/bin:$PATH ; exp
>>>>> 17803 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit -nostdin 
>>>>> -V compute-1-14.local PATH=/opt/openmpi/bin:$PATH ; exp
>>>>> 17804 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit -nostdin 
>>>>> -V compute-1-10.local PATH=/opt/openmpi/bin:$PATH ; exp
>>>>> 17805 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit -nostdin 
>>>>> -V compute-1-15.local PATH=/opt/openmpi/bin:$PATH ; exp
>>>>> 17806 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit -nostdin 
>>>>> -V compute-1-8.local PATH=/opt/openmpi/bin:$PATH ; expo
>>>>> 17807 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit -nostdin 
>>>>> -V compute-1-4.local PATH=/opt/openmpi/bin:$PATH ; expo
>>>>> 17826 ? R 31:36 \_ ./inverse.exe
>>>>> 3429 ? Ssl 0:00 automount --pid-file /var/run/autofs.pid 
>>>>> 
>>>>> So the job is using the 10 machines, Until here is all right OK. Do you 
>>>>> think that changing the "allocation_rule " to a number instead $fill_up 
>>>>> the MPI 
> processes would divide the work in that number of threads?
>>>>> 
>>>>> Thanks a lot 
>>>>> 
>>>>> Oscar Fabian Mojica Ladino
>>>>> Geologist M.S. in Geophysics
>>>>> 
>>>>> 
>>>>> PS: I have another doubt, what is a slot? is a physical core?
>>>>> 
>>>>> 
>>>>>> From: re...@staff.uni-marburg.de
>>>>>> Date: Thu, 14 Aug 2014 23:54:22 +0200
>>>>>> To: us...@open-mpi.org
>>>>>> Subject: Re: [OMPI users] Running a hybrid MPI+openMP program
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I think this is a broader issue in case an MPI library is used in 
>>>>>> conjunction with threads while running inside a queuing system. First: 
>>>>>> whether your 
> actual installation of Open MPI is SGE-aware you can check with:
>>>>>> 
>>>>>> $ ompi_info | grep grid
>>>>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5)
>>>>>> 
>>>>>> Then we can look at the definition of your PE: "allocation_rule 
>>>>>> $fill_up". This means that SGE will grant you 14 slots in total in any 
>>>>>> combination on the 
> available machines, means 8+4+2 slots allocation is an allowed combination 
> like 4+4+3+3 and so on. Depending on the SGE-awareness it's a question: will 
> your 
> application just start processes on all nodes and completely disregard the 
> granted allocation, or as the other extreme does it stays on one and the same 
> machine 
> for all started processes? On the master node of the parallel job you can 
> issue:
>>>>>> 
>>>>>> $ ps -e f
>>>>>> 
>>>>>> (f w/o -) to have a look whether `ssh` or `qrsh -inhert ...` is used to 
>>>>>> reach other machines and their requested process count.
>>>>>> 
>>>>>> 
>>>>>> Now to the common problem in such a set up:
>>>>>> 
>>>>>> AFAICS: for now there is no way in the Open MPI + SGE combination to 
>>>>>> specify the number of MPI processes and intended number of threads which 
>>>>>> are 
> automatically read by Open MPI while staying inside the granted slot count 
> and allocation. So it seems to be necessary to have the intended number of 
> threads being 
> honored by Open MPI too.
>>>>>> 
>>>>>> Hence specifying e.g. "allocation_rule 8" in such a setup while 
>>>>>> requesting 32 processes, would for now start 32 processes by MPI 
>>>>>> already, as Open MP reads 
> the $PE_HOSTFILE and acts accordingly.
>>>>>> 
>>>>>> Open MPI would have to read the generated machine file in a slightly 
>>>>>> different way regarding threads: a) read the $PE_HOSTFILE, b) divide the 
>>>>>> granted 
> slots per machine by OMP_NUM_THREADS, c) throw an error in case it's not 
> divisible by OMP_NUM_THREADS. Then start one process per quotient.
>>>>>> 
>>>>>> Would this work for you?
>>>>>> 
>>>>>> -- Reuti
>>>>>> 
>>>>>> PS: This would also mean to have a couple of PEs in SGE having a fixed 
>>>>>> "allocation_rule". While this works right now, an extension in SGE could 
>>>>>> be 
> "$fill_up_omp"/"$round_robin_omp" and using OMP_NUM_THREADS there too, hence 
> it must not be specified as an `export` in the job script but either on the 
> command 
> line or inside the job script in #$ lines as job requests. This would mean to 
> collect slots in bunches of OMP_NUM_THREADS on each machine to reach the 
> overall 
> specified slot count. Whether OMP_NUM_THREADS or n times OMP_NUM_THREADS is 
> allowed per machine needs to be discussed.
>>>>>> 
>>>>>> PS2: As Univa SGE can also supply a list of granted cores in the 
>>>>>> $PE_HOSTFILE, it would be an extension to feed this to Open MPI to allow 
>>>>>> any UGE aware 
> binding.
>>>>>> 
>>>>>> 
>>>>>> Am 14.08.2014 um 21:52 schrieb Oscar Mojica:
>>>>>> 
>>>>>>> Guys
>>>>>>> 
>>>>>>> I changed the line to run the program in the script with both options
>>>>>>> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v --bind-to-none -np 
>>>>>>> $NSLOTS ./inverse.exe
>>>>>>> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v --bind-to-socket -np 
>>>>>>> $NSLOTS ./inverse.exe
>>>>>>> 
>>>>>>> but I got the same results. When I use man mpirun appears:
>>>>>>> 
>>>>>>> -bind-to-none, --bind-to-none
>>>>>>> Do not bind processes. (Default.)
>>>>>>> 
>>>>>>> and the output of 'qconf -sp orte' is
>>>>>>> 
>>>>>>> pe_name orte
>>>>>>> slots 9999
>>>>>>> user_lists NONE
>>>>>>> xuser_lists NONE
>>>>>>> start_proc_args /bin/true
>>>>>>> stop_proc_args /bin/true
>>>>>>> allocation_rule $fill_up
>>>>>>> control_slaves TRUE
>>>>>>> job_is_first_task FALSE
>>>>>>> urgency_slots min
>>>>>>> accounting_summary TRUE
>>>>>>> 
>>>>>>> I don't know if the installed Open MPI was compiled with '--with-sge'. 
>>>>>>> How can i know that?
>>>>>>> before to think in an hybrid application i was using only MPI and the 
>>>>>>> program used few processors (14). The cluster possesses 28 machines, 15 
>>>>>>> with 16 
> cores and 13 with 8 cores totalizing 344 units of processing. When I 
> submitted the job (only MPI), the MPI processes were spread to the cores 
> directly, for that 
> reason I created a new queue with 14 machines trying to gain more time. the 
> results were the same in both cases. In the last case i could prove that the 
> processes 
> were distributed to all machines correctly.
>>>>>>> 
>>>>>>> What I must to do?
>>>>>>> Thanks 
>>>>>>> 
>>>>>>> Oscar Fabian Mojica Ladino
>>>>>>> Geologist M.S. in Geophysics
>>>>>>> 
>>>>>>> 
>>>>>>>> Date: Thu, 14 Aug 2014 10:10:17 -0400
>>>>>>>> From: maxime.boissonnea...@calculquebec.ca
>>>>>>>> To: us...@open-mpi.org
>>>>>>>> Subject: Re: [OMPI users] Running a hybrid MPI+openMP program
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> You DEFINITELY need to disable OpenMPI's new default binding. 
>>>>>>>> Otherwise, 
>>>>>>>> your N threads will run on a single core. --bind-to socket would be my 
>>>>>>>> recommendation for hybrid jobs.
>>>>>>>> 
>>>>>>>> Maxime
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Le 2014-08-14 10:04, Jeff Squyres (jsquyres) a 馗rit :
>>>>>>>>> I don't know much about OpenMP, but do you need to disable Open MPI's 
>>>>>>>>> default bind-to-core functionality (I'm assuming you're using Open 
>>>>>>>>> MPI 1.8.x)?
>>>>>>>>> 
>>>>>>>>> You can try "mpirun --bind-to none ...", which will have Open MPI not 
>>>>>>>>> bind MPI processes to cores, which might allow OpenMP to think that 
>>>>>>>>> it can use 
> all the cores, and therefore it will spawn num_cores threads...?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Aug 14, 2014, at 9:50 AM, Oscar Mojica <o_moji...@hotmail.com> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hello everybody
>>>>>>>>>> 
>>>>>>>>>> I am trying to run a hybrid mpi + openmp program in a cluster. I 
>>>>>>>>>> created a queue with 14 machines, each one with 16 cores. The 
>>>>>>>>>> program divides the 
> work among the 14 processors with MPI and within each processor a loop is 
> also divided into 8 threads for example, using openmp. The problem is that 
> when I submit 
> the job to the queue the MPI processes don't divide the work into threads and 
> the program prints the number of threads that are working within each process 
> as one.
>>>>>>>>>> 
>>>>>>>>>> I made a simple test program that uses openmp and I logged in one 
>>>>>>>>>> machine of the fourteen. I compiled it using gfortran -fopenmp 
>>>>>>>>>> program.f -o exe, 
> set the OMP_NUM_THREADS environment variable equal to 8 and when I ran 
> directly in the terminal the loop was effectively divided among the cores and 
> for example in 
> this case the program printed the number of threads equal to 8
>>>>>>>>>> 
>>>>>>>>>> This is my Makefile
>>>>>>>>>> 
>>>>>>>>>> # Start of the makefile
>>>>>>>>>> # Defining variables
>>>>>>>>>> objects = inv_grav3d.o funcpdf.o gr3dprm.o fdjac.o dsvd.o
>>>>>>>>>> #f90comp = /opt/openmpi/bin/mpif90
>>>>>>>>>> f90comp = /usr/bin/mpif90
>>>>>>>>>> #switch = -O3
>>>>>>>>>> executable = inverse.exe
>>>>>>>>>> # Makefile
>>>>>>>>>> all : $(executable)
>>>>>>>>>> $(executable) : $(objects)   
>>>>>>>>>> $(f90comp) -fopenmp -g -O -o $(executable) $(objects)
>>>>>>>>>> rm $(objects)
>>>>>>>>>> %.o: %.f
>>>>>>>>>> $(f90comp) -c $<
>>>>>>>>>> # Cleaning everything
>>>>>>>>>> clean:
>>>>>>>>>> rm $(executable)
>>>>>>>>>> #    rm $(objects)
>>>>>>>>>> # End of the makefile
>>>>>>>>>> 
>>>>>>>>>> and the script that i am using is
>>>>>>>>>> 
>>>>>>>>>> #!/bin/bash
>>>>>>>>>> #$ -cwd
>>>>>>>>>> #$ -j y
>>>>>>>>>> #$ -S /bin/bash
>>>>>>>>>> #$ -pe orte 14
>>>>>>>>>> #$ -N job
>>>>>>>>>> #$ -q new.q
>>>>>>>>>> 
>>>>>>>>>> export OMP_NUM_THREADS=8
>>>>>>>>>> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v -np $NSLOTS 
>>>>>>>>>> ./inverse.exe
>>>>>>>>>> 
>>>>>>>>>> am I forgetting something?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> 
>>>>>>>>>> Oscar Fabian Mojica Ladino
>>>>>>>>>> Geologist M.S. in Geophysics
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> Link to this post: 
>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/08/25016.php
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> ---------------------------------
>>>>>>>> Maxime Boissonneault
>>>>>>>> Analyste de calcul - Calcul Qu饕ec, Universit・Laval
>>>>>>>> Ph. D. en physique
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this post: 
>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/08/25020.php
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/users/2014/08/25032.php
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2014/08/25034.php
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2014/08/25037.php
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2014/08/25038.php
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/08/25079.php
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/08/25080.php
> 
> ----
> Tetsuya Mishima  tmish...@jcity.maeda.co.jp
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/25081.php

Re: [OMPI users] Running a hybrid MPI+openMP program

Reply via email to