[OMPI users] openmpi 1.4.1
Ooops, found the problem, hadn't restarted pbs after changing the nodes lists and the job had been put onto a node with a faulty myrinet connection on the switch. Regards Hi All, I am receiving an error message [grid-admin@ng2 ~]$ cat dml_test.err [hydra010:22914] [btl_gm_proc.c:191] error in converting global to local id [hydra002:07435] [btl_gm_proc.c:191] error in converting global to local id [hydra009:31492] [btl_gm_proc.c:191] error in converting global to local id [hydra008:29253] [btl_gm_proc.c:191] error in converting global to local id [hydra007:02552] [btl_gm_proc.c:191] error in converting global to local id [hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id [hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id [hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id [hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id [hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id [hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id [hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id [hydra005:27967] [btl_gm_proc.c:191] error in converting global to local id [hydra006:19420] [btl_gm_proc.c:191] error in converting global to local id [hydra010:22914] [btl_gm.c:489] send completed with unhandled gm error 18 [hydra010:22914] pml_ob1_sendreq.c:211 FATAL -- mpirun has exited due to process rank 0 with PID 22914 on node hydra010 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- [grid-admin@ng2 ~]$ I've searched and googled only to find nothing that is able to point me where this problem may lie. I've looked at the source code and can't see anything glaringly obvious and am wondering whether this might be a gm issue? It does appear to start up ok GM: Version 2.1.30_Linux build 2.1.30_Linux root@hydra115:/usr/local/src/gm-2.1.30_Linux Tue Apr 27 12:29:17 CST 2010 GM: On i686, kernel version: 2.6.18-92.1.10.el5 #1 SMP Tue Aug 5 07:41:53 EDT 2008 GM: Highmem memory configuration: GM: PFN_ZERO=0x0, PFN_MAX=0x7fffc, KERNEL_PFN_MAX=0x38000 GM: Memory available for registration: 259456 pages (1013 MBytes) GM: MCP for unit 0: L9 4K GM: LANai rate set to 132 MHz (max = 134 MHz) GM: Board 0 supports 2815 remote nodes. GM: Board 0 page hash cache has 16384 bins. GM: Board 0 has 1 packet interfaces. GM: NOTICE: /usr/local/src/gm-2.1.30_Linux/drivers/linux/kbuild/gm_arch_k.c:4828:():kernel GM: ServerWorks chipset detected: avoiding PIO read. GM: Allocated IRQ10 GM: 1 Myrinet board(s) found and initialized Any ideas as to where to look would be most appreciated. Thanks -- David Logan eResearch SA, ARCS Grid Administrator Level 1, School of Physics and Chemistry North Terrace, Adelaide, 5005 (W) 08 8303 7301 (M) 0458 631 117
[OMPI users] openmpi 1.4.1
Hi All, I am receiving an error message [grid-admin@ng2 ~]$ cat dml_test.err [hydra010:22914] [btl_gm_proc.c:191] error in converting global to local id [hydra002:07435] [btl_gm_proc.c:191] error in converting global to local id [hydra009:31492] [btl_gm_proc.c:191] error in converting global to local id [hydra008:29253] [btl_gm_proc.c:191] error in converting global to local id [hydra007:02552] [btl_gm_proc.c:191] error in converting global to local id [hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id [hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id [hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id [hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id [hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id [hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id [hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id [hydra005:27967] [btl_gm_proc.c:191] error in converting global to local id [hydra006:19420] [btl_gm_proc.c:191] error in converting global to local id [hydra010:22914] [btl_gm.c:489] send completed with unhandled gm error 18 [hydra010:22914] pml_ob1_sendreq.c:211 FATAL -- mpirun has exited due to process rank 0 with PID 22914 on node hydra010 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- [grid-admin@ng2 ~]$ I've searched and googled only to find nothing that is able to point me where this problem may lie. I've looked at the source code and can't see anything glaringly obvious and am wondering whether this might be a gm issue? It does appear to start up ok GM: Version 2.1.30_Linux build 2.1.30_Linux root@hydra115:/usr/local/src/gm-2.1.30_Linux Tue Apr 27 12:29:17 CST 2010 GM: On i686, kernel version: 2.6.18-92.1.10.el5 #1 SMP Tue Aug 5 07:41:53 EDT 2008 GM: Highmem memory configuration: GM: PFN_ZERO=0x0, PFN_MAX=0x7fffc, KERNEL_PFN_MAX=0x38000 GM: Memory available for registration: 259456 pages (1013 MBytes) GM: MCP for unit 0: L9 4K GM: LANai rate set to 132 MHz (max = 134 MHz) GM: Board 0 supports 2815 remote nodes. GM: Board 0 page hash cache has 16384 bins. GM: Board 0 has 1 packet interfaces. GM: NOTICE: /usr/local/src/gm-2.1.30_Linux/drivers/linux/kbuild/gm_arch_k.c:4828:():kernel GM: ServerWorks chipset detected: avoiding PIO read. GM: Allocated IRQ10 GM: 1 Myrinet board(s) found and initialized Any ideas as to where to look would be most appreciated. Thanks -- David Logan eResearch SA, ARCS Grid Administrator Level 1, School of Physics and Chemistry North Terrace, Adelaide, 5005 (W) 08 8303 7301 (M) 0458 631 117
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
Hi Jeff Answers inline. Jeff Squyres wrote: On May 6, 2010, at 2:01 PM, Gus Correa wrote: 1) Now I can see and use the btl_sm_num_fifos component: I had committed already "btl = ^sm" to the openmpi-mca-params.conf file. This apparently hides the btl_sm_num_fifos from ompi_info. After I switched to no options in openmpi-mca-params.conf, then ompi_info showed the btl_sm_num_fifos component. ompi_info --all | grep btl_sm_num_fifos MCA btl: parameter "btl_sm_num_fifos" (current value: "1", data source: default value) A side comment: This means that the system administrator can hide some Open MPI options from the users, depending on what he puts in the openmpi-mca-params.conf file, right? Correct. BUT: a user can always override the "btl" MCA param and see them again. For example, you could also have done this: echo "btl =" > ~/.openmpi/mca-params.conf ompi_info --all | grep btl_sm_num_fifos # ...will show the sm params... Aha! Can they override my settings?! Can't anymore. I'm gonna write a BOFH cron script to run every 10 minutes, check for and delete any ~/.openmpi directory, shutdown the recalcitrant account, make a tarball of its ~ , and send it to the mass store. Quarantined. :) 2) However, running with "sm" still breaks, unfortunately: Boomer! Doh! I get the same errors that I reported in my very first email, if I increase the number of processes to 16, to explore the hyperthreading range. This is using "sm" (i.e. not excluded in the mca config file), and btl_sm_num_fifos (mpiexec command line) The machine hangs, requires a hard reboot, etc, etc, as reported earlier. See the below, please. I saw that only some probably-unrelated dmesg messages were emitted. Was there anything else revealing on the console and/or /var/log/* files? Hard reboots absolutely should not be caused by Open MPI. I don't think the problem is with Open MPI. So, it may not be easy to find a logical link between the kernel messages and the MPI hello_c that was running. So, I guess the conclusion is that I can use sm, but I have to remain within the range of physical cores (8), not oversubscribe, not try to explore the HT range. Should I expect it to work also for np>number of physical cores? Your prior explanations of when HT is useful seemed pretty reasonable to me. Meaning: Nehalem HT will help only in some kinds of codes. Dense computation codes with few conditional branches may not benefit much from HT. When there aren't frequent requests to change the code, to include new features, one can think about optimizing for dense computation, avoid inner loop branches, etc. That is the situation reported by Doug Reeder on this thread, where his optimized finite element code shows a 2/3 degraded speed when HT is used. However, most of the codes we run here seem to have been optimized at some point of their early life, but then aggregated so many new features that the if/elseif/elseif... branches are abundant. The logic can get so complicated to de-tangle and streamline that nobody dares to rewrite the code, afraid to produce wrong results, or to have to face a long code re-development cycle (without support). It is like fixing the plumbing or wiring of an old house. OO that goes OOverboard also plays a role, often misses the point, and can add more overhead. I would guess that this is not a specific situation of Earth Science applications (which tend to be big and complex). So, chances are that hyperthreading may give us a little edge, harnessing the code imperfections. Not a big one, maybe 10-20%, I would guess. I experienced that type of speedup with SMT/HT on an IBM machine with one of these big codes. But OMPI applications should always run *correctly*, regardless of HT or not-HT -- even if you're oversubscribing. The performance may suffer (sometimes dramatically) if you oversubscribe physical cores with dense computational code, but it should always run *correctly*. That is what I was seeking first place. Not performance with HT, but correctness with HT. Whether we would use HT or not was to be decided later, after testing how the atmospheric model would perform with and without HT. I wonder if this would still work with np<=8, but with heavier code. (I only used hello_c.c so far.) If hello_c is crashing your computer - even if you're running np>8 or np>16 -- something is wrong outside of Open MPI. I routinely run np=100 hello_c on machines. I've got hello_c to run correctly with heavy oversubscription on our cluster nodes (up to 1024 on a 8-core node IIRR). Heavier programs don't go this far, but still run with light oversubscription. But on that Nehalem + Fedora 12 machine it doesn't work. So, the evidence is clear. The problem is not with Open MPI. $ mpiexec -mca btl_sm_num_fifos 16 -np 16 a.out -- mpiexec noticed that process rank 8 with PID 3659 on
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
Hi Gus, Doh! I didn't see the kernel-related messages after the segfault message. Definitely some weirdness here that is beyond your control... Sorry about that. -- Samuel K. Gutierrez Los Alamos National Laboratory On May 6, 2010, at 3:28 PM, Gus Correa wrote: Hi Samuel Samuel K. Gutierrez wrote: Hi Gus, This may not help, but it's worth a try. If it's not too much trouble, can you please reconfigure your Open MPI installation with --enable-debug and then rebuild? After that, may we see the stack trace from a core file that is produced after the segmentation fault? Thanks, -- Samuel K. Gutierrez Los Alamos National Laboratory Thank you for the suggestion. I am a bit reluctant to try this because when it fails, it *really* fails. Most of the times the machine doesn't even return the prompt, and in all cases it freezes and requires a hard reboot. It is not a segfault that the OS can catch, I guess. I wonder if enabling debug mode would do much for us, and get to the point of dumping a core, or just die before that. Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - On May 6, 2010, at 12:01 PM, Gus Correa wrote: Hi Eugene Thanks for the detailed answer. * 1) Now I can see and use the btl_sm_num_fifos component: I had committed already "btl = ^sm" to the openmpi-mca-params.conf file. This apparently hides the btl_sm_num_fifos from ompi_info. After I switched to no options in openmpi-mca-params.conf, then ompi_info showed the btl_sm_num_fifos component. ompi_info --all | grep btl_sm_num_fifos MCA btl: parameter "btl_sm_num_fifos" (current value: "1", data source: default value) A side comment: This means that the system administrator can hide some Open MPI options from the users, depending on what he puts in the openmpi-mca-params.conf file, right? * 2) However, running with "sm" still breaks, unfortunately: Boomer! I get the same errors that I reported in my very first email, if I increase the number of processes to 16, to explore the hyperthreading range. This is using "sm" (i.e. not excluded in the mca config file), and btl_sm_num_fifos (mpiexec command line) The machine hangs, requires a hard reboot, etc, etc, as reported earlier. See the below, please. So, I guess the conclusion is that I can use sm, but I have to remain within the range of physical cores (8), not oversubscribe, not try to explore the HT range. Should I expect it to work also for np>number of physical cores? I wonder if this would still work with np<=8, but with heavier code. (I only used hello_c.c so far.) Not sure I'll be able to test this, the user wants to use the machine. $mpiexec -mca btl_sm_num_fifos 4 -np 4 a.out Hello, world, I am 0 of 4 Hello, world, I am 1 of 4 Hello, world, I am 2 of 4 Hello, world, I am 3 of 4 $ mpiexec -mca btl_sm_num_fifos 8 -np 8 a.out Hello, world, I am 0 of 8 Hello, world, I am 1 of 8 Hello, world, I am 2 of 8 Hello, world, I am 3 of 8 Hello, world, I am 4 of 8 Hello, world, I am 5 of 8 Hello, world, I am 6 of 8 Hello, world, I am 7 of 8 $ mpiexec -mca btl_sm_num_fifos 16 -np 16 a.out -- mpiexec noticed that process rank 8 with PID 3659 on node spinoza.ldeo.columbia.edu exited on signal 11 (Segmentation fault). -- $ Message from syslogd@spinoza at May 6 13:38:13 ... kernel:[ cut here ] Message from syslogd@spinoza at May 6 13:38:13 ... kernel:invalid opcode: [#1] SMP Message from syslogd@spinoza at May 6 13:38:13 ... kernel:last sysfs file: /sys/devices/system/cpu/cpu15/topology/ physical_package_id Message from syslogd@spinoza at May 6 13:38:13 ... kernel:Stack: Message from syslogd@spinoza at May 6 13:38:13 ... kernel:Call Trace: Message from syslogd@spinoza at May 6 13:38:13 ... kernel:Code: 48 89 45 a0 4c 89 ff e8 e0 dd 2b 00 41 8b b6 58 03 00 00 4c 89 e7 ff c6 e8 b5 bc ff ff 41 8b 96 5c 03 00 00 48 98 48 39 d0 73 04 <0f> 0b eb fe 48 29 d0 48 89 45 a8 66 41 ff 07 49 8b 94 24 00 01 * Many thanks, Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Eugene Loh wrote: Gus Correa wrote: Hi Eugene Thank you for answering one of my original questions. However, there seems to be a problem with the syntax. Is it really "-mca btl btl_sm_num_fifos=some_number"? No. Try "--mca btl_sm_num_fifos 4". Or, % setenv OMPI_MCA_btl_sm_num_fifos 4 % ompi_info -a | grep btl_sm_num_fifos # check that things
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
On May 6, 2010, at 2:01 PM, Gus Correa wrote: > 1) Now I can see and use the btl_sm_num_fifos component: > > I had committed already "btl = ^sm" to the openmpi-mca-params.conf > file. This apparently hides the btl_sm_num_fifos from ompi_info. > > After I switched to no options in openmpi-mca-params.conf, > then ompi_info showed the btl_sm_num_fifos component. > > ompi_info --all | grep btl_sm_num_fifos > MCA btl: parameter "btl_sm_num_fifos" (current value: "1", > data source: default value) > > A side comment: > This means that the system administrator can > hide some Open MPI options from the users, depending on what > he puts in the openmpi-mca-params.conf file, right? Correct. BUT: a user can always override the "btl" MCA param and see them again. For example, you could also have done this: echo "btl =" > ~/.openmpi/mca-params.conf ompi_info --all | grep btl_sm_num_fifos # ...will show the sm params... > 2) However, running with "sm" still breaks, unfortunately: > > Boomer! Doh! > I get the same errors that I reported in my very > first email, if I increase the number of processes to 16, > to explore the hyperthreading range. > > This is using "sm" (i.e. not excluded in the mca config file), > and btl_sm_num_fifos (mpiexec command line) > > The machine hangs, requires a hard reboot, etc, etc, > as reported earlier. See the below, please. I saw that only some probably-unrelated dmesg messages were emitted. Was there anything else revealing on the console and/or /var/log/* files? Hard reboots absolutely should not be caused by Open MPI. > So, I guess the conclusion is that I can use sm, > but I have to remain within the range of physical cores (8), > not oversubscribe, not try to explore the HT range. > Should I expect it to work also for np>number of physical cores? Your prior explanations of when HT is useful seemed pretty reasonable to me. Meaning: Nehalem HT will help only in some kinds of codes. Dense computation codes with few conditional branches may not benefit much from HT. But OMPI applications should always run *correctly*, regardless of HT or not-HT -- even if you're oversubscribing. The performance may suffer (sometimes dramatically) if you oversubscribe physical cores with dense computational code, but it should always run *correctly*. > I wonder if this would still work with np<=8, but with heavier code. > (I only used hello_c.c so far.) If hello_c is crashing your computer - even if you're running np>8 or np>16 -- something is wrong outside of Open MPI. I routinely run np=100 hello_c on machines. > $ mpiexec -mca btl_sm_num_fifos 16 -np 16 a.out > -- > mpiexec noticed that process rank 8 with PID 3659 on node > spinoza.ldeo.columbia.edu exited on signal 11 (Segmentation fault). > -- > $ > > Message from syslogd@spinoza at May 6 13:38:13 ... > kernel:[ cut here ] > > Message from syslogd@spinoza at May 6 13:38:13 ... > kernel:invalid opcode: [#1] SMP > > Message from syslogd@spinoza at May 6 13:38:13 ... > kernel:last sysfs file: > /sys/devices/system/cpu/cpu15/topology/physical_package_id > > Message from syslogd@spinoza at May 6 13:38:13 ... > kernel:Stack: > > Message from syslogd@spinoza at May 6 13:38:13 ... > kernel:Call Trace: > > Message from syslogd@spinoza at May 6 13:38:13 ... > kernel:Code: 48 89 45 a0 4c 89 ff e8 e0 dd 2b 00 41 8b b6 58 03 00 00 4c 89 > e7 ff c6 e8 b5 bc ff ff 41 8b 96 5c 03 00 00 48 98 48 39 d0 73 04 <0f> 0b eb > fe 48 29 d0 48 89 45 a8 66 41 ff 07 49 8b 94 24 00 01 I unfortunately don't know what these messages mean... -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
Hi Samuel Samuel K. Gutierrez wrote: Hi Gus, This may not help, but it's worth a try. If it's not too much trouble, can you please reconfigure your Open MPI installation with --enable-debug and then rebuild? After that, may we see the stack trace from a core file that is produced after the segmentation fault? Thanks, -- Samuel K. Gutierrez Los Alamos National Laboratory Thank you for the suggestion. I am a bit reluctant to try this because when it fails, it *really* fails. Most of the times the machine doesn't even return the prompt, and in all cases it freezes and requires a hard reboot. It is not a segfault that the OS can catch, I guess. I wonder if enabling debug mode would do much for us, and get to the point of dumping a core, or just die before that. Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - On May 6, 2010, at 12:01 PM, Gus Correa wrote: Hi Eugene Thanks for the detailed answer. * 1) Now I can see and use the btl_sm_num_fifos component: I had committed already "btl = ^sm" to the openmpi-mca-params.conf file. This apparently hides the btl_sm_num_fifos from ompi_info. After I switched to no options in openmpi-mca-params.conf, then ompi_info showed the btl_sm_num_fifos component. ompi_info --all | grep btl_sm_num_fifos MCA btl: parameter "btl_sm_num_fifos" (current value: "1", data source: default value) A side comment: This means that the system administrator can hide some Open MPI options from the users, depending on what he puts in the openmpi-mca-params.conf file, right? * 2) However, running with "sm" still breaks, unfortunately: Boomer! I get the same errors that I reported in my very first email, if I increase the number of processes to 16, to explore the hyperthreading range. This is using "sm" (i.e. not excluded in the mca config file), and btl_sm_num_fifos (mpiexec command line) The machine hangs, requires a hard reboot, etc, etc, as reported earlier. See the below, please. So, I guess the conclusion is that I can use sm, but I have to remain within the range of physical cores (8), not oversubscribe, not try to explore the HT range. Should I expect it to work also for np>number of physical cores? I wonder if this would still work with np<=8, but with heavier code. (I only used hello_c.c so far.) Not sure I'll be able to test this, the user wants to use the machine. $mpiexec -mca btl_sm_num_fifos 4 -np 4 a.out Hello, world, I am 0 of 4 Hello, world, I am 1 of 4 Hello, world, I am 2 of 4 Hello, world, I am 3 of 4 $ mpiexec -mca btl_sm_num_fifos 8 -np 8 a.out Hello, world, I am 0 of 8 Hello, world, I am 1 of 8 Hello, world, I am 2 of 8 Hello, world, I am 3 of 8 Hello, world, I am 4 of 8 Hello, world, I am 5 of 8 Hello, world, I am 6 of 8 Hello, world, I am 7 of 8 $ mpiexec -mca btl_sm_num_fifos 16 -np 16 a.out -- mpiexec noticed that process rank 8 with PID 3659 on node spinoza.ldeo.columbia.edu exited on signal 11 (Segmentation fault). -- $ Message from syslogd@spinoza at May 6 13:38:13 ... kernel:[ cut here ] Message from syslogd@spinoza at May 6 13:38:13 ... kernel:invalid opcode: [#1] SMP Message from syslogd@spinoza at May 6 13:38:13 ... kernel:last sysfs file: /sys/devices/system/cpu/cpu15/topology/physical_package_id Message from syslogd@spinoza at May 6 13:38:13 ... kernel:Stack: Message from syslogd@spinoza at May 6 13:38:13 ... kernel:Call Trace: Message from syslogd@spinoza at May 6 13:38:13 ... kernel:Code: 48 89 45 a0 4c 89 ff e8 e0 dd 2b 00 41 8b b6 58 03 00 00 4c 89 e7 ff c6 e8 b5 bc ff ff 41 8b 96 5c 03 00 00 48 98 48 39 d0 73 04 <0f> 0b eb fe 48 29 d0 48 89 45 a8 66 41 ff 07 49 8b 94 24 00 01 * Many thanks, Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Eugene Loh wrote: Gus Correa wrote: Hi Eugene Thank you for answering one of my original questions. However, there seems to be a problem with the syntax. Is it really "-mca btl btl_sm_num_fifos=some_number"? No. Try "--mca btl_sm_num_fifos 4". Or, % setenv OMPI_MCA_btl_sm_num_fifos 4 % ompi_info -a | grep btl_sm_num_fifos # check that things were set correctly % mpirun -n 4 a.out When I grep any component starting with btl_sm I get nothing: ompi_info --all | grep btl_sm (No output) I'm no guru, but I think the reason has something to do with dynamically loaded somethings. E.g., %
Re: [OMPI users] MPI_Bsend vs. MPI_Ibsend (2)
First, to minimize ambiguity, it may make sense to distinguish explicitly between two buffers: the send buffer (specified in the MPI_Send or MPI_Bsend call) and the attached buffer (specified in some MPI_Buffer_attach call). Jovana Knezevic wrote: On the other hand, a slight confusion when Buffered send is concerned remains: In my understanding, MPI_SEND (standard, blocking) does not return until the send operation it invoked has completed. Completion can mean the message was copied into an MPI internal buffer, or it can mean the sending and receiving processes synchronized on the message. MPI_Send will return when it is safe to reuse the send buffer. No guarantees about anything having to do with the receiver. So, if we decide to use buffered send (Bsend, so blocking), and we say "I want to allocate a large enough buffer, I want my data to be copied into the buffer then, because I do not want anyone else to decide if I am going to syncronize completely my sends and receives on the message - I know what I'm doing :-)!" then as soon as the data is copied to the buffer, the call returns and the buffer can be reused. MPI_Bsend will return when it is safe to reuse the send buffer. The message data might simply have been copied to the local attached buffer. Is the difference in comparison to Ibsend that with Ibsend the data doesn't even have to be copied to the buffer when the call returns, right. or something like that? Because otherwise, I still do not see the difference: data copied into buffer-> call returns! Why wouldn't I reuse my message-buffer then?!
Re: [OMPI users] MPI_Bsend vs. MPI_Ibsend (2)
Bsend does not guarantee to use the attached buffer, Return from MPI_Ibsend does not guarantee you can modify the application send buffer. Maybe the implementation would try to optimize by scheduling a nonblocking send from the apploication buffer that bypasses the copy to the attach buffer. When you call WAIT, if the message had gone from the application send buffer in the interim, the copy cost is saved. If it has not, the WAIT could copy into the attach buffer and let the send go from there what the recv is posted. I am not aware of an MPI that does this, but it would be a reasonable optimization. Dick Treumann - MPI Team IBM Systems & Technology Group Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 |> | From: | |> >| |Jovana Knezevic| >| |> | To:| |> >| |us...@open-mpi.org | >| |> | Date: | |> >| |05/06/2010 03:36 PM | >| |> | Subject: | |> >| |[OMPI users] MPI_Bsend vs. MPI_Ibsend (2) | >| |> | Sent by: | |> >| |users-boun...@open-mpi.org | >| Thank you all! Regarding the posted Recv, I am aware that neither send nor buffered send tell the sender if it is posted. Regarding the distinction between blocking and unblocking calls in general, everything is clear as well. On the other hand, a slight confusion when Buffered send is concerned remains: In my understanding, MPI_SEND (standard, blocking) does not return until the send operation it invoked has completed. Completion can mean the message was copied into an MPI internal buffer, or it can mean the sending and receiving processes synchronized on the message. So, if we decide to use buffered send (Bsend, so blocking), and we say "I want to allocate a large enough buffer, I want my data to be copied into the buffer then, because I do not want anyone else to decide if I am going to syncronize completely my sends and receives on the message - I know what I'm doing :-)!" then as soon as the data is copied to the buffer, the call returns and the buffer can be reused. Is the difference in comparison to Ibsend that with Ibsend the data doesn't even have to be copied to the buffer when the call returns, or something like that? Because otherwise, I still do not see the difference: data copied into buffer-> call returns! Why wouldn't I reuse my message-buffer then?! Sorry for bothering you so much, but for the type of applications I am involved in this is very important issue, thus, it is crucial that this becomes completely clear to me. Thank you again! Cheers, Jovana > An MPI send (of any kind), is defined by "local completion semantics". > When a send is complete, the send buffer may be reused. The only kind of > send that gives any indication whether the receive is posted is the > synchronous send. Neither standard send nor buffered send tell the sender > if the recv was posted. > > The difference between
[OMPI users] MPI_Bsend vs. MPI_Ibsend (2)
Thank you all! Regarding the posted Recv, I am aware that neither send nor buffered send tell the sender if it is posted. Regarding the distinction between blocking and unblocking calls in general, everything is clear as well. On the other hand, a slight confusion when Buffered send is concerned remains: In my understanding, MPI_SEND (standard, blocking) does not return until the send operation it invoked has completed. Completion can mean the message was copied into an MPI internal buffer, or it can mean the sending and receiving processes synchronized on the message. So, if we decide to use buffered send (Bsend, so blocking), and we say "I want to allocate a large enough buffer, I want my data to be copied into the buffer then, because I do not want anyone else to decide if I am going to syncronize completely my sends and receives on the message - I know what I'm doing :-)!" then as soon as the data is copied to the buffer, the call returns and the buffer can be reused. Is the difference in comparison to Ibsend that with Ibsend the data doesn't even have to be copied to the buffer when the call returns, or something like that? Because otherwise, I still do not see the difference: data copied into buffer-> call returns! Why wouldn't I reuse my message-buffer then?! Sorry for bothering you so much, but for the type of applications I am involved in this is very important issue, thus, it is crucial that this becomes completely clear to me. Thank you again! Cheers, Jovana > An MPI send (of any kind), is defined by "local completion semantics". > When a send is complete, the send buffer may be reused. The only kind of > send that gives any indication whether the receive is posted is the > synchronous send. Neither standard send nor buffered send tell the sender > if the recv was posted. > > The difference between blocking and nonblocking is that a return from a > blocking send call indicates the send buffer may be reused. A return from a > nonblocking send does not allow the send buffer tpo be reused (but other > things can be done). The send buffer becomes available to reuse after a > wait or successful test. > > Dick Treumann - MPI Team > IBM Systems & Technology Group > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 > Tele (845) 433-7846 Fax (845) 433-8363 > > > > |> > | From: | > |> > > >| > |Bill Rankin> | > > >| > |> > | To: | > |> > > >| > |Open MPI Users > | > > >| > |> > | Date: | > |> > > >| > |05/06/2010 10:35 AM > | > > >| > |> > | Subject: | > |> > > >| > |Re: [OMPI users] MPI_Bsend vs. MPI_Ibsend > | > > >| > |> > | Sent by: | > |> > > >| > |users-boun...@open-mpi.org > | > > >| > > > > > > Actually the 'B' in MPI_Bsend() specifies that it is a blocking *buffered* > send. So if I remember my standards correctly, this call requires: > > 1) you will have to explicitly manage the send buffers via >
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
Gus Correa wrote: 2) However, running with "sm" still breaks, unfortunately: I get the same errors that I reported in my very first email, if I increase the number of processes to 16, to explore the hyperthreading range. This is using "sm" (i.e. not excluded in the mca config file), and btl_sm_num_fifos (mpiexec command line) The machine hangs, requires a hard reboot, etc, etc, as reported earlier. Okay. I think this is different from trac 2043, then, since that involved a race condition that can be worked around by giving each sender its own FIFO. So, I guess the conclusion is that I can use sm, but I have to remain within the range of physical cores (8), not oversubscribe, not try to explore the HT range. Should I expect it to work also for np>number of physical cores? Yes, I believe that would be a reasonable expectation (under circumstances other than the ones you're facing, in any case). I just ran the examples/connectivity_c.c test with GCC on an 8-core Nehalem system with HT turned on and tested up to np=64.
Re: [OMPI users] Can NWChem be run with OpenMPI over an InfiniBand interconnect ... ??
Yeah, you just need to set the param specified in the warning message. We inserted that to ensure that people understand that IB doesn't play well with fork'd processes, so you need to be careful when doing so. On May 6, 2010, at 12:27 PM, Addepalli, Srirangam V wrote: > HelloRichard, > Yes NWCHEM can be run on IB using 1.4.1. If you have built openmpi with IB > support. > Note: If your IB cards are qlogic you need to compile NWCHEM with MPI-SPAWN. > Rangam > > Settings for my Build with MPI-SPAWN: > export ARMCI_NETWORK=MPI-SPAWN > export IB_HOME=/usr > export IB_INCLUDE=/usr/include > export IB_LIB=/usr/lib64 > export IB_LIB_NAME="-libverbs -libumad -lpthread " > export NWCHEM_TOP=/lustre/work/apps/nwchem-5.1.1 > export NWCHEM_MODULES="venus geninterface all" > export LIBMPI="-lmpi" > export ARMCI_DEFAULT_SHMMAX=256 > export BLASLIB=goto2_penrynp-r1.00 > export BLASLOC=/lustre/work/apps/goto/ > export BLASOPT="-L/lustre/work/apps/goto/ -l$BLASLIB" > export CC=icc > export CFLG="-xP -fPIC" > export CXX=icpc > export F77=ifort > export F90=ifort > export FC=ifort > export FL=ifort > export LARGE_FILES=TRUE > export LIB_DEFINES=-DDFLT_TOT_MEM=2147483647 > export MPI_LOC=/lustre/work/apps/IB-ICC-IFORT-OPENMPI-IB/ > export MPI_INCLUDE=$MPI_LOC/include > export MPI_LIB=$MPI_LOC/lib > export MPI_BIN=$MPI_LOC/bin > export NWCHEM_TARGET=LINUX64 > export TARGET=LINUX64 > export USE_MPI=y > > Setting with OPENIB > > export ARMCI_NETWORK=OPENIB > export IB_HOME=/usr > export IB_INCLUDE=/usr/include > export IB_LIB=/usr/lib64 > export IBV_FORK_SAFE=1 > export NWCHEM_TOP=/lustre/work/apps/nwchem-5.1.1 > export NWCHEM_MODULES="all qm geninterface" > #export LIBMPI="-lmpich -libumad -libverbs -lrdmacm -pthread" > export LIBMPI="-lmpi -pthread -libumad -libverbs -lrdmacm -pthread" > export ARMCI_DEFAULT_SHMMAX=256 > export BLASLIB=goto2_penrynp-r1.00 > export BLASLOC=/lustre/work/apps/goto/ > export BLASOPT="-L/lustre/work/apps/goto/ -l$BLASLIB" > export CC=icc > export CFLG="-xP -fPIC" > export CXX=icpc > export F77=ifort > export F90=ifort > export FC=ifort > export FL=ifort > export GOTO_NUM_THREADS=1 > export LARGE_FILES=TRUE > export LIB_DEFINES=-DDFLT_TOT_MEM=2147483647 > export MA_USE_ARMCI_MEM=1 > export MPI_LOC=/lustre/work/apps/IB-ICC-IFORT-OPENMPI > export MPI_INCLUDE=$MPI_LOC/include > export MPI_LIB=$MPI_LOC/lib > export MPI_BIN=$MPI_LOC/bin > export NWCHEM_TARGET=LINUX64 > export OMP_NUM_THREADS=1 > export TARGET=LINUX64 > export USE_MPI=y > > > > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of > Richard Walsh [richard.wa...@csi.cuny.edu] > Sent: Thursday, May 06, 2010 1:06 PM > To: us...@open-mpi.org > Subject: [OMPI users] Can NWChem be run with OpenMPI over an InfiniBand > interconnect ... ?? > > All, > > I have built NWChem successfully, and trying to run it with an > Intel built version of OpenMPI 1.4.1. If I force to run over over > 1 GigE maintenance interconnect it works, but when I try it over > the default InfiniBand communications network it fails with: > > -- > An MPI process has executed an operation involving a call to the > "fork()" system call to create a child process. Open MPI is currently > operating in a condition that could result in memory corruption or > other system errors; your MPI job may hang, crash, or produce silent > data corruption. The use of fork() (or system() or other calls that > create child processes) is strongly discouraged. > > The process that invoked fork was: > > Local host: gpute-2 (PID 15996) > MPI_COMM_WORLD rank: 0 > > If you are *absolutely sure* that your application will successfully > and correctly survive a call to fork(), you may disable this warning > by setting the mpi_warn_on_fork MCA parameter to 0. > -- > > This looks to be a known problem. Is there I go around? I have seen > it suggested in some places that I need to use Mellanox's version of MPI, > which is not an option and surprises me as they are a big OFED contributor. > > What are my options ... other than using GigE ... ?? > > Thanks, > > rbw > > > > > Richard Walsh > Parallel Applications and Systems Manager > CUNY HPC Center, Staten Island, NY > 718-982-3319 > 612-382-4620 > > Mighty the Wizard > Who found me at sunrise > Sleeping, and woke me > And learn'd me Magic! > > Think green before you print this email. > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
I know a few national labs that run OMPI w/Fedora 9, but that isn't on Nehalem hardware and is using gcc 4.3.x. However, I think the key issue really is the compiler. I have seen similar problems on multiple platforms and OS's whenever I use GCC 4.4.x - I -think- it has to do with the automatic vectorization in that compiler, but I can't swear to it. You can always install a personal copy of gcc for your own use on the system and see if that solves the problem. Just download a version like 4.3.x from the gnu site. I know 4.3.x doesn't have a problem, though again I haven't tried it on Nehalem. On May 6, 2010, at 12:10 PM, Gus Correa wrote: > Hi Jeff > > Thank you for your testimony. > > So now I have two important data points (you and Douglas Guptill) > to support the argument here that installing Fedora > on machines meant to do scientific and parallel computation > is to ask for trouble. > > I use CentOS in our cluster, but this is a standalone machine > I don't have control of. > > Anybody out there using Open MPI + Fedora Core + Nehalem ? > Happy? > > Regards, > Gus Correa > > Jeff Squyres wrote: >> On May 6, 2010, at 1:11 PM, Gus Correa wrote: >>> Just for the record, I am using: >>> Open MPI 1.4.2 (released 2 days ago), gcc 4.4.3 (g++, gfortran). >>> All on Fedora Core 12, kernel 2.6.32.11-99.fc12.x86_64 #1 SMP. >> Someone asked earlier in this thread -- I've used RHEL4.4 and RHEL5.4 on my >> Nehalem EP boxen. I used the default gcc on those RHELs for compiling >> everything (OMPI + apps). I don't remember what it was on RHEL 4.4, but on >> RHEL 5.4, it's GCC 4.1.2. >>> You and Jeff reported that your >>> Nehalems get along with Open MPI. >>> I would guess other people have functional Open MPI + Nehalem systems. >>> All I can think of is that some mess with the OS/gcc is causing >>> the trouble here. >> I don't have much experience with kernels outside > of the RHEL kernels, > so I don't know if 2.6.32 is problematic or not. :-( > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Can NWChem be run with OpenMPI over an InfiniBand interconnect ... ??
HelloRichard, Yes NWCHEM can be run on IB using 1.4.1. If you have built openmpi with IB support. Note: If your IB cards are qlogic you need to compile NWCHEM with MPI-SPAWN. Rangam Settings for my Build with MPI-SPAWN: export ARMCI_NETWORK=MPI-SPAWN export IB_HOME=/usr export IB_INCLUDE=/usr/include export IB_LIB=/usr/lib64 export IB_LIB_NAME="-libverbs -libumad -lpthread " export NWCHEM_TOP=/lustre/work/apps/nwchem-5.1.1 export NWCHEM_MODULES="venus geninterface all" export LIBMPI="-lmpi" export ARMCI_DEFAULT_SHMMAX=256 export BLASLIB=goto2_penrynp-r1.00 export BLASLOC=/lustre/work/apps/goto/ export BLASOPT="-L/lustre/work/apps/goto/ -l$BLASLIB" export CC=icc export CFLG="-xP -fPIC" export CXX=icpc export F77=ifort export F90=ifort export FC=ifort export FL=ifort export LARGE_FILES=TRUE export LIB_DEFINES=-DDFLT_TOT_MEM=2147483647 export MPI_LOC=/lustre/work/apps/IB-ICC-IFORT-OPENMPI-IB/ export MPI_INCLUDE=$MPI_LOC/include export MPI_LIB=$MPI_LOC/lib export MPI_BIN=$MPI_LOC/bin export NWCHEM_TARGET=LINUX64 export TARGET=LINUX64 export USE_MPI=y Setting with OPENIB export ARMCI_NETWORK=OPENIB export IB_HOME=/usr export IB_INCLUDE=/usr/include export IB_LIB=/usr/lib64 export IBV_FORK_SAFE=1 export NWCHEM_TOP=/lustre/work/apps/nwchem-5.1.1 export NWCHEM_MODULES="all qm geninterface" #export LIBMPI="-lmpich -libumad -libverbs -lrdmacm -pthread" export LIBMPI="-lmpi -pthread -libumad -libverbs -lrdmacm -pthread" export ARMCI_DEFAULT_SHMMAX=256 export BLASLIB=goto2_penrynp-r1.00 export BLASLOC=/lustre/work/apps/goto/ export BLASOPT="-L/lustre/work/apps/goto/ -l$BLASLIB" export CC=icc export CFLG="-xP -fPIC" export CXX=icpc export F77=ifort export F90=ifort export FC=ifort export FL=ifort export GOTO_NUM_THREADS=1 export LARGE_FILES=TRUE export LIB_DEFINES=-DDFLT_TOT_MEM=2147483647 export MA_USE_ARMCI_MEM=1 export MPI_LOC=/lustre/work/apps/IB-ICC-IFORT-OPENMPI export MPI_INCLUDE=$MPI_LOC/include export MPI_LIB=$MPI_LOC/lib export MPI_BIN=$MPI_LOC/bin export NWCHEM_TARGET=LINUX64 export OMP_NUM_THREADS=1 export TARGET=LINUX64 export USE_MPI=y From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Richard Walsh [richard.wa...@csi.cuny.edu] Sent: Thursday, May 06, 2010 1:06 PM To: us...@open-mpi.org Subject: [OMPI users] Can NWChem be run with OpenMPI over an InfiniBand interconnect ... ?? All, I have built NWChem successfully, and trying to run it with an Intel built version of OpenMPI 1.4.1. If I force to run over over 1 GigE maintenance interconnect it works, but when I try it over the default InfiniBand communications network it fails with: -- An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: gpute-2 (PID 15996) MPI_COMM_WORLD rank: 0 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -- This looks to be a known problem. Is there I go around? I have seen it suggested in some places that I need to use Mellanox's version of MPI, which is not an option and surprises me as they are a big OFED contributor. What are my options ... other than using GigE ... ?? Thanks, rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY 718-982-3319 612-382-4620 Mighty the Wizard Who found me at sunrise Sleeping, and woke me And learn'd me Magic! Think green before you print this email. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
Hi Gus, This may not help, but it's worth a try. If it's not too much trouble, can you please reconfigure your Open MPI installation with -- enable-debug and then rebuild? After that, may we see the stack trace from a core file that is produced after the segmentation fault? Thanks, -- Samuel K. Gutierrez Los Alamos National Laboratory On May 6, 2010, at 12:01 PM, Gus Correa wrote: Hi Eugene Thanks for the detailed answer. * 1) Now I can see and use the btl_sm_num_fifos component: I had committed already "btl = ^sm" to the openmpi-mca-params.conf file. This apparently hides the btl_sm_num_fifos from ompi_info. After I switched to no options in openmpi-mca-params.conf, then ompi_info showed the btl_sm_num_fifos component. ompi_info --all | grep btl_sm_num_fifos MCA btl: parameter "btl_sm_num_fifos" (current value: "1", data source: default value) A side comment: This means that the system administrator can hide some Open MPI options from the users, depending on what he puts in the openmpi-mca-params.conf file, right? * 2) However, running with "sm" still breaks, unfortunately: Boomer! I get the same errors that I reported in my very first email, if I increase the number of processes to 16, to explore the hyperthreading range. This is using "sm" (i.e. not excluded in the mca config file), and btl_sm_num_fifos (mpiexec command line) The machine hangs, requires a hard reboot, etc, etc, as reported earlier. See the below, please. So, I guess the conclusion is that I can use sm, but I have to remain within the range of physical cores (8), not oversubscribe, not try to explore the HT range. Should I expect it to work also for np>number of physical cores? I wonder if this would still work with np<=8, but with heavier code. (I only used hello_c.c so far.) Not sure I'll be able to test this, the user wants to use the machine. $mpiexec -mca btl_sm_num_fifos 4 -np 4 a.out Hello, world, I am 0 of 4 Hello, world, I am 1 of 4 Hello, world, I am 2 of 4 Hello, world, I am 3 of 4 $ mpiexec -mca btl_sm_num_fifos 8 -np 8 a.out Hello, world, I am 0 of 8 Hello, world, I am 1 of 8 Hello, world, I am 2 of 8 Hello, world, I am 3 of 8 Hello, world, I am 4 of 8 Hello, world, I am 5 of 8 Hello, world, I am 6 of 8 Hello, world, I am 7 of 8 $ mpiexec -mca btl_sm_num_fifos 16 -np 16 a.out -- mpiexec noticed that process rank 8 with PID 3659 on node spinoza.ldeo.columbia.edu exited on signal 11 (Segmentation fault). -- $ Message from syslogd@spinoza at May 6 13:38:13 ... kernel:[ cut here ] Message from syslogd@spinoza at May 6 13:38:13 ... kernel:invalid opcode: [#1] SMP Message from syslogd@spinoza at May 6 13:38:13 ... kernel:last sysfs file: /sys/devices/system/cpu/cpu15/topology/ physical_package_id Message from syslogd@spinoza at May 6 13:38:13 ... kernel:Stack: Message from syslogd@spinoza at May 6 13:38:13 ... kernel:Call Trace: Message from syslogd@spinoza at May 6 13:38:13 ... kernel:Code: 48 89 45 a0 4c 89 ff e8 e0 dd 2b 00 41 8b b6 58 03 00 00 4c 89 e7 ff c6 e8 b5 bc ff ff 41 8b 96 5c 03 00 00 48 98 48 39 d0 73 04 <0f> 0b eb fe 48 29 d0 48 89 45 a8 66 41 ff 07 49 8b 94 24 00 01 * Many thanks, Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Eugene Loh wrote: Gus Correa wrote: Hi Eugene Thank you for answering one of my original questions. However, there seems to be a problem with the syntax. Is it really "-mca btl btl_sm_num_fifos=some_number"? No. Try "--mca btl_sm_num_fifos 4". Or, % setenv OMPI_MCA_btl_sm_num_fifos 4 % ompi_info -a | grep btl_sm_num_fifos # check that things were set correctly % mpirun -n 4 a.out When I grep any component starting with btl_sm I get nothing: ompi_info --all | grep btl_sm (No output) I'm no guru, but I think the reason has something to do with dynamically loaded somethings. E.g., % /home/eugene/ompi/bin/ompi_info --all | grep btl_sm_num_fifos (no output) % setenv OPAL_PREFIX /home/eugene/ompi % set path = ( $OPAL_PREFIX/bin $path ) % ompi_info --all | grep btl_sm_num_fifos MCA btl: parameter "btl_sm_num_fifos" (current value: "1", data source: default value) ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
Hi Jeff Thank you for your testimony. So now I have two important data points (you and Douglas Guptill) to support the argument here that installing Fedora on machines meant to do scientific and parallel computation is to ask for trouble. I use CentOS in our cluster, but this is a standalone machine I don't have control of. Anybody out there using Open MPI + Fedora Core + Nehalem ? Happy? Regards, Gus Correa Jeff Squyres wrote: On May 6, 2010, at 1:11 PM, Gus Correa wrote: Just for the record, I am using: Open MPI 1.4.2 (released 2 days ago), gcc 4.4.3 (g++, gfortran). All on Fedora Core 12, kernel 2.6.32.11-99.fc12.x86_64 #1 SMP. Someone asked earlier in this thread -- I've used RHEL4.4 and RHEL5.4 on my Nehalem EP boxen. I used the default gcc on those RHELs for compiling everything (OMPI + apps). I don't remember what it was on RHEL 4.4, but on RHEL 5.4, it's GCC 4.1.2. You and Jeff reported that your Nehalems get along with Open MPI. I would guess other people have functional Open MPI + Nehalem systems. All I can think of is that some mess with the OS/gcc is causing the trouble here. I don't have much experience with kernels outside of the RHEL kernels, so I don't know if 2.6.32 is problematic or not. :-(
[OMPI users] Can NWChem be run with OpenMPI over an InfiniBand interconnect ... ??
All, I have built NWChem successfully, and trying to run it with an Intel built version of OpenMPI 1.4.1. If I force to run over over 1 GigE maintenance interconnect it works, but when I try it over the default InfiniBand communications network it fails with: -- An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: gpute-2 (PID 15996) MPI_COMM_WORLD rank: 0 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -- This looks to be a known problem. Is there I go around? I have seen it suggested in some places that I need to use Mellanox's version of MPI, which is not an option and surprises me as they are a big OFED contributor. What are my options ... other than using GigE ... ?? Thanks, rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY 718-982-3319 612-382-4620 Mighty the Wizard Who found me at sunrise Sleeping, and woke me And learn'd me Magic! Think green before you print this email.
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
Hi Eugene Thanks for the detailed answer. * 1) Now I can see and use the btl_sm_num_fifos component: I had committed already "btl = ^sm" to the openmpi-mca-params.conf file. This apparently hides the btl_sm_num_fifos from ompi_info. After I switched to no options in openmpi-mca-params.conf, then ompi_info showed the btl_sm_num_fifos component. ompi_info --all | grep btl_sm_num_fifos MCA btl: parameter "btl_sm_num_fifos" (current value: "1", data source: default value) A side comment: This means that the system administrator can hide some Open MPI options from the users, depending on what he puts in the openmpi-mca-params.conf file, right? * 2) However, running with "sm" still breaks, unfortunately: Boomer! I get the same errors that I reported in my very first email, if I increase the number of processes to 16, to explore the hyperthreading range. This is using "sm" (i.e. not excluded in the mca config file), and btl_sm_num_fifos (mpiexec command line) The machine hangs, requires a hard reboot, etc, etc, as reported earlier. See the below, please. So, I guess the conclusion is that I can use sm, but I have to remain within the range of physical cores (8), not oversubscribe, not try to explore the HT range. Should I expect it to work also for np>number of physical cores? I wonder if this would still work with np<=8, but with heavier code. (I only used hello_c.c so far.) Not sure I'll be able to test this, the user wants to use the machine. $mpiexec -mca btl_sm_num_fifos 4 -np 4 a.out Hello, world, I am 0 of 4 Hello, world, I am 1 of 4 Hello, world, I am 2 of 4 Hello, world, I am 3 of 4 $ mpiexec -mca btl_sm_num_fifos 8 -np 8 a.out Hello, world, I am 0 of 8 Hello, world, I am 1 of 8 Hello, world, I am 2 of 8 Hello, world, I am 3 of 8 Hello, world, I am 4 of 8 Hello, world, I am 5 of 8 Hello, world, I am 6 of 8 Hello, world, I am 7 of 8 $ mpiexec -mca btl_sm_num_fifos 16 -np 16 a.out -- mpiexec noticed that process rank 8 with PID 3659 on node spinoza.ldeo.columbia.edu exited on signal 11 (Segmentation fault). -- $ Message from syslogd@spinoza at May 6 13:38:13 ... kernel:[ cut here ] Message from syslogd@spinoza at May 6 13:38:13 ... kernel:invalid opcode: [#1] SMP Message from syslogd@spinoza at May 6 13:38:13 ... kernel:last sysfs file: /sys/devices/system/cpu/cpu15/topology/physical_package_id Message from syslogd@spinoza at May 6 13:38:13 ... kernel:Stack: Message from syslogd@spinoza at May 6 13:38:13 ... kernel:Call Trace: Message from syslogd@spinoza at May 6 13:38:13 ... kernel:Code: 48 89 45 a0 4c 89 ff e8 e0 dd 2b 00 41 8b b6 58 03 00 00 4c 89 e7 ff c6 e8 b5 bc ff ff 41 8b 96 5c 03 00 00 48 98 48 39 d0 73 04 <0f> 0b eb fe 48 29 d0 48 89 45 a8 66 41 ff 07 49 8b 94 24 00 01 * Many thanks, Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Eugene Loh wrote: Gus Correa wrote: Hi Eugene Thank you for answering one of my original questions. However, there seems to be a problem with the syntax. Is it really "-mca btl btl_sm_num_fifos=some_number"? No. Try "--mca btl_sm_num_fifos 4". Or, % setenv OMPI_MCA_btl_sm_num_fifos 4 % ompi_info -a | grep btl_sm_num_fifos # check that things were set correctly % mpirun -n 4 a.out When I grep any component starting with btl_sm I get nothing: ompi_info --all | grep btl_sm (No output) I'm no guru, but I think the reason has something to do with dynamically loaded somethings. E.g., % /home/eugene/ompi/bin/ompi_info --all | grep btl_sm_num_fifos (no output) % setenv OPAL_PREFIX /home/eugene/ompi % set path = ( $OPAL_PREFIX/bin $path ) % ompi_info --all | grep btl_sm_num_fifos MCA btl: parameter "btl_sm_num_fifos" (current value: "1", data source: default value) ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
On May 6, 2010, at 1:11 PM, Gus Correa wrote: > Just for the record, I am using: > Open MPI 1.4.2 (released 2 days ago), gcc 4.4.3 (g++, gfortran). > All on Fedora Core 12, kernel 2.6.32.11-99.fc12.x86_64 #1 SMP. Someone asked earlier in this thread -- I've used RHEL4.4 and RHEL5.4 on my Nehalem EP boxen. I used the default gcc on those RHELs for compiling everything (OMPI + apps). I don't remember what it was on RHEL 4.4, but on RHEL 5.4, it's GCC 4.1.2. > You and Jeff reported that your > Nehalems get along with Open MPI. > I would guess other people have functional Open MPI + Nehalem systems. > All I can think of is that some mess with the OS/gcc is causing > the trouble here. I don't have much experience with kernels outside of the RHEL kernels, so I don't know if 2.6.32 is problematic or not. :-( -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
Hi Douglas Just for the record, I am using: Open MPI 1.4.2 (released 2 days ago), gcc 4.4.3 (g++, gfortran). All on Fedora Core 12, kernel 2.6.32.11-99.fc12.x86_64 #1 SMP. The machine is a white box with two-way quad-core Intel Xeon (Nehalem) E5540 @ 2.53GHz, 48GB RAM. Hyperthreading is currently turned on. But please, don't spend more time on this. You already gave a lot of help. I guess this would be fixed if I could reinstall the OS using a more stable Linux distribution, not Fedora. You and Jeff reported that your Nehalems get along with Open MPI. I would guess other people have functional Open MPI + Nehalem systems. All I can think of is that some mess with the OS/gcc is causing the trouble here. (Yes, to avoid trouble I always compile MPI and applications with the same compiler set. And keep a bunch of Open MPI builds to match our needs.) Cheers, Gus Correa Douglas Guptill wrote: Hello Gus: On Thu, May 06, 2010 at 11:26:57AM -0400, Gus Correa wrote: Douglas: Would you know which gcc you used to build your Open MPI? Or did you use Intel icc instead? Intel ifort and icc. I build OpenMPI with the same compiler, and same options, that I build my application with. I have been tempted to try and duplicate your problem. Would that be a helpful experiment? gcc, OpenMPI 1.4.1, IIRC ? Regards, Douglas.
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
Gus Correa wrote: Hi Eugene Thank you for answering one of my original questions. However, there seems to be a problem with the syntax. Is it really "-mca btl btl_sm_num_fifos=some_number"? No. Try "--mca btl_sm_num_fifos 4". Or, % setenv OMPI_MCA_btl_sm_num_fifos 4 % ompi_info -a | grep btl_sm_num_fifos # check that things were set correctly % mpirun -n 4 a.out When I grep any component starting with btl_sm I get nothing: ompi_info --all | grep btl_sm (No output) I'm no guru, but I think the reason has something to do with dynamically loaded somethings. E.g., % /home/eugene/ompi/bin/ompi_info --all | grep btl_sm_num_fifos (no output) % setenv OPAL_PREFIX /home/eugene/ompi % set path = ( $OPAL_PREFIX/bin $path ) % ompi_info --all | grep btl_sm_num_fifos MCA btl: parameter "btl_sm_num_fifos" (current value: "1", data source: default value)
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
Hello Gus: On Thu, May 06, 2010 at 11:26:57AM -0400, Gus Correa wrote: > Douglas: > Would you know which gcc you used to build your Open MPI? > Or did you use Intel icc instead? Intel ifort and icc. I build OpenMPI with the same compiler, and same options, that I build my application with. I have been tempted to try and duplicate your problem. Would that be a helpful experiment? gcc, OpenMPI 1.4.1, IIRC ? Regards, Douglas. -- Douglas Guptill voice: 902-461-9749 Research Assistant, LSC 4640 email: douglas.gupt...@dal.ca Oceanography Department fax: 902-494-3877 Dalhousie University Halifax, NS, B3H 4J1, Canada
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
Hi Eugene Thank you for answering one of my original questions. However, there seems to be a problem with the syntax. Is it really "-mca btl btl_sm_num_fifos=some_number"? (FYI, I am using Open MPI 4.1.2, a tarball from two days ago.) When I grep any component starting with btl_sm I get nothing: ompi_info --all | grep btl_sm (No output) When I try to run with it, it fails telling me it cannot find the btl_sm_num_fifos component: mpiexec -mca btl sm,self -mca btl btl_sm_num_fifos=4 -np 4 ./a.out -- A requested component was not found, or was unable to be opened. This means that this component is either not installed or is unable to be used on your system (e.g., sometimes this means that shared libraries that the component requires are unable to be found/loaded). Note that Open MPI stopped checking at the first component that it did not find. Host: spinoza.ldeo.columbia.edu Framework: btl Component: btl_sm_num_fifos=4 -- Thank you, Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Eugene Loh wrote: Ralph Castain wrote: Yo Gus Just saw a ticket go by reminding us about continuing hang problems on shared memory when building with gcc 4.4.x - any chance you are in that category? You might have said something earlier in this thread Going back to the original e-mail in this thread: Gus Correa wrote: Use -mca btl -mca btl_sm_num_fifos=some_number ? (Which number?) Another experiment to try would be to keep sm on, but try changing btl_sm_num_fifos as above. The number to use would be the number of processes on the node. E.g., if all processes are running on the same box, just use the same number as processes in the job. The results might help narrow down the possibilities here. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
Hi Ralph, Douglas Ralph: Yes, I am in black list of your ticket (gcc 4.4.3): gcc --version gcc (GCC) 4.4.3 20100127 (Red Hat 4.4.3-4) Copyright (C) 2010 Free Software Foundation, Inc. Is is possible (and not too time consuming) to install an older gcc on this Fedora 12 box, and compile Open MPI with it? (It may be easier just to install another Linux distribution, I would guess. Fedora was not my choice, it is just a PITA.) Douglas: Thank you so much for telling your Linux distro, version, etc. Now it is really starting to look as a distro/kernel/gcc issue. I would not use Fedora, but I don't administer the box. Would you know which gcc you used to build your Open MPI? Or did you use Intel icc instead? Cheers, Gus - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Ralph Castain wrote: Yo Gus Just saw a ticket go by reminding us about continuing hang problems on shared memory when building with gcc 4.4.x - any chance you are in that category? You might have said something earlier in this thread On May 5, 2010, at 5:54 PM, Douglas Guptill wrote: On Wed, May 05, 2010 at 06:08:57PM -0400, Gus Correa wrote: If anybody else has Open MPI working with hyperthreading and "sm" on a Nehalem box, I would appreciate any information about the Linux distro and kernel version being used. Debian 5 (lenny), Core i7 920, Asus P6T MoBo, 12GB RAM, OpenMPI 1.2.8 (with a custom-built MPI_recv.c and MPI_Send.c, which cut down on the cpu load caused by the busy wait polling). We have six (6) of these machines. All configured the same. uname -a yields: Linux screm 2.6.26-2-amd64 #1 SMP Thu Feb 11 00:59:32 UTC 2010 x86_64 GNU/Linux HyperThreading is on. Applications are -np 2 only: mpirun --host localhost,localhost --byslot --mca btl sm,self -np 2 ${BIN} We normally run (up to) 4 of these jobs on each machine. Using Intel 11.0.074 and 11.1.0** compilers; have trouble with the 11.1.0** and "-mcmodel=large -shared-intel" builds. Trouble meaning the numerical results vary strangely. Still working on that problem. Hope that helps, Douglas. P.S. Yes, I know OpenMPI 1.2.8 is old. We have been using it for 2 years with no apparent problems. When I saw comments like "machine hung" for 1.4.1, and "data loss" for 1.3.x, I put aside thoughts of upgrading. -- Douglas Guptill voice: 902-461-9749 Research Assistant, LSC 4640 email: douglas.gupt...@dal.ca Oceanography Department fax: 902-494-3877 Dalhousie University Halifax, NS, B3H 4J1, Canada ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] MPI_Bsend vs. MPI_Ibsend
An MPI send (of any kind), is defined by "local completion semantics". When a send is complete, the send buffer may be reused. The only kind of send that gives any indication whether the receive is posted is the synchronous send. Neither standard send nor buffered send tell the sender if the recv was posted. The difference between blocking and nonblocking is that a return from a blocking send call indicates the send buffer may be reused. A return from a nonblocking send does not allow the send buffer tpo be reused (but other things can be done). The send buffer becomes available to reuse after a wait or successful test. Dick Treumann - MPI Team IBM Systems & Technology Group Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 |> | From: | |> >| |Bill Rankin| >| |> | To:| |> >| |Open MPI Users | >| |> | Date: | |> >| |05/06/2010 10:35 AM | >| |> | Subject: | |> >| |Re: [OMPI users] MPI_Bsend vs. MPI_Ibsend | >| |> | Sent by: | |> >| |users-boun...@open-mpi.org | >| Actually the 'B' in MPI_Bsend() specifies that it is a blocking *buffered* send. So if I remember my standards correctly, this call requires: 1) you will have to explicitly manage the send buffers via MPI_Buffer_[attach|detach](), and 2) the send will block until a corresponding receive is posted. The MPI_Ibsend() is the immediate version of the above and will return w/o the requirement for the corresponding received. Since it is a buffered send the local data copy should be completed before it returns, allowing you to change the contents of the local data buffer. But there is no guaranty that the message has been send, so you should not reuse the send buffer until after verifying the completion of the send via MPI_Wait() or similar. In your example, since MPI_Test() won't block, you can have a problem. Use MPI_Wait() instead or change your send buffer to one that is not being used. -bill -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Jovana Knezevic Sent: Thursday, May 06, 2010 4:44 AM To: us...@open-mpi.org Subject: [OMPI users] MPI_Bsend vs. MPI_Ibsend Dear all, Could anyone please clarify me the difference between MPI_Bsend and MPI_Ibsend? Or, in other words, what exactly is "blocking" in MPI_Bsend, when the data is stored in the buffer and we "return"? :-) Another, but similar, question: What about the data-buffer - when can it be reused in each of the cases - simple examples: for (i=0; i
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
Ralph Castain wrote: Yo Gus Just saw a ticket go by reminding us about continuing hang problems on shared memory when building with gcc 4.4.x - any chance you are in that category? You might have said something earlier in this thread Going back to the original e-mail in this thread: Gus Correa wrote: Use -mca btl -mca btl_sm_num_fifos=some_number ? (Which number?) Another experiment to try would be to keep sm on, but try changing btl_sm_num_fifos as above. The number to use would be the number of processes on the node. E.g., if all processes are running on the same box, just use the same number as processes in the job. The results might help narrow down the possibilities here.
Re: [OMPI users] MPI_Bsend vs. MPI_Ibsend
Actually the 'B' in MPI_Bsend() specifies that it is a blocking *buffered* send. So if I remember my standards correctly, this call requires: 1) you will have to explicitly manage the send buffers via MPI_Buffer_[attach|detach](), and 2) the send will block until a corresponding receive is posted. The MPI_Ibsend() is the immediate version of the above and will return w/o the requirement for the corresponding received. Since it is a buffered send the local data copy should be completed before it returns, allowing you to change the contents of the local data buffer. But there is no guaranty that the message has been send, so you should not reuse the send buffer until after verifying the completion of the send via MPI_Wait() or similar. In your example, since MPI_Test() won't block, you can have a problem. Use MPI_Wait() instead or change your send buffer to one that is not being used. -bill -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Jovana Knezevic Sent: Thursday, May 06, 2010 4:44 AM To: us...@open-mpi.org Subject: [OMPI users] MPI_Bsend vs. MPI_Ibsend Dear all, Could anyone please clarify me the difference between MPI_Bsend and MPI_Ibsend? Or, in other words, what exactly is "blocking" in MPI_Bsend, when the data is stored in the buffer and we "return"? :-) Another, but similar, question: What about the data-buffer - when can it be reused in each of the cases - simple examples: for (i=0; i
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
Gus, I'm not using OpenMPI, however OpenSUSE 11.2 with current updates seems to work fine on Nehalem. I'm curious that you say the Nvidia graphics driver does not install - have you tried running the install script manually, rather than downloading an RPM etc? I'm using version 195.36.15 and it seems to work fine.
Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?
Yo Gus Just saw a ticket go by reminding us about continuing hang problems on shared memory when building with gcc 4.4.x - any chance you are in that category? You might have said something earlier in this thread On May 5, 2010, at 5:54 PM, Douglas Guptill wrote: > On Wed, May 05, 2010 at 06:08:57PM -0400, Gus Correa wrote: > >> If anybody else has Open MPI working with hyperthreading and "sm" >> on a Nehalem box, I would appreciate any information about the >> Linux distro and kernel version being used. > > Debian 5 (lenny), Core i7 920, Asus P6T MoBo, 12GB RAM, OpenMPI 1.2.8 > (with a custom-built MPI_recv.c and MPI_Send.c, which cut down on the > cpu load caused by the busy wait polling). We have six (6) of these > machines. All configured the same. > > uname -a yields: > Linux screm 2.6.26-2-amd64 #1 SMP Thu Feb 11 00:59:32 UTC 2010 x86_64 > GNU/Linux > > HyperThreading is on. > > Applications are -np 2 only: > mpirun --host localhost,localhost --byslot --mca btl sm,self -np 2 ${BIN} > > We normally run (up to) 4 of these jobs on each machine. > > Using Intel 11.0.074 and 11.1.0** compilers; have trouble with the > 11.1.0** and "-mcmodel=large -shared-intel" builds. Trouble meaning > the numerical results vary strangely. Still working on that problem. > > Hope that helps, > Douglas. > > P.S. Yes, I know OpenMPI 1.2.8 is old. We have been using it for 2 > years with no apparent problems. When I saw comments like "machine > hung" for 1.4.1, and "data loss" for 1.3.x, I put aside thoughts of > upgrading. > > -- > Douglas Guptill voice: 902-461-9749 > Research Assistant, LSC 4640 email: douglas.gupt...@dal.ca > Oceanography Department fax: 902-494-3877 > Dalhousie University > Halifax, NS, B3H 4J1, Canada > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Fortran derived types
Assume your data is discontiguous in memory and making it contiguous is not practical (e.g. there is no way to make cells of a row and cells of a column both contiguous.) You have 3 options: 1) Use many small/contiguous messages 2) Allocate scratch space and pack/unpack 3) Use a derived datatype. If you decide to use option 2 then the time your program spends in the allocate/pack/send/free and the time it spends in allocate/recv/unpack/free needs to be counted in the cost. Just comparing a contiguous vs discontiguous message time does not help make a good decision. Whether 2 or 3 is faster depends a lot in how the MPI implementation does its datatype processing. If the MPI implementation can move data directly from discontiguous memory into the sends side adapter and from recv side adapter to discontiguous memory, Datatypes may be faster and will conserve memory. If the MPI implementation just mallocs a scratch buffer and uses the datatype to guide an internal pack/unpack subroutine, there is a pretty good chance your hand crafted pack or unpack, along with contiguous messaging will be more efficient. I mention option 1 for completeness and because if there were a very good put/get available, it might even be the best choice. It is probably not the best choice in any current MPI but there may be exceptions. Dick Treumann - MPI Team IBM Systems & Technology Group Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 |> | From: | |> >| |Terry Frankcombe| >| |> | To:| |> >| |Open MPI Users | >| |> | Date: | |> >| |05/06/2010 12:25 AM | >| |> | Subject: | |> >| |Re: [OMPI users] Fortran derived types | >| |> | Sent by: | |> >| |users-boun...@open-mpi.org | >| Hi Derek On Wed, 2010-05-05 at 13:05 -0400, Cole, Derek E wrote: > In general, even in your serial fortran code, you're already taking a > performance hit using a derived type. Do you have any numbers to back that up? Ciao Terry ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] opal_mutex_lock(): Resource deadlock avoided
Hi! We have a code that trips on this fairly often. I've seen cases where it works but mostly it gets stuck here. The actual mpi call is call mpi_file_open(...) I'm currently just wondering if there has been other reports on/anyone have seen deadlock in mpi-io parts of the code or if this most likely caused by our setup. openmpi version is 1.4.2 (fails with 1.3.3 too) Filesystem used is GPFS openmpi built with mpi-threads but without progress-threads -- Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden Internet: a...@hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90 7866126 Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
Re: [OMPI users] Fortran derived types
Hi, In general, even in your serial fortran code, you're already taking a performance hit using a derived type. That is not generally true. The right statement is: "it depends". Yes, sometimes derived data types and object orientation and so on can lead to some performance hit; but current compiler usually can oprimise alot. E.g. consider http://www.terboven.com/download/OAbstractionsLA.pdf (especially p.19). So, I would not recommend to disturb the ready program in order to let it be the old good f77 style. And let us not start a flame about "assembler is faster but OO is easier"! :-) Best wishes Paul -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Prentice Bisbal Sent: Wednesday, May 05, 2010 11:51 AM To: Open MPI Users Subject: Re: [OMPI users] Fortran derived types Vedran Coralic wrote: Hello, In my Fortran 90 code I use several custom defined derived types. Amongst them is a vector of arrays, i.e. v(:)%f(:,:,:). I am wondering what the proper way of sending this data structure from one processor to another is. Is the best way to just restructure the data by copying it into a vector and sending that or is there a simpler way possible by defining an MPI derived type that can handle it? I looked into the latter myself but so far, I have only found the solution for a scalar fortran derived type and the methodology that was suggested in that case did not seem naturally extensible to the vector case. It depends on how your data is distributed in memory. If the arrays are evenly distributed, like what would happen in a multidimensional-array, the derived datatypes will work fine. If you can't guarantee the spacing between the arrays that make up the vector, then using MPI_Pack/MPI_Unpack (or whatever the Fortran equivalents are) is the best way to go. I'm not an expert MPI programmer, but I wrote a small program earlier this year that created a dynamically created array of dynamically created arrays. After doing some research into this same problem, it looked like packing/unpacking was the only way to go. Using Pack/Unpack is easy, but there is a performance hit since the data needs to be copied into the packed buffer before sending, and then copied out of the buffer after the receive. -- Prentice ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature
[OMPI users] MPI_Bsend vs. MPI_Ibsend
Dear all, Could anyone please clarify me the difference between MPI_Bsend and MPI_Ibsend? Or, in other words, what exactly is "blocking" in MPI_Bsend, when the data is stored in the buffer and we "return"? :-) Another, but similar, question: What about the data-buffer - when can it be reused in each of the cases - simple examples: for (i=0; i
Re: [OMPI users] Fortran derived types
Hi Derek On Wed, 2010-05-05 at 13:05 -0400, Cole, Derek E wrote: > In general, even in your serial fortran code, you're already taking a > performance hit using a derived type. Do you have any numbers to back that up? Ciao Terry