[OMPI users] openmpi 1.4.1

2010-05-06 Thread David Logan
Ooops, found the problem, hadn't restarted pbs after changing the nodes 
lists and the job had been put onto a node with a faulty myrinet 
connection on the switch.


Regards

Hi All,

I am receiving an error message

[grid-admin@ng2 ~]$ cat dml_test.err
[hydra010:22914] [btl_gm_proc.c:191] error in converting global to local id
[hydra002:07435] [btl_gm_proc.c:191] error in converting global to local id
[hydra009:31492] [btl_gm_proc.c:191] error in converting global to local id
[hydra008:29253] [btl_gm_proc.c:191] error in converting global to local id
[hydra007:02552] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra005:27967] [btl_gm_proc.c:191] error in converting global to local id
[hydra006:19420] [btl_gm_proc.c:191] error in converting global to local id
[hydra010:22914] [btl_gm.c:489] send completed with unhandled gm error 18
[hydra010:22914] pml_ob1_sendreq.c:211 FATAL
--
mpirun has exited due to process rank 0 with PID 22914 on
node hydra010 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--
[grid-admin@ng2 ~]$

I've searched and googled only to find nothing that is able to point me 
where this problem may lie. I've looked at the source code and can't see 
anything glaringly obvious and am wondering whether this might be a gm 
issue? It does appear to start up ok


GM: Version 2.1.30_Linux build 2.1.30_Linux 
root@hydra115:/usr/local/src/gm-2.1.30_Linux Tue Apr 27 12:29:17 CST 2010
GM: On i686, kernel version: 2.6.18-92.1.10.el5 #1 SMP Tue Aug 5 
07:41:53 EDT 2008

GM: Highmem memory configuration:
GM: PFN_ZERO=0x0, PFN_MAX=0x7fffc, KERNEL_PFN_MAX=0x38000
GM: Memory available for registration: 259456 pages (1013 MBytes)
GM: MCP for unit 0: L9 4K
GM: LANai rate set to 132 MHz (max = 134 MHz)
GM: Board 0 supports 2815 remote nodes.
GM: Board 0 page hash cache has 16384 bins.
GM: Board 0 has 1 packet interfaces.
GM: NOTICE: 
/usr/local/src/gm-2.1.30_Linux/drivers/linux/kbuild/gm_arch_k.c:4828:():kernel

GM: ServerWorks chipset detected: avoiding PIO read.
GM: Allocated IRQ10
GM: 1 Myrinet board(s) found and initialized

Any ideas as to where to look would be most appreciated.

Thanks

--

David Logan
eResearch SA, ARCS Grid Administrator
Level 1, School of Physics and Chemistry
North Terrace, Adelaide, 5005

(W) 08 8303 7301
(M) 0458 631 117



[OMPI users] openmpi 1.4.1

2010-05-06 Thread David Logan

Hi All,

I am receiving an error message

[grid-admin@ng2 ~]$ cat dml_test.err
[hydra010:22914] [btl_gm_proc.c:191] error in converting global to local id
[hydra002:07435] [btl_gm_proc.c:191] error in converting global to local id
[hydra009:31492] [btl_gm_proc.c:191] error in converting global to local id
[hydra008:29253] [btl_gm_proc.c:191] error in converting global to local id
[hydra007:02552] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra005:27967] [btl_gm_proc.c:191] error in converting global to local id
[hydra006:19420] [btl_gm_proc.c:191] error in converting global to local id
[hydra010:22914] [btl_gm.c:489] send completed with unhandled gm error 18
[hydra010:22914] pml_ob1_sendreq.c:211 FATAL
--
mpirun has exited due to process rank 0 with PID 22914 on
node hydra010 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--
[grid-admin@ng2 ~]$

I've searched and googled only to find nothing that is able to point me 
where this problem may lie. I've looked at the source code and can't see 
anything glaringly obvious and am wondering whether this might be a gm 
issue? It does appear to start up ok


GM: Version 2.1.30_Linux build 2.1.30_Linux 
root@hydra115:/usr/local/src/gm-2.1.30_Linux Tue Apr 27 12:29:17 CST 2010
GM: On i686, kernel version: 2.6.18-92.1.10.el5 #1 SMP Tue Aug 5 
07:41:53 EDT 2008

GM: Highmem memory configuration:
GM: PFN_ZERO=0x0, PFN_MAX=0x7fffc, KERNEL_PFN_MAX=0x38000
GM: Memory available for registration: 259456 pages (1013 MBytes)
GM: MCP for unit 0: L9 4K
GM: LANai rate set to 132 MHz (max = 134 MHz)
GM: Board 0 supports 2815 remote nodes.
GM: Board 0 page hash cache has 16384 bins.
GM: Board 0 has 1 packet interfaces.
GM: NOTICE: 
/usr/local/src/gm-2.1.30_Linux/drivers/linux/kbuild/gm_arch_k.c:4828:():kernel

GM: ServerWorks chipset detected: avoiding PIO read.
GM: Allocated IRQ10
GM: 1 Myrinet board(s) found and initialized

Any ideas as to where to look would be most appreciated.

Thanks

--

David Logan
eResearch SA, ARCS Grid Administrator
Level 1, School of Physics and Chemistry
North Terrace, Adelaide, 5005

(W) 08 8303 7301
(M) 0458 631 117



Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread Gus Correa

Hi Jeff

Answers inline.

Jeff Squyres wrote:

On May 6, 2010, at 2:01 PM, Gus Correa wrote:


1) Now I can see and use the btl_sm_num_fifos component:

I had committed already "btl = ^sm" to the openmpi-mca-params.conf
file.  This apparently hides the btl_sm_num_fifos from ompi_info.

After I switched to no options in openmpi-mca-params.conf,
then ompi_info showed the btl_sm_num_fifos component.

ompi_info --all | grep btl_sm_num_fifos
MCA btl: parameter "btl_sm_num_fifos" (current value: "1", data 
source: default value)

A side comment:
This means that the system administrator can
hide some Open MPI options from the users, depending on what
he puts in the openmpi-mca-params.conf file, right?


Correct.

BUT: a user can always override the "btl" MCA param and see them again.  For 
example, you could also have done this:

   echo "btl =" > ~/.openmpi/mca-params.conf
   ompi_info --all | grep btl_sm_num_fifos
   # ...will show the sm params...



Aha!
Can they override my settings?!
Can't anymore.
I'm gonna write a BOFH cron script to run every 10 minutes,
check for and delete any ~/.openmpi directory,
shutdown the recalcitrant account, make a tarball of its ~ ,
and send it to the mass store.  Quarantined. :)



2) However, running with "sm" still breaks, unfortunately:

Boomer!


Doh!


I get the same errors that I reported in my very
first email, if I increase the number of processes to 16,
to explore the hyperthreading range.

This is using "sm" (i.e. not excluded in the mca config file),
and btl_sm_num_fifos (mpiexec command line)

The machine hangs, requires a hard reboot, etc, etc,
as reported earlier.  See the below, please.


I saw that only some probably-unrelated dmesg messages were emitted.  Was there 
anything else revealing on the console and/or /var/log/* files?  Hard reboots 
absolutely should not be caused by Open MPI.



I don't think the problem is with Open MPI.
So, it may not be easy to find a logical link between the kernel
messages and the MPI hello_c that was running.


So, I guess the conclusion is that I can use sm,
but I have to remain within the range of physical cores (8),
not oversubscribe, not try to explore the HT range.
Should I expect it to work also for np>number of physical cores?


Your prior explanations of when HT is useful 
seemed pretty reasonable to me.  
Meaning: Nehalem HT will help only in some kinds of codes.  
Dense computation codes with few conditional branches may 
not benefit much from HT.




When there aren't frequent requests to change the code,
to include new features, one can think about optimizing for
dense computation, avoid inner loop branches, etc.
That is the situation reported by Doug Reeder on this thread,
where his optimized finite element code shows a 2/3 degraded
speed when HT is used.

However, most of the codes we run here seem to have been optimized at
some point of their early life, but then aggregated so many new
features that the if/elseif/elseif... branches are abundant.
The logic can get so complicated to de-tangle and streamline that
nobody dares to rewrite the code, afraid to produce wrong results,
or to have to face a long code re-development cycle (without support).
It is like fixing the plumbing or wiring of an old house.
OO that goes OOverboard also plays a role, often misses the
point, and can add more overhead.
I would guess that this is not a specific situation of
Earth Science applications (which tend to be big and complex).

So, chances are that hyperthreading may give us a little edge,
harnessing the code imperfections.
Not a big one, maybe 10-20%, I would guess.
I experienced that type of speedup with SMT/HT on an IBM machine
with one of these big codes.

But OMPI applications should always run *correctly*, 
regardless of HT or not-HT -- even if you're oversubscribing. 
The performance may suffer (sometimes dramatically) 
if you oversubscribe physical cores with dense computational code, 
but it should always run *correctly*.




That is what I was seeking first place.
Not performance with HT, but correctness with HT.

Whether we would use HT or not was to be decided later,
after testing how the atmospheric model would perform
with and without HT.



I wonder if this would still work with np<=8, but with heavier code.
(I only used hello_c.c so far.)


If hello_c is crashing your computer - 
even if you're running np>8 or np>16 -- 
something is wrong outside of Open MPI.  
I routinely run np=100 hello_c on machines.




I've got hello_c to run correctly with heavy oversubscription on
our cluster nodes (up to 1024 on a 8-core node IIRR).
Heavier programs don't go this far, but still run with light
oversubscription.

But on that Nehalem + Fedora 12 machine it doesn't work.
So, the evidence is clear.
The problem is not with Open MPI.


$ mpiexec -mca btl_sm_num_fifos 16 -np 16 a.out
--
mpiexec noticed that process rank 8 with PID 3659 on 

Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread Samuel K. Gutierrez

Hi Gus,

Doh!  I didn't see the kernel-related messages after the segfault  
message.  Definitely some weirdness here that is beyond your  
control... Sorry about that.


--
Samuel K. Gutierrez
Los Alamos National Laboratory

On May 6, 2010, at 3:28 PM, Gus Correa wrote:


Hi Samuel

Samuel K. Gutierrez wrote:

Hi Gus,
This may not help, but it's worth a try.  If it's not too much  
trouble, can you please reconfigure your Open MPI installation with  
--enable-debug and then rebuild?  After that, may we see the stack  
trace from a core file that is produced after the segmentation fault?

Thanks,
--
Samuel K. Gutierrez
Los Alamos National Laboratory


Thank you for the suggestion.

I am a bit reluctant to try this because when it fails,
it *really* fails.
Most of the times the machine doesn't even return the prompt,
and in all cases it freezes and requires a hard reboot.
It is not a segfault that the OS can catch, I guess.
I wonder if enabling debug mode would do much for us,
and get to the point of dumping a core, or just die before that.

Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-


On May 6, 2010, at 12:01 PM, Gus Correa wrote:

Hi Eugene

Thanks for the detailed answer.

*

1) Now I can see and use the btl_sm_num_fifos component:

I had committed already "btl = ^sm" to the openmpi-mca-params.conf
file.  This apparently hides the btl_sm_num_fifos from ompi_info.

After I switched to no options in openmpi-mca-params.conf,
then ompi_info showed the btl_sm_num_fifos component.

ompi_info --all | grep btl_sm_num_fifos
   MCA btl: parameter "btl_sm_num_fifos" (current  
value: "1", data source: default value)


A side comment:
This means that the system administrator can
hide some Open MPI options from the users, depending on what
he puts in the openmpi-mca-params.conf file, right?

*

2) However, running with "sm" still breaks, unfortunately:

Boomer!
I get the same errors that I reported in my very
first email, if I increase the number of processes to 16,
to explore the hyperthreading range.

This is using "sm" (i.e. not excluded in the mca config file),
and btl_sm_num_fifos (mpiexec command line)

The machine hangs, requires a hard reboot, etc, etc,
as reported earlier.  See the below, please.

So, I guess the conclusion is that I can use sm,
but I have to remain within the range of physical cores (8),
not oversubscribe, not try to explore the HT range.
Should I expect it to work also for np>number of physical cores?

I wonder if this would still work with np<=8, but with heavier code.
(I only used hello_c.c so far.)
Not sure I'll be able to test this, the user wants to use the  
machine.



$mpiexec -mca btl_sm_num_fifos 4 -np 4 a.out
Hello, world, I am 0 of 4
Hello, world, I am 1 of 4
Hello, world, I am 2 of 4
Hello, world, I am 3 of 4

$ mpiexec -mca btl_sm_num_fifos 8 -np 8 a.out
Hello, world, I am 0 of 8
Hello, world, I am 1 of 8
Hello, world, I am 2 of 8
Hello, world, I am 3 of 8
Hello, world, I am 4 of 8
Hello, world, I am 5 of 8
Hello, world, I am 6 of 8
Hello, world, I am 7 of 8

$ mpiexec -mca btl_sm_num_fifos 16 -np 16 a.out
--
mpiexec noticed that process rank 8 with PID 3659 on node  
spinoza.ldeo.columbia.edu exited on signal 11 (Segmentation fault).

--
$

Message from syslogd@spinoza at May  6 13:38:13 ...
kernel:[ cut here ]

Message from syslogd@spinoza at May  6 13:38:13 ...
kernel:invalid opcode:  [#1] SMP

Message from syslogd@spinoza at May  6 13:38:13 ...
kernel:last sysfs file: /sys/devices/system/cpu/cpu15/topology/ 
physical_package_id


Message from syslogd@spinoza at May  6 13:38:13 ...
kernel:Stack:

Message from syslogd@spinoza at May  6 13:38:13 ...
kernel:Call Trace:

Message from syslogd@spinoza at May  6 13:38:13 ...
kernel:Code: 48 89 45 a0 4c 89 ff e8 e0 dd 2b 00 41 8b b6 58 03 00  
00 4c 89 e7 ff c6 e8 b5 bc ff ff 41 8b 96 5c 03 00 00 48 98 48 39  
d0 73 04 <0f> 0b eb fe 48 29 d0 48 89 45 a8 66 41 ff 07 49 8b 94  
24 00 01


*

Many thanks,
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-


Eugene Loh wrote:

Gus Correa wrote:

Hi Eugene

Thank you for answering one of my original questions.

However, there seems to be a problem with the syntax.
Is it really "-mca btl btl_sm_num_fifos=some_number"?

No.  Try "--mca btl_sm_num_fifos 4".  Or,
% setenv OMPI_MCA_btl_sm_num_fifos 4
% ompi_info -a | grep btl_sm_num_fifos # check that things  

Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread Jeff Squyres
On May 6, 2010, at 2:01 PM, Gus Correa wrote:

> 1) Now I can see and use the btl_sm_num_fifos component:
> 
> I had committed already "btl = ^sm" to the openmpi-mca-params.conf
> file.  This apparently hides the btl_sm_num_fifos from ompi_info.
> 
> After I switched to no options in openmpi-mca-params.conf,
> then ompi_info showed the btl_sm_num_fifos component.
> 
> ompi_info --all | grep btl_sm_num_fifos
> MCA btl: parameter "btl_sm_num_fifos" (current value: "1", 
> data source: default value)
> 
> A side comment:
> This means that the system administrator can
> hide some Open MPI options from the users, depending on what
> he puts in the openmpi-mca-params.conf file, right?

Correct.

BUT: a user can always override the "btl" MCA param and see them again.  For 
example, you could also have done this:

   echo "btl =" > ~/.openmpi/mca-params.conf
   ompi_info --all | grep btl_sm_num_fifos
   # ...will show the sm params...

> 2) However, running with "sm" still breaks, unfortunately:
> 
> Boomer!

Doh!

> I get the same errors that I reported in my very
> first email, if I increase the number of processes to 16,
> to explore the hyperthreading range.
> 
> This is using "sm" (i.e. not excluded in the mca config file),
> and btl_sm_num_fifos (mpiexec command line)
> 
> The machine hangs, requires a hard reboot, etc, etc,
> as reported earlier.  See the below, please.

I saw that only some probably-unrelated dmesg messages were emitted.  Was there 
anything else revealing on the console and/or /var/log/* files?  Hard reboots 
absolutely should not be caused by Open MPI.

> So, I guess the conclusion is that I can use sm,
> but I have to remain within the range of physical cores (8),
> not oversubscribe, not try to explore the HT range.
> Should I expect it to work also for np>number of physical cores?

Your prior explanations of when HT is useful seemed pretty reasonable to me.  
Meaning: Nehalem HT will help only in some kinds of codes.  Dense computation 
codes with few conditional branches may not benefit much from HT.

But OMPI applications should always run *correctly*, regardless of HT or not-HT 
-- even if you're oversubscribing.  The performance may suffer (sometimes 
dramatically) if you oversubscribe physical cores with dense computational 
code, but it should always run *correctly*.

> I wonder if this would still work with np<=8, but with heavier code.
> (I only used hello_c.c so far.)

If hello_c is crashing your computer - even if you're running np>8 or np>16 -- 
something is wrong outside of Open MPI.  I routinely run np=100 hello_c on 
machines.

> $ mpiexec -mca btl_sm_num_fifos 16 -np 16 a.out
> --
> mpiexec noticed that process rank 8 with PID 3659 on node 
> spinoza.ldeo.columbia.edu exited on signal 11 (Segmentation fault).
> --
> $
> 
> Message from syslogd@spinoza at May  6 13:38:13 ...
> kernel:[ cut here ]
> 
> Message from syslogd@spinoza at May  6 13:38:13 ...
> kernel:invalid opcode:  [#1] SMP
> 
> Message from syslogd@spinoza at May  6 13:38:13 ...
> kernel:last sysfs file: 
> /sys/devices/system/cpu/cpu15/topology/physical_package_id
> 
> Message from syslogd@spinoza at May  6 13:38:13 ...
> kernel:Stack:
> 
> Message from syslogd@spinoza at May  6 13:38:13 ...
> kernel:Call Trace:
> 
> Message from syslogd@spinoza at May  6 13:38:13 ...
> kernel:Code: 48 89 45 a0 4c 89 ff e8 e0 dd 2b 00 41 8b b6 58 03 00 00 4c 89 
> e7 ff c6 e8 b5 bc ff ff 41 8b 96 5c 03 00 00 48 98 48 39 d0 73 04 <0f> 0b eb 
> fe 48 29 d0 48 89 45 a8 66 41 ff 07 49 8b 94 24 00 01

I unfortunately don't know what these messages mean...

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread Gus Correa

Hi Samuel

Samuel K. Gutierrez wrote:

Hi Gus,

This may not help, but it's worth a try.  If it's not too much trouble, 
can you please reconfigure your Open MPI installation with 
--enable-debug and then rebuild?  After that, may we see the stack trace 
from a core file that is produced after the segmentation fault?


Thanks,

--
Samuel K. Gutierrez
Los Alamos National Laboratory



Thank you for the suggestion.

I am a bit reluctant to try this because when it fails,
it *really* fails.
Most of the times the machine doesn't even return the prompt,
and in all cases it freezes and requires a hard reboot.
It is not a segfault that the OS can catch, I guess.
I wonder if enabling debug mode would do much for us,
and get to the point of dumping a core, or just die before that.

Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-


On May 6, 2010, at 12:01 PM, Gus Correa wrote:


Hi Eugene

Thanks for the detailed answer.

*

1) Now I can see and use the btl_sm_num_fifos component:

I had committed already "btl = ^sm" to the openmpi-mca-params.conf
file.  This apparently hides the btl_sm_num_fifos from ompi_info.

After I switched to no options in openmpi-mca-params.conf,
then ompi_info showed the btl_sm_num_fifos component.

ompi_info --all | grep btl_sm_num_fifos
MCA btl: parameter "btl_sm_num_fifos" (current value: 
"1", data source: default value)


A side comment:
This means that the system administrator can
hide some Open MPI options from the users, depending on what
he puts in the openmpi-mca-params.conf file, right?

*

2) However, running with "sm" still breaks, unfortunately:

Boomer!
I get the same errors that I reported in my very
first email, if I increase the number of processes to 16,
to explore the hyperthreading range.

This is using "sm" (i.e. not excluded in the mca config file),
and btl_sm_num_fifos (mpiexec command line)

The machine hangs, requires a hard reboot, etc, etc,
as reported earlier.  See the below, please.

So, I guess the conclusion is that I can use sm,
but I have to remain within the range of physical cores (8),
not oversubscribe, not try to explore the HT range.
Should I expect it to work also for np>number of physical cores?

I wonder if this would still work with np<=8, but with heavier code.
(I only used hello_c.c so far.)
Not sure I'll be able to test this, the user wants to use the machine.


$mpiexec -mca btl_sm_num_fifos 4 -np 4 a.out
Hello, world, I am 0 of 4
Hello, world, I am 1 of 4
Hello, world, I am 2 of 4
Hello, world, I am 3 of 4

$ mpiexec -mca btl_sm_num_fifos 8 -np 8 a.out
Hello, world, I am 0 of 8
Hello, world, I am 1 of 8
Hello, world, I am 2 of 8
Hello, world, I am 3 of 8
Hello, world, I am 4 of 8
Hello, world, I am 5 of 8
Hello, world, I am 6 of 8
Hello, world, I am 7 of 8

$ mpiexec -mca btl_sm_num_fifos 16 -np 16 a.out
-- 

mpiexec noticed that process rank 8 with PID 3659 on node 
spinoza.ldeo.columbia.edu exited on signal 11 (Segmentation fault).
-- 


$

Message from syslogd@spinoza at May  6 13:38:13 ...
kernel:[ cut here ]

Message from syslogd@spinoza at May  6 13:38:13 ...
kernel:invalid opcode:  [#1] SMP

Message from syslogd@spinoza at May  6 13:38:13 ...
kernel:last sysfs file: 
/sys/devices/system/cpu/cpu15/topology/physical_package_id


Message from syslogd@spinoza at May  6 13:38:13 ...
kernel:Stack:

Message from syslogd@spinoza at May  6 13:38:13 ...
kernel:Call Trace:

Message from syslogd@spinoza at May  6 13:38:13 ...
kernel:Code: 48 89 45 a0 4c 89 ff e8 e0 dd 2b 00 41 8b b6 58 03 00 00 
4c 89 e7 ff c6 e8 b5 bc ff ff 41 8b 96 5c 03 00 00 48 98 48 39 d0 73 
04 <0f> 0b eb fe 48 29 d0 48 89 45 a8 66 41 ff 07 49 8b 94 24 00 01


*

Many thanks,
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-


Eugene Loh wrote:

Gus Correa wrote:

Hi Eugene

Thank you for answering one of my original questions.

However, there seems to be a problem with the syntax.
Is it really "-mca btl btl_sm_num_fifos=some_number"?

No.  Try "--mca btl_sm_num_fifos 4".  Or,
% setenv OMPI_MCA_btl_sm_num_fifos 4
% ompi_info -a | grep btl_sm_num_fifos # check that things were 
set correctly

% mpirun -n 4 a.out

When I grep any component starting with btl_sm I get nothing:

ompi_info --all | grep btl_sm
(No output)
I'm no guru, but I think the reason has something to do with 
dynamically loaded somethings.  E.g.,

% 

Re: [OMPI users] MPI_Bsend vs. MPI_Ibsend (2)

2010-05-06 Thread Eugene Loh
First, to minimize ambiguity, it may make sense to distinguish 
explicitly between two buffers:  the send buffer (specified in the 
MPI_Send or MPI_Bsend call) and the attached buffer (specified in some 
MPI_Buffer_attach call).


Jovana Knezevic wrote:


On the other hand,  a slight confusion when Buffered send is concerned remains:
In my understanding, MPI_SEND (standard, blocking) does not return
until the send operation it invoked has completed. Completion can mean
the message was copied into an MPI internal buffer, or it can mean the
sending and receiving processes synchronized on the message.

MPI_Send will return when it is safe to reuse the send buffer.  No 
guarantees about anything having to do with the receiver.



So, if we
decide to use buffered send (Bsend, so blocking), and we say "I want
to allocate a large enough buffer, I want my data to be copied into
the buffer then, because I do not want anyone else to decide if I am
going to syncronize completely my sends and receives on the message -
I know what I'm doing :-)!" then as soon as the data is copied to the
buffer, the call returns and the buffer can be reused.
 

MPI_Bsend will return when it is safe to reuse the send buffer.  The 
message data might simply have been copied to the local attached buffer.



Is the difference in comparison to Ibsend that with Ibsend the data
doesn't even have to be copied to the buffer when the call returns,


right.


or
something like that? Because otherwise, I still do not see the
difference: data copied into buffer-> call returns! Why wouldn't I
reuse my message-buffer then?!



Re: [OMPI users] MPI_Bsend vs. MPI_Ibsend (2)

2010-05-06 Thread Richard Treumann

Bsend does not guarantee to use the attached buffer, Return from MPI_Ibsend
does not guarantee you can modify the application send buffer.

Maybe the implementation would try to optimize by scheduling a nonblocking
send from the apploication buffer that bypasses the copy to the attach
buffer. When you call WAIT, if the message had gone from the application
send buffer in the interim, the copy cost is saved.  If it has not, the
WAIT could copy into the attach buffer and let the send go from there what
the recv is posted.

I am not aware of an MPI that does this, but it would be a reasonable
optimization.

Dick Treumann  -  MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363



|>
| From:  |
|>
  
>|
  |Jovana Knezevic    
   |
  
>|
|>
| To:|
|>
  
>|
  |us...@open-mpi.org   
   |
  
>|
|>
| Date:  |
|>
  
>|
  |05/06/2010 03:36 PM  
   |
  
>|
|>
| Subject:   |
|>
  
>|
  |[OMPI users] MPI_Bsend vs. MPI_Ibsend (2)
   |
  
>|
|>
| Sent by:   |
|>
  
>|
  |users-boun...@open-mpi.org   
   |
  
>|





Thank you all!

Regarding the posted Recv, I am aware that neither send nor buffered
send tell the sender if it is posted.
Regarding the distinction between blocking and unblocking calls in
general, everything is clear as well.

On the other hand,  a slight confusion when Buffered send is concerned
remains:
In my understanding, MPI_SEND (standard, blocking) does not return
until the send operation it invoked has completed. Completion can mean
the message was copied into an MPI internal buffer, or it can mean the
sending and receiving processes synchronized on the message. So, if we
decide to use buffered send (Bsend, so blocking), and we say "I want
to allocate a large enough buffer, I want my data to be copied into
the buffer then, because I do not want anyone else to decide if I am
going to syncronize completely my sends and receives on the message -
I know what I'm doing :-)!" then as soon as the data is copied to the
buffer, the call returns and the buffer can be reused.
Is the difference in comparison to Ibsend that with Ibsend the data
doesn't even have to be copied to the buffer when the call returns, or
something like that? Because otherwise, I still do not see the
difference: data copied into buffer-> call returns! Why wouldn't I
reuse my message-buffer then?!

Sorry for bothering you so much, but for the type of applications I am
involved in this is very important issue, thus, it is crucial that
this becomes completely clear to me. Thank you again!

Cheers,
Jovana


> An MPI send (of any kind), is defined by "local completion semantics".
> When a send is complete, the send buffer may be reused. The only kind of
> send that gives any indication whether the receive is posted is the
> synchronous send. Neither standard send nor buffered send tell the sender
> if the recv was posted.
>
> The difference between 

[OMPI users] MPI_Bsend vs. MPI_Ibsend (2)

2010-05-06 Thread Jovana Knezevic
Thank you all!

Regarding the posted Recv, I am aware that neither send nor buffered
send tell the sender if it is posted.
Regarding the distinction between blocking and unblocking calls in
general, everything is clear as well.

On the other hand,  a slight confusion when Buffered send is concerned remains:
In my understanding, MPI_SEND (standard, blocking) does not return
until the send operation it invoked has completed. Completion can mean
the message was copied into an MPI internal buffer, or it can mean the
sending and receiving processes synchronized on the message. So, if we
decide to use buffered send (Bsend, so blocking), and we say "I want
to allocate a large enough buffer, I want my data to be copied into
the buffer then, because I do not want anyone else to decide if I am
going to syncronize completely my sends and receives on the message -
I know what I'm doing :-)!" then as soon as the data is copied to the
buffer, the call returns and the buffer can be reused.
Is the difference in comparison to Ibsend that with Ibsend the data
doesn't even have to be copied to the buffer when the call returns, or
something like that? Because otherwise, I still do not see the
difference: data copied into buffer-> call returns! Why wouldn't I
reuse my message-buffer then?!

Sorry for bothering you so much, but for the type of applications I am
involved in this is very important issue, thus, it is crucial that
this becomes completely clear to me. Thank you again!

Cheers,
Jovana


> An MPI send (of any kind), is defined by "local completion semantics".
> When a send is complete, the send buffer may be reused. The only kind of
> send that gives any indication whether the receive is posted is the
> synchronous send. Neither standard send nor buffered send tell the sender
> if the recv was posted.
>
> The difference between blocking and nonblocking is that a return from a
> blocking send call indicates the send buffer may be reused. A return from a
> nonblocking send does not allow the send buffer tpo be reused (but other
> things can be done).  The send buffer becomes available to reuse after a
> wait or successful test.
>
> Dick Treumann  -  MPI Team
> IBM Systems & Technology Group
> Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> Tele (845) 433-7846         Fax (845) 433-8363
>
>
>
> |>
> | From:      |
> |>
>  
> >|
>  |Bill Rankin                                            
>                                                             |
>  
> >|
> |>
> | To:        |
> |>
>  
> >|
>  |Open MPI Users                                          
>                                                             |
>  
> >|
> |>
> | Date:      |
> |>
>  
> >|
>  |05/06/2010 10:35 AM                                                         
>                                                             |
>  
> >|
> |>
> | Subject:   |
> |>
>  
> >|
>  |Re: [OMPI users] MPI_Bsend vs. MPI_Ibsend                                   
>                                                             |
>  
> >|
> |>
> | Sent by:   |
> |>
>  
> >|
>  |users-boun...@open-mpi.org                                                  
>                                                             |
>  
> >|
>
>
>
>
>
> Actually the 'B' in MPI_Bsend() specifies that it is a blocking *buffered*
> send.  So if I remember my standards correctly, this call requires:
>
> 1) you will have to explicitly manage the send buffers via
> 

Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread Eugene Loh

Gus Correa wrote:


2) However, running with "sm" still breaks, unfortunately:

I get the same errors that I reported in my very first email, if I 
increase the number of processes to 16, to explore the hyperthreading 
range.


This is using "sm" (i.e. not excluded in the mca config file), and 
btl_sm_num_fifos (mpiexec command line)


The machine hangs, requires a hard reboot, etc, etc, as reported earlier.


Okay.  I think this is different from trac 2043, then, since that 
involved a race condition that can be worked around by giving each 
sender its own FIFO.


So, I guess the conclusion is that I can use sm, but I have to remain 
within the range of physical cores (8), not oversubscribe, not try to 
explore the HT range.  Should I expect it to work also for np>number 
of physical cores?


Yes, I believe that would be a reasonable expectation (under 
circumstances other than the ones you're facing, in any case).  I just 
ran the examples/connectivity_c.c test with GCC on an 8-core Nehalem 
system with HT turned on and tested up to np=64.


Re: [OMPI users] Can NWChem be run with OpenMPI over an InfiniBand interconnect ... ??

2010-05-06 Thread Ralph Castain
Yeah, you just need to set the param specified in the warning message. We 
inserted that to ensure that people understand that IB doesn't play well with 
fork'd processes, so you need to be careful when doing so.


On May 6, 2010, at 12:27 PM, Addepalli, Srirangam V wrote:

> HelloRichard,
> Yes NWCHEM can be run on IB using 1.4.1.  If you have built openmpi with IB 
> support. 
> Note: If your IB cards are qlogic you need to compile NWCHEM with MPI-SPAWN.
> Rangam
> 
> Settings for my Build with MPI-SPAWN:
> export ARMCI_NETWORK=MPI-SPAWN
> export IB_HOME=/usr
> export IB_INCLUDE=/usr/include
> export IB_LIB=/usr/lib64
> export IB_LIB_NAME="-libverbs -libumad -lpthread "
> export NWCHEM_TOP=/lustre/work/apps/nwchem-5.1.1
> export NWCHEM_MODULES="venus geninterface all"
> export LIBMPI="-lmpi"
> export ARMCI_DEFAULT_SHMMAX=256
> export BLASLIB=goto2_penrynp-r1.00
> export BLASLOC=/lustre/work/apps/goto/
> export BLASOPT="-L/lustre/work/apps/goto/ -l$BLASLIB"
> export CC=icc
> export CFLG="-xP -fPIC"
> export CXX=icpc
> export F77=ifort
> export F90=ifort
> export FC=ifort
> export FL=ifort
> export LARGE_FILES=TRUE
> export LIB_DEFINES=-DDFLT_TOT_MEM=2147483647
> export MPI_LOC=/lustre/work/apps/IB-ICC-IFORT-OPENMPI-IB/
> export MPI_INCLUDE=$MPI_LOC/include
> export MPI_LIB=$MPI_LOC/lib
> export MPI_BIN=$MPI_LOC/bin
> export NWCHEM_TARGET=LINUX64
> export TARGET=LINUX64
> export USE_MPI=y
> 
> Setting with OPENIB
> 
> export ARMCI_NETWORK=OPENIB
> export IB_HOME=/usr
> export IB_INCLUDE=/usr/include
> export IB_LIB=/usr/lib64
> export IBV_FORK_SAFE=1
> export NWCHEM_TOP=/lustre/work/apps/nwchem-5.1.1
> export NWCHEM_MODULES="all qm geninterface"
> #export LIBMPI="-lmpich -libumad -libverbs -lrdmacm -pthread"
> export LIBMPI="-lmpi -pthread -libumad -libverbs -lrdmacm -pthread"
> export ARMCI_DEFAULT_SHMMAX=256
> export BLASLIB=goto2_penrynp-r1.00
> export BLASLOC=/lustre/work/apps/goto/
> export BLASOPT="-L/lustre/work/apps/goto/ -l$BLASLIB"
> export CC=icc
> export CFLG="-xP -fPIC"
> export CXX=icpc
> export F77=ifort
> export F90=ifort
> export FC=ifort
> export FL=ifort
> export GOTO_NUM_THREADS=1
> export LARGE_FILES=TRUE
> export LIB_DEFINES=-DDFLT_TOT_MEM=2147483647
> export MA_USE_ARMCI_MEM=1
> export MPI_LOC=/lustre/work/apps/IB-ICC-IFORT-OPENMPI
> export MPI_INCLUDE=$MPI_LOC/include
> export MPI_LIB=$MPI_LOC/lib
> export MPI_BIN=$MPI_LOC/bin
> export NWCHEM_TARGET=LINUX64
> export OMP_NUM_THREADS=1
> export TARGET=LINUX64
> export USE_MPI=y
> 
> 
> 
> From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of 
> Richard Walsh [richard.wa...@csi.cuny.edu]
> Sent: Thursday, May 06, 2010 1:06 PM
> To: us...@open-mpi.org
> Subject: [OMPI users] Can NWChem be run with OpenMPI over an InfiniBand 
> interconnect ... ??
> 
> All,
> 
> I have built NWChem successfully, and trying to run it with an
> Intel built version of OpenMPI 1.4.1.  If I force to run over over
> 1 GigE maintenance interconnect it works, but when I try it over
> the default InfiniBand communications network it fails with:
> 
> --
> An MPI process has executed an operation involving a call to the
> "fork()" system call to create a child process.  Open MPI is currently
> operating in a condition that could result in memory corruption or
> other system errors; your MPI job may hang, crash, or produce silent
> data corruption.  The use of fork() (or system() or other calls that
> create child processes) is strongly discouraged.
> 
> The process that invoked fork was:
> 
>  Local host:  gpute-2 (PID 15996)
>  MPI_COMM_WORLD rank: 0
> 
> If you are *absolutely sure* that your application will successfully
> and correctly survive a call to fork(), you may disable this warning
> by setting the mpi_warn_on_fork MCA parameter to 0.
> --
> 
> This looks to be a known problem.  Is there I go around?  I have seen
> it suggested in some places that I need to use Mellanox's version of MPI,
> which is not an option and surprises me as they are a big OFED contributor.
> 
> What are my options ... other than using GigE ... ??
> 
> Thanks,
> 
> rbw
> 
> 
> 
> 
>   Richard Walsh
>   Parallel Applications and Systems Manager
>   CUNY HPC Center, Staten Island, NY
>   718-982-3319
>   612-382-4620
> 
>   Mighty the Wizard
>   Who found me at sunrise
>   Sleeping, and woke me
>   And learn'd me Magic!
> 
> Think green before you print this email.
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread Ralph Castain
I know a few national labs that run OMPI w/Fedora 9, but that isn't on Nehalem 
hardware and is using gcc 4.3.x.

However, I think the key issue really is the compiler. I have seen similar 
problems on multiple platforms and OS's whenever I use GCC 4.4.x - I -think- it 
has to do with the automatic vectorization in that compiler, but I can't swear 
to it.

You can always install a personal copy of gcc for your own use on the system 
and see if that solves the problem. Just download a version like 4.3.x from the 
gnu site.

I know 4.3.x doesn't have a problem, though again I haven't tried it on Nehalem.


On May 6, 2010, at 12:10 PM, Gus Correa wrote:

> Hi Jeff
> 
> Thank you for your testimony.
> 
> So now I have two important data points (you and Douglas Guptill)
> to support the argument here that installing Fedora
> on machines meant to do scientific and parallel computation
> is to ask for trouble.
> 
> I use CentOS in our cluster, but this is a standalone machine
> I don't have control of.
> 
> Anybody out there using Open MPI + Fedora Core + Nehalem ?
> Happy?
> 
> Regards,
> Gus Correa
> 
> Jeff Squyres wrote:
>> On May 6, 2010, at 1:11 PM, Gus Correa wrote:
>>> Just for the record, I am using:
>>> Open MPI 1.4.2 (released 2 days ago), gcc 4.4.3  (g++, gfortran).
>>> All on Fedora Core 12, kernel 2.6.32.11-99.fc12.x86_64 #1 SMP.
>> Someone asked earlier in this thread -- I've used RHEL4.4 and RHEL5.4 on my 
>> Nehalem EP boxen.  I used the default gcc on those RHELs for compiling 
>> everything (OMPI + apps).  I don't remember what it was on RHEL 4.4, but on 
>> RHEL 5.4, it's GCC 4.1.2.
>>> You and Jeff reported that your
>>> Nehalems get along with Open MPI.
>>> I would guess other people have functional Open MPI + Nehalem systems.
>>> All I can think of is that some mess with the OS/gcc is causing
>>> the trouble here.
>> I don't have much experience with kernels outside 
> of the RHEL kernels,
> so I don't know if 2.6.32 is problematic or not.  :-(
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Can NWChem be run with OpenMPI over an InfiniBand interconnect ... ??

2010-05-06 Thread Addepalli, Srirangam V
HelloRichard,
Yes NWCHEM can be run on IB using 1.4.1.  If you have built openmpi with IB 
support. 
Note: If your IB cards are qlogic you need to compile NWCHEM with MPI-SPAWN.
Rangam

Settings for my Build with MPI-SPAWN:
export ARMCI_NETWORK=MPI-SPAWN
export IB_HOME=/usr
export IB_INCLUDE=/usr/include
export IB_LIB=/usr/lib64
export IB_LIB_NAME="-libverbs -libumad -lpthread "
export NWCHEM_TOP=/lustre/work/apps/nwchem-5.1.1
export NWCHEM_MODULES="venus geninterface all"
export LIBMPI="-lmpi"
export ARMCI_DEFAULT_SHMMAX=256
export BLASLIB=goto2_penrynp-r1.00
export BLASLOC=/lustre/work/apps/goto/
export BLASOPT="-L/lustre/work/apps/goto/ -l$BLASLIB"
export CC=icc
export CFLG="-xP -fPIC"
export CXX=icpc
export F77=ifort
export F90=ifort
export FC=ifort
export FL=ifort
export LARGE_FILES=TRUE
export LIB_DEFINES=-DDFLT_TOT_MEM=2147483647
export MPI_LOC=/lustre/work/apps/IB-ICC-IFORT-OPENMPI-IB/
export MPI_INCLUDE=$MPI_LOC/include
export MPI_LIB=$MPI_LOC/lib
export MPI_BIN=$MPI_LOC/bin
export NWCHEM_TARGET=LINUX64
export TARGET=LINUX64
export USE_MPI=y

Setting with OPENIB

export ARMCI_NETWORK=OPENIB
export IB_HOME=/usr
export IB_INCLUDE=/usr/include
export IB_LIB=/usr/lib64
export IBV_FORK_SAFE=1
export NWCHEM_TOP=/lustre/work/apps/nwchem-5.1.1
export NWCHEM_MODULES="all qm geninterface"
#export LIBMPI="-lmpich -libumad -libverbs -lrdmacm -pthread"
export LIBMPI="-lmpi -pthread -libumad -libverbs -lrdmacm -pthread"
export ARMCI_DEFAULT_SHMMAX=256
export BLASLIB=goto2_penrynp-r1.00
export BLASLOC=/lustre/work/apps/goto/
export BLASOPT="-L/lustre/work/apps/goto/ -l$BLASLIB"
export CC=icc
export CFLG="-xP -fPIC"
export CXX=icpc
export F77=ifort
export F90=ifort
export FC=ifort
export FL=ifort
export GOTO_NUM_THREADS=1
export LARGE_FILES=TRUE
export LIB_DEFINES=-DDFLT_TOT_MEM=2147483647
export MA_USE_ARMCI_MEM=1
export MPI_LOC=/lustre/work/apps/IB-ICC-IFORT-OPENMPI
export MPI_INCLUDE=$MPI_LOC/include
export MPI_LIB=$MPI_LOC/lib
export MPI_BIN=$MPI_LOC/bin
export NWCHEM_TARGET=LINUX64
export OMP_NUM_THREADS=1
export TARGET=LINUX64
export USE_MPI=y



From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of 
Richard Walsh [richard.wa...@csi.cuny.edu]
Sent: Thursday, May 06, 2010 1:06 PM
To: us...@open-mpi.org
Subject: [OMPI users] Can NWChem be run with OpenMPI over an InfiniBand 
interconnect ... ??

All,

I have built NWChem successfully, and trying to run it with an
Intel built version of OpenMPI 1.4.1.  If I force to run over over
1 GigE maintenance interconnect it works, but when I try it over
the default InfiniBand communications network it fails with:

--
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

  Local host:  gpute-2 (PID 15996)
  MPI_COMM_WORLD rank: 0

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--

This looks to be a known problem.  Is there I go around?  I have seen
it suggested in some places that I need to use Mellanox's version of MPI,
which is not an option and surprises me as they are a big OFED contributor.

What are my options ... other than using GigE ... ??

Thanks,

rbw




   Richard Walsh
   Parallel Applications and Systems Manager
   CUNY HPC Center, Staten Island, NY
   718-982-3319
   612-382-4620

   Mighty the Wizard
   Who found me at sunrise
   Sleeping, and woke me
   And learn'd me Magic!

Think green before you print this email.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread Samuel K. Gutierrez

Hi Gus,

This may not help, but it's worth a try.  If it's not too much  
trouble, can you please reconfigure your Open MPI installation with -- 
enable-debug and then rebuild?  After that, may we see the stack trace  
from a core file that is produced after the segmentation fault?


Thanks,

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On May 6, 2010, at 12:01 PM, Gus Correa wrote:


Hi Eugene

Thanks for the detailed answer.

*

1) Now I can see and use the btl_sm_num_fifos component:

I had committed already "btl = ^sm" to the openmpi-mca-params.conf
file.  This apparently hides the btl_sm_num_fifos from ompi_info.

After I switched to no options in openmpi-mca-params.conf,
then ompi_info showed the btl_sm_num_fifos component.

ompi_info --all | grep btl_sm_num_fifos
MCA btl: parameter "btl_sm_num_fifos" (current  
value: "1", data source: default value)


A side comment:
This means that the system administrator can
hide some Open MPI options from the users, depending on what
he puts in the openmpi-mca-params.conf file, right?

*

2) However, running with "sm" still breaks, unfortunately:

Boomer!
I get the same errors that I reported in my very
first email, if I increase the number of processes to 16,
to explore the hyperthreading range.

This is using "sm" (i.e. not excluded in the mca config file),
and btl_sm_num_fifos (mpiexec command line)

The machine hangs, requires a hard reboot, etc, etc,
as reported earlier.  See the below, please.

So, I guess the conclusion is that I can use sm,
but I have to remain within the range of physical cores (8),
not oversubscribe, not try to explore the HT range.
Should I expect it to work also for np>number of physical cores?

I wonder if this would still work with np<=8, but with heavier code.
(I only used hello_c.c so far.)
Not sure I'll be able to test this, the user wants to use the machine.


$mpiexec -mca btl_sm_num_fifos 4 -np 4 a.out
Hello, world, I am 0 of 4
Hello, world, I am 1 of 4
Hello, world, I am 2 of 4
Hello, world, I am 3 of 4

$ mpiexec -mca btl_sm_num_fifos 8 -np 8 a.out
Hello, world, I am 0 of 8
Hello, world, I am 1 of 8
Hello, world, I am 2 of 8
Hello, world, I am 3 of 8
Hello, world, I am 4 of 8
Hello, world, I am 5 of 8
Hello, world, I am 6 of 8
Hello, world, I am 7 of 8

$ mpiexec -mca btl_sm_num_fifos 16 -np 16 a.out
--
mpiexec noticed that process rank 8 with PID 3659 on node  
spinoza.ldeo.columbia.edu exited on signal 11 (Segmentation fault).

--
$

Message from syslogd@spinoza at May  6 13:38:13 ...
kernel:[ cut here ]

Message from syslogd@spinoza at May  6 13:38:13 ...
kernel:invalid opcode:  [#1] SMP

Message from syslogd@spinoza at May  6 13:38:13 ...
kernel:last sysfs file: /sys/devices/system/cpu/cpu15/topology/ 
physical_package_id


Message from syslogd@spinoza at May  6 13:38:13 ...
kernel:Stack:

Message from syslogd@spinoza at May  6 13:38:13 ...
kernel:Call Trace:

Message from syslogd@spinoza at May  6 13:38:13 ...
kernel:Code: 48 89 45 a0 4c 89 ff e8 e0 dd 2b 00 41 8b b6 58 03 00  
00 4c 89 e7 ff c6 e8 b5 bc ff ff 41 8b 96 5c 03 00 00 48 98 48 39 d0  
73 04 <0f> 0b eb fe 48 29 d0 48 89 45 a8 66 41 ff 07 49 8b 94 24 00 01


*

Many thanks,
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-


Eugene Loh wrote:

Gus Correa wrote:

Hi Eugene

Thank you for answering one of my original questions.

However, there seems to be a problem with the syntax.
Is it really "-mca btl btl_sm_num_fifos=some_number"?

No.  Try "--mca btl_sm_num_fifos 4".  Or,
% setenv OMPI_MCA_btl_sm_num_fifos 4
% ompi_info -a | grep btl_sm_num_fifos # check that things were  
set correctly

% mpirun -n 4 a.out

When I grep any component starting with btl_sm I get nothing:

ompi_info --all | grep btl_sm
(No output)
I'm no guru, but I think the reason has something to do with  
dynamically loaded somethings.  E.g.,

% /home/eugene/ompi/bin/ompi_info --all | grep btl_sm_num_fifos
(no output)
% setenv OPAL_PREFIX /home/eugene/ompi
% set path = ( $OPAL_PREFIX/bin $path )
% ompi_info --all | grep btl_sm_num_fifos
   MCA btl: parameter "btl_sm_num_fifos" (current  
value: "1", data source: default value)

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread Gus Correa

Hi Jeff

Thank you for your testimony.

So now I have two important data points (you and Douglas Guptill)
to support the argument here that installing Fedora
on machines meant to do scientific and parallel computation
is to ask for trouble.

I use CentOS in our cluster, but this is a standalone machine
I don't have control of.

Anybody out there using Open MPI + Fedora Core + Nehalem ?
Happy?

Regards,
Gus Correa

Jeff Squyres wrote:

On May 6, 2010, at 1:11 PM, Gus Correa wrote:


Just for the record, I am using:
Open MPI 1.4.2 (released 2 days ago), gcc 4.4.3  (g++, gfortran).
All on Fedora Core 12, kernel 2.6.32.11-99.fc12.x86_64 #1 SMP.


Someone asked earlier in this thread -- I've used RHEL4.4 and RHEL5.4 on my 
Nehalem EP boxen.  I used the default gcc on those RHELs for compiling 
everything (OMPI + apps).  I don't remember what it was on RHEL 4.4, but on 
RHEL 5.4, it's GCC 4.1.2.


You and Jeff reported that your
Nehalems get along with Open MPI.
I would guess other people have functional Open MPI + Nehalem systems.
All I can think of is that some mess with the OS/gcc is causing
the trouble here.


I don't have much experience with kernels outside 

of the RHEL kernels,
so I don't know if 2.6.32 is problematic or not.  :-(






[OMPI users] Can NWChem be run with OpenMPI over an InfiniBand interconnect ... ??

2010-05-06 Thread Richard Walsh

All,

I have built NWChem successfully, and trying to run it with an
Intel built version of OpenMPI 1.4.1.  If I force to run over over
1 GigE maintenance interconnect it works, but when I try it over
the default InfiniBand communications network it fails with:

--
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

  Local host:  gpute-2 (PID 15996)
  MPI_COMM_WORLD rank: 0

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--

This looks to be a known problem.  Is there I go around?  I have seen
it suggested in some places that I need to use Mellanox's version of MPI,
which is not an option and surprises me as they are a big OFED contributor.

What are my options ... other than using GigE ... ??

Thanks,

rbw




   Richard Walsh
   Parallel Applications and Systems Manager
   CUNY HPC Center, Staten Island, NY
   718-982-3319
   612-382-4620

   Mighty the Wizard
   Who found me at sunrise
   Sleeping, and woke me
   And learn'd me Magic!

Think green before you print this email.



Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread Gus Correa

Hi Eugene

Thanks for the detailed answer.

*

1) Now I can see and use the btl_sm_num_fifos component:

I had committed already "btl = ^sm" to the openmpi-mca-params.conf
file.  This apparently hides the btl_sm_num_fifos from ompi_info.

After I switched to no options in openmpi-mca-params.conf,
then ompi_info showed the btl_sm_num_fifos component.

ompi_info --all | grep btl_sm_num_fifos
 MCA btl: parameter "btl_sm_num_fifos" (current value: 
"1", data source: default value)


A side comment:
This means that the system administrator can
hide some Open MPI options from the users, depending on what
he puts in the openmpi-mca-params.conf file, right?

*

2) However, running with "sm" still breaks, unfortunately:

Boomer!
I get the same errors that I reported in my very
first email, if I increase the number of processes to 16,
to explore the hyperthreading range.

This is using "sm" (i.e. not excluded in the mca config file),
and btl_sm_num_fifos (mpiexec command line)

The machine hangs, requires a hard reboot, etc, etc,
as reported earlier.  See the below, please.

So, I guess the conclusion is that I can use sm,
but I have to remain within the range of physical cores (8),
not oversubscribe, not try to explore the HT range.
Should I expect it to work also for np>number of physical cores?

I wonder if this would still work with np<=8, but with heavier code.
(I only used hello_c.c so far.)
Not sure I'll be able to test this, the user wants to use the machine.


$mpiexec -mca btl_sm_num_fifos 4 -np 4 a.out
Hello, world, I am 0 of 4
Hello, world, I am 1 of 4
Hello, world, I am 2 of 4
Hello, world, I am 3 of 4

$ mpiexec -mca btl_sm_num_fifos 8 -np 8 a.out
Hello, world, I am 0 of 8
Hello, world, I am 1 of 8
Hello, world, I am 2 of 8
Hello, world, I am 3 of 8
Hello, world, I am 4 of 8
Hello, world, I am 5 of 8
Hello, world, I am 6 of 8
Hello, world, I am 7 of 8

$ mpiexec -mca btl_sm_num_fifos 16 -np 16 a.out
--
mpiexec noticed that process rank 8 with PID 3659 on node 
spinoza.ldeo.columbia.edu exited on signal 11 (Segmentation fault).

--
$

Message from syslogd@spinoza at May  6 13:38:13 ...
 kernel:[ cut here ]

Message from syslogd@spinoza at May  6 13:38:13 ...
 kernel:invalid opcode:  [#1] SMP

Message from syslogd@spinoza at May  6 13:38:13 ...
 kernel:last sysfs file: 
/sys/devices/system/cpu/cpu15/topology/physical_package_id


Message from syslogd@spinoza at May  6 13:38:13 ...
 kernel:Stack:

Message from syslogd@spinoza at May  6 13:38:13 ...
 kernel:Call Trace:

Message from syslogd@spinoza at May  6 13:38:13 ...
 kernel:Code: 48 89 45 a0 4c 89 ff e8 e0 dd 2b 00 41 8b b6 58 03 00 00 
4c 89 e7 ff c6 e8 b5 bc ff ff 41 8b 96 5c 03 00 00 48 98 48 39 d0 73 04 
<0f> 0b eb fe 48 29 d0 48 89 45 a8 66 41 ff 07 49 8b 94 24 00 01


*

Many thanks,
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-


Eugene Loh wrote:

Gus Correa wrote:


Hi Eugene

Thank you for answering one of my original questions.

However, there seems to be a problem with the syntax.
Is it really "-mca btl btl_sm_num_fifos=some_number"?


No.  Try "--mca btl_sm_num_fifos 4".  Or,

% setenv OMPI_MCA_btl_sm_num_fifos 4
% ompi_info -a | grep btl_sm_num_fifos # check that things were set 
correctly

% mpirun -n 4 a.out


When I grep any component starting with btl_sm I get nothing:

ompi_info --all | grep btl_sm
(No output)


I'm no guru, but I think the reason has something to do with dynamically 
loaded somethings.  E.g.,


% /home/eugene/ompi/bin/ompi_info --all | grep btl_sm_num_fifos
(no output)
% setenv OPAL_PREFIX /home/eugene/ompi
% set path = ( $OPAL_PREFIX/bin $path )
% ompi_info --all | grep btl_sm_num_fifos
MCA btl: parameter "btl_sm_num_fifos" (current value: 
"1", data source: default value)

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread Jeff Squyres
On May 6, 2010, at 1:11 PM, Gus Correa wrote:

> Just for the record, I am using:
> Open MPI 1.4.2 (released 2 days ago), gcc 4.4.3  (g++, gfortran).
> All on Fedora Core 12, kernel 2.6.32.11-99.fc12.x86_64 #1 SMP.

Someone asked earlier in this thread -- I've used RHEL4.4 and RHEL5.4 on my 
Nehalem EP boxen.  I used the default gcc on those RHELs for compiling 
everything (OMPI + apps).  I don't remember what it was on RHEL 4.4, but on 
RHEL 5.4, it's GCC 4.1.2.

> You and Jeff reported that your
> Nehalems get along with Open MPI.
> I would guess other people have functional Open MPI + Nehalem systems.
> All I can think of is that some mess with the OS/gcc is causing
> the trouble here.

I don't have much experience with kernels outside of the RHEL kernels, so I 
don't know if 2.6.32 is problematic or not.  :-(

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread Gus Correa

Hi Douglas

Just for the record, I am using:
Open MPI 1.4.2 (released 2 days ago), gcc 4.4.3  (g++, gfortran).
All on Fedora Core 12, kernel 2.6.32.11-99.fc12.x86_64 #1 SMP.
The machine is a white box with two-way
quad-core Intel Xeon (Nehalem) E5540  @ 2.53GHz, 48GB RAM.
Hyperthreading is currently turned on.

But please, don't spend more time on this.
You already gave a lot of help.

I guess this would be fixed if I could reinstall the OS
using a more stable Linux distribution, not Fedora.
You and Jeff reported that your
Nehalems get along with Open MPI.
I would guess other people have functional Open MPI + Nehalem systems.
All I can think of is that some mess with the OS/gcc is causing
the trouble here.

(Yes, to avoid trouble I always compile MPI
and applications with the same compiler set.
And keep a bunch of Open MPI builds to match our needs.)

Cheers,
Gus Correa


Douglas Guptill wrote:

Hello Gus:

On Thu, May 06, 2010 at 11:26:57AM -0400, Gus Correa wrote:


Douglas:



Would you know which gcc you used to build your Open MPI?
Or did you use Intel icc instead?


Intel ifort and icc.  I build OpenMPI with the same compiler, and same
options, that I build my application with.

I have been tempted to try and duplicate your problem.  Would that be a
helpful experiment?  gcc, OpenMPI 1.4.1, IIRC ?

Regards,
Douglas.




Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread Eugene Loh

Gus Correa wrote:


Hi Eugene

Thank you for answering one of my original questions.

However, there seems to be a problem with the syntax.
Is it really "-mca btl btl_sm_num_fifos=some_number"?


No.  Try "--mca btl_sm_num_fifos 4".  Or,

% setenv OMPI_MCA_btl_sm_num_fifos 4
% ompi_info -a | grep btl_sm_num_fifos # check that things were set 
correctly

% mpirun -n 4 a.out


When I grep any component starting with btl_sm I get nothing:

ompi_info --all | grep btl_sm
(No output)


I'm no guru, but I think the reason has something to do with dynamically 
loaded somethings.  E.g.,


% /home/eugene/ompi/bin/ompi_info --all | grep btl_sm_num_fifos
(no output)
% setenv OPAL_PREFIX /home/eugene/ompi
% set path = ( $OPAL_PREFIX/bin $path )
% ompi_info --all | grep btl_sm_num_fifos
MCA btl: parameter "btl_sm_num_fifos" (current value: 
"1", data source: default value)


Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread Douglas Guptill
Hello Gus:

On Thu, May 06, 2010 at 11:26:57AM -0400, Gus Correa wrote:

> Douglas:

> Would you know which gcc you used to build your Open MPI?
> Or did you use Intel icc instead?

Intel ifort and icc.  I build OpenMPI with the same compiler, and same
options, that I build my application with.

I have been tempted to try and duplicate your problem.  Would that be a
helpful experiment?  gcc, OpenMPI 1.4.1, IIRC ?

Regards,
Douglas.
-- 
  Douglas Guptill   voice: 902-461-9749
  Research Assistant, LSC 4640  email: douglas.gupt...@dal.ca
  Oceanography Department   fax:   902-494-3877
  Dalhousie University
  Halifax, NS, B3H 4J1, Canada



Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread Gus Correa

Hi Eugene

Thank you for answering one of my original questions.

However, there seems to be a problem with the syntax.
Is it really "-mca btl btl_sm_num_fifos=some_number"?
(FYI, I am using Open MPI 4.1.2, a tarball from two days ago.)

When I grep any component starting with btl_sm I get nothing:

ompi_info --all | grep btl_sm
(No output)


When I try to run with it, it fails telling me it cannot
find the btl_sm_num_fifos component:


mpiexec -mca btl sm,self -mca btl btl_sm_num_fifos=4 -np 4 ./a.out
--
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:  spinoza.ldeo.columbia.edu
Framework: btl
Component: btl_sm_num_fifos=4
--

Thank you,
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-


Eugene Loh wrote:

Ralph Castain wrote:


Yo Gus

Just saw a ticket go by reminding us about continuing hang problems on 
shared memory when building with gcc 4.4.x - any  chance you are in 
that category? You might have said something earlier in this thread
 


Going back to the original e-mail in this thread:

Gus Correa wrote:

Use -mca btl -mca btl_sm_num_fifos=some_number ? (Which number?) 


Another experiment to try would be to keep sm on, but try changing 
btl_sm_num_fifos as above.  The number to use would be the number of 
processes on the node.  E.g., if all processes are running on the same 
box, just use the same number as processes in the job.  The results 
might help narrow down the possibilities here.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread Gus Correa

Hi Ralph, Douglas

Ralph:

Yes, I am in black list of your ticket (gcc 4.4.3):

gcc --version
gcc (GCC) 4.4.3 20100127 (Red Hat 4.4.3-4)
Copyright (C) 2010 Free Software Foundation, Inc.

Is is possible (and not too time consuming) to install an
older gcc on this Fedora 12 box, and compile Open MPI with it?
(It may be easier just to install another Linux distribution,
I would guess.  Fedora was not my choice, it is just a PITA.)

Douglas:

Thank you so much for telling your Linux distro, version, etc.
Now it is really starting to look as a distro/kernel/gcc issue.
I would not use Fedora, but I don't administer the box.

Would you know which gcc you used to build your Open MPI?
Or did you use Intel icc instead?

Cheers,
Gus
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-

Ralph Castain wrote:

Yo Gus

Just saw a ticket go by reminding us about 
continuing hang problems on shared memory when building with 
gcc 4.4.x - any  chance you are in that category? 
You might have said something earlier in this thread


On May 5, 2010, at 5:54 PM, Douglas Guptill wrote:


On Wed, May 05, 2010 at 06:08:57PM -0400, Gus Correa wrote:


If anybody else has Open MPI working with hyperthreading and "sm"
on a Nehalem box, I would appreciate any information about the
Linux distro and kernel version being used.

Debian 5 (lenny), Core i7 920, Asus P6T MoBo, 12GB RAM, OpenMPI 1.2.8
(with a custom-built MPI_recv.c and MPI_Send.c, which cut down on the
cpu load caused by the busy wait polling).  We have six (6) of these
machines.  All configured the same.

uname -a yields:
Linux screm 2.6.26-2-amd64 #1 SMP Thu Feb 11 00:59:32 UTC 2010 x86_64 GNU/Linux

HyperThreading is on.

Applications are -np 2 only:
 mpirun --host localhost,localhost --byslot --mca btl sm,self -np 2 ${BIN}

We normally run (up to) 4 of these jobs on each machine.

Using Intel 11.0.074 and 11.1.0** compilers; have trouble with the
11.1.0** and "-mcmodel=large -shared-intel" builds.  Trouble meaning
the numerical results vary strangely.  Still working on that problem.

Hope that helps,
Douglas.

P.S.  Yes, I know OpenMPI 1.2.8 is old.  We have been using it for 2
years with no apparent problems.  When I saw comments like "machine
hung" for 1.4.1, and "data loss" for 1.3.x, I put aside thoughts of
upgrading.

--
 Douglas Guptill   voice: 902-461-9749
 Research Assistant, LSC 4640  email: douglas.gupt...@dal.ca
 Oceanography Department   fax:   902-494-3877
 Dalhousie University
 Halifax, NS, B3H 4J1, Canada

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] MPI_Bsend vs. MPI_Ibsend

2010-05-06 Thread Richard Treumann

An MPI send (of any kind), is defined by "local completion semantics".
When a send is complete, the send buffer may be reused. The only kind of
send that gives any indication whether the receive is posted is the
synchronous send. Neither standard send nor buffered send tell the sender
if the recv was posted.

The difference between blocking and nonblocking is that a return from a
blocking send call indicates the send buffer may be reused. A return from a
nonblocking send does not allow the send buffer tpo be reused (but other
things can be done).  The send buffer becomes available to reuse after a
wait or successful test.

Dick Treumann  -  MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363



|>
| From:  |
|>
  
>|
  |Bill Rankin 
   |
  
>|
|>
| To:|
|>
  
>|
  |Open MPI Users   
   |
  
>|
|>
| Date:  |
|>
  
>|
  |05/06/2010 10:35 AM  
   |
  
>|
|>
| Subject:   |
|>
  
>|
  |Re: [OMPI users] MPI_Bsend vs. MPI_Ibsend
   |
  
>|
|>
| Sent by:   |
|>
  
>|
  |users-boun...@open-mpi.org   
   |
  
>|





Actually the 'B' in MPI_Bsend() specifies that it is a blocking *buffered*
send.  So if I remember my standards correctly, this call requires:

1) you will have to explicitly manage the send buffers via
MPI_Buffer_[attach|detach](), and

2) the send will block until a corresponding receive is posted.

The MPI_Ibsend() is the immediate version of the above and will return w/o
the requirement for the corresponding received.  Since it is a buffered
send the local data copy should be completed before it returns, allowing
you to change the contents of the local data buffer.  But there is no
guaranty that the message has been send, so you should not reuse the send
buffer until after verifying the completion of the send via MPI_Wait() or
similar.

In your example, since MPI_Test() won't block, you can have a problem.  Use
MPI_Wait() instead or change your send buffer to one that is not being
used.

-bill



-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Jovana Knezevic
Sent: Thursday, May 06, 2010 4:44 AM
To: us...@open-mpi.org
Subject: [OMPI users] MPI_Bsend vs. MPI_Ibsend

Dear all,

Could anyone please clarify me the difference between MPI_Bsend and
MPI_Ibsend? Or, in other words, what exactly is "blocking" in
MPI_Bsend, when the data is stored in the buffer and we "return"? :-)
Another, but similar, question:

What about the data-buffer - when can it be reused in each of the
cases - simple examples:

for (i=0; i

Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread Eugene Loh

Ralph Castain wrote:


Yo Gus

Just saw a ticket go by reminding us about continuing hang problems on shared 
memory when building with gcc 4.4.x - any  chance you are in that category? You 
might have said something earlier in this thread
 


Going back to the original e-mail in this thread:

Gus Correa wrote:

Use -mca btl -mca btl_sm_num_fifos=some_number ? (Which number?) 


Another experiment to try would be to keep sm on, but try changing 
btl_sm_num_fifos as above.  The number to use would be the number of 
processes on the node.  E.g., if all processes are running on the same 
box, just use the same number as processes in the job.  The results 
might help narrow down the possibilities here.




Re: [OMPI users] MPI_Bsend vs. MPI_Ibsend

2010-05-06 Thread Bill Rankin
Actually the 'B' in MPI_Bsend() specifies that it is a blocking *buffered* 
send.  So if I remember my standards correctly, this call requires:

1) you will have to explicitly manage the send buffers via 
MPI_Buffer_[attach|detach](), and

2) the send will block until a corresponding receive is posted.

The MPI_Ibsend() is the immediate version of the above and will return w/o the 
requirement for the corresponding received.  Since it is a buffered send the 
local data copy should be completed before it returns, allowing you to change 
the contents of the local data buffer.  But there is no guaranty that the 
message has been send, so you should not reuse the send buffer until after 
verifying the completion of the send via MPI_Wait() or similar.

In your example, since MPI_Test() won't block, you can have a problem.  Use 
MPI_Wait() instead or change your send buffer to one that is not being used.

-bill



-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Jovana Knezevic
Sent: Thursday, May 06, 2010 4:44 AM
To: us...@open-mpi.org
Subject: [OMPI users] MPI_Bsend vs. MPI_Ibsend

Dear all,

Could anyone please clarify me the difference between MPI_Bsend and
MPI_Ibsend? Or, in other words, what exactly is "blocking" in
MPI_Bsend, when the data is stored in the buffer and we "return"? :-)
Another, but similar, question:

What about the data-buffer - when can it be reused in each of the
cases - simple examples:

for (i=0; i

Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread John Hearns
Gus,
  I'm not using OpenMPI, however OpenSUSE 11.2 with current updates
seems to work fine on Nehalem.

I'm curious that you say the Nvidia graphics driver does not install -
have you tried running the install script manually, rather than
downloading an RPM etc?
I'm using version 195.36.15 and it seems to work fine.


Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread Ralph Castain
Yo Gus

Just saw a ticket go by reminding us about continuing hang problems on shared 
memory when building with gcc 4.4.x - any  chance you are in that category? You 
might have said something earlier in this thread

On May 5, 2010, at 5:54 PM, Douglas Guptill wrote:

> On Wed, May 05, 2010 at 06:08:57PM -0400, Gus Correa wrote:
> 
>> If anybody else has Open MPI working with hyperthreading and "sm"
>> on a Nehalem box, I would appreciate any information about the
>> Linux distro and kernel version being used.
> 
> Debian 5 (lenny), Core i7 920, Asus P6T MoBo, 12GB RAM, OpenMPI 1.2.8
> (with a custom-built MPI_recv.c and MPI_Send.c, which cut down on the
> cpu load caused by the busy wait polling).  We have six (6) of these
> machines.  All configured the same.
> 
> uname -a yields:
> Linux screm 2.6.26-2-amd64 #1 SMP Thu Feb 11 00:59:32 UTC 2010 x86_64 
> GNU/Linux
> 
> HyperThreading is on.
> 
> Applications are -np 2 only:
>  mpirun --host localhost,localhost --byslot --mca btl sm,self -np 2 ${BIN}
> 
> We normally run (up to) 4 of these jobs on each machine.
> 
> Using Intel 11.0.074 and 11.1.0** compilers; have trouble with the
> 11.1.0** and "-mcmodel=large -shared-intel" builds.  Trouble meaning
> the numerical results vary strangely.  Still working on that problem.
> 
> Hope that helps,
> Douglas.
> 
> P.S.  Yes, I know OpenMPI 1.2.8 is old.  We have been using it for 2
> years with no apparent problems.  When I saw comments like "machine
> hung" for 1.4.1, and "data loss" for 1.3.x, I put aside thoughts of
> upgrading.
> 
> -- 
>  Douglas Guptill   voice: 902-461-9749
>  Research Assistant, LSC 4640  email: douglas.gupt...@dal.ca
>  Oceanography Department   fax:   902-494-3877
>  Dalhousie University
>  Halifax, NS, B3H 4J1, Canada
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Fortran derived types

2010-05-06 Thread Richard Treumann

Assume your data is discontiguous in memory and  making it contiguous is
not practical (e.g. there is no way to make cells of a row and cells of a
column both contiguous.)  You have 3 options:

1) Use many small/contiguous messages
2) Allocate scratch space and pack/unpack
3) Use a derived datatype.

If you decide to use option 2 then the time your program spends in the
allocate/pack/send/free and the time it spends in allocate/recv/unpack/free
needs to be counted in the cost.  Just comparing a contiguous vs
discontiguous message time does not help make a good decision.

Whether 2 or 3 is faster depends a lot in how the MPI implementation does
its datatype processing.  If the MPI implementation can move data directly
from discontiguous memory into the sends side adapter and from recv side
adapter to discontiguous memory, Datatypes may be faster and will conserve
memory.  If the MPI implementation just mallocs a scratch buffer and uses
the datatype to guide an internal pack/unpack subroutine, there is a pretty
good chance your hand crafted pack or unpack, along with contiguous
messaging will be more efficient.

I mention option 1 for completeness and because if there were a very good
put/get available, it might even be the best choice.  It is probably not
the best choice in any current MPI but there may be exceptions.


Dick Treumann  -  MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363



|>
| From:  |
|>
  
>|
  |Terry Frankcombe   
   |
  
>|
|>
| To:|
|>
  
>|
  |Open MPI Users   
   |
  
>|
|>
| Date:  |
|>
  
>|
  |05/06/2010 12:25 AM  
   |
  
>|
|>
| Subject:   |
|>
  
>|
  |Re: [OMPI users] Fortran derived types   
   |
  
>|
|>
| Sent by:   |
|>
  
>|
  |users-boun...@open-mpi.org   
   |
  
>|





Hi Derek

On Wed, 2010-05-05 at 13:05 -0400, Cole, Derek E wrote:
> In general, even in your serial fortran code, you're already taking a
> performance hit using a derived type.

Do you have any numbers to back that up?

Ciao
Terry


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] opal_mutex_lock(): Resource deadlock avoided

2010-05-06 Thread Ake Sandgren
Hi!

We have a code that trips on this fairly often. I've seen cases where it
works but mostly it gets stuck here.

The actual mpi call is call mpi_file_open(...)

I'm currently just wondering if there has been other reports on/anyone
have seen deadlock in mpi-io parts of the code or if this most likely
caused by our setup.

openmpi version is 1.4.2 (fails with 1.3.3 too)
Filesystem used is GPFS

openmpi built with mpi-threads but without progress-threads

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se



Re: [OMPI users] Fortran derived types

2010-05-06 Thread Paul Kapinos

Hi,

In general, even in your serial fortran code, you're already 
taking a performance hit using a derived type. 


That is not generally true. The right statement is: "it depends".

Yes, sometimes derived data types and object orientation and so on can 
lead to some performance hit; but current compiler usually can oprimise 
alot.


E.g. consider http://www.terboven.com/download/OAbstractionsLA.pdf 
(especially p.19).



So, I would not recommend to disturb the ready program in order to let 
it be the old good f77 style. And let us not start a flame about 
"assembler is faster but OO is easier"! :-)


Best wishes
Paul





-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Prentice Bisbal
Sent: Wednesday, May 05, 2010 11:51 AM
To: Open MPI Users
Subject: Re: [OMPI users] Fortran derived types

Vedran Coralic wrote:

Hello,

In my Fortran 90 code I use several custom defined derived types.
Amongst them is a vector of arrays, i.e. v(:)%f(:,:,:). I am wondering 
what the proper way of sending this data structure from one processor 
to another is. Is the best way to just restructure the data by copying 
it into a vector and sending that or is there a simpler way possible 
by defining an MPI derived type that can handle it?


I looked into the latter myself but so far, I have only found the 
solution for a scalar fortran derived type and the methodology that 
was suggested in that case did not seem naturally extensible to the vector case.




It depends on how your data is distributed in memory. If the arrays are evenly 
distributed, like what would happen in a multidimensional-array, the derived 
datatypes will work fine. If you can't guarantee the spacing between the arrays 
that make up the vector, then using MPI_Pack/MPI_Unpack (or whatever the 
Fortran equivalents are) is the best way to go.

I'm not an expert MPI programmer, but I wrote a small program earlier this year 
that created a dynamically created array of dynamically created arrays. After 
doing some research into this same problem, it looked like packing/unpacking 
was the only way to go.

Using Pack/Unpack is easy, but there is a performance hit since the data needs 
to be copied into the packed buffer before sending, and then copied out of the 
buffer after the receive.


--
Prentice
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] MPI_Bsend vs. MPI_Ibsend

2010-05-06 Thread Jovana Knezevic
Dear all,

Could anyone please clarify me the difference between MPI_Bsend and
MPI_Ibsend? Or, in other words, what exactly is "blocking" in
MPI_Bsend, when the data is stored in the buffer and we "return"? :-)
Another, but similar, question:

What about the data-buffer - when can it be reused in each of the
cases - simple examples:

for (i=0; i

Re: [OMPI users] Fortran derived types

2010-05-06 Thread Terry Frankcombe
Hi Derek

On Wed, 2010-05-05 at 13:05 -0400, Cole, Derek E wrote:
> In general, even in your serial fortran code, you're already taking a
> performance hit using a derived type.

Do you have any numbers to back that up?

Ciao
Terry