Re: [OMPI users] Runtime error with OpenMPI via InfiniBand - [btl_openib_proc.c:157] ompi_modex_recv failed for peer

2017-04-19 Thread Jeff Squyres (jsquyres)
Dong -- I do not see an obvious cause for the error. Are you able to run trivial hello world / ring kinds of MPI jobs? Is the problem localized to a specific set of nodes in the cluster? > On Apr 14, 2017, at 4:30 PM, Dong Young Yoon wrote: > > Hi everyone, > > I am a

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-15 Thread Gilles Gouaillardet
Jason, How many nodes are you running on ? Since you have an IB network, IB is used for intra node communication between tasks that are not part of the same OpenMPI job (read spawn group) I can make a simple patch to use tcp instead of IB for these intra node communication, Let me know if you

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Jason Maldonis
Thanks Ralph for all the help. I will do that until it gets fixed. Nathan, I am very very interested in this working because we are developing some new cool code for research in materials science. This is the last piece of the puzzle for us I believe. I can use TCP for now though of course. While

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Ralph Castain
You don’t want to always use those options as your performance will take a hit - TCP vs Infiniband isn’t a good option. Sadly, this is something we need someone like Nathan to address as it is a bug in the code base, and in an area I’m not familiar with For now, just use TCP so you can move

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Jason Maldonis
Ralph, The problem *does* go away if I add "-mca btl tcp,sm,self" to the mpiexec cmd line. (By the way, I am using mpiexec rather than mpirun; do you recommend one over the other?) Will you tell me what this means for me? For example, should I always append these arguments to mpiexec for my

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Nathan Hjelm
That message is coming from udcm in the openib btl. It indicates some sort of failure in the connection mechanism. It can happen if the listening thread no longer exists or is taking too long to process messages. -Nathan On Jun 14, 2016, at 12:20 PM, Ralph Castain wrote:

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Ralph Castain
Hmm…I’m unable to replicate a problem on my machines. What fabric are you using? Does the problem go away if you add “-mca btl tcp,sm,self” to the mpirun cmd line? > On Jun 14, 2016, at 11:15 AM, Jason Maldonis wrote: > > Hi Ralph, et. al, > > Great, thank you for the

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Jason Maldonis
Hi Ralph, et. al, Great, thank you for the help. I downloaded the mpi loop spawn test directly from what I think is the master repo on github: https://github.com/open-mpi/ompi/blob/master/orte/test/mpi/loop_spawn.c I am still using the mpi code from 1.10.2, however. Is that test updated with the

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Ralph Castain
I dug into this a bit (with some help from others) and found that the spawn code appears to be working correctly - it is the test in orte/test that is wrong. The test has been correctly updated in the 2.x and master repos, but we failed to backport it to the 1.10 series. I have done so this

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-13 Thread Ralph Castain
No, that PR has nothing to do with loop_spawn. I’ll try to take a look at the problem. > On Jun 13, 2016, at 3:47 PM, Jason Maldonis wrote: > > Hello, > > I am using OpenMPI 1.10.2 compiled with Intel. I am trying to get the spawn > functionality to work inside a for loop,

[OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-13 Thread Jason Maldonis
Hello, I am using OpenMPI 1.10.2 compiled with Intel. I am trying to get the spawn functionality to work inside a for loop, but continue to get the error "too many retries sending message to , giving up" somewhere down the line in the for loop, seemingly because the processors are not being fully

[OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-13 Thread Jason Maldonis
Hello, I am using OpenMPI 1.10.2 compiled with Intel. I am trying to get the spawn functionality to work inside a for loop, but continue to get the error "too many retries sending message to , giving up" somewhere down the line in the for loop, seemingly because the processors are not being fully

Re: [OMPI users] runtime error with openmpi-v2.x-dev-958-g7e94425

2016-02-20 Thread Ralph Castain
On https://github.com/open-mpi/ompi/pull/1385 Gilles indicated he would update the patch and commit it on Monday > On Feb 20, 2016, at 12:48 AM, Siegmar Gross > wrote: > > Hi Gilles, > > do you know, when

Re: [OMPI users] runtime error with openmpi-v2.x-dev-958-g7e94425

2016-02-20 Thread Siegmar Gross
Hi Gilles, do you know, when fixes for the problems will be ready? They still exist in the current version. tyr spawn 136 ompi_info | grep -e "Open MPI repo revision" -e "C compiler absolute" Open MPI repo revision: v2.x-dev-1108-gaaf15d9 C compiler absolute:

Re: [OMPI users] runtime error with openmpi-v2.x-dev-958-g7e94425

2016-01-15 Thread Gilles Gouaillardet
Siegmar, the fix is now being discussed at https://github.com/open-mpi/ompi/pull/1285 the other error your reported (MPI_Comm_spawn hanging on an heterogeneous cluster) is being discussed at https://github.com/open-mpi/ompi/pull/1292 Cheers, Gilles On 1/14/2016 11:06 PM, Siegmar Gross

[OMPI users] runtime error with openmpi-v2.x-dev-958-g7e94425

2016-01-14 Thread Siegmar Gross
Hi, I've successfully built openmpi-v2.x-dev-958-g7e94425 on my machine (SUSE Linux Enterprise Server 12.0 x86_64) with gcc-5.2.0 and Sun C 5.13. Unfortunately I get a runtime error for all programs if I use the Sun compiler. Most of my small works es expected with the GNU compiler. I used the

[OMPI users] runtime error with openmpi-v1.10.1-140-g31ff573

2016-01-14 Thread Siegmar Gross
Hi, I've successfully built openmpi-v1.10.1-140-g31ff573 on my machine (SUSE Linux Enterprise Server 12.0 x86_64) with gcc-5.2.0 and Sun C 5.13. Unfortunately I get a runtime error for a small program spawning a process. Everything works as expected with my programs "spawn_multiple_master" and

Re: [OMPI users] runtime error

2011-02-14 Thread Jeff Squyres
What happens if you try to mpirun a non-MPI program like, "date" or "hostname"? On Feb 11, 2011, at 6:14 AM, Marcela Castro León wrote: > Excuse me. I forgot the attaching. > > 2011/2/11 Marcela Castro León > Hello: > > I've the same version ob Ubuntu 10.04. The original

Re: [OMPI users] runtime error

2011-02-11 Thread Marcela Castro León
Excuse me. I forgot the attaching. 2011/2/11 Marcela Castro León > Hello: > > I've the same version ob Ubuntu 10.04. The original version was Ubuntu > Server 9.1 (64) and upgraded both of them to 10.04. > Yesterday I've updated and upgraded to the same level again. But I've

Re: [OMPI users] runtime error

2011-02-11 Thread Marcela Castro León
Hello: I've the same version ob Ubuntu 10.04. The original version was Ubuntu Server 9.1 (64) and upgraded both of them to 10.04. Yesterday I've updated and upgraded to the same level again. But I've got the same error after that. The machine are exactly the same, HP Compaq with inter Core I5.

Re: [OMPI users] runtime error

2011-02-10 Thread Jeff Squyres
I typically see these kinds of errors when there's an Open MPI version mismatch between the nodes, and/or if there are slightly different flavors of Linux installed on each node (i.e., you're technically in a heterogeneous situation, but you're trying to run a single application binary). Can

Re: [OMPI users] runtime error

2011-02-10 Thread Marcela Castro León
Hello > I've a program that allways works fine, but i'm trying it on a new cluster > and fails when I execute it on more than one machine. > I mean, if I execute alone on each host, everything works fine. > radic@santacruz:~/gaps/caso3-i1$ mpirun -np 3 ../test parcorto.txt > > But when I execute

Re: [OMPI users] Runtime error while running mpirun

2009-11-02 Thread Jeff Squyres
On Nov 2, 2009, at 7:43 AM, Shiqing Fan wrote: Because you were building Open MPI with libtool support, probably the problem could be that libtool is not loaded correctly. Could you check that libtool bin directory is in the PATH environment variable? If Open MPI can't find correct libtool

Re: [OMPI users] Runtime error while running mpirun

2009-11-02 Thread Shiqing Fan
: Mon 11/2/2009 7:55 PM To: Basant Lakhotiya (WT01 - Computing and Storage IPG) Cc: us...@open-mpi.org Subject: Re: [OMPI users] Runtime error while running mpirun Hi Basant, Could you please also check in your Open MPI solutions, that do you have the mca_paffinity_windows project? and in the prope

Re: [OMPI users] Runtime error while running mpirun

2009-11-02 Thread Shiqing Fan
MPI Users Cc: Basant Lakhotiya (WT01 - Computing and Storage IPG) Subject: Re: [OMPI users] Runtime error while running mpirun Hi Basant, The mca_paffinity_windowsd.dll is the debug version of mca_paffinity_windows.dll, but orterun.exe should know which one it can use when you build it.

Re: [OMPI users] Runtime error while running mpirun

2009-11-02 Thread Shiqing Fan
: Mon 11/2/2009 6:13 PM To: Open MPI Users Cc: Basant Lakhotiya (WT01 - Computing and Storage IPG) Subject: Re: [OMPI users] Runtime error while running mpirun Hi Basant, The mca_paffinity_windowsd.dll is the debug version of mca_paffinity_windows.dll, but orterun.exe should know which one it

Re: [OMPI users] Runtime error while running mpirun

2009-11-02 Thread basant.lakhotiya
ine 570 Regards, Basant From: Shiqing Fan [mailto:f...@hlrs.de] Sent: Mon 11/2/2009 6:13 PM To: Open MPI Users Cc: Basant Lakhotiya (WT01 - Computing and Storage IPG) Subject: Re: [OMPI users] Runtime error while running mpirun Hi Bas

Re: [OMPI users] Runtime error while running mpirun

2009-11-02 Thread Shiqing Fan
s, Basant *From:* Basant [mailto:basant.lakhot...@wipro.com] *Sent:* Mon 11/2/2009 12:14 PM *To:* 'Open MPI Users' *Subject:* RE: [OMPI users] Runtime error while running mpirun Hi Terry, Its not creating mca_paffinity_w

Re: [OMPI users] Runtime error while running mpirun

2009-11-02 Thread basant.lakhotiya
r. Thanks, Basant From: Basant [mailto:basant.lakhot...@wipro.com] Sent: Mon 11/2/2009 12:14 PM To: 'Open MPI Users' Subject: RE: [OMPI users] Runtime error while running mpirun Hi Terry, Its not creating mca_paffinity_windows.dll but there

Re: [OMPI users] Runtime error while running mpirun

2009-11-02 Thread basant.lakhotiya
] On Behalf Of Terry Dontje Sent: Friday, October 30, 2009 11:05 PM To: us...@open-mpi.org Subject: Re: [OMPI users] Runtime error while running mpirun Hi Basant, I am not familiar with Windows builds of Open MPI. However, can you see if you Open MPI build actually created a mca_paffinity_window's dll? I

Re: [OMPI users] Runtime error while running mpirun

2009-10-30 Thread Terry Dontje
command you can run on the dll (if it exists)? --td *Subject:* [OMPI users] Runtime error while running mpirun *From:* /basant.lakhotiya_at_[hidden]/ <mailto:basant.lakhotiya_at_%5Bhidden%5D?Subject=Re:%20%5BOMPI%20users%5D%20Runtime%20error%20while%20running%20mpirun> *Date:* 2009-10

[OMPI users] Runtime error while running mpirun

2009-10-30 Thread basant.lakhotiya
Hi All, I compiled OpenMPI in windows server 2003 through Cygwin and also through CMake and Visual Studio. In both the method I successfully complied in cygwin I configured with following command ./configure --enable-mca-no-build=timer-windows,memory_mallopt,maffinity,paffinity without these

Re: [OMPI users] Runtime error only on one node.

2009-03-05 Thread Jeff Squyres
On Mar 5, 2009, at 7:05 PM, Shinta Bonnefoy wrote: Thanks, the option --mca btl ^openib works fine ! Half of the cluster has Infiniband/OpenFabrics (from node49 to node96) and the other half (nodes from 01 to 48) doesn't. Ah... this explains things. I wonder if we have not

Re: [OMPI users] Runtime error only on one node.

2009-03-05 Thread Shinta Bonnefoy
:34 -0500 > From: Jeff Squyres <jsquy...@cisco.com> > Subject: Re: [OMPI users] Runtime error only on one node. > To: "Open MPI Users" <us...@open-mpi.org> > Message-ID: <70d31c29-b711-419f-9973-73b41feb0...@cisco.com> > Content-Type: text/plain; charset=U

Re: [OMPI users] Runtime error only on one node.

2009-03-05 Thread Jeff Squyres
Whoops; we shouldn't be seg faulting. :-\ The warning is exactly what it implies -- it found the OpenFabrics network stack by no functioning OpenFabrics-capable hardware. You can disable it (and the segv) by disabling the openfabrics BTL from running: mpirun --mca btl ^openib But what

Re: [OMPI users] Runtime Error

2006-07-26 Thread Michael Kluskens
Summary: You have to properly uninstall OpenMPI 1.0.2 before installing OpenMPI 1.1 On Jul 26, 2006, at 7:05 AM, wrote: Updated to open_mpi-1.1. I get a runtime error on the application as follows mca:base:component_find:unable to

[OMPI users] Runtime Error

2006-07-26 Thread robertmcbroom
Updated to open_mpi-1.1. I get a runtime error on the application as follows mca:base:component_find:unable to open:/usr/local/lip/openmpi/mca_pml_teg.so:undefined symbol:mca_ptl_base_modules_initialized Open_mpi is compile with g95 and gcc 4.0.3 The file is at the location indicated by the

[OMPI users] runtime error

2006-07-04 Thread Manal Helal
Hi sorry for posting too much, I tried running and I got this error, I assume that this is the stack of the calls before the error Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR) Failing at addr:0x8059b73 [0] func:/usr/local/bin/openmpi/lib/libopal.so.0 [0xb7e76ed0] [1]