Re: [OMPI users] Issues with Get/Put and IRecv

2007-03-20 Thread Mike Houston
Well, I've managed to get a working solution, but I'm not sure how I got 
there.  I built a test case that looked like a nice simple version of 
what I was trying to do and it worked, so I moved the test code into my 
implementation and low and behold it works.  I must have been doing 
something a little funky in the original pass, likely causing a stack 
smash somewhere or trying to do a get/put out of bounds.


If I have any more problems, I'll let y'all know.  I've tested pretty 
heavy usage up to 128 MPI processes across 16 nodes and things seem to 
be behaving.  I did notice that single sided transfers seem to be a 
little slower than explicit send/recv, at least on GigE.  Once I do some 
more testing, I'll bring things up on IB and see how things are going.


-Mike

Mike Houston wrote:

Brian Barrett wrote:
  

On Mar 20, 2007, at 3:15 PM, Mike Houston wrote:

  

If I only do gets/puts, things seem to be working correctly with  
version

1.2.  However, if I have a posted Irecv on the target node and issue a
MPI_Get against that target, MPI_Test on the posed IRecv causes a  
segfaults:


Anyone have suggestions?  Sadly, I need to have IRecv's posted.  I'll
attempt to find a workaround, but it looks like the posed IRecv is
getting all the data of the MPI_Get from the other node.  It's like  
the

message tagging is getting ignored.  I've never tried posting two
different IRecv's with different message tags either...

  

Hi Mike -

I've spent some time this afternoon looking at the problem and have  
some ideas on what could be happening.  I don't think it's a data  
mismatch (the data intended for the IRecv getting delivered to the  
Get), but more a problem with the call to MPI_Test perturbing the  
progress flow of the one-sided engine.  I can see one or two places  
where it's possible this could happen, although I'm having trouble  
replicating the problem with any test case I can write.  Is it  
possible for you to share the code causing the problem (or some small  
test case)?  It would make me feel considerably better if I could  
really understand the conditions required to end up in a seg fault  
state.


Thanks,

Brian
  

Well, I can give you a linux x86 binary if that would do it.  The code 
is huge as it's part of a much larger system, so there is no such thing 
as a simple case at the moment, and the code is in pieces an largely 
unrunnable now with all the hacking...


I basically have one thread spinning on an MPI_Test on a posted IRecv 
while being used as the target to the MPI_Get.  I'll see if I can hack 
together a simple version that breaks late tonight.  I've just played 
with posting a send to that IRecv, issuing the MPI_Get, handshaking and 
then posting another IRecv and the MPI_Test continues to eat it, but in 
a memcpy:


#0  0x001c068c in memcpy () from /lib/libc.so.6
#1  0x00e412d9 in ompi_convertor_pack (pConv=0x83c1198, iov=0xa0, 
out_size=0xaffc1fd8, max_data=0xaffc1fdc) at convertor.c:254
#2  0x00ea265d in ompi_osc_pt2pt_replyreq_send (module=0x856e668, 
replyreq=0x83c1180) at osc_pt2pt_data_move.c:411
#3  0x00ea0ebe in ompi_osc_pt2pt_component_fragment_cb 
(pt2pt_buffer=0x8573380) at osc_pt2pt_component.c:582

#4  0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769
#5  0x00aa3019 in opal_progress () at runtime/opal_progress.c:288
#6  0x00ea59e5 in ompi_osc_pt2pt_passive_unlock (module=0x856e668, 
origin=1, count=1) at osc_pt2pt_sync.c:60
#7  0x00ea0cd2 in ompi_osc_pt2pt_component_fragment_cb 
(pt2pt_buffer=0x856f300) at osc_pt2pt_component.c:688

#8  0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769
#9  0x00aa3019 in opal_progress () at runtime/opal_progress.c:288
#10 0x00e33f05 in ompi_request_test (rptr=0xaffc2430, 
completed=0xaffc2434, status=0xaffc23fc) at request/req_test.c:82
#11 0x00e61770 in PMPI_Test (request=0xaffc2430, completed=0xaffc2434, 
status=0xaffc23fc) at ptest.c:52


-Mike
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

  


Re: [OMPI users] multithreading support

2007-03-20 Thread Jeff Squyres

On Mar 16, 2007, at 1:35 AM, Chevchenkovic Chevchenkovic wrote:


 Could someone let me know about the status of multithread support in
openMPI and MVAPICH. I  got some details about MVAPICH which says that
it is supported for MVAPICH2 but I am not sure of the same for
openMPI.


Open MPI's threading support is "light testing", at best.  In  
reality, it probably will not work.  Threading support was designed  
into the system from the beginning, but we have not really got around  
to debugging / testing it yet.  It is possible that we will do so  
over the next few months; a few of the Open MPI member organizations  
indicated that they will be working on threading support for the v1.3  
series (probably towards the end of this year -- v1.3 is very much in  
the planning/definition stage at this point).


We cannot really comment on MVAPICH2 here; it's an entirely different  
software project.  You'll probably want to post to their mailing list  
to get an answer.


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Signal 13

2007-03-20 Thread Jeff Squyres
FWIW, most LDAP installations I have seen have ended up doing the  
same thing -- if you have a large enough cluster, you have MPI jobs  
starting all the time, and rate control of a single job startup is  
not sufficient to avoid overloading your LDAP server.


The solutions that I have seen typically have a job fired once a day  
via cron that dumps relevant information from LDAP into local /etc/ 
passwd / shadow / group files and then simply use that for  
authentication across the cluster.


Hope that helps.


On Mar 18, 2007, at 8:34 PM, David Bronke wrote:


That's great to hear! For now we'll just create local users for those
who need access to MPI on this system, but I'll keep an eye on the
list for when you do get a chance to finish that fix. Thanks again!

On 3/18/07, Ralph Castain  wrote:
Excellent! Yes, we use pipe in several places, including in the  
run-time

during various stages of launch, so that could be a problem.

Also, be aware that other users have reported problems on LDAP- 
based systems
when attempting to launch large jobs. The problem is that the  
OpenMPI launch

system has no rate control in it - and the LDAP's slapd servers get
overwhelmed by the launch when we ssh on a large number of nodes.

I promised another user to concoct a fix for this problem, but am  
taking a
break from the project for a few months so it may be a little  
while before a
fix is available. When I do get it done, it may or may not make it  
into an

OpenMPI release for some time - I'm not sure how they will decide to
schedule the change (is it a "bug", or a new "feature"?). So I may  
do an
interim release as a patch on the OpenRTE site (since that is the  
run-time
underneath OpenMPI). I'll let people know via this mailing list  
either way.


Ralph



On 3/18/07 2:06 PM, "David Bronke"  wrote:


I just received an email from a friend who is helping me work on
resolving this; he was able to trace the problem back to a pipe()  
call

in OpenMPI apparently:


The problem is with the pipe() system call (which is invoked by the
MPI_Send() as far as I can tell) by a LDAP authenticated user.   
Still
working out where exactly that goes wrong, but the fact is that  
it isn't
actually a permissions problem - the reason it works as root is  
because

root is a local user and does /etc/passwd normal authentication.


I had forgotten to mention that we use LDAP for authentication on  
this

machine; PAM and NSS are set up to use it, but I'm guessing that
either OpenMPI itself or the pipe() system call won't check with  
them

when needed... We have made some local users on the machine to get
things going, but I'll probably have to find an LDAP mailing list to
get this issue resolved.

Thanks for all the help so far!

On 3/16/07, Ralph Castain  wrote:
I'm afraid I have zero knowledge or experience with gentoo  
portage, so I
can't help you there. I always install our releases from the  
tarball source

as it is pretty trivial to do and avoids any issues.

I will have to defer to someone who knows that system to help  
you from here.

It sounds like an installation or configuration issue.

Ralph



On 3/16/07 3:15 PM, "David Bronke"  wrote:


On 3/15/07, Ralph Castain  wrote:
Hmmm...well, a few thoughts to hopefully help with the  
debugging. One
initial comment, though - 1.1.2 is quite old. You might want  
to upgrade to
1.2 (releasing momentarily - you can use the last release  
candidate in the

interim as it is identical).


Version 1.2 doesn't seem to be in gentoo portage yet, so I may  
end up
having to compile from source... I generally prefer to do  
everything
from portage if possible, because it makes upgrades and  
maintenance

much cleaner.

Meantime, looking at this output, there appear to be a couple  
of common
possibilities. First, I don't see any of the diagnostic output  
from after

we
do a local fork (we do this prior to actually launching the  
daemon). Is it
possible your system doesn't allow you to fork processes (some  
don't,

though
it's unusual)?


I don't see any problems with forking on this system... I'm  
able to

start a dbus daemon as a regular user without any problems.

Second, it could be that the "orted" program isn't being found  
in your

path.
People often forget that the path in shells started up by  
programs isn't
necessarily the same as that in their login shell. You might  
try executing

a
simple shellscript that outputs the results of "which orted"  
to verify this

is correct.


'which orted' from a shell script gives me '/usr/bin/orted', which
seems to be correct.

BTW, I should have asked as well: what are you running this  
on, and how did

you configure openmpi?


I'm running this on two identical machines with 2 dual-core
hyperthreading Xeon processors. (EM64T) I simply installed OpenMPI
using portage, with the USE flags "debug fortran pbs -threads".  
(I've

also tried it 

Re: [OMPI users] Issues with Get/Put and IRecv

2007-03-20 Thread Mike Houston



Brian Barrett wrote:

On Mar 20, 2007, at 3:15 PM, Mike Houston wrote:

  
If I only do gets/puts, things seem to be working correctly with  
version

1.2.  However, if I have a posted Irecv on the target node and issue a
MPI_Get against that target, MPI_Test on the posed IRecv causes a  
segfaults:


Anyone have suggestions?  Sadly, I need to have IRecv's posted.  I'll
attempt to find a workaround, but it looks like the posed IRecv is
getting all the data of the MPI_Get from the other node.  It's like  
the

message tagging is getting ignored.  I've never tried posting two
different IRecv's with different message tags either...



Hi Mike -

I've spent some time this afternoon looking at the problem and have  
some ideas on what could be happening.  I don't think it's a data  
mismatch (the data intended for the IRecv getting delivered to the  
Get), but more a problem with the call to MPI_Test perturbing the  
progress flow of the one-sided engine.  I can see one or two places  
where it's possible this could happen, although I'm having trouble  
replicating the problem with any test case I can write.  Is it  
possible for you to share the code causing the problem (or some small  
test case)?  It would make me feel considerably better if I could  
really understand the conditions required to end up in a seg fault  
state.


Thanks,

Brian
  
Well, I can give you a linux x86 binary if that would do it.  The code 
is huge as it's part of a much larger system, so there is no such thing 
as a simple case at the moment, and the code is in pieces an largely 
unrunnable now with all the hacking...


I basically have one thread spinning on an MPI_Test on a posted IRecv 
while being used as the target to the MPI_Get.  I'll see if I can hack 
together a simple version that breaks late tonight.  I've just played 
with posting a send to that IRecv, issuing the MPI_Get, handshaking and 
then posting another IRecv and the MPI_Test continues to eat it, but in 
a memcpy:


#0  0x001c068c in memcpy () from /lib/libc.so.6
#1  0x00e412d9 in ompi_convertor_pack (pConv=0x83c1198, iov=0xa0, 
out_size=0xaffc1fd8, max_data=0xaffc1fdc) at convertor.c:254
#2  0x00ea265d in ompi_osc_pt2pt_replyreq_send (module=0x856e668, 
replyreq=0x83c1180) at osc_pt2pt_data_move.c:411
#3  0x00ea0ebe in ompi_osc_pt2pt_component_fragment_cb 
(pt2pt_buffer=0x8573380) at osc_pt2pt_component.c:582

#4  0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769
#5  0x00aa3019 in opal_progress () at runtime/opal_progress.c:288
#6  0x00ea59e5 in ompi_osc_pt2pt_passive_unlock (module=0x856e668, 
origin=1, count=1) at osc_pt2pt_sync.c:60
#7  0x00ea0cd2 in ompi_osc_pt2pt_component_fragment_cb 
(pt2pt_buffer=0x856f300) at osc_pt2pt_component.c:688

#8  0x00ea1389 in ompi_osc_pt2pt_progress () at osc_pt2pt_component.c:769
#9  0x00aa3019 in opal_progress () at runtime/opal_progress.c:288
#10 0x00e33f05 in ompi_request_test (rptr=0xaffc2430, 
completed=0xaffc2434, status=0xaffc23fc) at request/req_test.c:82
#11 0x00e61770 in PMPI_Test (request=0xaffc2430, completed=0xaffc2434, 
status=0xaffc23fc) at ptest.c:52


-Mike


Re: [OMPI users] Issues with Get/Put and IRecv

2007-03-20 Thread Brian Barrett

On Mar 20, 2007, at 3:15 PM, Mike Houston wrote:

If I only do gets/puts, things seem to be working correctly with  
version

1.2.  However, if I have a posted Irecv on the target node and issue a
MPI_Get against that target, MPI_Test on the posed IRecv causes a  
segfaults:


Anyone have suggestions?  Sadly, I need to have IRecv's posted.  I'll
attempt to find a workaround, but it looks like the posed IRecv is
getting all the data of the MPI_Get from the other node.  It's like  
the

message tagging is getting ignored.  I've never tried posting two
different IRecv's with different message tags either...


Hi Mike -

I've spent some time this afternoon looking at the problem and have  
some ideas on what could be happening.  I don't think it's a data  
mismatch (the data intended for the IRecv getting delivered to the  
Get), but more a problem with the call to MPI_Test perturbing the  
progress flow of the one-sided engine.  I can see one or two places  
where it's possible this could happen, although I'm having trouble  
replicating the problem with any test case I can write.  Is it  
possible for you to share the code causing the problem (or some small  
test case)?  It would make me feel considerably better if I could  
really understand the conditions required to end up in a seg fault  
state.


Thanks,

Brian


[OMPI users] Issues with Get/Put and IRecv

2007-03-20 Thread Mike Houston
If I only do gets/puts, things seem to be working correctly with version 
1.2.  However, if I have a posted Irecv on the target node and issue a 
MPI_Get against that target, MPI_Test on the posed IRecv causes a segfaults:


[expose:21249] *** Process received signal ***
[expose:21249] Signal: Segmentation fault (11)
[expose:21249] Signal code: Address not mapped (1)
[expose:21249] Failing at address: 0xa0
[expose:21249] [ 0] [0x96e440]
[expose:21249] [ 1] 
/usr/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_replyreq_send+0xed) 
[0x2c765d]

[expose:21249] [ 2] /usr/lib/openmpi/mca_osc_pt2pt.so [0x2c5ebe]
[expose:21249] [ 3] 
/usr/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_progress+0x119) [0x2c6389]

[expose:21249] [ 4] /usr/lib/libopen-pal.so.0(opal_progress+0x69) [0x67d019]
[expose:21249] [ 5] 
/usr/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_passive_unlock+0xb5) 
[0x2ca9e5]

[expose:21249] [ 6] /usr/lib/openmpi/mca_osc_pt2pt.so [0x2c5cd2]
[expose:21249] [ 7] 
/usr/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_progress+0x119) [0x2c6389]

[expose:21249] [ 8] /usr/lib/libopen-pal.so.0(opal_progress+0x69) [0x67d019]
[expose:21249] [ 9] /usr/lib/libmpi.so.0(ompi_request_test+0x35) [0x3d6f05]
[expose:21249] [10] /usr/lib/libmpi.so.0(PMPI_Test+0x80) [0x404770]

Anyone have suggestions?  Sadly, I need to have IRecv's posted.  I'll 
attempt to find a workaround, but it looks like the posed IRecv is 
getting all the data of the MPI_Get from the other node.  It's like the 
message tagging is getting ignored.  I've never tried posting two 
different IRecv's with different message tags either...


-Mike


Re: [OMPI users] v1.2 Bus Error (/tmp usage)

2007-03-20 Thread Ralph Castain
One option would be to amend your mpirun command with -mca btl ^sm. This
turns off the shared memory subsystem, so you'll see some performance loss
in your collectives. However, it will reduce your /tmp usage to almost
nothing.

Others may suggest alternative solutions.
Ralph



On 3/20/07 2:32 PM, "Hugh Merz"  wrote:

> Good Day,
> 
>   I'm using Open MPI on a diskless cluster (/tmp is part of a 1m ramdisk), and
> I found that after upgrading from v1.1.4 to v1.2 that jobs using np > 4 would
> fail to start during MPI_Init, due to what appears to be a lack of space in
> /tmp.  The error output is:
> 
> -
> 
> [tpb200:32193] *** Process received signal ***
> [tpb200:32193] Signal: Bus error (7)
> [tpb200:32193] Signal code:  (2)
> [tpb200:32193] Failing at address: 0x2a998f4120
> [tpb200:32193] [ 0] /lib64/tls/libpthread.so.0 [0x2a95f6e430]
> [tpb200:32193] [ 1]
> /opt/openmpi/1.2.gcc3/lib/libmpi.so.0(ompi_free_list_grow+0x138)
> [0x2a9568abc8]
> [tpb200:32193] [ 2]
> /opt/openmpi/1.2.gcc3/lib/libmpi.so.0(ompi_free_list_resize+0x2d)
> [0x2a9568b0dd]
> [tpb200:32193] [ 3]
> /opt/openmpi/1.2.gcc3/lib/openmpi/mca_btl_sm.so(mca_btl_sm_add_procs_same_base
> _addr+0x6bf) [0x2a98ba419f]
> [tpb200:32193] [ 4]
> /opt/openmpi/1.2.gcc3/lib/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x28a)
> [0x2a9899a4fa]
> [tpb200:32193] [ 5]
> /opt/openmpi/1.2.gcc3/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xe8)
> [0x2a98889308]
> [tpb200:32193] [ 6] /opt/openmpi/1.2.gcc3/lib/libmpi.so.0(ompi_mpi_init+0x45d)
> [0x2a956a32ed]
> [tpb200:32193] [ 7] /opt/openmpi/1.2.gcc3/lib/libmpi.so.0(MPI_Init+0x93)
> [0x2a956c5c93]
> [tpb200:32193] [ 8] a.out(main+0x1c) [0x400a44]
> [tpb200:32193] [ 9] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
> [0x2a960933fb]
> [tpb200:32193] [10] a.out [0x40099a]
> [tpb200:32193] *** End of error message ***
> 
> ... lots of the above for each process ...
> 
> mpirun noticed that job rank 0 with PID 32040 on node tpb200 exited on signal
> 7 (Bus error). 
> 
> --/--
> 
>   If I increase the size of my ramdisk or point $TMP to a network filesystem
> then jobs start and complete fine, so it's not a showstopper, but with v1.1.4
> (or LAM v7.1.2) I didn't encounter this issue with my default 1m ramdisk (even
> with np > 100 ).  Is there a way to limit /tmp usage in Open MPI v1.2?
> 
> Hugh 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] CFP: 2007 IEEE International Conference on Cluster Computing (Cluster2007)

2007-03-20 Thread Ron Brightwell

***

Call for Papers

2007 IEEE International Conference on Cluster Computing
  (Cluster2007)

17 - 21 September 2007

   Austin, Texas, USA

   http://www.cluster2007.org/

***

In less than a decade, cluster computing has become the mainstream technology
for High Performance Computing and Information Technology. It has gained
this prominence by providing reliable, robust and cost-effective platforms
for solving many complex computational problems, accessing and visualizing
data, and providing information services. Cluster 2007 is hosted by Texas
Advanced Computing Center (TACC) in the culturally rich, high-tech city of
Austin, Texas. Here you will experience an open forum with fellow cluster
researchers, system designers and installers, and users for presenting and
discussing new directions, opportunities and ideas that will shape Cluster
Computing. Cluster 2007 welcomes paper and poster submissions on innovative
work from researchers in academia, industry and government describing original
research work in cluster computing.

The ability to aggregate the computing power of thousands of processors is a
significant milestone in the scalability of commodity systems. Nevertheless,
the ability to use both small and large systems efficiently is an ongoing
effort in the areas of Networking, Management, Interconnects, and Application
Optimization. A continued vigilance and assessment of R efforts is important
to insure that Cluster Computing will harness the new technological advances
in hardware and software to solve the challenges of our age, and the next
generation.

Topics of interest are (but not limited to):

  - Cluster Software and Middleware
  - Software Environments and Tools
  - Single -System Image Services
  - Parallel File Systems and I/O Libraries
  - Standard Software for Clusters
  - Cluster Networking
  - High-Speed Interconnects
  - High Performance Message Passing Libraries
  - Lightweight Communication Protocols
  - Applications
  - Application Methods and Algorithms
  - Adaptation to Multi-Core
  - Data Distribution, Load Balancing & Scaling
  - MPI/OpenMP Hybrid Computing
  - Visualization
  - Performance Analysis and Evaluation
  - Benchmarking & Profiling Tools
  - Performance Prediction & Modeling
  - Cluster Management
  - Security and Reliability
  - High Availability Solutions
  - Resource and Job Management
  - Administration and Maintenance Tools

Paper Submission:

Paper Format: Since the camera-ready version of accepted papers must be
compliant with the IEEE Xplore format for publication, submitted papers
must conform to the following Xplore layout, page limit, and font size. This
will insure a size consistency and a uniform layout for the reviewers. (With
minimal changes, accepted document can be styled for publication according
to Xplore requirements explained in the Xplore formatting guide, which is
also in Xplore format).

  - PDF files only.
  - Maximum 10 pages for Technical Papers, maximum 6 pages for Posters.
  - Single-spaced
  - 2-column numbered pages in IEEE Xplore format
  - (8.5x11-inch paper, margins in inches-- top: 0.75, bottom: 1.0,
sides:0.625, and between columns:0.25, main text: 10pt ).
  - Format instructions are available for: LaTeX, Word document, PDF
files.
  - Margin and placement guides are available in: Word, PDF and
postscript files.
  - Concerning the final camera-ready version: Maximum of 2 extra pages
at $100/page. Camera-ready means PDF file must comply with IEEE
Xplore formatting and style for publication.
  - A conversion tool kit for converting from Word, LaTeX, and
PostScript and checking compliance will be available by April 11.
See the Final Submission section then.
  - Electronic Submission: Only web-based submission is accepted.
The URL will be announced two weeks before the submission deadline,
on the Cluster2007 web page.

In addition to the normal technical paper sessions, we plan to organize vendor
sessions and industrial exhibitions. Companies interested in participating
in the vendor sessions or presenting their exhibits at the meeting or both,
should contact the Exhibits Chair member, Ivan R. Judson (jud...@mcs.anl.gov)
by July 13, 2007.

Important Dates:

  Technical paper submissions:11 May 2007
  Last minute paper abstracts:11 May 2007
  Workshop/tutorial proposals:11 May 2007

  Poster submissions:  8 Jun 2007
  Panel proposals: 8 Jun 2007
  Workshop/tutorial notification:  8 Jun 2007

  Technical paper notification:   29 Jun 2007

  Poster notification:13 Jul 2007
  Exhibit proposals:  13 Jul 2007
  Last minute papers:   

Re: [OMPI users] mpirun exit status for non-existent executable

2007-03-20 Thread Tim Prins
Well that's not a good thing. I have filed a bug about this (https:// 
svn.open-mpi.org/trac/ompi/ticket/954) and will try to look into it  
soon, but don't know when it will get fixed.


Thanks for bringing this to our attention!

Tim

On Mar 20, 2007, at 1:39 AM, Bill Saphir wrote:



If you ask mpirun to launch an executable that does not exist, it
fails, but returns an exit status of 0.
This makes it difficult to write scripts that invoke mpirun and need
to check for errors.
I'm wondering if a) this is considered a bug and b) whether it might
be fixed in a near term release.

Example:


orterun -np 2 asdflkj
-- 
--

--
Failed to find the following executable:

Host:   build-linux64
Executable: asdflkj

Cannot continue.
-- 
--

--

echo $?

0


I see this behavior for both 1.2 and 1.1.x.

Thanks for your help.

Bill

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] mpirun exit status for non-existent executable

2007-03-20 Thread Bill Saphir


If you ask mpirun to launch an executable that does not exist, it  
fails, but returns an exit status of 0.
This makes it difficult to write scripts that invoke mpirun and need  
to check for errors.
I'm wondering if a) this is considered a bug and b) whether it might  
be fixed in a near term release.


Example:

> orterun -np 2 asdflkj
 
--

Failed to find the following executable:

Host:   build-linux64
Executable: asdflkj

Cannot continue.
 
--

> echo $?
0


I see this behavior for both 1.2 and 1.1.x.

Thanks for your help.

Bill