Re: [OMPI users] MPI_Comm_connect and singleton init

2006-03-14 Thread Edgar Gabriel

you are touching here a difficult area in Open MPI:

- name publishing across independent jobs does unfortunatly not work 
right now (It does work, if all processes have been started by the same 
mpirun or if the have been spawned by a father process using 
MPI_Comm_spawn). Your approach with passing the port as a command line 
option should work however.


- you have to start however the orted daemon *before* starting both jobs 
using the flags

' orted --seed --persistent --scope public'
These flags are however currently just lightly tested, since a brand new 
runtime environment with much better support for these operations is 
currently under development.


- regarding the 'pack data mismatch': do both machines which you are 
using have the same data representation? The reason I ask is because 
this looks like a data type mismatch error, and Open MPI currently does 
have some restriction regarding different data formats and endianness...


Thanks
Edgar

Robert Latham wrote:


Hello
In playing around with process management routines, I found another
issue.  This one might very well be operator error, or something
implementation specific.

I've got two processes (a and b), linked with openmpi, but started
independently (no mpiexec).  


- A starts up and calls MPI_Init
- A calls MPI_Open_port, prints out the port name to stdout, then
  calls MPI_Comm_accept and blocks.
- B takes as a command line argument the port
  name printed out by A.  It calls MPI_Init and then and passes that
  port name to MPI_Comm_connect
- B gets the following error:

[leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Pack data mismatch
in file ../../../orte/dps/dps_unpack.c at line 121
[leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Pack data mismatch
in file ../../../orte/dps/dps_unpack.c at line 95
[leela.mcs.anl.gov:04177] *** An error occurred in MPI_Comm_connect
[leela.mcs.anl.gov:04177] *** on communicator MPI_COMM_WORLD
[leela.mcs.anl.gov:04177] *** MPI_ERR_UNKNOWN: unknown error
[leela.mcs.anl.gov:04177] *** MPI_ERRORS_ARE_FATAL (goodbye)
[leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Not found in file
../../../../../orte/mca/pls/base/pls_base_proxy.c at line 183

- A is still waiting for someone to connect to it.

Did I pass MPI port strings between programs the correct way, or is
MPI_Publish_name/MPI_Lookup_name the prefered way to pass around this
information?

Thanks
==rob



--
Edgar Gabriel
Assistant Professor
Department of Computer Science  email:gabr...@cs.uh.edu
University of Houston   http://www.cs.uh.edu/~gabriel
Philip G. Hoffman Hall, Room 524Tel: +1 (713) 743-3857
Houston, TX-77204, USA  Fax: +1 (713) 743-3335


[OMPI users] MPI_Comm_connect and singleton init

2006-03-14 Thread Robert Latham
Hello
In playing around with process management routines, I found another
issue.  This one might very well be operator error, or something
implementation specific.

I've got two processes (a and b), linked with openmpi, but started
independently (no mpiexec).  

- A starts up and calls MPI_Init
- A calls MPI_Open_port, prints out the port name to stdout, then
  calls MPI_Comm_accept and blocks.
- B takes as a command line argument the port
  name printed out by A.  It calls MPI_Init and then and passes that
  port name to MPI_Comm_connect
- B gets the following error:

[leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Pack data mismatch
in file ../../../orte/dps/dps_unpack.c at line 121
[leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Pack data mismatch
in file ../../../orte/dps/dps_unpack.c at line 95
[leela.mcs.anl.gov:04177] *** An error occurred in MPI_Comm_connect
[leela.mcs.anl.gov:04177] *** on communicator MPI_COMM_WORLD
[leela.mcs.anl.gov:04177] *** MPI_ERR_UNKNOWN: unknown error
[leela.mcs.anl.gov:04177] *** MPI_ERRORS_ARE_FATAL (goodbye)
[leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Not found in file
../../../../../orte/mca/pls/base/pls_base_proxy.c at line 183

- A is still waiting for someone to connect to it.

Did I pass MPI port strings between programs the correct way, or is
MPI_Publish_name/MPI_Lookup_name the prefered way to pass around this
information?

Thanks
==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Labs, IL USAB29D F333 664A 4280 315B