Re: [OMPI users] MPI_Put/Get with many nested derived type

2016-05-23 Thread Akihiro Tabuchi

Hi George,

Thanks for your response.
I confirmed that your commit fixed the issue.

Regards,
Akihiro

On 2016/05/22 5:05, George Bosilca wrote:

16d9f71d01cc should provide a fix for this issue.

   George.


On Sat, May 21, 2016 at 12:08 PM, Akihiro Tabuchi > wrote:

Hi Gilles,

Thanks for your quick response and patch.

After applying the patch to 1.10.2, the test code and our program which 
uses nested hvector type
ran without error.
I hope the patch will be applied to future releases.

Regards,
Akihiro


On 2016/05/21 23:15, Gilles Gouaillardet wrote:

Here are attached two patches (one for master, one for v1.10)

please consider these as experimental ones :
- they cannot hurt
- they might not always work
- they will likely allocate a bit more memory than necessary
- if something goes wrong, it will hopefully be caught soon enough in
a new assert clause

Cheers,

Gilles

On Sat, May 21, 2016 at 9:19 PM, Gilles Gouaillardet
> 
wrote:

Tabuchi-san,

thanks for the report.

this is indeed a bug i was able to reproduce on my linux laptop (for
some unknown reasons, there is no crash on OS X )

ompi_datatype_pack_description_length malloc 88 bytes for the 
datatype
description, but 96 bytes are required.
this causes a memory corruption with undefined side effects (crash 
in
MPI_Type_free, or in MPI_Win_unlock)

iirc, we made some changes to ensure  data is always aligned (Sparc
processors require this), and we could have missed
some stuff, and hence malloc less bytes than required.


Cheers,

Gilles

On Sat, May 21, 2016 at 5:50 PM, Akihiro Tabuchi
> wrote:

Hi,

At OpenMPI 1.10.2, MPI_Type_free crashes with a many nested 
derived type after using
MPI_Put/Get
with the datatype as target_datatype.
The test code is attached.
In the code, MPI_Type_free crashes if N_NEST >= 4.

This problem occurs at OpenMPI 1.8.5 or later.
There is no problem at OpenMPI 1.8.4, MPICH 3.2, and MVAPICH 
2.1.

Does anyone know about the problem?

Regards,
Akihiro

--
Akihiro Tabuchi
HPCS Lab, Univ. of Tsukuba
tabu...@hpcs.cs.tsukuba.ac.jp 


___
users mailing list
us...@open-mpi.org 
Subscription: 
https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/05/29260.php


___
users mailing list
us...@open-mpi.org 
Subscription: 
https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/05/29262.php



--
Akihiro Tabuchi
HPCS Lab, Univ. of Tsukuba
tabu...@hpcs.cs.tsukuba.ac.jp 
___
users mailing list
us...@open-mpi.org 
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/05/29263.php




___
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/05/29266.php




--
Akihiro Tabuchi
HPCS Lab, Univ. of Tsukuba
tabu...@hpcs.cs.tsukuba.ac.jp


[OMPI users] another problem with slot-list and openmpi-v2.x-dev-1441-g402abf9

2016-05-23 Thread Siegmar Gross

Hi,

I installed openmpi-v2.x-dev-1441-g402abf9 on my "SUSE Linux Enterprise
Server 12 (x86_64)" with Sun C 5.14  and gcc-6.1.0. Unfortunately I
don't get the expected output with "--slot-list". It's the same behaviour
for both compilers.


loki hello_2 114 ompi_info | grep -e "OPAL repo revision:" -e "C compiler 
absolute:"OPAL repo revision: v2.x-dev-1441-g402abf9

 C compiler absolute: /usr/local/gcc-6.1.0/bin/gcc



I get the expected output for 1 slave process, although it takes very
long (11 seconds).

loki hello_2 115 time mpiexec --slot-list 0:0-5,1:0-5 --host loki -np 1 
hello_2_mpi : --host loki -np 1 hello_2_slave_mpi

Process 0 of 2 running on loki
Process 1 of 2 running on loki


Now 1 slave tasks are sending greetings.

Greetings from task 1:
  message type:3
  msg length:  132 characters
  message:
hostname:  loki
operating system:  Linux
release:   3.12.55-52.42-default
processor: x86_64

11.680u 1.416s 0:13.07 100.1%   0+0k 0+824io 4pf+0w




I don't get the expected output for two slave processes.

loki hello_2 116 time mpiexec --slot-list 0:0-5,1:0-5 --host loki -np 1 
hello_2_mpi : --host loki -np 2 hello_2_slave_mpi

Process 0 of 2 running on loki
Process 1 of 2 running on loki


Now 1 slave tasks are sending greetings.

Greetings from task 1:
  message type:3
  msg length:  132 characters
  message:
hostname:  loki
operating system:  Linux
release:   3.12.55-52.42-default
processor: x86_64

21.744u 2.348s 0:24.07 100.0%   0+0k 0+728io 4pf+0w




I get no output and the program doesn't terminate for three slave processes.

loki hello_2 117 time mpiexec --slot-list 0:0-5,1:0-5 --host loki -np 1 
hello_2_mpi : --host loki -np 3 hello_2_slave_mpi

^C968.460u 51.124s 5:42.13 298.0%   0+0k 0+984io 5pf+0w
loki hello_2 118



I would be grateful, if somebody can fix the problem. Thank you
very much for any help in advance.


Kind regards

Siegmar
/* Another MPI-version of the "hello world" program, which delivers
 * some information about its machine and operating system. In this
 * version the functions "master" and "slave" from "hello_1_mpi.c"
 * are implemented as independant processes. This is the file for the
 * "master".
 *
 *
 * Compiling:
 *   Store executable(s) into local directory.
 * mpicc -o  
 *
 *   Store executable(s) into predefined directories.
 * make
 *
 *   Make program(s) automatically on all specified hosts. You must
 *   edit the file "make_compile" and specify your host names before
 *   you execute it.
 * make_compile
 *
 * Running:
 *   LAM-MPI:
 * mpiexec -boot -np  
 * or
 * mpiexec -boot \
 *	 -host  -np   : \
 *	 -host  -np  
 * or
 * mpiexec -boot [-v] -configfile 
 * or
 * lamboot [-v] []
 *   mpiexec -np  
 *	 or
 *	 mpiexec [-v] -configfile 
 * lamhalt
 *
 *   OpenMPI:
 * "host1", "host2", and so on can all have the same name,
 * if you want to start a virtual computer with some virtual
 * cpu's on the local host. The name "localhost" is allowed
 * as well.
 *
 * mpiexec -np  
 * or
 * mpiexec --host  \
 *	 -np  
 * or
 * mpiexec -hostfile  \
 *	 -np  
 * or
 * mpiexec -app 
 *
 * Cleaning:
 *   local computer:
 * rm 
 * or
 * make clean_all
 *   on all specified computers (you must edit the file "make_clean_all"
 *   and specify your host names before you execute it.
 * make_clean_all
 *
 *
 * File: hello_2_mpi.c		   	Author: S. Gross
 * Date: 01.10.2012
 *
 */

#include 
#include 
#include 
#include "mpi.h"

#define	BUF_SIZE	255		/* message buffer size		*/
#define	MAX_TASKS	12		/* max. number of tasks		*/
#define	SENDTAG		1		/* send message command		*/
#define	EXITTAG		2		/* termination command		*/
#define	MSGTAG		3		/* normal message token		*/

#define ENTASKS		-1		/* error: too many tasks	*/

int main (int argc, char *argv[])
{
  int  mytid,/* my task id			*/
   ntasks,/* number of parallel tasks	*/
   namelen,/* length of processor name	*/
   num,/* number of chars in buffer	*/
   i;/* loop variable		*/
  char processor_name[MPI_MAX_PROCESSOR_NAME],
   buf[BUF_SIZE + 1];		/* message buffer (+1 for '\0')	*/
  MPI_Status	stat;			/* message details		*/

  MPI_Init (, );
  MPI_Comm_rank (MPI_COMM_WORLD, );
  MPI_Comm_size (MPI_COMM_WORLD, );
  MPI_Get_processor_name (processor_name, );
  /* With the next statement every process executing this code will
   * print one line on the display. It may happen that the lines will
   * get mixed up because the display is a critical section. In general
   * only one process (mostly the process with rank 0) will print on
   * the display and all other processes will send their messages to
   * this process. Nevertheless for debugging purposes (or to
   * demonstrate that it is possible) it may be useful if every
   * 

[OMPI users] problem with slot-list and openmpi-v2.x-dev-1441-g402abf9

2016-05-23 Thread Siegmar Gross

Hi,

I installed openmpi-v2.x-dev-1441-g402abf9 on my "SUSE Linux Enterprise
Server 12 (x86_64)" with Sun C 5.14  and gcc-6.1.0. Unfortunately I get
a timeout error for "--slot-list". It's the same behaviour for both
compilers.


loki spawn 143 mpiexec -np 1 --host loki,loki,loki,nfs1,nfs1 spawn_master

Parent process 0 running on loki
  I create 4 slave processes

Parent process 0: tasks in MPI_COMM_WORLD:1
  tasks in COMM_CHILD_PROCESSES local group:  1
  tasks in COMM_CHILD_PROCESSES remote group: 4

Slave process 0 of 4 running on loki
Slave process 1 of 4 running on loki
spawn_slave 1: argv[0]: spawn_slave
spawn_slave 0: argv[0]: spawn_slave
Slave process 2 of 4 running on nfs1
Slave process 3 of 4 running on nfs1
spawn_slave 2: argv[0]: spawn_slave
spawn_slave 3: argv[0]: spawn_slave


loki spawn 144 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master

Parent process 0 running on loki
  I create 4 slave processes

[loki:15594] OPAL ERROR: Timeout in file 
../../../../openmpi-v2.x-dev-1441-g402abf9/opal/mca/pmix/base/pmix_base_fns.c at 
line 195

[loki:15594] *** An error occurred in MPI_Comm_spawn
[loki:15594] *** reported by process [2740518913,0]
[loki:15594] *** on communicator MPI_COMM_WORLD
[loki:15594] *** MPI_ERR_UNKNOWN: unknown error
[loki:15594] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
abort,

[loki:15594] ***and potentially your MPI job)
loki spawn 145



I would be grateful, if somebody can fix the problem. Thank you
very much for any help in advance.


Kind regards

Siegmar
/* The program demonstrates how to spawn some dynamic MPI processes.
 * This version uses one master process which creates some slave
 * processes.
 *
 * A process or a group of processes can create another group of
 * processes with "MPI_Comm_spawn ()" or "MPI_Comm_spawn_multiple ()".
 * In general it is best (better performance) to start all processes
 * statically with "mpiexec" via the command line. If you want to use
 * dynamic processes you will normally have one master process which
 * starts a lot of slave processes. In some cases it may be useful to
 * enlarge a group of processes, e.g., if the MPI universe provides
 * more virtual cpu's than the current number of processes and the
 * program may benefit from additional processes. You will use
 * "MPI_Comm_spwan_multiple ()" if you must start different
 * programs or if you want to start the same program with different
 * parameters.
 *
 * There are some reasons to prefer "MPI_Comm_spawn_multiple ()"
 * instead of calling "MPI_Comm_spawn ()" multiple times. If you
 * spawn new (child) processes they start up like any MPI application,
 * i.e., they call "MPI_Init ()" and can use the communicator
 * MPI_COMM_WORLD afterwards. This communicator contains only the
 * child processes which have been created with the same call of
 * "MPI_Comm_spawn ()" and which is distinct from MPI_COMM_WORLD
 * of the parent process or processes created in other calls of
 * "MPI_Comm_spawn ()". The natural communication mechanism between
 * the groups of parent and child processes is via an
 * inter-communicator which will be returned from the above
 * MPI functions to spawn new processes. The local group of the
 * inter-communicator contains the parent processes and the remote
 * group contains the child processes. The child processes can get
 * the same inter-communicator calling "MPI_Comm_get_parent ()".
 * Now it is obvious that calling "MPI_Comm_spawn ()" multiple
 * times will create many sets of children with different
 * communicators MPI_COMM_WORLD whereas "MPI_Comm_spawn_multiple ()"
 * creates child processes with a single MPI_COMM_WORLD. Furthermore
 * spawning several processes in one call may be faster than spawning
 * them sequentially and perhaps even the communication between
 * processes spawned at the same time may be faster than communication
 * between sequentially spawned processes.
 *
 * For collective operations it is sometimes easier if all processes
 * belong to the same intra-communicator. You can use the function
 * "MPI_Intercomm_merge ()" to merge the local and remote group of
 * an inter-communicator into an intra-communicator.
 * 
 *
 * Compiling:
 *   Store executable(s) into local directory.
 * mpicc -o  
 *
 *   Store executable(s) into predefined directories.
 * make
 *
 *   Make program(s) automatically on all specified hosts. You must
 *   edit the file "make_compile" and specify your host names before
 *   you execute it.
 * make_compile
 *
 * Running:
 *   LAM-MPI:
 * mpiexec -boot -np  
 * or
 * mpiexec -boot \
 *	 -host  -np   : \
 *	 -host  -np  
 * or
 * mpiexec -boot [-v] -configfile 
 * or
 * lamboot [-v] []
 *   mpiexec -np  
 *	 or
 *	 mpiexec [-v] -configfile 
 * lamhalt
 *
 *   OpenMPI:
 * "host1", "host2", and so on can all have the same name,
 * if you want to start a virtual 

[OMPI users] Open MPI does not work when MPICH or intel MPI are installed

2016-05-23 Thread Megdich Islem
Hi,
I am using 2 software, one is called Open Foam and the other called EMPIRE that 
need to run together at the same time.Open Foam uses  Open MPI implementation 
and EMPIRE uses either MPICH or intel mpi.The version of Open MPI that comes 
with Open Foam is 1.6.5.I am using Intel (R) MPI Library for linux * OS, 
version 5.1.3 and MPICH 3.0.4.
My problem is when I have the environment variables of  either mpich or Intel 
MPI  sourced to bashrc, I fail to run a case of Open Foam with parallel 
processing ( You find attached a picture of the error I got ) This is an 
example of a command line I use to run Open Foammpirun -np 4 interFoam -parallel
Once I keep the environment variable of OpenFoam only, the parallel processing 
works without any problem, so I won't be able to run EMPIRE.
I am sourcing the environment variables in this way:
For Open Foam:source /opt/openfoam30/etc/bashrc
For MPICH 3.0.4
export PATH=/home/islem/Desktop/mpich/bin:$PATHexport 
LD_LIBRARY_PATH="/home/islem/Desktop/mpich/lib/:$LD_LIBRARY_PATH"export 
MPICH_F90=gfortranexport MPICH_CC=/opt/intel/bin/iccexport 
MPICH_CXX=/opt/intel/bin/icpcexport 
MPICH-LINK_CXX="-L/home/islem/Desktop/mpich/lib/ -Wl,-rpath 
-Wl,/home/islem/Desktop/mpich/lib -lmpichcxx -lmpich -lopa -lmpl -lrt -lpthread"
For intel
export 
PATH=$PATH:/opt/intel/bin/LD_LIBRARY_PATH="/opt/intel/lib/intel64:$LD_LIBRARY_PATH"export
 LD_LIBRARY_PATHsource 
/opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpivars.sh 
intel64
If Only Open Foam is sourced, mpirun --version gives OPEN MPI (1.6.5)If Open 
Foam and MPICH are sourced, mpirun --version gives mpich 3.0.1If Open Foam and 
intel MPI are sourced, mpirun --version gives intel (R) MPI libarary for linux, 
version 5.1.3 
My question is why I can't have two MPI implementation installed and sourced 
together. How can I solve the problem ?
Regards,Islem Megdiche





Re: [OMPI users] Open MPI does not work when MPICH or intel MPI are installed

2016-05-23 Thread Andy Riebs

  
  
Hi,

The short answer: Environment module files are probably the best
solution for your problem.

The long answer: See
,
which pretty much addresses your question.

Andy

On 05/23/2016 07:40 AM, Megdich Islem
  wrote:


  
  
Hi,


I am
  using 2 software, one is called Open Foam and the other called
  EMPIRE that need to run together at the same time.
Open
  Foam uses  Open MPI implementation and EMPIRE uses either
  MPICH or intel mpi.
The
  version of Open MPI that comes with Open Foam is 1.6.5.
I am
  using Intel (R) MPI Library for linux * OS, version 5.1.3 and
  MPICH 3.0.4.


My
  problem is when I have the environment variables of  either
  mpich or Intel MPI  sourced to bashrc, I fail to run a case of
  Open Foam with parallel processing ( You find attached a
  picture of the error I got ) 
This
  is an example of a command line I use to run Open Foam
mpirun
  -np 4 interFoam -parallel


Once I
  keep the environment variable of OpenFoam only, the parallel
  processing works without any problem, so I won't be able to
  run EMPIRE.


I am
  sourcing the environment variables in this way:


For
  Open Foam:
source
  /opt/openfoam30/etc/bashrc


For
  MPICH 3.0.4


export
  PATH=/home/islem/Desktop/mpich/bin:$PATH
export
LD_LIBRARY_PATH="/home/islem/Desktop/mpich/lib/:$LD_LIBRARY_PATH"
export
  MPICH_F90=gfortran
export
  MPICH_CC=/opt/intel/bin/icc
export
  MPICH_CXX=/opt/intel/bin/icpc
export
  MPICH-LINK_CXX="-L/home/islem/Desktop/mpich/lib/ -Wl,-rpath
  -Wl,/home/islem/Desktop/mpich/lib -lmpichcxx -lmpich -lopa
  -lmpl -lrt -lpthread"


For
  intel


export
  PATH=$PATH:/opt/intel/bin/
LD_LIBRARY_PATH="/opt/intel/lib/intel64:$LD_LIBRARY_PATH"
export
  LD_LIBRARY_PATH
source
  /opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpivars.sh
  intel64


If
  Only Open Foam is sourced, mpirun --version gives OPEN MPI
  (1.6.5)
If
  Open Foam and MPICH are sourced, mpirun --version gives mpich
  3.0.1
If
  Open Foam and intel MPI are sourced, mpirun --version gives
  intel (R) MPI libarary for linux, version 5.1.3 


My
  question is why I can't have two MPI implementation installed
  and sourced together. How can I solve the problem ?


Regards,
Islem
  Megdiche








  
  
  
  
  ___
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: http://www.open-mpi.org/community/lists/users/2016/05/29279.php


  



[OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-23 Thread Siegmar Gross

Hi,

I installed openmpi-1.10.3rc2 on my "SUSE Linux Enterprise Server
12 (x86_64)" with Sun C 5.13  and gcc-6.1.0. Unfortunately I get
a segmentation fault for "--slot-list" for one of my small programs.


loki spawn 119 ompi_info | grep -e "OPAL repo revision:" -e "C compiler 
absolute:"
  OPAL repo revision: v1.10.2-201-gd23dda8
 C compiler absolute: /usr/local/gcc-6.1.0/bin/gcc


loki spawn 120 mpiexec -np 1 --host loki,loki,loki,loki,loki spawn_master

Parent process 0 running on loki
  I create 4 slave processes

Parent process 0: tasks in MPI_COMM_WORLD:1
  tasks in COMM_CHILD_PROCESSES local group:  1
  tasks in COMM_CHILD_PROCESSES remote group: 4

Slave process 0 of 4 running on loki
Slave process 1 of 4 running on loki
Slave process 2 of 4 running on loki
spawn_slave 2: argv[0]: spawn_slave
Slave process 3 of 4 running on loki
spawn_slave 0: argv[0]: spawn_slave
spawn_slave 1: argv[0]: spawn_slave
spawn_slave 3: argv[0]: spawn_slave




loki spawn 121 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master

Parent process 0 running on loki
  I create 4 slave processes

[loki:17326] *** Process received signal ***
[loki:17326] Signal: Segmentation fault (11)
[loki:17326] Signal code: Address not mapped (1)
[loki:17326] Failing at address: 0x8
[loki:17326] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7f4e469b3870]
[loki:17326] [ 1] *** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[loki:17324] Local abort before MPI_INIT completed successfully; not able to 
aggregate error messages, and not able to guarantee that all other processes 
were killed!

/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_proc_self+0x35)[0x7f4e46c165b0]
[loki:17326] [ 2] 
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_comm_init+0x68b)[0x7f4e46bf5b08]

[loki:17326] [ 3] *** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[loki:17325] Local abort before MPI_INIT completed successfully; not able to 
aggregate error messages, and not able to guarantee that all other processes 
were killed!

/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_mpi_init+0xa90)[0x7f4e46c1be8a]
[loki:17326] [ 4] 
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(MPI_Init+0x180)[0x7f4e46c5828e]

[loki:17326] [ 5] spawn_slave[0x40097e]
[loki:17326] [ 6] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f4e4661db05]
[loki:17326] [ 7] spawn_slave[0x400a54]
[loki:17326] *** End of error message ***
---
Child job 2 terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
---
--
mpiexec detected that one or more processes exited with non-zero status, thus 
causing

the job to be terminated. The first process to do so was:

  Process name: [[56340,2],0]
  Exit code:1
--
loki spawn 122




I would be grateful, if somebody can fix the problem. Thank you
very much for any help in advance.


Kind regards

Siegmar


Re: [OMPI users] problem about mpirun on two nodes

2016-05-23 Thread Jeff Squyres (jsquyres)
On May 21, 2016, at 11:31 PM, dour...@aol.com wrote:
> 
> I encountered a problem about mpirun and SSH when using OMPI 1.10.0 compiled 
> with gcc, running on centos7.2.
> When I execute mpirun on my 2 node cluster, I get the following errors pasted 
> below.
> 
> [douraku@master home]$ mpirun -np 12 a.out
> Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

This is the key right here: you got a permission denied error when you 
(assumedly) tried to execute on the remote server.

Triple check your ssh settings to ensure that you can run on the remote 
server(s) without a password or interactive passphrase entry.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] Open MPI does not work when MPICH or intel MPI are installed

2016-05-23 Thread Gilles Gouaillardet
modules are way more friendly that manually setting and exporting your
environment.
the issue here is you are setting your environment in your .bashrc, and
that cannot work if your account is used with various MPI implementations.
(unless your .bashrc checks a third party variable to select the
appropriate mpi, in this case, simply extend the logic to select openmpi)

if you configure'd with --enable-mpirun-prefix-by-default, you should not
need anything in your environment.

Cheers,

Gilles

On Monday, May 23, 2016, Andy Riebs  wrote:

> Hi,
>
> The short answer: Environment module files are probably the best solution
> for your problem.
>
> The long answer: See
> 
> , which
> pretty much addresses your question.
>
> Andy
>
> On 05/23/2016 07:40 AM, Megdich Islem wrote:
>
> Hi,
>
> I am using 2 software, one is called Open Foam and the other called EMPIRE
> that need to run together at the same time.
> Open Foam uses  Open MPI implementation and EMPIRE uses either MPICH or
> intel mpi.
> The version of Open MPI that comes with Open Foam is 1.6.5.
> I am using Intel (R) MPI Library for linux * OS, version 5.1.3 and MPICH
> 3.0.4.
>
> My problem is when I have the environment variables of  either mpich or
> Intel MPI  sourced to bashrc, I fail to run a case of Open Foam with
> parallel processing ( You find attached a picture of the error I got )
> This is an example of a command line I use to run Open Foam
> mpirun -np 4 interFoam -parallel
>
> Once I keep the environment variable of OpenFoam only, the parallel
> processing works without any problem, so I won't be able to run EMPIRE.
>
> I am sourcing the environment variables in this way:
>
> For Open Foam:
> source /opt/openfoam30/etc/bashrc
>
> For MPICH 3.0.4
>
> export PATH=/home/islem/Desktop/mpich/bin:$PATH
> export LD_LIBRARY_PATH="/home/islem/Desktop/mpich/lib/:$LD_LIBRARY_PATH"
> export MPICH_F90=gfortran
> export MPICH_CC=/opt/intel/bin/icc
> export MPICH_CXX=/opt/intel/bin/icpc
> export MPICH-LINK_CXX="-L/home/islem/Desktop/mpich/lib/ -Wl,-rpath
> -Wl,/home/islem/Desktop/mpich/lib -lmpichcxx -lmpich -lopa -lmpl -lrt
> -lpthread"
>
> For intel
>
> export PATH=$PATH:/opt/intel/bin/
> LD_LIBRARY_PATH="/opt/intel/lib/intel64:$LD_LIBRARY_PATH"
> export LD_LIBRARY_PATH
> source
> /opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpivars.sh
> intel64
>
> If Only Open Foam is sourced, mpirun --version gives OPEN MPI (1.6.5)
> If Open Foam and MPICH are sourced, mpirun --version gives mpich 3.0.1
> If Open Foam and intel MPI are sourced, mpirun --version gives intel (R)
> MPI libarary for linux, version 5.1.3
>
> My question is why I can't have two MPI implementation installed and
> sourced together. How can I solve the problem ?
>
> Regards,
> Islem Megdiche
>
>
>
>
>
>
> ___
> users mailing listus...@open-mpi.org 
> 
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29279.php
>
>
>


Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-23 Thread Ralph Castain
I cannot replicate the problem - both scenarios work fine for me. I’m not 
convinced your test code is correct, however, as you call Comm_free the 
inter-communicator but didn’t call Comm_disconnect. Checkout the attached for a 
correct code and see if it works for you.

FWIW: I don’t know how many cores you have on your sockets, but if you have 6 
cores/socket, then your slot-list is equivalent to “—bind-to none” as the 
slot-list applies to every process being launched



simple_spawn.c
Description: Binary data


> On May 23, 2016, at 6:26 AM, Siegmar Gross 
>  wrote:
> 
> Hi,
> 
> I installed openmpi-1.10.3rc2 on my "SUSE Linux Enterprise Server
> 12 (x86_64)" with Sun C 5.13  and gcc-6.1.0. Unfortunately I get
> a segmentation fault for "--slot-list" for one of my small programs.
> 
> 
> loki spawn 119 ompi_info | grep -e "OPAL repo revision:" -e "C compiler 
> absolute:"
>  OPAL repo revision: v1.10.2-201-gd23dda8
> C compiler absolute: /usr/local/gcc-6.1.0/bin/gcc
> 
> 
> loki spawn 120 mpiexec -np 1 --host loki,loki,loki,loki,loki spawn_master
> 
> Parent process 0 running on loki
>  I create 4 slave processes
> 
> Parent process 0: tasks in MPI_COMM_WORLD:1
>  tasks in COMM_CHILD_PROCESSES local group:  1
>  tasks in COMM_CHILD_PROCESSES remote group: 4
> 
> Slave process 0 of 4 running on loki
> Slave process 1 of 4 running on loki
> Slave process 2 of 4 running on loki
> spawn_slave 2: argv[0]: spawn_slave
> Slave process 3 of 4 running on loki
> spawn_slave 0: argv[0]: spawn_slave
> spawn_slave 1: argv[0]: spawn_slave
> spawn_slave 3: argv[0]: spawn_slave
> 
> 
> 
> 
> loki spawn 121 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master
> 
> Parent process 0 running on loki
>  I create 4 slave processes
> 
> [loki:17326] *** Process received signal ***
> [loki:17326] Signal: Segmentation fault (11)
> [loki:17326] Signal code: Address not mapped (1)
> [loki:17326] Failing at address: 0x8
> [loki:17326] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7f4e469b3870]
> [loki:17326] [ 1] *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***and potentially your MPI job)
> [loki:17324] Local abort before MPI_INIT completed successfully; not able to 
> aggregate error messages, and not able to guarantee that all other processes 
> were killed!
> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_proc_self+0x35)[0x7f4e46c165b0]
> [loki:17326] [ 2] 
> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_comm_init+0x68b)[0x7f4e46bf5b08]
> [loki:17326] [ 3] *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***and potentially your MPI job)
> [loki:17325] Local abort before MPI_INIT completed successfully; not able to 
> aggregate error messages, and not able to guarantee that all other processes 
> were killed!
> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_mpi_init+0xa90)[0x7f4e46c1be8a]
> [loki:17326] [ 4] 
> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(MPI_Init+0x180)[0x7f4e46c5828e]
> [loki:17326] [ 5] spawn_slave[0x40097e]
> [loki:17326] [ 6] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f4e4661db05]
> [loki:17326] [ 7] spawn_slave[0x400a54]
> [loki:17326] *** End of error message ***
> ---
> Child job 2 terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> ---
> --
> mpiexec detected that one or more processes exited with non-zero status, thus 
> causing
> the job to be terminated. The first process to do so was:
> 
>  Process name: [[56340,2],0]
>  Exit code:1
> --
> loki spawn 122
> 
> 
> 
> 
> I would be grateful, if somebody can fix the problem. Thank you
> very much for any help in advance.
> 
> 
> Kind regards
> 
> Siegmar
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29281.php



[OMPI users] mpirun java

2016-05-23 Thread Claudio Stamile
Dear all,

I'm using openmpi for Java.
I've a problem when I try to use more option parameters in my java command.
More in detail I run mpirun as follow:

mpirun -n 5 java -cp path1:path2 -Djava.library.path=pathLibs
classification.MyClass

It seems that the option "-Djava.library.path" is ignored when i execute
the command.

Is it normal ?

Do you know how to solve this problem ?

Thank you.

Best,
Claudio

-- 
C.


[OMPI users] wtime implementation in 1.10

2016-05-23 Thread Dave Love
I thought the 1.10 branch had been fixed to use clock_gettime for
MPI_Wtime where it's available, a la
https://www.open-mpi.org/community/lists/users/2016/04/28899.php -- and
have been telling people so!  However, I realize it hasn't, and it looks
as if 1.10 is still being maintained.

Is there a good reason for that, or could it be fixed?


Re: [OMPI users] wtime implementation in 1.10

2016-05-23 Thread Ralph Castain
Nobody ever filed a PR to update the branch with the patch - looks like you 
never responded to confirm that George’s proposed patch was acceptable. I’ll 
create the PR and copy you for review


> On May 23, 2016, at 9:17 AM, Dave Love  wrote:
> 
> I thought the 1.10 branch had been fixed to use clock_gettime for
> MPI_Wtime where it's available, a la
> https://www.open-mpi.org/community/lists/users/2016/04/28899.php -- and
> have been telling people so!  However, I realize it hasn't, and it looks
> as if 1.10 is still being maintained.
> 
> Is there a good reason for that, or could it be fixed?
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29286.php



Re: [OMPI users] mpirun java

2016-05-23 Thread Howard Pritchard
Hello Claudio,

mpirun should be combining your java.library.path option with the one
needed to add
the Open MPI's java bindings as well.

Which version of Open MPI are you using?

Could you first try to compile the Ring.java code in ompi/examples and run
it with the
following additional mpirun parameter?

mpirun -np 1 --mca odls_base_verbose 100 java Ring

then try your application with the same "odls_base_verbose" mpirun option

and post the output from the two runs to the mail list?

I suspect there may be a bug with building the combined java.library.path
in the Open MPI code.

Howard


2016-05-23 9:47 GMT-06:00 Claudio Stamile :

> Dear all,
>
> I'm using openmpi for Java.
> I've a problem when I try to use more option parameters in my java
> command. More in detail I run mpirun as follow:
>
> mpirun -n 5 java -cp path1:path2 -Djava.library.path=pathLibs
> classification.MyClass
>
> It seems that the option "-Djava.library.path" is ignored when i execute
> the command.
>
> Is it normal ?
>
> Do you know how to solve this problem ?
>
> Thank you.
>
> Best,
> Claudio
>
> --
> C.
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/05/29285.php
>


Re: [OMPI users] mpirun java

2016-05-23 Thread Saliya Ekanayake
I tested with OpenMPI 1.10.1 and it works.

See this example, which prints java.library.path

mpijavac LibPath.java
mpirun -np 2 java -Djava.library.path=path LibPath

On Mon, May 23, 2016 at 1:38 PM, Howard Pritchard 
wrote:

> Hello Claudio,
>
> mpirun should be combining your java.library.path option with the one
> needed to add
> the Open MPI's java bindings as well.
>
> Which version of Open MPI are you using?
>
> Could you first try to compile the Ring.java code in ompi/examples and run
> it with the
> following additional mpirun parameter?
>
> mpirun -np 1 --mca odls_base_verbose 100 java Ring
>
> then try your application with the same "odls_base_verbose" mpirun option
>
> and post the output from the two runs to the mail list?
>
> I suspect there may be a bug with building the combined java.library.path
> in the Open MPI code.
>
> Howard
>
>
> 2016-05-23 9:47 GMT-06:00 Claudio Stamile :
>
>> Dear all,
>>
>> I'm using openmpi for Java.
>> I've a problem when I try to use more option parameters in my java
>> command. More in detail I run mpirun as follow:
>>
>> mpirun -n 5 java -cp path1:path2 -Djava.library.path=pathLibs
>> classification.MyClass
>>
>> It seems that the option "-Djava.library.path" is ignored when i execute
>> the command.
>>
>> Is it normal ?
>>
>> Do you know how to solve this problem ?
>>
>> Thank you.
>>
>> Best,
>> Claudio
>>
>> --
>> C.
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/05/29285.php
>>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/05/29288.php
>



-- 
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington


LibPath.java
Description: Binary data


Re: [OMPI users] problem about mpirun on two nodes

2016-05-23 Thread douraku
Jeff, Thank you for your advice.

By bad. I took the wrong shot, because I tested so many different settings. 
After I came back to the original network settings, "permission denied', of 
course disappeared, but the other messages were still there. The master node 
has two NICs, one for WAN (via another server) with zone=external and the other 
for the slave node, zone = internal. The NICs on the master are in different 
subnet.
NIC on the slave node is set to 'internal'.Their status was confirmed by 
firewall-cmd --get-active-zones. 

I temporary stopped firewalld and the error messages disappeared. I saw six 
processes were running on each node, but now the all processes keep running 
forever with 100% CPU usage.


-Original Message-
From: Jeff Squyres (jsquyres) 
To: Open MPI User's List 
Sent: Mon, May 23, 2016 9:13 am
Subject: Re: [OMPI users] problem about mpirun on two nodes

On May 21, 2016, at 11:31 PM, dour...@aol.com wrote:
> 
> I encountered a problem about mpirun and SSH when using OMPI 1.10.0 compiled 
> with gcc, running on centos7.2.
> When I execute mpirun on my 2 node cluster, I get the following errors pasted 
> below.
> 
> [douraku@master home]$ mpirun -np 12 a.out
> Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

This is the key right here: you got a permission denied error when you 
(assumedly) tried to execute on the remote server.

Triple check your ssh settings to ensure that you can run on the remote 
server(s) without a password or interactive passphrase entry.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/05/29282.php


Re: [OMPI users] problem about mpirun on two nodes

2016-05-23 Thread Jeff Squyres (jsquyres)
You might want to test with some known-good MPI applications first.  Try 
following the steps in this FAQ item:

https://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems


> On May 23, 2016, at 2:31 PM, dour...@aol.com wrote:
> 
> Jeff, Thank you for your advice.
> 
> By bad. I took the wrong shot, because I tested so many different settings. 
> After I came back to the original network settings, "permission denied', of 
> course disappeared, but the other messages were still there. The master node 
> has two NICs, one for WAN (via another server) with zone=external and the 
> other for the slave node, zone = internal. The NICs on the master are in 
> different subnet.
> NIC on the slave node is set to 'internal'.Their status was confirmed by 
> firewall-cmd --get-active-zones. 
> 
> I temporary stopped firewalld and the error messages disappeared. I saw six 
> processes were running on each node, but now the all processes keep running 
> forever with 100% CPU usage.
> 
> 
> -Original Message-
> From: Jeff Squyres (jsquyres) 
> To: Open MPI User's List 
> Sent: Mon, May 23, 2016 9:13 am
> Subject: Re: [OMPI users] problem about mpirun on two nodes
> 
> On May 21, 2016, at 11:31 PM, dour...@aol.com wrote:
>> 
>> I encountered a problem about mpirun and SSH when using OMPI 1.10.0 compiled 
>> with gcc, running on centos7.2.
>> When I execute mpirun on my 2 node cluster, I get the following errors 
>> pasted below.
>> 
>> [douraku@master home]$ mpirun -np 12 a.out
>> Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
> 
> This is the key right here: you got a permission denied error when you 
> (assumedly) tried to execute on the remote server.
> 
> Triple check your ssh settings to ensure that you can run on the remote 
> server(s) without a password or interactive passphrase entry.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29282.php
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29290.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] mpirun java

2016-05-23 Thread Claudio Stamile
Hi Howard.

Thank you for your reply.

I'm using version 1.10.2

I executed the following command:

mpirun -np 2 --mca odls_base_verbose 100 java -cp alot:of:jarfile
-Djava.library.path=/Users/stamile/Applications/IBM/ILOG/CPLEX_Studio1263/cplex/bin/x86-64_osx
clustering.TensorClusterinCplexMPI


the output is:

* Num procs: 2 FirstRank: 0 Recovery: DEFAULT Max Restarts: 0*

*  Argv[0]: java*

*  Argv[1]: -cp*

*  Argv[2]:
/Applications/Eclipse.app/Contents/MacOS:/Users/stamile/Documents/workspace_newJava/TensorFactorization/bin:/Users/stamile/Applications/IBM/ILOG/CPLEX_Studio1263/cplex/lib/cplex.jar:/Users/stamile/Downloads/commons-lang3-3.4/commons-lang3-3.4.jar:/Users/stamile/Downloads/Jama-1.0.3.jar:/Users/stamile/Downloads/hyperdrive-master/hyperdrive.jar:/usr/local/lib:/usr/local/lib/mpi.jar*

*  Argv[3]:
/Users/stamile/Applications/IBM/ILOG/CPLEX_Studio1263/cplex/bin/x86-64_osx*

*  Argv[4]:
-Djava.library.path=-Djava.library.path=/Users/stamile/Applications/IBM/ILOG/CPLEX_Studio1263/cplex/bin/x86-64_osx:/usr/local/lib*

*  Argv[5]: clustering.TensorClusterinCplexMPI*

*  Env[0]: OMPI_MCA_odls_base_verbose=100*

*  Env[1]: OMPI_COMMAND=clustering.TensorClusterinCplexMPI*

*  Env[2]:
OMPI_MCA_orte_precondition_transports=e6a8891c458c267b-c079810b4abe7ebf*

*  Env[3]: OMPI_MCA_orte_peer_modex_id=0*

*  Env[4]: OMPI_MCA_orte_peer_init_barrier_id=1*

*  Env[5]: OMPI_MCA_orte_peer_fini_barrier_id=2*

*  Env[6]: TMPDIR=/var/folders/5t/6tqp003x4fn09fzgtx46tjdhgn/T/*


Argv[4] looks strange. Indeed if I execute:

mpirun -np 2 --mca odls_base_verbose 100 java -cp alot:of:jarfile
clustering.TensorClusterinCplexMPI
The same as before without
( 
-Djava.library.path=/Users/stamile/Applications/IBM/ILOG/CPLEX_Studio1263/cplex/bin/x86-64_osx
)
i obtain:

*Argv[0]: java*

*  Argv[1]: -Djava.library.path=/usr/local/lib*

*  Argv[2]: -cp*

*  Argv[3]:
/Applications/Eclipse.app/Contents/MacOS:/Users/stamile/Documents/workspace_newJava/TensorFactorization/bin:/Users/stamile/Applications/IBM/ILOG/CPLEX_Studio1263/cplex/lib/cplex.jar:/Users/stamile/Downloads/commons-lang3-3.4/commons-lang3-3.4.jar:/Users/stamile/Downloads/Jama-1.0.3.jar:/Users/stamile/Downloads/hyperdrive-master/hyperdrive.jar:/usr/local/lib:/usr/local/lib/mpi.jar*

*  Argv[4]: clustering.TensorClusterinCplexMPI*

*  Env[0]: OMPI_MCA_odls_base_verbose=100*

*  Env[1]: OMPI_COMMAND=clustering.TensorClusterinCplexMPI*

*  Env[2]:
OMPI_MCA_orte_precondition_transports=92248561306f2b2e-601ae65dc34a347c*

*  Env[3]: OMPI_MCA_orte_peer_modex_id=0*

*  Env[4]: OMPI_MCA_orte_peer_init_barrier_id=1*

*  Env[5]: OMPI_MCA_orte_peer_fini_barrier_id=2*

*  Env[6]: TMPDIR=/var/folders/5t/6tqp003x4fn09fzgtx46tjdhgn/T/*

*  Env[7]: __CF_USER_TEXT_ENCODING=0x1F5:0x0:0x4*


What do you think ?

Best,

Claudio

2016-05-23 19:38 GMT+02:00 Howard Pritchard :

> Hello Claudio,
>
> mpirun should be combining your java.library.path option with the one
> needed to add
> the Open MPI's java bindings as well.
>
> Which version of Open MPI are you using?
>
> Could you first try to compile the Ring.java code in ompi/examples and run
> it with the
> following additional mpirun parameter?
>
> mpirun -np 1 --mca odls_base_verbose 100 java Ring
>
> then try your application with the same "odls_base_verbose" mpirun option
>
> and post the output from the two runs to the mail list?
>
> I suspect there may be a bug with building the combined java.library.path
> in the Open MPI code.
>
> Howard
>
>
> 2016-05-23 9:47 GMT-06:00 Claudio Stamile :
>
>> Dear all,
>>
>> I'm using openmpi for Java.
>> I've a problem when I try to use more option parameters in my java
>> command. More in detail I run mpirun as follow:
>>
>> mpirun -n 5 java -cp path1:path2 -Djava.library.path=pathLibs
>> classification.MyClass
>>
>> It seems that the option "-Djava.library.path" is ignored when i execute
>> the command.
>>
>> Is it normal ?
>>
>> Do you know how to solve this problem ?
>>
>> Thank you.
>>
>> Best,
>> Claudio
>>
>> --
>> C.
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/05/29285.php
>>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/05/29288.php
>



-- 
C.


Re: [OMPI users] mpirun java

2016-05-23 Thread Howard Pritchard
HI Ralph,

Yep, If you could handle this that would be great.  I guess we'd like a fix
in 1.10.x and for 2.0.1
that would be great.

Howard


2016-05-23 14:59 GMT-06:00 Ralph Castain :

> Looks to me like there is a bug in the orterun parser that is trying to
> add java library paths - I can take a look at it
>
> On May 23, 2016, at 1:05 PM, Claudio Stamile 
> wrote:
>
> Hi Howard.
>
> Thank you for your reply.
>
> I'm using version 1.10.2
>
> I executed the following command:
>
> mpirun -np 2 --mca odls_base_verbose 100 java -cp alot:of:jarfile
> -Djava.library.path=/Users/stamile/Applications/IBM/ILOG/CPLEX_Studio1263/cplex/bin/x86-64_osx
> clustering.TensorClusterinCplexMPI
>
>
> the output is:
>
> * Num procs: 2 FirstRank: 0 Recovery: DEFAULT Max Restarts: 0*
>
> *  Argv[0]: java*
>
> *  Argv[1]: -cp*
>
> *  Argv[2]:
> /Applications/Eclipse.app/Contents/MacOS:/Users/stamile/Documents/workspace_newJava/TensorFactorization/bin:/Users/stamile/Applications/IBM/ILOG/CPLEX_Studio1263/cplex/lib/cplex.jar:/Users/stamile/Downloads/commons-lang3-3.4/commons-lang3-3.4.jar:/Users/stamile/Downloads/Jama-1.0.3.jar:/Users/stamile/Downloads/hyperdrive-master/hyperdrive.jar:/usr/local/lib:/usr/local/lib/mpi.jar*
>
> *  Argv[3]:
> /Users/stamile/Applications/IBM/ILOG/CPLEX_Studio1263/cplex/bin/x86-64_osx*
>
> *  Argv[4]:
> -Djava.library.path=-Djava.library.path=/Users/stamile/Applications/IBM/ILOG/CPLEX_Studio1263/cplex/bin/x86-64_osx:/usr/local/lib*
>
> *  Argv[5]: clustering.TensorClusterinCplexMPI*
>
> *  Env[0]: OMPI_MCA_odls_base_verbose=100*
>
> *  Env[1]: OMPI_COMMAND=clustering.TensorClusterinCplexMPI*
>
> *  Env[2]:
> OMPI_MCA_orte_precondition_transports=e6a8891c458c267b-c079810b4abe7ebf*
>
> *  Env[3]: OMPI_MCA_orte_peer_modex_id=0*
>
> *  Env[4]: OMPI_MCA_orte_peer_init_barrier_id=1*
>
> *  Env[5]: OMPI_MCA_orte_peer_fini_barrier_id=2*
>
> *  Env[6]: TMPDIR=/var/folders/5t/6tqp003x4fn09fzgtx46tjdhgn/T/*
>
>
> Argv[4] looks strange. Indeed if I execute:
>
> mpirun -np 2 --mca odls_base_verbose 100 java -cp alot:of:jarfile
> clustering.TensorClusterinCplexMPI
> The same as before without
> ( 
> -Djava.library.path=/Users/stamile/Applications/IBM/ILOG/CPLEX_Studio1263/cplex/bin/x86-64_osx
>  )
> i obtain:
>
> *Argv[0]: java*
>
> *  Argv[1]: -Djava.library.path=/usr/local/lib*
>
> *  Argv[2]: -cp*
>
> *  Argv[3]:
> /Applications/Eclipse.app/Contents/MacOS:/Users/stamile/Documents/workspace_newJava/TensorFactorization/bin:/Users/stamile/Applications/IBM/ILOG/CPLEX_Studio1263/cplex/lib/cplex.jar:/Users/stamile/Downloads/commons-lang3-3.4/commons-lang3-3.4.jar:/Users/stamile/Downloads/Jama-1.0.3.jar:/Users/stamile/Downloads/hyperdrive-master/hyperdrive.jar:/usr/local/lib:/usr/local/lib/mpi.jar*
>
> *  Argv[4]: clustering.TensorClusterinCplexMPI*
>
> *  Env[0]: OMPI_MCA_odls_base_verbose=100*
>
> *  Env[1]: OMPI_COMMAND=clustering.TensorClusterinCplexMPI*
>
> *  Env[2]:
> OMPI_MCA_orte_precondition_transports=92248561306f2b2e-601ae65dc34a347c*
>
> *  Env[3]: OMPI_MCA_orte_peer_modex_id=0*
>
> *  Env[4]: OMPI_MCA_orte_peer_init_barrier_id=1*
>
> *  Env[5]: OMPI_MCA_orte_peer_fini_barrier_id=2*
>
> *  Env[6]: TMPDIR=/var/folders/5t/6tqp003x4fn09fzgtx46tjdhgn/T/*
>
> *  Env[7]: __CF_USER_TEXT_ENCODING=0x1F5:0x0:0x4*
>
>
> What do you think ?
>
> Best,
>
> Claudio
>
> 2016-05-23 19:38 GMT+02:00 Howard Pritchard :
>
>> Hello Claudio,
>>
>> mpirun should be combining your java.library.path option with the one
>> needed to add
>> the Open MPI's java bindings as well.
>>
>> Which version of Open MPI are you using?
>>
>> Could you first try to compile the Ring.java code in ompi/examples and
>> run it with the
>> following additional mpirun parameter?
>>
>> mpirun -np 1 --mca odls_base_verbose 100 java Ring
>>
>> then try your application with the same "odls_base_verbose" mpirun option
>>
>> and post the output from the two runs to the mail list?
>>
>> I suspect there may be a bug with building the combined java.library.path
>> in the Open MPI code.
>>
>> Howard
>>
>>
>> 2016-05-23 9:47 GMT-06:00 Claudio Stamile :
>>
>>> Dear all,
>>>
>>> I'm using openmpi for Java.
>>> I've a problem when I try to use more option parameters in my java
>>> command. More in detail I run mpirun as follow:
>>>
>>> mpirun -n 5 java -cp path1:path2 -Djava.library.path=pathLibs
>>> classification.MyClass
>>>
>>> It seems that the option "-Djava.library.path" is ignored when i execute
>>> the command.
>>>
>>> Is it normal ?
>>>
>>> Do you know how to solve this problem ?
>>>
>>> Thank you.
>>>
>>> Best,
>>> Claudio
>>>
>>> --
>>> C.
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2016/05/29285.php
>>>
>>
>>
>> 

Re: [OMPI users] users Digest, Vol 3510, Issue 2

2016-05-23 Thread Megdich Islem
"Open MPI does not work when MPICH or intel MPI are installed"
Thank you for your suggestion. But I need to run OpenFoam and Empire at the 
same time. In fact, Empire couples OpenFoam with another software.
Is there any solution for this case ?

Regards,Islem 

Le Lundi 23 mai 2016 17h00, "users-requ...@open-mpi.org" 
 a écrit :
 

 Send users mailing list submissions to
    us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
    https://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
    users-requ...@open-mpi.org

You can reach the person managing the list at
    users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

  1. Re: Open MPI does not work when MPICH or intel MPI are
      installed (Andy Riebs)
  2. segmentation fault for slot-list and openmpi-1.10.3rc2
      (Siegmar Gross)
  3. Re: problem about mpirun on two nodes (Jeff Squyres (jsquyres))
  4. Re: Open MPI does not work when MPICH or intel MPI are
      installed (Gilles Gouaillardet)
  5. Re: segmentation fault for slot-list and    openmpi-1.10.3rc2
      (Ralph Castain)
  6. mpirun java (Claudio Stamile)


--

[Message discarded by content filter]
--

Message: 2
List-Post: users@lists.open-mpi.org
Date: Mon, 23 May 2016 15:26:52 +0200
From: Siegmar Gross 
To: Open MPI Users 
Subject: [OMPI users] segmentation fault for slot-list and
    openmpi-1.10.3rc2
Message-ID:
    <241613b1-ada6-292f-eeb9-722fc8fa2...@informatik.hs-fulda.de>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi,

I installed openmpi-1.10.3rc2 on my "SUSE Linux Enterprise Server
12 (x86_64)" with Sun C 5.13  and gcc-6.1.0. Unfortunately I get
a segmentation fault for "--slot-list" for one of my small programs.


loki spawn 119 ompi_info | grep -e "OPAL repo revision:" -e "C compiler 
absolute:"
      OPAL repo revision: v1.10.2-201-gd23dda8
      C compiler absolute: /usr/local/gcc-6.1.0/bin/gcc


loki spawn 120 mpiexec -np 1 --host loki,loki,loki,loki,loki spawn_master

Parent process 0 running on loki
  I create 4 slave processes

Parent process 0: tasks in MPI_COMM_WORLD:                    1
                  tasks in COMM_CHILD_PROCESSES local group:  1
                  tasks in COMM_CHILD_PROCESSES remote group: 4

Slave process 0 of 4 running on loki
Slave process 1 of 4 running on loki
Slave process 2 of 4 running on loki
spawn_slave 2: argv[0]: spawn_slave
Slave process 3 of 4 running on loki
spawn_slave 0: argv[0]: spawn_slave
spawn_slave 1: argv[0]: spawn_slave
spawn_slave 3: argv[0]: spawn_slave




loki spawn 121 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master

Parent process 0 running on loki
  I create 4 slave processes

[loki:17326] *** Process received signal ***
[loki:17326] Signal: Segmentation fault (11)
[loki:17326] Signal code: Address not mapped (1)
[loki:17326] Failing at address: 0x8
[loki:17326] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7f4e469b3870]
[loki:17326] [ 1] *** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[loki:17324] Local abort before MPI_INIT completed successfully; not able to 
aggregate error messages, and not able to guarantee that all other processes 
were killed!
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_proc_self+0x35)[0x7f4e46c165b0]
[loki:17326] [ 2] 
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_comm_init+0x68b)[0x7f4e46bf5b08]
[loki:17326] [ 3] *** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[loki:17325] Local abort before MPI_INIT completed successfully; not able to 
aggregate error messages, and not able to guarantee that all other processes 
were killed!
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_mpi_init+0xa90)[0x7f4e46c1be8a]
[loki:17326] [ 4] 
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(MPI_Init+0x180)[0x7f4e46c5828e]
[loki:17326] [ 5] spawn_slave[0x40097e]
[loki:17326] [ 6] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f4e4661db05]
[loki:17326] [ 7] spawn_slave[0x400a54]
[loki:17326] *** End of error message ***
---
Child job 2 terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
---
--
mpiexec detected that one or more processes exited with non-zero status, thus 
causing
the job to be terminated. The first process 

Re: [OMPI users] users Digest, Vol 3510, Issue 2

2016-05-23 Thread Gilles Gouaillardet

what do you mean by coupling ?

does Empire and OpenFoam communicate via MPI ?

wouldn't it be much easier if you rebuild OpenFoam with mpich or intelmpi ?


Cheers,


Gilles


On 5/24/2016 8:44 AM, Megdich Islem wrote:

"Open MPI does not work when MPICH or intel MPI are installed"

Thank you for your suggestion. But I need to run OpenFoam and Empire 
at the same time. In fact, Empire couples OpenFoam with another software.


Is there any solution for this case ?


Regards,
Islem


Le Lundi 23 mai 2016 17h00, "users-requ...@open-mpi.org" 
 a écrit :



Send users mailing list submissions to
us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
https://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
users-requ...@open-mpi.org

You can reach the person managing the list at
users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

  1. Re: Open MPI does not work when MPICH or intel MPI are
  installed (Andy Riebs)
  2. segmentation fault for slot-list and openmpi-1.10.3rc2
  (Siegmar Gross)
  3. Re: problem about mpirun on two nodes (Jeff Squyres (jsquyres))
  4. Re: Open MPI does not work when MPICH or intel MPI are
  installed (Gilles Gouaillardet)
  5. Re: segmentation fault for slot-list and openmpi-1.10.3rc2
  (Ralph Castain)
  6. mpirun java (Claudio Stamile)


--

[Message discarded by content filter]
--

Message: 2
Date: Mon, 23 May 2016 15:26:52 +0200
From: Siegmar Gross 
To: Open MPI Users 
Subject: [OMPI users] segmentation fault for slot-list and
openmpi-1.10.3rc2
Message-ID:
<241613b1-ada6-292f-eeb9-722fc8fa2...@informatik.hs-fulda.de>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi,

I installed openmpi-1.10.3rc2 on my "SUSE Linux Enterprise Server
12 (x86_64)" with Sun C 5.13  and gcc-6.1.0. Unfortunately I get
a segmentation fault for "--slot-list" for one of my small programs.


loki spawn 119 ompi_info | grep -e "OPAL repo revision:" -e "C 
compiler absolute:"

  OPAL repo revision: v1.10.2-201-gd23dda8
  C compiler absolute: /usr/local/gcc-6.1.0/bin/gcc


loki spawn 120 mpiexec -np 1 --host loki,loki,loki,loki,loki spawn_master

Parent process 0 running on loki
  I create 4 slave processes

Parent process 0: tasks in MPI_COMM_WORLD:   1
  tasks in COMM_CHILD_PROCESSES local group:  1
  tasks in COMM_CHILD_PROCESSES remote group: 4

Slave process 0 of 4 running on loki
Slave process 1 of 4 running on loki
Slave process 2 of 4 running on loki
spawn_slave 2: argv[0]: spawn_slave
Slave process 3 of 4 running on loki
spawn_slave 0: argv[0]: spawn_slave
spawn_slave 1: argv[0]: spawn_slave
spawn_slave 3: argv[0]: spawn_slave




loki spawn 121 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 
spawn_master


Parent process 0 running on loki
  I create 4 slave processes

[loki:17326] *** Process received signal ***
[loki:17326] Signal: Segmentation fault (11)
[loki:17326] Signal code: Address not mapped (1)
[loki:17326] Failing at address: 0x8
[loki:17326] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7f4e469b3870]
[loki:17326] [ 1] *** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[loki:17324] Local abort before MPI_INIT completed successfully; not 
able to
aggregate error messages, and not able to guarantee that all other 
processes

were killed!
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_proc_self+0x35)[0x7f4e46c165b0]
[loki:17326] [ 2]
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_comm_init+0x68b)[0x7f4e46bf5b08]
[loki:17326] [ 3] *** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[loki:17325] Local abort before MPI_INIT completed successfully; not 
able to
aggregate error messages, and not able to guarantee that all other 
processes

were killed!
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_mpi_init+0xa90)[0x7f4e46c1be8a]
[loki:17326] [ 4]
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(MPI_Init+0x180)[0x7f4e46c5828e]
[loki:17326] [ 5] spawn_slave[0x40097e]
[loki:17326] [ 6] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f4e4661db05]
[loki:17326] [ 7] spawn_slave[0x400a54]
[loki:17326] *** End of error message ***
---
Child job 2 terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
---