[OMPI users] Problem with Mellanox ConnectX3 (FDR) and openmpi 4

2022-08-19 Thread Boyrie Fabrice via users
Hi I had to reinstall a cluster in AlmaLinux 8.6 I am unable to make openmpi 4 working with infiniband. I have the following message in a trivial pingpong test mpirun --hostfile hostfile -np 2 pingpong -- WARNING: Ther

Re: [OMPI users] Problem with OpenMPI as Third pary library

2022-08-10 Thread Benson Muite via users
10, 2022 3:26 AM *To:* Jeff Squyres (jsquyres) *Subject:* Re: [OMPI users] Problem with OpenMPI as Third pary library Hello, I tried what is explained there (changed OPAL_PREFIX to the new location and changed the rpath of my executable), I even added the flags --with-hwloc and the other one becaus

Re: [OMPI users] Problem with OpenMPI as Third pary library

2022-08-09 Thread Jeff Squyres (jsquyres) via users
sers Sent: Tuesday, August 9, 2022 9:52 AM To: users@lists.open-mpi.org Cc: Sebastian Gutierrez Subject: [OMPI users] Problem with OpenMPI as Third pary library Good morning Open-MPI organization, I have been trying to distribute your program as third party library in my CMake Project. Because I d

[OMPI users] Problem with OpenMPI as Third pary library

2022-08-09 Thread Sebastian Gutierrez via users
Good morning Open-MPI organization, I have been trying to distribute your program as third party library in my CMake Project. Because I do not want to my Linux users to have to install OpenMPI by their own. I just want them to use my final product that uses OPenMPI dependencies. When I execute m

Re: [OMPI users] Problem in starting openmpi job - no output just hangs - SOLVED

2020-09-01 Thread Tony Ladd via users
Jeff I found the solution - rdma needs significant memory so the limits on the shell have to be increased. I needed to add the lines * soft memlock unlimited * hard memlock unlimited to the end of the file /etc/security/limits.conf. After that the openib driver loads and everything is fine -

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-25 Thread Jeff Squyres (jsquyres) via users
On Aug 24, 2020, at 9:44 PM, Tony Ladd wrote: > > I appreciate your help (and John's as well). At this point I don't think is > an OMPI problem - my mistake. I think the communication with RDMA is somehow > disabled (perhaps its the verbs layer - I am not very knowledgeable with > this). It us

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-25 Thread John Hearns via users
I apologise. That was an Omnipath issue https://www.beowulf.org/pipermail/beowulf/2017-March/034214.html On Tue, 25 Aug 2020 at 08:17, John Hearns wrote: > Aha. I dimly remember a problem with the ibverbs /dev device - maybe the > permissions, > or more likely the owner account for that device.

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-25 Thread John Hearns via users
Aha. I dimly remember a problem with the ibverbs /dev device - maybe the permissions, or more likely the owner account for that device. On Tue, 25 Aug 2020 at 02:44, Tony Ladd wrote: > Hi Jeff > > I appreciate your help (and John's as well). At this point I don't think > is an OMPI problem - m

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-24 Thread Tony Ladd via users
Hi Jeff I appreciate your help (and John's as well). At this point I don't think is an OMPI problem - my mistake. I think the communication with RDMA is somehow disabled (perhaps its the verbs layer - I am not very knowledgeable with this). It used to work like a dream but Mellanox has appare

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-24 Thread Jeff Squyres (jsquyres) via users
I'm afraid I don't have many better answers for you. I can't quite tell from your machines, but are you running IMB-MPI1 Sendrecv *on a single node* with `--mca btl openib,self`? I don't remember offhand, but I didn't think that openib was supposed to do loopback communication. E.g., if both M

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-23 Thread Tony Ladd via users
Hi John Thanks for the response. I have run all those diagnostics, and as best I can tell the IB fabric is OK. I have a cluster of 49 nodes (48 clients + server) and the fabric passes all the tests. There is 1 warning: I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps SL:0x

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-23 Thread John Hearns via users
Tony, start at a low level. Is the Infiniband fabric healthy? Run ibstatus on every node sminfo on one node ibdiagnet on one node On Sun, 23 Aug 2020 at 05:02, Tony Ladd via users wrote: > Hi Jeff > > I installed ucx as you suggested. But I can't get even the simplest code > (ucp_client_server

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-22 Thread Tony Ladd via users
Hi Jeff I installed ucx as you suggested. But I can't get even the simplest code (ucp_client_server) to work across the network. I can compile openMPI with UCX but it has the same problem - mpi codes will not execute and there are no messages. Really, UCX is not helping. It is adding another

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-19 Thread Jeff Squyres (jsquyres) via users
Tony -- Have you tried compiling Open MPI with UCX support? This is Mellanox (NVIDIA's) preferred mechanism for InfiniBand support these days -- the openib BTL is legacy. You can run: mpirun --mca pml ucx ... > On Aug 19, 2020, at 12:46 PM, Tony Ladd via users > wrote: > > One other updat

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-19 Thread Tony Ladd via users
One other update. I compiled OpenMPI-4.0.4 The outcome was the same but there is no mention of ibv_obj this time. Tony -- Tony Ladd Chemical Engineering Department University of Florida Gainesville, Florida 32611-6005 USA Email: tladd-"(AT)"-che.ufl.edu Webhttp://ladd.che.ufl.edu Tel:

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-17 Thread Tony Ladd via users
My apologies - I did not read the FAQ's carefully enough - with regard to 14: 1. openib 2. Ubuntu supplied drivers etc. 3. Ubuntu 18.04  4.15.0-112-generic 4. opensm-3.3.5_mlnx-0.1.g6b18e73 5. Attached 6. Attached 7. unlimited on foam and 16384 on f34 I changed the ulimit to unlimited on

[OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-17 Thread Tony Ladd via users
I would very much appreciate some advice in how to debug this problem. I am trying to get OpenMPI to work on my reconfigured cluster - upgrading from Centos 5 to Ubuntu 18. The problem is that a simple job using Intel's IMB message passing test code will not run on any of the new clients (4 so

Re: [OMPI users] Problem with open-mpi installation

2020-06-05 Thread Jeff Squyres (jsquyres) via users
Are you actually running into a problem? A successful install may still end with "Nothing to be done..." messages. On Jun 5, 2020, at 10:48 AM, Edris Tajfirouzeh via users mailto:users@lists.open-mpi.org>> wrote: Dear Operator I'm trying to install open-mpi package on my mac catalina version

[OMPI users] Problem with open-mpi installation

2020-06-05 Thread Edris Tajfirouzeh via users
Dear Operator I'm trying to install open-mpi package on my mac catalina version 10.15.4. I have already installed gnu compilers and I tested them. I downloaded the package from the following link: https://www.open-mpi.org/software/ompi/v4.0/ I typed the following command in the command line tool:

[OMPI users] Problem with MPI_Spawn

2020-04-20 Thread Martín Morales via users
Hello All. I'm using OMPI 4.0.1. I run MPI_Spawn() as a singleton. I need to run, in the same instance of my app, different spawn configurations. In this case I run first using my hostfile (i.e. setting a MPI_Info object with MPI_Info_create() function, and then setting the attributes with MPI_I

Re: [OMPI users] problem with cancelling Send-Request

2019-10-02 Thread Jeff Hammond via users
e=g> > Edif. PRBB > 08003 Barcelona, Spain > Phone Ext: #1098 > > -- > *From:* users on behalf of Christian > Von Kutzleben via users > *Sent:* 02 October 2019 16:14:24 > *To:* users@lists.open-mpi.org > *Cc:* Christian Von Kutzleben > *Subjec

Re: [OMPI users] problem with cancelling Send-Request

2019-10-02 Thread Jeff Hammond via users
Don’t try to cancel sends. https://github.com/mpi-forum/mpi-issues/issues/27 has some useful info. Jeff On Wed, Oct 2, 2019 at 7:17 AM Christian Von Kutzleben via users < users@lists.open-mpi.org> wrote: > Hi, > > I’m currently evaluating to use openmpi (4.0.1) in our application. > > We are us

Re: [OMPI users] problem with cancelling Send-Request

2019-10-02 Thread Emyr James via users
Regulation C/ Dr. Aiguader, 88 Edif. PRBB 08003 Barcelona, Spain Phone Ext: #1098 From: users on behalf of Christian Von Kutzleben via users Sent: 02 October 2019 16:14:24 To: users@lists.open-mpi.org Cc: Christian Von Kutzleben Subject: [OMPI users] problem with

[OMPI users] problem with cancelling Send-Request

2019-10-02 Thread Christian Von Kutzleben via users
Hi, I’m currently evaluating to use openmpi (4.0.1) in our application. We are using a construct like this for some cleanup functionality, to cancel some Send requests: if (*req != MPI_REQUEST_NULL) { MPI_Cancel(req); MPI_Wait(req, MPI_STATUS_IGNORE); assert(*req == MPI_REQUEST_NULL); } Howeve

[OMPI users] Problem with OpenMPI v 4.0.1 with UCX on IB network with hca_id mlx4_0

2019-04-25 Thread Bertini, Denis Dr. via users
Hi I tried to install OpenMPI v 4.0.1 on our Debian cluster using Infiniband network with following dev_info: hca_id: mlx4_0 1) I first tried to install openMPI without UCX framework and i runs prefectly as before just need to add >export OMPI_MCA_btl_openib_allow_ib=1 to remove the warning l

Re: [OMPI users] Problem running with UCX/oshmem on single node?

2018-05-14 Thread Michael Di Domenico
On Wed, May 9, 2018 at 9:45 PM, Howard Pritchard wrote: > > You either need to go and buy a connectx4/5 HCA from mellanox (and maybe a > switch), and install that > on your system, or else install xpmem (https://github.com/hjelmn/xpmem). > Note there is a bug right now > in UCX that you may hit if

Re: [OMPI users] problem

2018-05-10 Thread dpchoudh
What Jeff is suggesting is probably valgrind. However, in my experience, which is much less than most OpenMPI developers, a simple code inspection often is adequate. Here are the steps: 1. If you don't already have it, build a debug version of your code. If you are using gcc, you'd use a -g to CFL

Re: [OMPI users] problem

2018-05-10 Thread Ankita m
ok...Thank you so much sir On Wed, May 9, 2018 at 11:13 PM, Jeff Squyres (jsquyres) wrote: > It looks like you're getting a segv when calling MPI_Comm_rank(). > > This is quite unusual -- MPI_Comm_rank() is just a local lookup / return > of an integer. If MPI_Comm_rank() is seg faulting, it usu

Re: [OMPI users] Problem running with UCX/oshmem on single node?

2018-05-09 Thread Howard Pritchard
Hi Craig, You are experiencing problems because you don't have a transport installed that UCX can use for oshmem. You either need to go and buy a connectx4/5 HCA from mellanox (and maybe a switch), and install that on your system, or else install xpmem (https://github.com/hjelmn/xpmem). Note ther

[OMPI users] Problem running with UCX/oshmem on single node?

2018-05-09 Thread Craig Reese
I'm trying to play with oshmem on a single node (just to have a way to do some simple experimentation and playing around) and having spectacular problems: CentOS 6.9 (gcc 4.4.7) built and installed ucx 1.3.0 built and installed openmpi-3.1.0 [cfreese]$ cat oshmem.c #include int

Re: [OMPI users] problem

2018-05-09 Thread Jeff Squyres (jsquyres)
It looks like you're getting a segv when calling MPI_Comm_rank(). This is quite unusual -- MPI_Comm_rank() is just a local lookup / return of an integer. If MPI_Comm_rank() is seg faulting, it usually indicates that there's some other kind of memory error in the application, and this seg fault

Re: [OMPI users] problem

2018-05-09 Thread Ankita m
yes. Because previously i was using intel-mpi. That time the program was running perfectly. Now when i use openmpi this shows this error files...Though i am not quite sure. I just thought if the issue will be for Openmpi then i could get some help here. On Wed, May 9, 2018 at 6:47 PM, Gilles Gouai

Re: [OMPI users] problem

2018-05-09 Thread Gilles Gouaillardet
Ankita, Do you have any reason to suspect the root cause of the crash is Open MPI ? Cheers, Gilles On Wednesday, May 9, 2018, Ankita m wrote: > MPI "Hello World" program is also working > > please see this error file attached below. its of a different program > > On Wed, May 9, 2018 at 4:10 P

Re: [OMPI users] problem

2018-05-09 Thread Ankita m
MPI "Hello World" program is also working please see this error file attached below. its of a different program On Wed, May 9, 2018 at 4:10 PM, John Hearns via users < users@lists.open-mpi.org> wrote: > Ankita, looks like your program is not launching correctly. > I would try the following: > de

Re: [OMPI users] problem

2018-05-09 Thread John Hearns via users
Ankita, looks like your program is not launching correctly. I would try the following: define two hosts in a machinefile. Use mpirun -np 2 machinefile date Ie can you use mpirun just to run the command 'date' Secondly compile up and try to run an MPI 'Hello World' program On 9 May 2018 at 12:

[OMPI users] problem

2018-05-09 Thread Ankita m
I am using ompi -3.1.0 version in my program and compiler is mpicc its a parallel program which uses multiple nodes with 16 cores in each node. but its not working and generates a error file . i Have attached the error file below. can anyone please tell what is the issue actually bicgstab_Test

Re: [OMPI users] problem related ORTE

2018-04-06 Thread Jeff Squyres (jsquyres)
Can you please send all the information listed here: https://www.open-mpi.org/community/help/ Thanks! > On Apr 6, 2018, at 8:27 AM, Ankita m wrote: > > Hello Sir/Madam > > I am Ankita Maity, a PhD scholar from Mechanical Dept., IIT Roorkee, India > > I am facing a problem while submitti

[OMPI users] problem related ORTE

2018-04-06 Thread Ankita m
Hello Sir/Madam I am Ankita Maity, a PhD scholar from Mechanical Dept., IIT Roorkee, India I am facing a problem while submitting a parallel program to the HPC cluster available in our dept. I have attached the error file its showing during the time of run. Can You please help me with the issue

[OMPI users] Problem with Mellanox device selection

2017-12-18 Thread Götz Waschk
Hi everyone, I have a cluster of 32 nodes with Infiniband, four of them additionally have a 10G Mellanox Ethernet card for faster I/O. If my job based on openmpi 1.10.6 ends up on one of these nodes, it will crash: No OpenFabrics connection schemes reported that they were able to be used on a spe

Re: [OMPI users] Problem related to openmpi cart create command

2017-12-03 Thread Gilles Gouaillardet
Hi, There is not enough information to help. Can you build a minimal example that evidences the issue and states how many MPI tasks are needed to evidence this issue ? Cheers, Gilles On Sun, Dec 3, 2017 at 6:00 PM, Muhammad Umar wrote: > Hello, hope everyone is fine. > > > I have been given

[OMPI users] Problem related to openmpi cart create command

2017-12-03 Thread Muhammad Umar
Hello, hope everyone is fine. I have been given a code of openmpi by a senior which includes mpi_cart_create. I have been trying to run the program. The program compiles corrrecty but on execution gives error on the builtin function mpi_cart_create. The operating system I am using is ubuntu 64

Re: [OMPI users] Problem with MPI jobs terminating when using OMPI 3.0.x

2017-10-31 Thread Andy Riebs
As always, thanks for your help Ralph! Cutting over to PMIx 1.2.4 solved the problem for me. (Slurm wasn't happy building with PMIx v2.) And yes, I had ssh access to node04. (And Gilles, thanks for your note, as well.) Andy On 10/27/2017 04:31 PM, r...@open-mpi.org wrote: Two questions: 1

Re: [OMPI users] Problem with MPI jobs terminating when using OMPI 3.0.x

2017-10-29 Thread Gilles Gouaillardet
Andy, The crash occurs in the orted daemon and not in the mpi_hello MPI app, so you will not see anything useful in gdb. you can use the attached launch agent script in order to get a stack trace of orted. your mpirun command line should be updated like this mpirun --mca orte_launch_agent

Re: [OMPI users] Problem with MPI jobs terminating when using OMPI 3.0.x

2017-10-27 Thread r...@open-mpi.org
Two questions: 1. are you running this on node04? Or do you have ssh access to node04? 2. I note you are building this against an old version of PMIx for some reason. Does it work okay if you build it with the embedded PMIx (which is 2.0)? Does it work okay if you use PMIx v1.2.4, the latest re

[OMPI users] Problem with MPI jobs terminating when using OMPI 3.0.x

2017-10-27 Thread Andy Riebs
We have built a version of Open MPI 3.0.x that works with Slurm (our primary use case), but it fails when executed without Slurm. If I srun an MPI "hello world" program, it works just fine. Likewise, if I salloc a couple of nodes and use mpirun from there, life is good. But if I just try to mp

Re: [OMPI users] Problem with MPI_FILE_WRITE_AT

2017-09-15 Thread Edgar Gabriel
thank you for the report and the code, I will look into this. What file system is that occurring on? Until I find the problem, note that you could switch to back to the previous parallel I/O implementation (romio) by providing that as a parameter to your mpirun command, e.g. mpirun --mca io

[OMPI users] Problem with MPI_FILE_WRITE_AT

2017-09-15 Thread McGrattan, Kevin B. Dr. (Fed)
I am using MPI_FILE_WRITE_AT to print out the timings of subroutines in a big Fortran code. I have noticed since upgrading to Open MPI 2.1.1 that sometimes the file to be written is corrupted. Each MPI process is supposed to write out a character string that is 159 characters in length, plus a l

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-31 Thread Gilles Gouaillardet
Thanks Siegmar, i was finally able to reproduce it. the error is triggered by the VM topology, and i was able to reproduce it by manually removing the "NUMA" objects from the topology. as a workaround, you can mpirun --map-by socket ... i will follow-up on the devel ML with Ralph. Bes

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-31 Thread Siegmar Gross
Hi Gilles, Am 31.05.2017 um 08:38 schrieb Gilles Gouaillardet: Siegmar, the "big ORTE update" is a bunch of backports from master to v3.x btw, does the same error occurs with master ? Yes, it does, but the error occurs only if I use a real machine with my virtual machine "exin". I get the e

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread Gilles Gouaillardet
Siegmar, the "big ORTE update" is a bunch of backports from master to v3.x btw, does the same error occurs with master ? i noted mpirun simply does ssh exin orted ... can you double check the right orted (e.g. /usr/local/openmpi-3.0.0_64_cc/bin/orted) or you can try to mpirun --mca orte

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread Siegmar Gross
Hi Gilles, I configured Open MPI with the following command. ../openmpi-v3.x-201705250239-d5200ea/configure \ --prefix=/usr/local/openmpi-3.0.0_64_cc \ --libdir=/usr/local/openmpi-3.0.0_64_cc/lib64 \ --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \ --with-jdk-headers=/usr/local/jdk1.8.0_66

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread r...@open-mpi.org
Until the fixes pending in the big ORTE update PR are committed, I suggest not wasting time chasing this down. I tested the “patched” version of the 3.x branch, and it works just fine. > On May 30, 2017, at 7:43 PM, Gilles Gouaillardet wrote: > > Ralph, > > > the issue Siegmar initially rep

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread Gilles Gouaillardet
Ralph, the issue Siegmar initially reported was loki hello_1 111 mpiexec -np 3 --host loki:2,exin hello_1_mpi per what you wrote, this should be equivalent to loki hello_1 111 mpiexec -np 3 --host loki:2,exin:1 hello_1_mpi and this is what i initially wanted to double check (but i made a ty

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread r...@open-mpi.org
This behavior is as-expected. When you specify "-host foo,bar”, you have told us to assign one slot to each of those nodes. Thus, running 3 procs exceeds the number of slots you assigned. You can tell it to set the #slots to the #cores it discovers on the node by using “-host foo:*,bar:*” I ca

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread gilles
Hi Siegmar, my bad, there was a typo in my reply. i really meant > > what if you ? > > mpiexec --host loki:2,exin:1 -np 3 hello_1_mpi but you also tried that and it did not help. i could not find anything in your logs that suggest mpiexec tries to start 5 MPI tasks, did i miss something ? i w

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread Siegmar Gross
Hi Gilles, what if you ? mpiexec --host loki:1,exin:1 -np 3 hello_1_mpi I need as many slots as processes so that I use "-np 2". "mpiexec --host loki,exin -np 2 hello_1_mpi" works as well. The command breaks, if I use at least "-np 3" and distribute the processes across at least two machines.

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread gilles
Hi Siegmar, what if you ? mpiexec --host loki:1,exin:1 -np 3 hello_1_mpi are loki and exin different ? (os, sockets, core) Cheers, Gilles - Original Message - > Hi, > > I have installed openmpi-v3.x-201705250239-d5200ea on my "SUSE Linux > Enterprise Server 12.2 (x86_64)" with Sun C

[OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread Siegmar Gross
Hi, I have installed openmpi-v3.x-201705250239-d5200ea on my "SUSE Linux Enterprise Server 12.2 (x86_64)" with Sun C 5.14 and gcc-7.1.0. Depending on the machine that I use to start my processes, I have a problem with "--host" for versions "v3.x" and "master", while everything works as expected w

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-21 Thread Jing Gong
Hi, The email is intended to follow the thread about "Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch". https://mail-archive.com/users@lists.open-mpi.org/msg30650.html We have installed the latest version v2.0.2 on the cluster that

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread Anastasia Kruchinina
Ok, thanks for your answers! I was not aware that it is a known issue. I guess I will just try to find a machine with OpenMPI/2.0.2 and try there. On 16 February 2017 at 00:01, r...@open-mpi.org wrote: > Yes, 2.0.1 has a spawn issue. We believe that 2.0.2 is okay if you want to > give it a try

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
Yes, 2.0.1 has a spawn issue. We believe that 2.0.2 is okay if you want to give it a try Sent from my iPad > On Feb 15, 2017, at 1:14 PM, Jason Maldonis wrote: > > Just to throw this out there -- to me, that doesn't seem to be just a problem > with SLURM. I'm guessing the exact same error wo

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread Jason Maldonis
Just to throw this out there -- to me, that doesn't seem to be just a problem with SLURM. I'm guessing the exact same error would be thrown interactively (unless I didn't read the above messages carefully enough). I had a lot of problems running spawned jobs on 2.0.x a few months ago, so I switched

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread Anastasia Kruchinina
Hi! I am doing like this: sbatch -N 2 -n 5 ./job.sh where job.sh is: #!/bin/bash -l module load openmpi/2.0.1-icc mpirun -np 1 ./manager 4 On 15 February 2017 at 17:58, r...@open-mpi.org wrote: > The cmd line looks fine - when you do your “sbatch” request, what is in > the shell scrip

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
The cmd line looks fine - when you do your “sbatch” request, what is in the shell script you give it? Or are you saying you just “sbatch” the mpirun cmd directly? > On Feb 15, 2017, at 8:07 AM, Anastasia Kruchinina > wrote: > > Hi, > > I am running like this: > mpirun -np 1 ./manager > >

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread Anastasia Kruchinina
Hi, I am running like this: mpirun -np 1 ./manager Should I do it differently? I also thought that all sbatch does is create an allocation and then run my script in it. But it seems it is not since I am getting these results... I would like to upgrade to OpenMPI, but no clusters near me have it

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread Howard Pritchard
Hi Anastasia, Definitely check the mpirun when in batch environment but you may also want to upgrade to Open MPI 2.0.2. Howard r...@open-mpi.org schrieb am Mi. 15. Feb. 2017 um 07:49: > Nothing immediate comes to mind - all sbatch does is create an allocation > and then run your script in it.

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
Nothing immediate comes to mind - all sbatch does is create an allocation and then run your script in it. Perhaps your script is using a different “mpirun” command than when you type it interactively? > On Feb 14, 2017, at 5:11 AM, Anastasia Kruchinina > wrote: > > Hi, > > I am trying to us

[OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-14 Thread Anastasia Kruchinina
Hi, I am trying to use MPI_Comm_spawn function in my code. I am having trouble with openmpi 2.0.x + sbatch (batch system Slurm). My test program is located here: http://user.it.uu.se/~anakr367/files/MPI_test/ When I am running my code I am getting an error: OPAL ERROR: Timeout in file ../../../.

Re: [OMPI users] problem with opal_list_remove_item for openmpi-v2.x-201702010255-8b16747 on Linux

2017-02-03 Thread Jeff Squyres (jsquyres)
I've filed this as https://github.com/open-mpi/ompi/issues/2920. Ralph is just heading out for about a week or so; it may not get fixed until he comes back. > On Feb 3, 2017, at 2:03 AM, Siegmar Gross > wrote: > > Hi, > > I have installed openmpi-v2.x-201702010255-8b16747 on my "SUSE Linux

[OMPI users] problem with opal_list_remove_item for openmpi-v2.x-201702010255-8b16747 on Linux

2017-02-02 Thread Siegmar Gross
Hi, I have installed openmpi-v2.x-201702010255-8b16747 on my "SUSE Linux Enterprise Server 12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Unfortunately, I get a warning from "opal_list_remove_item" about a missing item when I run one of my programs. loki spawn 115 mpiexec -np 1 --host loki,loki,n

Re: [OMPI users] Problem with double shared library

2016-10-28 Thread Sean Ahern
Gilles, You described the problem exactly. I think we were able to nail down a solution to this one through judicious use of the -rpath $MPI_DIR/lib linker flag, allowing the runtime linker to properly find OpenMPI symbols at runtime. We're operational. Thanks for your help. -Sean -- Sean Ahern

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-24 Thread Gilles Gouaillardet
ens Sent: Tuesday, October 18, 2016 9:53 AM To: users@lists.open-mpi.org Subject: [OMPI users] Problem building OpenMPI with CUDA 8.0 I have the release version of CUDA 8.0 installed and am trying to build OpenMPI. Here is my configure and build line: ./configure --prefix=$PREFIXPATH --w

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-24 Thread Brice Goglin
: >>> ${CUDA_HOME}/lib64/stubs >>> For 8.0 I’d suggest updating the configure/make scripts to look >>> for nvml there and link in the stubs. This way the build is not >>> dependent on the driver being installed and only the toolkit. >>> Thanks, >>> Jus

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-23 Thread Gilles Gouaillardet
e driver being installed and only the toolkit. Thanks, Justin From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Justin Luitjens Sent: Tuesday, October 18, 2016 9:53 AM To: users@lists.open-mpi.org Subject: [OMPI users] Problem building OpenMPI with CUDA 8.0 I have the

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-19 Thread Jeff Squyres (jsquyres)
there and link in the stubs. This way the build is not dependent on the > driver being installed and only the toolkit. > > Thanks, > Justin > > From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Justin > Luitjens > Sent: Tuesday, October 18, 2016 9:53 AM &g

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-18 Thread Justin Luitjens
esday, October 18, 2016 9:53 AM To: users@lists.open-mpi.org Subject: [OMPI users] Problem building OpenMPI with CUDA 8.0 I have the release version of CUDA 8.0 installed and am trying to build OpenMPI. Here is my configure and build line: ./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --wi

[OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-18 Thread Justin Luitjens
I have the release version of CUDA 8.0 installed and am trying to build OpenMPI. Here is my configure and build line: ./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm= --with-openib= && make && sudo make install Where CUDA_HOME points to the cuda install path. When I run the a

Re: [OMPI users] Problem with double shared library

2016-10-17 Thread Gilles Gouaillardet
Sean, if i understand correctly, your built a libtransport_mpi.so library that depends on Open MPI, and your main program dlopen libtransport_mpi.so. in this case, and at least for the time being, you need to use RTLD_GLOBAL in your dlopen flags. Cheers, Gilles On 10/18/2016 4:53 AM,

[OMPI users] Problem with double shared library

2016-10-17 Thread Sean Ahern
Folks, For our code, we have a communication layer that abstracts the code that does the actual transfer of data. We call these "transports", and we link them as shared libraries. We have created an MPI transport that compiles/links against OpenMPI 2.0.1 using the compiler wrappers. When I compile

Re: [OMPI users] Problem running an MPI program through the PBS manager

2016-09-26 Thread Mahmood Naderan
OK thank you very much. It is now running... Regards, Mahmood On Mon, Sep 26, 2016 at 2:04 PM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Mahmood, > > The node is defined in the PBS config, however it is not part of the > allocation (e.g. job) so it cannot be used, and hence

Re: [OMPI users] Problem running an MPI program through the PBS manager

2016-09-26 Thread Gilles Gouaillardet
Mahmood, The node is defined in the PBS config, however it is not part of the allocation (e.g. job) so it cannot be used, and hence the error message. In your PBS script, you do not need -np nor -host parameters to your mpirun command. Open MPI mpirun will automatically detect it is launched from

[OMPI users] Problem running an MPI program through the PBS manager

2016-09-26 Thread Mahmood Naderan
Hi, When I run an MPI command through the terminal the programs runs fine on the compute node specified in hosts.txt. However, when I put that command in a PBS script, if says that the compute node is not defined in the job manager's list. However, that node is actually defined in the job manager.

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
OK. Running "module unload rocks-openmpi" and putting that in ~/.bashrc will remove /opt/openmpi/lib from LD_LIBRARY_PATH. Thanks Gilles for your help. Regards, Mahmood On Mon, Sep 12, 2016 at 1:25 PM, Mahmood Naderan wrote: > It seems that it is part of rocks-openmpi. I will find out how to

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
It seems that it is part of rocks-openmpi. I will find out how to remove it and will come back. Regards, Mahmood On Mon, Sep 12, 2016 at 1:06 PM, Gilles Gouaillardet wrote: > Mahmood, > > you need to manually remove /opt/openmpi/lib from your LD_LIBRARY_PATH > (or have your sysadmin do it if

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Gilles Gouaillardet
Mahmood, you need to manually remove /opt/openmpi/lib from your LD_LIBRARY_PATH (or have your sysadmin do it if this is somehow done automatically) the point of configuring with --enable-mpirun-prefix-by-default is you do *not* need to add /export/apps/siesta/openmpi-1.8.8/lib in your LD_LIBRA

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
Is the following output OK? ... Making install in util make[2]: Entering directory `/export/apps/siesta/openmpi-1.8.8/test/util' make[3]: Entering directory `/export/apps/siesta/openmpi-1.8.8/test/util' make[3]: Nothing to be done for `install-exec-am'. make[3]: Nothing to be done for `install-da

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Gilles Gouaillardet
Mahmood, I was suggesting you (re)configure (i assume you did it) the Open MPI 1.8.8 installed in /export/apps/siesta/openmpi-1.8.8 with --enable-mpirun-prefix-by-default Cheers, Gilles On 9/12/2016 4:51 PM, Mahmood Naderan wrote: >​ --enable-mpirun-prefix-by-default​ What is that? Does t

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Gilles Gouaillardet
Basically, it means libs with be linked with -Wl,-rpath,/export/apps/siesta/openmpi-1.8.8/lib so if you run a.out with an empty $LD_LIBRARY_PATH, then it will look for the MPI libraries in /export/apps/siesta/openmpi-1.8.8/lib Cheers, Gilles On 9/12/2016 4:50 PM, Mahmood Naderan wrote:

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
>​ --enable-mpirun-prefix-by-default​ What is that? Does that mean "configure 1.8.8 with the default one installed on the system"? Then that is not good I think because # /opt/openmpi/bin/ompi_info Package: Open MPI root@centos-6-3.localdomain Distribution O

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
>​ --enable-mpirun-prefix-by-default​ What is that? Does that mean "configure 1.8.8 with the default one installed on the system"? Then that is not good I think because Regards, Mahmood ___ users mailing list users@lists.open-mpi.org https://rfd.n

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Gilles Gouaillardet
That sounds good to me ! just to make it crystal clear ... assuming you configure'd your Open MPI 1.8.8 with --enable-mpirun-prefix-by-default (and if you did not, i do encourage you to do so), then all you need is to remove /opt/openmpi/lib from your LD_LIBRARY_PATH (e.g. you do *not* ha

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
​>(i'd like to make sure you are not using IntelMPI libmpi.so.1 with Open MPI libmpi_mpifh.so.2, that can happen if Intel MPI >appears first in your LD_LIBRARY_PATH) # echo $LD_LIBRARY_PATH /opt/gridengine/lib/linux-x64:/opt/openmpi/lib # ls /opt/openmpi/lib libmpi.a libompitrace.a

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Gilles Gouaillardet
Hi, this is the relevant part of your config.log configure:1594: checking whether the Fortran compiler works configure:1600: ./a.out ./a.out: symbol lookup error: /export/apps/siesta/openmpi-1.8.8/lib/libmpi_mpifh.so.2: undefined symbol: mpi_fortran_weights_empty configure:1603: $? = 127 c

[OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
Hi, Following the suggestion by Gilles Gouaillardet ( https://mail-archive.com/users@lists.open-mpi.org/msg29688.html), I ran a configure command for a program like this ​# ../Src/configure FC=/export/apps/siesta/openmpi-1.8.8/bin/mpifort --with-blas=libopenblas.a --with-lapack=liblapack.a --with-

Re: [OMPI users] problem with exceptions in Java interface

2016-08-29 Thread Graham, Nathaniel Richard
, August 29, 2016 6:16 AM To: Open MPI Users Subject: Re: [OMPI users] problem with exceptions in Java interface Hi Siegmar, I will review PR 1698 and wait some more feedback from the developers, they might have different views than mine. assuming PR 1698 does what you expect, it does not catch all

Re: [OMPI users] problem with exceptions in Java interface

2016-08-29 Thread Gilles Gouaillardet
Hi Siegmar, I will review PR 1698 and wait some more feedback from the developers, they might have different views than mine. assuming PR 1698 does what you expect, it does not catch all user errors. for example, if you MPI_Send a buffer that is too short, the exception might be thrown at any time

Re: [OMPI users] problem with exceptions in Java interface

2016-08-29 Thread Siegmar Gross
Hi Gilles, isn't it possible to pass all exceptions from the Java interface to the calling method? I can live with the current handling of exceptions as well, although some exceptions can be handled within my program and some will break my program even if I want to handle exceptions myself. I und

Re: [OMPI users] problem with exceptions in Java interface

2016-08-29 Thread Gilles Gouaillardet
Siegmar and all, i am puzzled with this error. on one hand, it is caused by an invalid buffer (e.g. buffer size is 1, but user suggests size is 2) so i am fine with current behavior (e.g. java.lang.ArrayIndexOutOfBoundsException is thrown) /* if that was a C program, it would very likely S

[OMPI users] problem with exceptions in Java interface

2016-08-28 Thread Siegmar Gross
Hi, I have installed v1.10.3-31-g35ba6a1, openmpi-v2.0.0-233-gb5f0a4f, and openmpi-dev-4691-g277c319 on my "SUSE Linux Enterprise Server 12 (x86_64)" with Sun C 5.14 beta and gcc-6.1.0. In May I had reported a problem with Java execeptions (PR 1698) which had been solved in June (PR 1803). https

Re: [OMPI users] Problem when installing Rmpi package in HPC cluster

2016-07-11 Thread Bennet Fauber
We have found that virtually all Rmpi jobs need to be started with $ mpirun -np 1 R CMD BATCH This is, as I understand it, because the first R will initialize the MPI environment and then when you create the cluster, it wants to be able to start the rest of the processes. When you intialize

  1   2   3   4   5   6   7   8   9   10   >