[OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

2019-11-08 Thread John DelSignore via devel
Hi, An LLNL TotalView user on a Mac reported that their MPI job was hanging inside MPI_Init() when started under the control of TotalView. They were using Open MPI 4.0.1, and TotalView was using the MPIR Interface (sorry, we don't support the PMIx debugging hooks yet). I was able to reproduce

Re: [OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

2019-11-13 Thread John DelSignore via devel
Again, John, I'm not convinced your last statement is true. However, I think it is "good enough" for now as it seems to work for you and it isn't seen outside of a debugger scenario. On Nov 12, 2019, at 3:13 PM, John DelSignore via devel mailto:devel@lists.open-mpi.org>> wrote: Hi

Re: [OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

2019-11-12 Thread John DelSignore via devel
n as the OMPI >> community doesn't actively test MPIR support. I haven't seen any reports of >> hangs during MPI_Init from any release series, including 4.x. My guess is >> that it may have something to do with the debugger interactions as opposed >> to being a true r

Re: [OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

2019-11-12 Thread John DelSignore via devel
t; > On 11/12/2019 9:27 AM, Ralph Castain via devel wrote: >> Hi John >> >> Sorry to say, but there is no way to really answer your question as the OMPI >> community doesn't actively test MPIR support. I haven't seen any reports of >> hangs during MPI_Init from

Re: [OMPI devel] Fix your MTT scripts!

2020-02-10 Thread John DelSignore via devel
I just got snagged by this myself. A prte I had that was only a few weeks old accepted the following: prte -daemonize -hostfile myhostfile Then I pulled this morning to pickup some stop-on-exec support that Ralph is working on, and the above command stopped working. I needed the following

[OMPI devel] Today's OMPI master is failing with "ompi_mpi_init: ompi_rte_init failed"

2020-03-04 Thread John DelSignore via devel
Hi, I've been working with Ralph to try to get the PMIx debugging interfaces working with OMPI v5 master. I've been periodically pulling new versions to try to pickup the changes Ralph has been pushing into PRRTE/OpenPMIx. After pulling this morning, I'm getting the following error. This all

Re: [OMPI devel] OMPI master fatal error in pml_ob1_sendreq.c

2020-05-04 Thread John DelSignore via devel
ICT, host name order didn't matter. Cheers, John D.   On May 4, 2020, at 7:34 AM, John DelSignore via devel <devel@lists.open-mpi.org> wrote: Hi folks, I cloned a fresh copy of OMPI master this morning at ~8:30am EDT and rebuilt. I'm running a very simple test code on three Centos 7.[56]

[OMPI devel] OMPI master fatal error in pml_ob1_sendreq.c

2020-05-04 Thread John DelSignore via devel
Hi folks, I cloned a fresh copy of OMPI master this morning at ~8:30am EDT and rebuilt. I'm running a very simple test code on three Centos 7.[56] nodes named microway[123] over TCP. I'm seeing a fatal error similar to the following: [microway3.totalviewtech.com:227713]

Re: [OMPI devel] OMPI master fatal error in pml_ob1_sendreq.c

2020-05-04 Thread John DelSignore via devel
or from connect while trying to connect to 10.71.2.58 on port 1024. Who is 10.71.2.58 ? If the firewall open ? Is the port 1024 allowed to connect to ?   George. On Mon, May 4, 2020 at 11:36 AM John DelSignore via devel <devel@lists.open-mpi.org> wrote: Inline below... On 2020-05-

Re: [OMPI devel] OMPI master fatal error in pml_ob1_sendreq.c

2020-05-04 Thread John DelSignore via devel
mean there is something wrong microway2? If that were the case, then why would it ever work? On 2020-05-04 12:08, Ralph Castain via devel wrote: What happens if you run your "3 procs on two nodes" case using just microway1 and 3 (i.e., omit microway2)? On May 4, 2020, at 9:05 AM

Re: [OMPI devel] OMPI master fatal error in pml_ob1_sendreq.c

2020-05-04 Thread John DelSignore via devel
On 2020-05-04 11:42, George Bosilca wrote: John, The common denominator across all these errors is an error from connect while trying to connect to 10.71.2.58 on port 1024. Who is 10.71.2.58 ? If the firewall open ? Is the port 1024 allowed to connect to ?   George. On Mon, May 4, 2020 a

Re: [OMPI devel] OMPI master fatal error in pml_ob1_sendreq.c

2020-05-04 Thread John DelSignore via devel
r work? On 2020-05-04 12:08, Ralph Castain via devel wrote: What happens if you run your "3 procs on two nodes" case using just microway1 and 3 (i.e., omit microway2)? On May 4, 2020, at 9:05 AM, John DelSignore via devel <devel@lists.open-mpi.org> wrote: Hi George, 10.71.2.5