Hi,
An LLNL TotalView user on a Mac reported that their MPI job was hanging inside
MPI_Init() when started under the control of TotalView. They were using Open
MPI 4.0.1, and TotalView was using the MPIR Interface (sorry, we don't support
the PMIx debugging hooks yet).
I was able to reproduce
Again, John, I'm not convinced your last statement is true. However, I think it
is "good enough" for now as it seems to work for you and it isn't seen outside
of a debugger scenario.
On Nov 12, 2019, at 3:13 PM, John DelSignore via devel
mailto:devel@lists.open-mpi.org>> wrote:
Hi
n as the OMPI
>> community doesn't actively test MPIR support. I haven't seen any reports of
>> hangs during MPI_Init from any release series, including 4.x. My guess is
>> that it may have something to do with the debugger interactions as opposed
>> to being a true r
t;
> On 11/12/2019 9:27 AM, Ralph Castain via devel wrote:
>> Hi John
>>
>> Sorry to say, but there is no way to really answer your question as the OMPI
>> community doesn't actively test MPIR support. I haven't seen any reports of
>> hangs during MPI_Init from
I just got snagged by this myself. A prte I had that was only a few weeks old
accepted the following:
prte -daemonize -hostfile myhostfile
Then I pulled this morning to pickup some stop-on-exec support that Ralph is
working on, and the above command stopped working. I needed the following
Hi,
I've been working with Ralph to try to get the PMIx debugging interfaces
working with OMPI v5 master. I've been periodically pulling new versions to try
to pickup the changes Ralph has been pushing into PRRTE/OpenPMIx. After pulling
this morning, I'm getting the following error. This all
ICT, host name order didn't matter.
Cheers, John D.
On May 4, 2020, at 7:34 AM, John DelSignore via devel <devel@lists.open-mpi.org> wrote:
Hi folks,
I cloned a fresh copy of OMPI master this morning at ~8:30am EDT and rebuilt. I'm running a very simple test code on three Centos 7.[56]
Hi folks,
I cloned a fresh copy of OMPI master this morning at ~8:30am EDT and rebuilt. I'm running a very simple test code on three Centos 7.[56] nodes named microway[123] over TCP. I'm seeing a fatal error similar to the following:
[microway3.totalviewtech.com:227713]
or from connect while trying to connect to 10.71.2.58 on port 1024. Who is 10.71.2.58 ? If the firewall open ? Is the port 1024 allowed to connect to ?
George.
On Mon, May 4, 2020 at 11:36 AM John DelSignore via devel <devel@lists.open-mpi.org> wrote:
Inline below...
On 2020-05-
mean there is something wrong microway2? If that were the case, then why would it ever work?
On 2020-05-04 12:08, Ralph Castain via devel wrote:
What happens if you run your "3 procs on two nodes" case using just microway1 and 3 (i.e., omit microway2)?
On May 4, 2020, at 9:05 AM
On 2020-05-04 11:42, George Bosilca wrote:
John,
The common denominator across all these errors is an error from connect while trying to connect to 10.71.2.58 on port 1024. Who is 10.71.2.58 ? If the firewall open ? Is the port 1024 allowed to connect to ?
George.
On Mon, May 4, 2020 a
r work?
On 2020-05-04 12:08, Ralph Castain via devel wrote:
What happens if you run your "3 procs on two nodes" case using just microway1 and 3 (i.e., omit microway2)?
On May 4, 2020, at 9:05 AM, John DelSignore via devel <devel@lists.open-mpi.org> wrote:
Hi George,
10.71.2.5
12 matches
Mail list logo