Sam,
at first you mentionned Open MPI 1.7.3.
though this is now a legacy version, you posted to the right place.
then you
# python setup.py build --mpicc=/usr/lib64/mpich/bin/mpicc
this is mpich, which is a very reputable MPI implementation, but not
Open MPI.
so i do invite you to use Op
Thank you! The patch fixed the problem. I did multiple tests with your program
and another application. No more process hangs!
Cheers,
Dr. Jingchao Zhang
Holland Computing Center
University of Nebraska-Lincoln
402-472-6400
From: users on behalf of r...@open-mp
Yes, I can definitely help to test the patch.
Jingchao
From: users on behalf of r...@open-mpi.org
Sent: Tuesday, August 30, 2016 2:23:12 PM
To: Open MPI Users
Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
Oh my - that indeed illustrated the problem!
Hello everyone,
I am using openmpi-1.10.2 and I am using the `spawn_multiple` MPI function
inside a for-loop. My program spawns N workers within each iteration of the
for-loop, makes some changes to the input for the next iteration, and then
proceeds to the next iteration.
After a few iterations
Oh my - that indeed illustrated the problem!! It is indeed a race condition on
the backend orted. I’ll try to fix it - probably have to send you a patch to
test?
> On Aug 30, 2016, at 1:04 PM, Jingchao Zhang wrote:
>
> $mpirun -mca state_base_verbose 5 ./a.out < test.in
>
> Please see attache
Well, that helped a bit. For some reason, your system is skipping a step in the
launch state machine, and so we never hit the step where we setup the IO
forwarding system.
Sorry to keep poking, but I haven’t seen this behavior anywhere else, and so I
have no way to replicate it. Must be a subtl
Yes, all procs were launched properly. I added “-mca plm_base_verbose 5” to the
mpirun command. Please see attached for the results.
$mpirun -mca plm_base_verbose 5 ./a.out < test.in
I mentioned in my initial post that the test job can run properly for the 1st
time. But if I kill the job and
HI everyone,
I am using a linux fedora. I downloaded/installed
openmpi-1.7.3-1.fc20(64-bit) and openmpi-devel-1.7.3-1.fc20(64-bit). As
well as pypar-openmpi-2.1.5_108-3.fc20(64-bit) and
python3-mpi4py-openmpi-1.3.1-1.fc20(64-bit). The problem I am having is
building mpi4py using the mpicc wrapper.
Hmmm...well, the problem appears to be that we aren’t setting up the input
channel to read stdin. This happens immediately after the application is
launched - there is no “if” clause or anything else in front of it. The only
way it wouldn’t get called is if all the procs weren’t launched, but th
I checked again and as far as I can tell, everything was setup correctly. I
added "HCC debug" to the output message to make sure it's the correct plugin.
The updated outputs:
$ mpirun ./a.out < test.in
[c1725.crane.hcc.unl.edu:218844] HCC debug: [[26513,0],0] iof:hnp pushing fd 35
for process [
In absence of a clear error message, the btl_tcp_frag related error
messages can suggest a process was killed by the oom-killer.
This is not your case, since rank 0 died because of an illegal instruction.
Are you running under a batch manager ?
On which architecture ?
do your compute node have the
Hi,
An MPI job is running on two nodes and everything seems to be fine.
However, in the middle of the run, the program aborts with the following
error
[compute-0-1.local][[47664,1],14][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
[c
12 matches
Mail list logo