"Jeff Squyres (jsquyres)" writes:
> This is, unfortunately, an undefined area of the MPI specification. I
> do believe that our previous behavior was *correct* -- it just
> deadlocks with PETSC because PETSC is relying on undefined behavior.
Jeff, can you clarify where in the standard this is le
On Mon, Feb 23, 2015 at 4:37 PM, Joshua Ladd wrote:
> Nathan,
>
> I do, but the hang comes later on. It looks like it's a situation where
> the root is way, way faster than the children and he's inducing an an
> overrun in the unexpected message queue. I think the queue is set to just
> keep grow
Nathan,
I do, but the hang comes later on. It looks like it's a situation where the
root is way, way faster than the children and he's inducing an an overrun
in the unexpected message queue. I think the queue is set to just keep
growing and it eventually blows up the memory??
$/hpc/mtl_scrap/user
Josh, do you see a hang when using vader? It is preferred over the old
sm btl.
-Nathan
On Mon, Feb 23, 2015 at 03:48:17PM -0500, Joshua Ladd wrote:
>Sachin,
>
>I am able to reproduce something funny. Looks like your issue. When I run
>on a single host with two ranks, the test works
Sachin,
I am able to reproduce something funny. Looks like your issue. When I run
on a single host with two ranks, the test works fine. However, when I try
three or more, it looks like only the root, rank 0, is making any progress
after the first iteration.
$/hpc/mtl_scrap/users/joshual/openmpi-1
What version of OMPI are you using, and how was it configured? How was the job
started?
> On Feb 23, 2015, at 8:26 AM, Aleix Gimeno Vives wrote:
>
> Hello Ralph,
>
> The job is still running though and I used the default options. Would you
> recommend me to run the job again? (the job will t
Hello Ralph,
The job is still running though and I used the default options. Would you
recommend me to run the job again? (the job will take several days, so I'd
rather not run it again if possible).
Regards,
Aleix
2015-02-23 17:20 GMT+01:00 Ralph Castain :
> I would have expected the job to a
I would have expected the job to automatically abort if any processes were
located on the slave that shut down - that is the default behavior.
> On Feb 23, 2015, at 8:07 AM, Aleix Gimeno Vives wrote:
>
> Dear Open MPI support team,
>
> I am running a program using 1 master machine and 4 slave
Dear Open MPI support team,
I am running a program using 1 master machine and 4 slaves, but one of the
slaves was shut down. Will this have any influence in the output? Should I
restart the job?
I know it is a simple question, but I couldn't find the answer in the "Open
MPI FAQ" or the mailing
Hello list,
we have several questions regarding calls to collectives using
intercommunicators. In man for MPI_Bcast, there is a notice for the
inter-communicator case that reads the text below our questions.
If an I is an intercomunicator for communicattors C1={p1,p2,p3} and
C2={p4,p5,p6
10 matches
Mail list logo