George,
Thanks for review and commit!
I've confirmed your modification.
Takahiro Kawashima,
MPI development team,
Fujitsu
> Takahiro,
>
> Indeed we were way to lax on canceling the requests. I modified your patch to
> correctly deal with the MEMCHECK macro (remove the call from the branch that
Sorry for delay - local network was down and I couldn't commit the one-line fix
:-(
Turns out that there was a bug in the rsh launcher (the daemons *always*
declared a failed launch) that was previously being ignored and was now
exposed, resulting in a possible race condition. Fixed now.
Thank
It is a long-standing problem that due to a bug in Sun GridEngine
(setting the stack size limit equal to the address space limit)
using qrsh from within OpenMPI fails if a large memory is requested
but the stack size not explicitly set to a reasonably small value.
The best solution were if SGE jus
Interestingly enough it worked for me for a while and then after many
runs I started seeing the below too.
--td
On 7/26/2012 11:07 AM, Ralph Castain wrote:
Hmmm...it was working for me, but I'll recheck. Thanks!
On Jul 26, 2012, at 8:04 AM, George Bosilca wrote:
r26868 seems to have some is
Hmmm...it was working for me, but I'll recheck. Thanks!
On Jul 26, 2012, at 8:04 AM, George Bosilca wrote:
> r26868 seems to have some issues. It works well as long as all processes are
> started on the same node (aka. there is a single daemon), but it breaks with
> the error message attached b
r26868 seems to have some issues. It works well as long as all processes are
started on the same node (aka. there is a single daemon), but it breaks with
the error message attached below if there are more than two daemons.
$ mpirun -np 2 --bynode ./runme
[node01:07767] [[21341,0],1] ORTE_ERROR_L
OK, so this is only for receive, and not for send, I take it. Should have
looked closer.
Rich
-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf
Of George Bosilca
Sent: Thursday, July 26, 2012 10:47 AM
To: Open MPI Developers
Subject: Re:
Rich,
There is no matching in this case. Canceling a receive operation is possible
only up to the moment the request has been matched. Up to this point the
sequence numbers of the peers are not used, so removing a non-matched request
has no impact on the sequence number.
george.
On Jul 26,
I do not see any resetting of sequence numbers. It has been a long time since
I have looked at the matching code, so don't know if the out-of-order handling
has been taken out. If not, the sequence number has to be dealt with in some
manner, or else there will be a gap in the arriving sequence
Takahiro,
Indeed we were way to lax on canceling the requests. I modified your patch to
correctly deal with the MEMCHECK macro (remove the call from the branch that
will requires a completion function). The modified patch is attached below. I
will commit asap.
Thanks,
george.
Index: om
Hi Open MPI developers,
I found a small bug in Open MPI.
See attached program cancelled.c.
In this program, rank 1 tries to cancel a MPI_Irecv and calls a MPI_Recv
instead if the cancellation succeeds. This program should terminate whether
the cancellation succeeds or not. But it leads a deadlock
11 matches
Mail list logo