Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread Kawashima, Takahiro
George, Thanks for review and commit! I've confirmed your modification. Takahiro Kawashima, MPI development team, Fujitsu > Takahiro, > > Indeed we were way to lax on canceling the requests. I modified your patch to > correctly deal with the MEMCHECK macro (remove the call from the branch that

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26868 - in trunk/orte/mca/plm: base rsh

2012-07-26 Thread Ralph Castain
Sorry for delay - local network was down and I couldn't commit the one-line fix :-( Turns out that there was a bug in the rsh launcher (the daemons *always* declared a failed launch) that was previously being ignored and was now exposed, resulting in a possible race condition. Fixed now. Thank

[OMPI devel] OpenMPI and SGE integration made more stable

2012-07-26 Thread Christoph van Wüllen
It is a long-standing problem that due to a bug in Sun GridEngine (setting the stack size limit equal to the address space limit) using qrsh from within OpenMPI fails if a large memory is requested but the stack size not explicitly set to a reasonably small value. The best solution were if SGE jus

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26868 - in trunk/orte/mca/plm: base rsh

2012-07-26 Thread TERRY DONTJE
Interestingly enough it worked for me for a while and then after many runs I started seeing the below too. --td On 7/26/2012 11:07 AM, Ralph Castain wrote: Hmmm...it was working for me, but I'll recheck. Thanks! On Jul 26, 2012, at 8:04 AM, George Bosilca wrote: r26868 seems to have some is

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26868 - in trunk/orte/mca/plm: base rsh

2012-07-26 Thread Ralph Castain
Hmmm...it was working for me, but I'll recheck. Thanks! On Jul 26, 2012, at 8:04 AM, George Bosilca wrote: > r26868 seems to have some issues. It works well as long as all processes are > started on the same node (aka. there is a single daemon), but it breaks with > the error message attached b

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26868 - in trunk/orte/mca/plm: base rsh

2012-07-26 Thread George Bosilca
r26868 seems to have some issues. It works well as long as all processes are started on the same node (aka. there is a single daemon), but it breaks with the error message attached below if there are more than two daemons. $ mpirun -np 2 --bynode ./runme [node01:07767] [[21341,0],1] ORTE_ERROR_L

Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread Richard Graham
OK, so this is only for receive, and not for send, I take it. Should have looked closer. Rich -Original Message- From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca Sent: Thursday, July 26, 2012 10:47 AM To: Open MPI Developers Subject: Re:

Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread George Bosilca
Rich, There is no matching in this case. Canceling a receive operation is possible only up to the moment the request has been matched. Up to this point the sequence numbers of the peers are not used, so removing a non-matched request has no impact on the sequence number. george. On Jul 26,

Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread Richard Graham
I do not see any resetting of sequence numbers. It has been a long time since I have looked at the matching code, so don't know if the out-of-order handling has been taken out. If not, the sequence number has to be dealt with in some manner, or else there will be a gap in the arriving sequence

Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread George Bosilca
Takahiro, Indeed we were way to lax on canceling the requests. I modified your patch to correctly deal with the MEMCHECK macro (remove the call from the branch that will requires a completion function). The modified patch is attached below. I will commit asap. Thanks, george. Index: om

[OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread Kawashima, Takahiro
Hi Open MPI developers, I found a small bug in Open MPI. See attached program cancelled.c. In this program, rank 1 tries to cancel a MPI_Irecv and calls a MPI_Recv instead if the cancellation succeeds. This program should terminate whether the cancellation succeeds or not. But it leads a deadlock