Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread Kawashima, Takahiro
George,

Thanks for review and commit!
I've confirmed your modification.

Takahiro Kawashima,
MPI development team,
Fujitsu

> Takahiro,
> 
> Indeed we were way to lax on canceling the requests. I modified your patch to 
> correctly deal with the MEMCHECK macro (remove the call from the branch that 
> will requires a completion function). The modified patch is attached below. I 
> will commit asap.
> 
>   Thanks,
> george.
> 
> 
> Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
> ===
> --- ompi/mca/pml/ob1/pml_ob1_recvreq.c(revision 26870)
> +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c(working copy)
> @@ -3,7 +3,7 @@
>   * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
>   * University Research and Technology
>   * Corporation.  All rights reserved.
> - * Copyright (c) 2004-2009 The University of Tennessee and The University
> + * Copyright (c) 2004-2012 The University of Tennessee and The University
>   * of Tennessee Research Foundation.  All rights
>   * reserved.
>   * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, 
> @@ -15,6 +15,7 @@
>   * Copyright (c) 2012  NVIDIA Corporation.  All rights reserved.
>   * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights
>   * reserved.
> + * Copyright (c) 2012  FUJITSU LIMITED.  All rights reserved.
>   * $COPYRIGHT$
>   * 
>   * Additional copyrights may follow
> @@ -97,36 +98,26 @@
>  mca_pml_ob1_recv_request_t* request = 
> (mca_pml_ob1_recv_request_t*)ompi_request;
>  mca_pml_ob1_comm_t* comm = 
> request->req_recv.req_base.req_comm->c_pml_comm;
>  
> -if( true == ompi_request->req_complete ) { /* way to late to cancel this 
> one */
> -/*
> - * Receive request completed, make user buffer accessable.
> - */
> -MEMCHECKER(
> -memchecker_call(_memchecker_base_mem_defined,
> -request->req_recv.req_base.req_addr,
> -request->req_recv.req_base.req_count,
> -request->req_recv.req_base.req_datatype);
> -);
> +if( true == request->req_match_received ) { /* way to late to cancel 
> this one */
> +assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); /* not 
> matched isn't it */
>  return OMPI_SUCCESS;
>  }
>  
>  /* The rest should be protected behind the match logic lock */
>  OPAL_THREAD_LOCK(>matching_lock);
> -if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match 
> has not been already done */
> -   if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
> -  opal_list_remove_item( >wild_receives, 
> (opal_list_item_t*)request );
> -   } else {
> -  mca_pml_ob1_comm_proc_t* proc = comm->procs + 
> request->req_recv.req_base.req_peer;
> -  opal_list_remove_item(>specific_receives, 
> (opal_list_item_t*)request);
> -   }
> -   PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
> -&(request->req_recv.req_base), PERUSE_RECV );
> -   /**
> -* As now the PML is done with this request we have to force the 
> pml_complete
> -* to true. Otherwise, the request will never be freed.
> -*/
> -   request->req_recv.req_base.req_pml_complete = true;
> +if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
> +opal_list_remove_item( >wild_receives, 
> (opal_list_item_t*)request );
> +} else {
> +mca_pml_ob1_comm_proc_t* proc = comm->procs + 
> request->req_recv.req_base.req_peer;
> +opal_list_remove_item(>specific_receives, 
> (opal_list_item_t*)request);
>  }
> +PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
> + &(request->req_recv.req_base), PERUSE_RECV );
> +/**
> + * As now the PML is done with this request we have to force the 
> pml_complete
> + * to true. Otherwise, the request will never be freed.
> + */
> +request->req_recv.req_base.req_pml_complete = true;
>  OPAL_THREAD_UNLOCK(>matching_lock);
>  
>  OPAL_THREAD_LOCK(_request_lock);
> @@ -138,7 +129,7 @@
>  MCA_PML_OB1_RECV_REQUEST_MPI_COMPLETE(request);
>  OPAL_THREAD_UNLOCK(_request_lock);
>  /*
> - * Receive request cancelled, make user buffer accessable.
> + * Receive request cancelled, make user buffer accessible.
>   */
>  MEMCHECKER(
>  memchecker_call(_memchecker_base_mem_defined,
> 
> On Jul 26, 2012, at 13:41 , Kawashima, Takahiro wrote:
> 
> > Hi Open MPI developers,
> > 
> > I found a small bug in Open MPI.
> > 
> > See attached program cancelled.c.
> > In this program, rank 1 tries to cancel a MPI_Irecv and calls a MPI_Recv
> > instead if the cancellation succeeds. 

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26868 - in trunk/orte/mca/plm: base rsh

2012-07-26 Thread Ralph Castain
Sorry for delay - local network was down and I couldn't commit the one-line fix 
:-(

Turns out that there was a bug in the rsh launcher (the daemons *always* 
declared a failed launch) that was previously being ignored and was now 
exposed, resulting in a possible race condition. Fixed now.

Thanks
Ralph

On Jul 26, 2012, at 8:39 AM, TERRY DONTJE wrote:

> Interestingly enough it worked for me for a while and then after many runs I 
> started seeing the below too.
> 
> --td
> 
> On 7/26/2012 11:07 AM, Ralph Castain wrote:
>> 
>> Hmmm...it was working for me, but I'll recheck. Thanks!
>> 
>> On Jul 26, 2012, at 8:04 AM, George Bosilca wrote:
>> 
>>> r26868 seems to have some issues. It works well as long as all processes 
>>> are started on the same node (aka. there is a single daemon), but it breaks 
>>> with the error message attached below if there are more than two daemons.
>>> 
>>> $ mpirun -np 2 --bynode ./runme
>>> [node01:07767] [[21341,0],1] ORTE_ERROR_LOG: A message is attempting to be 
>>> sent to a process whose contact information is unknown in file 
>>> ../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c at line 362
>>> [node01:07767] [[21341,0],1] attempted to send to [[21341,0],2]: tag 15
>>> [node01:07767] [[21341,0],1] ORTE_ERROR_LOG: A message is attempting to be 
>>> sent to a process whose contact information is unknown in file 
>>> ../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c at line 157
>>> 
>>> I confirm that applying the reverted commit brings the trunk to a normal 
>>> state.
>>> 
>>> Please - a tad more care in what gets committed??
>>> 
>>>  george.
>>> 
>>> 
>>> On Jul 25, 2012, at 23:46 , svn-commit-mai...@open-mpi.org wrote:
>>> 
 Author: rhc (Ralph Castain)
 Date: 2012-07-25 17:46:45 EDT (Wed, 25 Jul 2012)
 New Revision: 26868
 URL: https://svn.open-mpi.org/trac/ompi/changeset/26868
 
 Log:
 Reconnect the rsh/ssh error reporting code for remote spawns to report 
 failure to launch. Ensure the HNP correctly reports non-zero exit status 
 when ssh encounters a problem.
 
 Thanks to Terry for spotting it!
 
 Text files modified: 
  trunk/orte/mca/plm/base/plm_base_launch_support.c |44 
 
  trunk/orte/mca/plm/base/plm_base_receive.c| 6 +   
 
  trunk/orte/mca/plm/base/plm_private.h | 4 +++ 
 
  trunk/orte/mca/plm/rsh/plm_rsh_module.c   |18 
 +++-
  4 files changed, 62 insertions(+), 10 deletions(-)
 
 Modified: trunk/orte/mca/plm/base/plm_base_launch_support.c
 ==
 --- trunk/orte/mca/plm/base/plm_base_launch_support.c  Wed Jul 25 
 12:32:51 2012(r26867)
 +++ trunk/orte/mca/plm/base/plm_base_launch_support.c  2012-07-25 
 17:46:45 EDT (Wed, 25 Jul 2012)  (r26868)
 @@ -741,6 +741,50 @@
 
 }
 
 +void orte_plm_base_daemon_failed(int st, orte_process_name_t* sender,
 + opal_buffer_t *buffer,
 + orte_rml_tag_t tag, void *cbdata)
 +{
 +int status, rc;
 +int32_t n;
 +orte_vpid_t vpid;
 +orte_proc_t *daemon;
 +
 +/* get the daemon job, if necessary */
 +if (NULL == jdatorted) {
 +jdatorted = orte_get_job_data_object(ORTE_PROC_MY_NAME->jobid);
 +}
 +
 +/* unpack the daemon that failed */
 +n=1;
 +if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer, , , 
 ORTE_VPID))) {
 +ORTE_ERROR_LOG(rc);
 +ORTE_UPDATE_EXIT_STATUS(ORTE_ERROR_DEFAULT_EXIT_CODE);
 +goto finish;
 +}
 +
 +/* unpack the exit status */
 +n=1;
 +if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer, , , 
 OPAL_INT))) {
 +ORTE_ERROR_LOG(rc);
 +status = ORTE_ERROR_DEFAULT_EXIT_CODE;
 +ORTE_UPDATE_EXIT_STATUS(ORTE_ERROR_DEFAULT_EXIT_CODE);
 +} else {
 +ORTE_UPDATE_EXIT_STATUS(WEXITSTATUS(status));
 +}
 +
 +/* find the daemon and update its state/status */
 +if (NULL == (daemon = 
 (orte_proc_t*)opal_pointer_array_get_item(jdatorted->procs, vpid))) {
 +ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
 +goto finish;
 +}
 +daemon->state = ORTE_PROC_STATE_FAILED_TO_START;
 +daemon->exit_code = status;
 +
 + finish:
 +ORTE_ACTIVATE_PROC_STATE(>name, 
 ORTE_PROC_STATE_FAILED_TO_START);
 +}
 +
 int orte_plm_base_setup_orted_cmd(int *argc, char ***argv)
 {
int i, loc;
 
 Modified: trunk/orte/mca/plm/base/plm_base_receive.c
 

[OMPI devel] OpenMPI and SGE integration made more stable

2012-07-26 Thread Christoph van Wüllen
It is a long-standing problem that due to a bug in Sun GridEngine
(setting the stack size limit equal to the address space limit)
using qrsh from within OpenMPI fails if a large memory is requested
but the stack size not explicitly set to a reasonably small value.

The best solution were if SGE just would not touch the stack
size limit and leave it at INFINITY.

However I have tested that just reducing the stack size limit in
file orte/mca/plm/rsh/plm_rsh_module.c, function ssh_child()  before
execv'ing qrsh circumvents the problem,  so just after exec_patch is set
by strdup(...)   I inserted the lines

   {
   struct rlimit rlim;
   int l;

   l=strlen(exec_path);
   if (l > 5 && !strcmp("/qrsh", exec_path + (l-5))) {
 getrlimit(RLIMIT_STACK, );
 if (rlim.rlim_max > 1000L) rlim.rlim_max=1000L;
 if (rlim.rlim_cur > 1000L) rlim.rlim_cur=1000L;
 setrlimit(RLIMIT_STACK, );
   }
   }


It looks quick-and-dirty and it certainly is, but it solves a severe
problem many users have with OpenMPI and SGE. Feel free to use this
information as you like. Note that MPI worker jobs eventually
spawned off on "distant" nodes do not suffer from the reduced stack
size limit, it is only the qrsh command.

Is this (still) of interest?

+-+--+
| Prof. Christoph van Wüllen  | Tele-Phone (+49) (0)631 205 2749 |
| TU Kaiserslautern, FB Chemie| Tele-Fax   (+49) (0)631 205 2750 |
| Erwin-Schrödinger-Str.  |  |
| D-67663 Kaiserslautern, Germany | vanwul...@chemie.uni-kl.de   |
||
| HomePage:  http://www.chemie.uni-kl.de/vanwullen   |
+-+--+




Re: [OMPI devel] [OMPI svn] svn:open-mpi r26868 - in trunk/orte/mca/plm: base rsh

2012-07-26 Thread TERRY DONTJE
Interestingly enough it worked for me for a while and then after many 
runs I started seeing the below too.


--td

On 7/26/2012 11:07 AM, Ralph Castain wrote:

Hmmm...it was working for me, but I'll recheck. Thanks!

On Jul 26, 2012, at 8:04 AM, George Bosilca wrote:


r26868 seems to have some issues. It works well as long as all processes are 
started on the same node (aka. there is a single daemon), but it breaks with 
the error message attached below if there are more than two daemons.

$ mpirun -np 2 --bynode ./runme
[node01:07767] [[21341,0],1] ORTE_ERROR_LOG: A message is attempting to be sent 
to a process whose contact information is unknown in file 
../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c at line 362
[node01:07767] [[21341,0],1] attempted to send to [[21341,0],2]: tag 15
[node01:07767] [[21341,0],1] ORTE_ERROR_LOG: A message is attempting to be sent 
to a process whose contact information is unknown in file 
../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c at line 157

I confirm that applying the reverted commit brings the trunk to a normal state.

Please - a tad more care in what gets committed??

  george.


On Jul 25, 2012, at 23:46 , svn-commit-mai...@open-mpi.org wrote:


Author: rhc (Ralph Castain)
Date: 2012-07-25 17:46:45 EDT (Wed, 25 Jul 2012)
New Revision: 26868
URL: https://svn.open-mpi.org/trac/ompi/changeset/26868

Log:
Reconnect the rsh/ssh error reporting code for remote spawns to report failure 
to launch. Ensure the HNP correctly reports non-zero exit status when ssh 
encounters a problem.

Thanks to Terry for spotting it!

Text files modified:
  trunk/orte/mca/plm/base/plm_base_launch_support.c |44 

  trunk/orte/mca/plm/base/plm_base_receive.c| 6 +
  trunk/orte/mca/plm/base/plm_private.h | 4 +++
  trunk/orte/mca/plm/rsh/plm_rsh_module.c   |18 +++-
  4 files changed, 62 insertions(+), 10 deletions(-)

Modified: trunk/orte/mca/plm/base/plm_base_launch_support.c
==
--- trunk/orte/mca/plm/base/plm_base_launch_support.c   Wed Jul 25 12:32:51 
2012(r26867)
+++ trunk/orte/mca/plm/base/plm_base_launch_support.c   2012-07-25 17:46:45 EDT 
(Wed, 25 Jul 2012)  (r26868)
@@ -741,6 +741,50 @@

}

+void orte_plm_base_daemon_failed(int st, orte_process_name_t* sender,
+ opal_buffer_t *buffer,
+ orte_rml_tag_t tag, void *cbdata)
+{
+int status, rc;
+int32_t n;
+orte_vpid_t vpid;
+orte_proc_t *daemon;
+
+/* get the daemon job, if necessary */
+if (NULL == jdatorted) {
+jdatorted = orte_get_job_data_object(ORTE_PROC_MY_NAME->jobid);
+}
+
+/* unpack the daemon that failed */
+n=1;
+if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer,,, ORTE_VPID))) {
+ORTE_ERROR_LOG(rc);
+ORTE_UPDATE_EXIT_STATUS(ORTE_ERROR_DEFAULT_EXIT_CODE);
+goto finish;
+}
+
+/* unpack the exit status */
+n=1;
+if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer,,, OPAL_INT))) {
+ORTE_ERROR_LOG(rc);
+status = ORTE_ERROR_DEFAULT_EXIT_CODE;
+ORTE_UPDATE_EXIT_STATUS(ORTE_ERROR_DEFAULT_EXIT_CODE);
+} else {
+ORTE_UPDATE_EXIT_STATUS(WEXITSTATUS(status));
+}
+
+/* find the daemon and update its state/status */
+if (NULL == (daemon = 
(orte_proc_t*)opal_pointer_array_get_item(jdatorted->procs, vpid))) {
+ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
+goto finish;
+}
+daemon->state = ORTE_PROC_STATE_FAILED_TO_START;
+daemon->exit_code = status;
+
+ finish:
+ORTE_ACTIVATE_PROC_STATE(>name, ORTE_PROC_STATE_FAILED_TO_START);
+}
+
int orte_plm_base_setup_orted_cmd(int *argc, char ***argv)
{
int i, loc;

Modified: trunk/orte/mca/plm/base/plm_base_receive.c
==
--- trunk/orte/mca/plm/base/plm_base_receive.c  Wed Jul 25 12:32:51 2012
(r26867)
+++ trunk/orte/mca/plm/base/plm_base_receive.c  2012-07-25 17:46:45 EDT (Wed, 
25 Jul 2012)  (r26868)
@@ -87,6 +87,12 @@
  
orte_plm_base_daemon_callback, NULL))) {
ORTE_ERROR_LOG(rc);
}
+if (ORTE_SUCCESS != (rc = orte_rml.recv_buffer_nb(ORTE_NAME_WILDCARD,
+  
ORTE_RML_TAG_REPORT_REMOTE_LAUNCH,
+  ORTE_RML_PERSISTENT,
+  
orte_plm_base_daemon_failed, NULL))) {
+ORTE_ERROR_LOG(rc);
+}
}
recv_issued = true;


Modified: trunk/orte/mca/plm/base/plm_private.h
==
--- trunk/orte/mca/plm/base/plm_private.h   Wed Jul 25 12:32:51 2012

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26868 - in trunk/orte/mca/plm: base rsh

2012-07-26 Thread George Bosilca
r26868 seems to have some issues. It works well as long as all processes are 
started on the same node (aka. there is a single daemon), but it breaks with 
the error message attached below if there are more than two daemons.

$ mpirun -np 2 --bynode ./runme
[node01:07767] [[21341,0],1] ORTE_ERROR_LOG: A message is attempting to be sent 
to a process whose contact information is unknown in file 
../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c at line 362
[node01:07767] [[21341,0],1] attempted to send to [[21341,0],2]: tag 15
[node01:07767] [[21341,0],1] ORTE_ERROR_LOG: A message is attempting to be sent 
to a process whose contact information is unknown in file 
../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c at line 157

I confirm that applying the reverted commit brings the trunk to a normal state.

Please - a tad more care in what gets committed??

  george.


On Jul 25, 2012, at 23:46 , svn-commit-mai...@open-mpi.org wrote:

> Author: rhc (Ralph Castain)
> Date: 2012-07-25 17:46:45 EDT (Wed, 25 Jul 2012)
> New Revision: 26868
> URL: https://svn.open-mpi.org/trac/ompi/changeset/26868
> 
> Log:
> Reconnect the rsh/ssh error reporting code for remote spawns to report 
> failure to launch. Ensure the HNP correctly reports non-zero exit status when 
> ssh encounters a problem.
> 
> Thanks to Terry for spotting it!
> 
> Text files modified: 
>   trunk/orte/mca/plm/base/plm_base_launch_support.c |44 
> 
>   trunk/orte/mca/plm/base/plm_base_receive.c| 6 + 
>   
>   trunk/orte/mca/plm/base/plm_private.h | 4 +++   
>   
>   trunk/orte/mca/plm/rsh/plm_rsh_module.c   |18 +++-  
>   
>   4 files changed, 62 insertions(+), 10 deletions(-)
> 
> Modified: trunk/orte/mca/plm/base/plm_base_launch_support.c
> ==
> --- trunk/orte/mca/plm/base/plm_base_launch_support.c Wed Jul 25 12:32:51 
> 2012(r26867)
> +++ trunk/orte/mca/plm/base/plm_base_launch_support.c 2012-07-25 17:46:45 EDT 
> (Wed, 25 Jul 2012)  (r26868)
> @@ -741,6 +741,50 @@
> 
> }
> 
> +void orte_plm_base_daemon_failed(int st, orte_process_name_t* sender,
> + opal_buffer_t *buffer,
> + orte_rml_tag_t tag, void *cbdata)
> +{
> +int status, rc;
> +int32_t n;
> +orte_vpid_t vpid;
> +orte_proc_t *daemon;
> +
> +/* get the daemon job, if necessary */
> +if (NULL == jdatorted) {
> +jdatorted = orte_get_job_data_object(ORTE_PROC_MY_NAME->jobid);
> +}
> +
> +/* unpack the daemon that failed */
> +n=1;
> +if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer, , , 
> ORTE_VPID))) {
> +ORTE_ERROR_LOG(rc);
> +ORTE_UPDATE_EXIT_STATUS(ORTE_ERROR_DEFAULT_EXIT_CODE);
> +goto finish;
> +}
> +
> +/* unpack the exit status */
> +n=1;
> +if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer, , , 
> OPAL_INT))) {
> +ORTE_ERROR_LOG(rc);
> +status = ORTE_ERROR_DEFAULT_EXIT_CODE;
> +ORTE_UPDATE_EXIT_STATUS(ORTE_ERROR_DEFAULT_EXIT_CODE);
> +} else {
> +ORTE_UPDATE_EXIT_STATUS(WEXITSTATUS(status));
> +}
> +
> +/* find the daemon and update its state/status */
> +if (NULL == (daemon = 
> (orte_proc_t*)opal_pointer_array_get_item(jdatorted->procs, vpid))) {
> +ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
> +goto finish;
> +}
> +daemon->state = ORTE_PROC_STATE_FAILED_TO_START;
> +daemon->exit_code = status;
> +
> + finish:
> +ORTE_ACTIVATE_PROC_STATE(>name, ORTE_PROC_STATE_FAILED_TO_START);
> +}
> +
> int orte_plm_base_setup_orted_cmd(int *argc, char ***argv)
> {
> int i, loc;
> 
> Modified: trunk/orte/mca/plm/base/plm_base_receive.c
> ==
> --- trunk/orte/mca/plm/base/plm_base_receive.cWed Jul 25 12:32:51 
> 2012(r26867)
> +++ trunk/orte/mca/plm/base/plm_base_receive.c2012-07-25 17:46:45 EDT 
> (Wed, 25 Jul 2012)  (r26868)
> @@ -87,6 +87,12 @@
>   
> orte_plm_base_daemon_callback, NULL))) {
> ORTE_ERROR_LOG(rc);
> }
> +if (ORTE_SUCCESS != (rc = orte_rml.recv_buffer_nb(ORTE_NAME_WILDCARD,
> +  
> ORTE_RML_TAG_REPORT_REMOTE_LAUNCH,
> +  
> ORTE_RML_PERSISTENT,
> +  
> orte_plm_base_daemon_failed, NULL))) {
> +ORTE_ERROR_LOG(rc);
> +}
> }
> recv_issued = true;
> 
> 
> Modified: trunk/orte/mca/plm/base/plm_private.h
> ==
> --- 

Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread Richard Graham
OK, so this is only for receive, and not for send, I take it.  Should have 
looked closer.

Rich

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf 
Of George Bosilca
Sent: Thursday, July 26, 2012 10:47 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it 
has a matched recv frag

Rich,

There is no matching in this case. Canceling a receive operation is possible 
only up to the moment the request has been matched. Up to this point the 
sequence numbers of the peers are not used, so removing a non-matched request 
has no impact on the sequence number.

  george.

On Jul 26, 2012, at 16:31 , Richard Graham wrote:

> I do not see any resetting of sequence numbers.  It has been a long time 
> since I have looked at the matching code, so don't know if the out-of-order 
> handling has been taken out.  If not, the sequence number has to be dealt 
> with in some manner, or else there will be a gap in the arriving sequence 
> numbers, and the matching logic will prevent any further progress.
> 
> Rich
> 
> -Original Message-
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] 
> On Behalf Of George Bosilca
> Sent: Thursday, July 26, 2012 10:06 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] [patch] MPI_Cancel should not cancel a 
> request if it has a matched recv frag
> 
> Takahiro,
> 
> Indeed we were way to lax on canceling the requests. I modified your patch to 
> correctly deal with the MEMCHECK macro (remove the call from the branch that 
> will requires a completion function). The modified patch is attached below. I 
> will commit asap.
> 
>  Thanks,
>george.
> 
> 
> Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
> ===
> --- ompi/mca/pml/ob1/pml_ob1_recvreq.c(revision 26870)
> +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c(working copy)
> @@ -3,7 +3,7 @@
>  * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
>  * University Research and Technology
>  * Corporation.  All rights reserved.
> - * Copyright (c) 2004-2009 The University of Tennessee and The 
> University
> + * Copyright (c) 2004-2012 The University of Tennessee and The 
> + University
>  * of Tennessee Research Foundation.  All rights
>  * reserved.
>  * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, @@ 
> -15,6 +15,7 @@
>  * Copyright (c) 2012  NVIDIA Corporation.  All rights reserved.
>  * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights
>  * reserved.
> + * Copyright (c) 2012  FUJITSU LIMITED.  All rights reserved.
>  * $COPYRIGHT$
>  *
>  * Additional copyrights may follow
> @@ -97,36 +98,26 @@
> mca_pml_ob1_recv_request_t* request = 
> (mca_pml_ob1_recv_request_t*)ompi_request;
> mca_pml_ob1_comm_t* comm = 
> request->req_recv.req_base.req_comm->c_pml_comm;
> 
> -if( true == ompi_request->req_complete ) { /* way to late to cancel this 
> one */
> -/*
> - * Receive request completed, make user buffer accessable.
> - */
> -MEMCHECKER(
> -memchecker_call(_memchecker_base_mem_defined,
> -request->req_recv.req_base.req_addr,
> -request->req_recv.req_base.req_count,
> -request->req_recv.req_base.req_datatype);
> -);
> +if( true == request->req_match_received ) { /* way to late to cancel 
> this one */
> +assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); 
> + /* not matched isn't it */
> return OMPI_SUCCESS;
> }
> 
> /* The rest should be protected behind the match logic lock */
> OPAL_THREAD_LOCK(>matching_lock);
> -if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match 
> has not been already done */
> -   if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
> -  opal_list_remove_item( >wild_receives, 
> (opal_list_item_t*)request );
> -   } else {
> -  mca_pml_ob1_comm_proc_t* proc = comm->procs + 
> request->req_recv.req_base.req_peer;
> -  opal_list_remove_item(>specific_receives, 
> (opal_list_item_t*)request);
> -   }
> -   PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
> -&(request->req_recv.req_base), PERUSE_RECV );
> -   /**
> -* As now the PML is done with this request we have to force the 
> pml_complete
> -* to true. Otherwise, the request will never be freed.
> -*/
> -   request->req_recv.req_base.req_pml_complete = true;
> +if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
> +opal_list_remove_item( >wild_receives, 
> (opal_list_item_t*)request );
> +} else {
> +

Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread George Bosilca
Rich,

There is no matching in this case. Canceling a receive operation is possible 
only up to the moment the request has been matched. Up to this point the 
sequence numbers of the peers are not used, so removing a non-matched request 
has no impact on the sequence number.

  george.

On Jul 26, 2012, at 16:31 , Richard Graham wrote:

> I do not see any resetting of sequence numbers.  It has been a long time 
> since I have looked at the matching code, so don't know if the out-of-order 
> handling has been taken out.  If not, the sequence number has to be dealt 
> with in some manner, or else there will be a gap in the arriving sequence 
> numbers, and the matching logic will prevent any further progress.
> 
> Rich
> 
> -Original Message-
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
> Behalf Of George Bosilca
> Sent: Thursday, July 26, 2012 10:06 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if 
> it has a matched recv frag
> 
> Takahiro,
> 
> Indeed we were way to lax on canceling the requests. I modified your patch to 
> correctly deal with the MEMCHECK macro (remove the call from the branch that 
> will requires a completion function). The modified patch is attached below. I 
> will commit asap.
> 
>  Thanks,
>george.
> 
> 
> Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
> ===
> --- ompi/mca/pml/ob1/pml_ob1_recvreq.c(revision 26870)
> +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c(working copy)
> @@ -3,7 +3,7 @@
>  * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
>  * University Research and Technology
>  * Corporation.  All rights reserved.
> - * Copyright (c) 2004-2009 The University of Tennessee and The University
> + * Copyright (c) 2004-2012 The University of Tennessee and The 
> + University
>  * of Tennessee Research Foundation.  All rights
>  * reserved.
>  * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, @@ 
> -15,6 +15,7 @@
>  * Copyright (c) 2012  NVIDIA Corporation.  All rights reserved.
>  * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights
>  * reserved.
> + * Copyright (c) 2012  FUJITSU LIMITED.  All rights reserved.
>  * $COPYRIGHT$
>  *
>  * Additional copyrights may follow
> @@ -97,36 +98,26 @@
> mca_pml_ob1_recv_request_t* request = 
> (mca_pml_ob1_recv_request_t*)ompi_request;
> mca_pml_ob1_comm_t* comm = 
> request->req_recv.req_base.req_comm->c_pml_comm;
> 
> -if( true == ompi_request->req_complete ) { /* way to late to cancel this 
> one */
> -/*
> - * Receive request completed, make user buffer accessable.
> - */
> -MEMCHECKER(
> -memchecker_call(_memchecker_base_mem_defined,
> -request->req_recv.req_base.req_addr,
> -request->req_recv.req_base.req_count,
> -request->req_recv.req_base.req_datatype);
> -);
> +if( true == request->req_match_received ) { /* way to late to cancel 
> this one */
> +assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); /* 
> + not matched isn't it */
> return OMPI_SUCCESS;
> }
> 
> /* The rest should be protected behind the match logic lock */
> OPAL_THREAD_LOCK(>matching_lock);
> -if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match 
> has not been already done */
> -   if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
> -  opal_list_remove_item( >wild_receives, 
> (opal_list_item_t*)request );
> -   } else {
> -  mca_pml_ob1_comm_proc_t* proc = comm->procs + 
> request->req_recv.req_base.req_peer;
> -  opal_list_remove_item(>specific_receives, 
> (opal_list_item_t*)request);
> -   }
> -   PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
> -&(request->req_recv.req_base), PERUSE_RECV );
> -   /**
> -* As now the PML is done with this request we have to force the 
> pml_complete
> -* to true. Otherwise, the request will never be freed.
> -*/
> -   request->req_recv.req_base.req_pml_complete = true;
> +if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
> +opal_list_remove_item( >wild_receives, 
> (opal_list_item_t*)request );
> +} else {
> +mca_pml_ob1_comm_proc_t* proc = comm->procs + 
> request->req_recv.req_base.req_peer;
> +opal_list_remove_item(>specific_receives, 
> + (opal_list_item_t*)request);
> }
> +PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
> + &(request->req_recv.req_base), PERUSE_RECV );
> +/**
> + * As now the PML is done with this request we 

Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread Richard Graham
I do not see any resetting of sequence numbers.  It has been a long time since 
I have looked at the matching code, so don't know if the out-of-order handling 
has been taken out.  If not, the sequence number has to be dealt with in some 
manner, or else there will be a gap in the arriving sequence numbers, and the 
matching logic will prevent any further progress.

Rich

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf 
Of George Bosilca
Sent: Thursday, July 26, 2012 10:06 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it 
has a matched recv frag

Takahiro,

Indeed we were way to lax on canceling the requests. I modified your patch to 
correctly deal with the MEMCHECK macro (remove the call from the branch that 
will requires a completion function). The modified patch is attached below. I 
will commit asap.

  Thanks,
george.


Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
===
--- ompi/mca/pml/ob1/pml_ob1_recvreq.c  (revision 26870)
+++ ompi/mca/pml/ob1/pml_ob1_recvreq.c  (working copy)
@@ -3,7 +3,7 @@
  * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
  * University Research and Technology
  * Corporation.  All rights reserved.
- * Copyright (c) 2004-2009 The University of Tennessee and The University
+ * Copyright (c) 2004-2012 The University of Tennessee and The 
+ University
  * of Tennessee Research Foundation.  All rights
  * reserved.
  * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, @@ 
-15,6 +15,7 @@
  * Copyright (c) 2012  NVIDIA Corporation.  All rights reserved.
  * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights
  * reserved.
+ * Copyright (c) 2012  FUJITSU LIMITED.  All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -97,36 +98,26 @@
 mca_pml_ob1_recv_request_t* request = 
(mca_pml_ob1_recv_request_t*)ompi_request;
 mca_pml_ob1_comm_t* comm = request->req_recv.req_base.req_comm->c_pml_comm;

-if( true == ompi_request->req_complete ) { /* way to late to cancel this 
one */
-/*
- * Receive request completed, make user buffer accessable.
- */
-MEMCHECKER(
-memchecker_call(_memchecker_base_mem_defined,
-request->req_recv.req_base.req_addr,
-request->req_recv.req_base.req_count,
-request->req_recv.req_base.req_datatype);
-);
+if( true == request->req_match_received ) { /* way to late to cancel this 
one */
+assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); /* 
+ not matched isn't it */
 return OMPI_SUCCESS;
 }

 /* The rest should be protected behind the match logic lock */
 OPAL_THREAD_LOCK(>matching_lock);
-if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match has 
not been already done */
-   if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
-  opal_list_remove_item( >wild_receives, 
(opal_list_item_t*)request );
-   } else {
-  mca_pml_ob1_comm_proc_t* proc = comm->procs + 
request->req_recv.req_base.req_peer;
-  opal_list_remove_item(>specific_receives, 
(opal_list_item_t*)request);
-   }
-   PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
-&(request->req_recv.req_base), PERUSE_RECV );
-   /**
-* As now the PML is done with this request we have to force the 
pml_complete
-* to true. Otherwise, the request will never be freed.
-*/
-   request->req_recv.req_base.req_pml_complete = true;
+if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
+opal_list_remove_item( >wild_receives, 
(opal_list_item_t*)request );
+} else {
+mca_pml_ob1_comm_proc_t* proc = comm->procs + 
request->req_recv.req_base.req_peer;
+opal_list_remove_item(>specific_receives, 
+ (opal_list_item_t*)request);
 }
+PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
+ &(request->req_recv.req_base), PERUSE_RECV );
+/**
+ * As now the PML is done with this request we have to force the 
pml_complete
+ * to true. Otherwise, the request will never be freed.
+ */
+request->req_recv.req_base.req_pml_complete = true;
 OPAL_THREAD_UNLOCK(>matching_lock);

 OPAL_THREAD_LOCK(_request_lock);
@@ -138,7 +129,7 @@
 MCA_PML_OB1_RECV_REQUEST_MPI_COMPLETE(request);
 OPAL_THREAD_UNLOCK(_request_lock);
 /*
- * Receive request cancelled, make user buffer accessable.
+ * Receive request cancelled, make user buffer accessible.
  */
 MEMCHECKER(
 

Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread George Bosilca
Takahiro,

Indeed we were way to lax on canceling the requests. I modified your patch to 
correctly deal with the MEMCHECK macro (remove the call from the branch that 
will requires a completion function). The modified patch is attached below. I 
will commit asap.

  Thanks,
george.


Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
===
--- ompi/mca/pml/ob1/pml_ob1_recvreq.c  (revision 26870)
+++ ompi/mca/pml/ob1/pml_ob1_recvreq.c  (working copy)
@@ -3,7 +3,7 @@
  * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
  * University Research and Technology
  * Corporation.  All rights reserved.
- * Copyright (c) 2004-2009 The University of Tennessee and The University
+ * Copyright (c) 2004-2012 The University of Tennessee and The University
  * of Tennessee Research Foundation.  All rights
  * reserved.
  * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, 
@@ -15,6 +15,7 @@
  * Copyright (c) 2012  NVIDIA Corporation.  All rights reserved.
  * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights
  * reserved.
+ * Copyright (c) 2012  FUJITSU LIMITED.  All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -97,36 +98,26 @@
 mca_pml_ob1_recv_request_t* request = 
(mca_pml_ob1_recv_request_t*)ompi_request;
 mca_pml_ob1_comm_t* comm = request->req_recv.req_base.req_comm->c_pml_comm;

-if( true == ompi_request->req_complete ) { /* way to late to cancel this 
one */
-/*
- * Receive request completed, make user buffer accessable.
- */
-MEMCHECKER(
-memchecker_call(_memchecker_base_mem_defined,
-request->req_recv.req_base.req_addr,
-request->req_recv.req_base.req_count,
-request->req_recv.req_base.req_datatype);
-);
+if( true == request->req_match_received ) { /* way to late to cancel this 
one */
+assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); /* not 
matched isn't it */
 return OMPI_SUCCESS;
 }

 /* The rest should be protected behind the match logic lock */
 OPAL_THREAD_LOCK(>matching_lock);
-if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match has 
not been already done */
-   if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
-  opal_list_remove_item( >wild_receives, 
(opal_list_item_t*)request );
-   } else {
-  mca_pml_ob1_comm_proc_t* proc = comm->procs + 
request->req_recv.req_base.req_peer;
-  opal_list_remove_item(>specific_receives, 
(opal_list_item_t*)request);
-   }
-   PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
-&(request->req_recv.req_base), PERUSE_RECV );
-   /**
-* As now the PML is done with this request we have to force the 
pml_complete
-* to true. Otherwise, the request will never be freed.
-*/
-   request->req_recv.req_base.req_pml_complete = true;
+if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
+opal_list_remove_item( >wild_receives, 
(opal_list_item_t*)request );
+} else {
+mca_pml_ob1_comm_proc_t* proc = comm->procs + 
request->req_recv.req_base.req_peer;
+opal_list_remove_item(>specific_receives, 
(opal_list_item_t*)request);
 }
+PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
+ &(request->req_recv.req_base), PERUSE_RECV );
+/**
+ * As now the PML is done with this request we have to force the 
pml_complete
+ * to true. Otherwise, the request will never be freed.
+ */
+request->req_recv.req_base.req_pml_complete = true;
 OPAL_THREAD_UNLOCK(>matching_lock);

 OPAL_THREAD_LOCK(_request_lock);
@@ -138,7 +129,7 @@
 MCA_PML_OB1_RECV_REQUEST_MPI_COMPLETE(request);
 OPAL_THREAD_UNLOCK(_request_lock);
 /*
- * Receive request cancelled, make user buffer accessable.
+ * Receive request cancelled, make user buffer accessible.
  */
 MEMCHECKER(
 memchecker_call(_memchecker_base_mem_defined,

On Jul 26, 2012, at 13:41 , Kawashima, Takahiro wrote:

> Hi Open MPI developers,
> 
> I found a small bug in Open MPI.
> 
> See attached program cancelled.c.
> In this program, rank 1 tries to cancel a MPI_Irecv and calls a MPI_Recv
> instead if the cancellation succeeds. This program should terminate whether
> the cancellation succeeds or not. But it leads a deadlock in MPI_Recv after
> printing "MPI_Test_cancelled: 1".
> I confirmed it works fine with MPICH2.
> 
> The problem is in mca_pml_ob1_recv_request_cancel function in
> ompi/mca/pml/ob1/pml_ob1_recvreq.c. It accepts the cancellation unless
> the request has been completed. I think it 

[OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread Kawashima, Takahiro
Hi Open MPI developers,

I found a small bug in Open MPI.

See attached program cancelled.c.
In this program, rank 1 tries to cancel a MPI_Irecv and calls a MPI_Recv
instead if the cancellation succeeds. This program should terminate whether
the cancellation succeeds or not. But it leads a deadlock in MPI_Recv after
printing "MPI_Test_cancelled: 1".
I confirmed it works fine with MPICH2.

The problem is in mca_pml_ob1_recv_request_cancel function in
ompi/mca/pml/ob1/pml_ob1_recvreq.c. It accepts the cancellation unless
the request has been completed. I think it should not accept the
cancellation if the request has been matched. If it want to accept the
cancellation, it must push the recv frag to the unexpected message queue
back and redo matching. Furthermore, the receive buffer must be reverted
if the received message has been written to the receive buffer partially
in a pipeline protocol.

Attached patch cancel-recv.patch is a sample fix for this bug for Open MPI
trunk. Though this patch has 65 lines, main modifications are adding one
if-statement and deleting one if-statement. Other lines are just for
indent alignment.
I cannot confirm the MEMCHECKER part is correct. Could anyone review it
before committing?

Regards,

Takahiro Kawashima,
MPI development team,
Fujitsu
#include 
#include 
#include 

/* rendezvous */
#define BUFSIZE1 (1024*1024)
/* eager */
#define BUFSIZE2 (8)

int main(int argc, char *argv[])
{
int myrank, cancelled;
void *buf1, *buf2;
MPI_Request request;
MPI_Status status;

MPI_Init(, );
MPI_Comm_rank(MPI_COMM_WORLD, );
buf1 = malloc(BUFSIZE1);
buf2 = malloc(BUFSIZE2);

if (myrank == 0) {

MPI_Isend(buf1, BUFSIZE1, MPI_BYTE, 1, 0, MPI_COMM_WORLD, );
MPI_Send(buf2, BUFSIZE2, MPI_BYTE, 1, 1, MPI_COMM_WORLD);
MPI_Wait(, );

} else if (myrank == 1) {

MPI_Irecv(buf1, BUFSIZE1, MPI_BYTE, 0, 0, MPI_COMM_WORLD, );
MPI_Recv(buf2, BUFSIZE2, MPI_BYTE, 0, 1, MPI_COMM_WORLD, );
MPI_Cancel();
MPI_Wait(, );

MPI_Test_cancelled(, );
printf("MPI_Test_cancelled: %d\n", cancelled);
fflush(stdout);

if (cancelled) {
MPI_Recv(buf1, BUFSIZE1, MPI_BYTE, 0, 0, MPI_COMM_WORLD,
 );
}

}

MPI_Finalize();
free(buf1);
free(buf2);

return 0;
}
Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
===
--- ompi/mca/pml/ob1/pml_ob1_recvreq.c	(revision 26836)
+++ ompi/mca/pml/ob1/pml_ob1_recvreq.c	(working copy)
@@ -97,36 +97,36 @@
 mca_pml_ob1_recv_request_t* request = (mca_pml_ob1_recv_request_t*)ompi_request;
 mca_pml_ob1_comm_t* comm = request->req_recv.req_base.req_comm->c_pml_comm;
 
-if( true == ompi_request->req_complete ) { /* way to late to cancel this one */
-/*
- * Receive request completed, make user buffer accessable.
- */
-MEMCHECKER(
-memchecker_call(_memchecker_base_mem_defined,
-request->req_recv.req_base.req_addr,
-request->req_recv.req_base.req_count,
-request->req_recv.req_base.req_datatype);
-);
+if( true == request->req_match_received ) { /* way to late to cancel this one */
+if( true == ompi_request->req_complete ) {
+/*
+ * Receive request completed, make user buffer accessable.
+ */
+MEMCHECKER(
+memchecker_call(_memchecker_base_mem_defined,
+request->req_recv.req_base.req_addr,
+request->req_recv.req_base.req_count,
+request->req_recv.req_base.req_datatype);
+);
+}
 return OMPI_SUCCESS;
 }
 
 /* The rest should be protected behind the match logic lock */
 OPAL_THREAD_LOCK(>matching_lock);
-if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match has not been already done */
-   if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
-  opal_list_remove_item( >wild_receives, (opal_list_item_t*)request );
-   } else {
-  mca_pml_ob1_comm_proc_t* proc = comm->procs + request->req_recv.req_base.req_peer;
-  opal_list_remove_item(>specific_receives, (opal_list_item_t*)request);
-   }
-   PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
-&(request->req_recv.req_base), PERUSE_RECV );
-   /**
-* As now the PML is done with this request we have to force the pml_complete
-* to true. Otherwise, the request will never be freed.
-*/
-   request->req_recv.req_base.req_pml_complete = true;
+if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
+   opal_list_remove_item( >wild_receives, (opal_list_item_t*)request );
+} else {
+