Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread Kawashima, Takahiro
George,

Thanks for review and commit!
I've confirmed your modification.

Takahiro Kawashima,
MPI development team,
Fujitsu

> Takahiro,
> 
> Indeed we were way to lax on canceling the requests. I modified your patch to 
> correctly deal with the MEMCHECK macro (remove the call from the branch that 
> will requires a completion function). The modified patch is attached below. I 
> will commit asap.
> 
>   Thanks,
> george.
> 
> 
> Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
> ===
> --- ompi/mca/pml/ob1/pml_ob1_recvreq.c(revision 26870)
> +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c(working copy)
> @@ -3,7 +3,7 @@
>   * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
>   * University Research and Technology
>   * Corporation.  All rights reserved.
> - * Copyright (c) 2004-2009 The University of Tennessee and The University
> + * Copyright (c) 2004-2012 The University of Tennessee and The University
>   * of Tennessee Research Foundation.  All rights
>   * reserved.
>   * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, 
> @@ -15,6 +15,7 @@
>   * Copyright (c) 2012  NVIDIA Corporation.  All rights reserved.
>   * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights
>   * reserved.
> + * Copyright (c) 2012  FUJITSU LIMITED.  All rights reserved.
>   * $COPYRIGHT$
>   * 
>   * Additional copyrights may follow
> @@ -97,36 +98,26 @@
>  mca_pml_ob1_recv_request_t* request = 
> (mca_pml_ob1_recv_request_t*)ompi_request;
>  mca_pml_ob1_comm_t* comm = 
> request->req_recv.req_base.req_comm->c_pml_comm;
>  
> -if( true == ompi_request->req_complete ) { /* way to late to cancel this 
> one */
> -/*
> - * Receive request completed, make user buffer accessable.
> - */
> -MEMCHECKER(
> -memchecker_call(_memchecker_base_mem_defined,
> -request->req_recv.req_base.req_addr,
> -request->req_recv.req_base.req_count,
> -request->req_recv.req_base.req_datatype);
> -);
> +if( true == request->req_match_received ) { /* way to late to cancel 
> this one */
> +assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); /* not 
> matched isn't it */
>  return OMPI_SUCCESS;
>  }
>  
>  /* The rest should be protected behind the match logic lock */
>  OPAL_THREAD_LOCK(>matching_lock);
> -if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match 
> has not been already done */
> -   if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
> -  opal_list_remove_item( >wild_receives, 
> (opal_list_item_t*)request );
> -   } else {
> -  mca_pml_ob1_comm_proc_t* proc = comm->procs + 
> request->req_recv.req_base.req_peer;
> -  opal_list_remove_item(>specific_receives, 
> (opal_list_item_t*)request);
> -   }
> -   PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
> -&(request->req_recv.req_base), PERUSE_RECV );
> -   /**
> -* As now the PML is done with this request we have to force the 
> pml_complete
> -* to true. Otherwise, the request will never be freed.
> -*/
> -   request->req_recv.req_base.req_pml_complete = true;
> +if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
> +opal_list_remove_item( >wild_receives, 
> (opal_list_item_t*)request );
> +} else {
> +mca_pml_ob1_comm_proc_t* proc = comm->procs + 
> request->req_recv.req_base.req_peer;
> +opal_list_remove_item(>specific_receives, 
> (opal_list_item_t*)request);
>  }
> +PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
> + &(request->req_recv.req_base), PERUSE_RECV );
> +/**
> + * As now the PML is done with this request we have to force the 
> pml_complete
> + * to true. Otherwise, the request will never be freed.
> + */
> +request->req_recv.req_base.req_pml_complete = true;
>  OPAL_THREAD_UNLOCK(>matching_lock);
>  
>  OPAL_THREAD_LOCK(_request_lock);
> @@ -138,7 +129,7 @@
>  MCA_PML_OB1_RECV_REQUEST_MPI_COMPLETE(request);
>  OPAL_THREAD_UNLOCK(_request_lock);
>  /*
> - * Receive request cancelled, make user buffer accessable.
> + * Receive request cancelled, make user buffer accessible.
>   */
>  MEMCHECKER(
>  memchecker_call(_memchecker_base_mem_defined,
> 
> On Jul 26, 2012, at 13:41 , Kawashima, Takahiro wrote:
> 
> > Hi Open MPI developers,
> > 
> > I found a small bug in Open MPI.
> > 
> > See attached program cancelled.c.
> > In this program, rank 1 tries to cancel a MPI_Irecv and calls a MPI_Recv
> > instead if the cancellation succeeds. 

Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread Richard Graham
OK, so this is only for receive, and not for send, I take it.  Should have 
looked closer.

Rich

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf 
Of George Bosilca
Sent: Thursday, July 26, 2012 10:47 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it 
has a matched recv frag

Rich,

There is no matching in this case. Canceling a receive operation is possible 
only up to the moment the request has been matched. Up to this point the 
sequence numbers of the peers are not used, so removing a non-matched request 
has no impact on the sequence number.

  george.

On Jul 26, 2012, at 16:31 , Richard Graham wrote:

> I do not see any resetting of sequence numbers.  It has been a long time 
> since I have looked at the matching code, so don't know if the out-of-order 
> handling has been taken out.  If not, the sequence number has to be dealt 
> with in some manner, or else there will be a gap in the arriving sequence 
> numbers, and the matching logic will prevent any further progress.
> 
> Rich
> 
> -Original Message-
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] 
> On Behalf Of George Bosilca
> Sent: Thursday, July 26, 2012 10:06 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] [patch] MPI_Cancel should not cancel a 
> request if it has a matched recv frag
> 
> Takahiro,
> 
> Indeed we were way to lax on canceling the requests. I modified your patch to 
> correctly deal with the MEMCHECK macro (remove the call from the branch that 
> will requires a completion function). The modified patch is attached below. I 
> will commit asap.
> 
>  Thanks,
>george.
> 
> 
> Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
> ===
> --- ompi/mca/pml/ob1/pml_ob1_recvreq.c(revision 26870)
> +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c(working copy)
> @@ -3,7 +3,7 @@
>  * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
>  * University Research and Technology
>  * Corporation.  All rights reserved.
> - * Copyright (c) 2004-2009 The University of Tennessee and The 
> University
> + * Copyright (c) 2004-2012 The University of Tennessee and The 
> + University
>  * of Tennessee Research Foundation.  All rights
>  * reserved.
>  * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, @@ 
> -15,6 +15,7 @@
>  * Copyright (c) 2012  NVIDIA Corporation.  All rights reserved.
>  * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights
>  * reserved.
> + * Copyright (c) 2012  FUJITSU LIMITED.  All rights reserved.
>  * $COPYRIGHT$
>  *
>  * Additional copyrights may follow
> @@ -97,36 +98,26 @@
> mca_pml_ob1_recv_request_t* request = 
> (mca_pml_ob1_recv_request_t*)ompi_request;
> mca_pml_ob1_comm_t* comm = 
> request->req_recv.req_base.req_comm->c_pml_comm;
> 
> -if( true == ompi_request->req_complete ) { /* way to late to cancel this 
> one */
> -/*
> - * Receive request completed, make user buffer accessable.
> - */
> -MEMCHECKER(
> -memchecker_call(_memchecker_base_mem_defined,
> -request->req_recv.req_base.req_addr,
> -request->req_recv.req_base.req_count,
> -request->req_recv.req_base.req_datatype);
> -);
> +if( true == request->req_match_received ) { /* way to late to cancel 
> this one */
> +assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); 
> + /* not matched isn't it */
> return OMPI_SUCCESS;
> }
> 
> /* The rest should be protected behind the match logic lock */
> OPAL_THREAD_LOCK(>matching_lock);
> -if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match 
> has not been already done */
> -   if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
> -  opal_list_remove_item( >wild_receives, 
> (opal_list_item_t*)request );
> -   } else {
> -  mca_pml_ob1_comm_proc_t* proc = comm->procs + 
> request->req_recv.req_base.req_peer;
> -  opal_list_remove_item(>specific_receives, 
> (opal_list_item_t*)request);
> -   }
> -   PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
> -&(request->req_recv.req_base), PERUSE_RECV );
> -   /**
> -* As now the PML is done with this request we have to force the 
> pml_complete
> -* to true. Otherwise, the request will never be freed.
> -*/
> -   request->req_recv.req_base.req_pml_complete = true;
> +if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
> +opal_list_remove_item( >wild_receives, 
> (opal_list_item_t*)request );
> +} else {
> +

Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread George Bosilca
Rich,

There is no matching in this case. Canceling a receive operation is possible 
only up to the moment the request has been matched. Up to this point the 
sequence numbers of the peers are not used, so removing a non-matched request 
has no impact on the sequence number.

  george.

On Jul 26, 2012, at 16:31 , Richard Graham wrote:

> I do not see any resetting of sequence numbers.  It has been a long time 
> since I have looked at the matching code, so don't know if the out-of-order 
> handling has been taken out.  If not, the sequence number has to be dealt 
> with in some manner, or else there will be a gap in the arriving sequence 
> numbers, and the matching logic will prevent any further progress.
> 
> Rich
> 
> -Original Message-
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
> Behalf Of George Bosilca
> Sent: Thursday, July 26, 2012 10:06 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if 
> it has a matched recv frag
> 
> Takahiro,
> 
> Indeed we were way to lax on canceling the requests. I modified your patch to 
> correctly deal with the MEMCHECK macro (remove the call from the branch that 
> will requires a completion function). The modified patch is attached below. I 
> will commit asap.
> 
>  Thanks,
>george.
> 
> 
> Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
> ===
> --- ompi/mca/pml/ob1/pml_ob1_recvreq.c(revision 26870)
> +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c(working copy)
> @@ -3,7 +3,7 @@
>  * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
>  * University Research and Technology
>  * Corporation.  All rights reserved.
> - * Copyright (c) 2004-2009 The University of Tennessee and The University
> + * Copyright (c) 2004-2012 The University of Tennessee and The 
> + University
>  * of Tennessee Research Foundation.  All rights
>  * reserved.
>  * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, @@ 
> -15,6 +15,7 @@
>  * Copyright (c) 2012  NVIDIA Corporation.  All rights reserved.
>  * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights
>  * reserved.
> + * Copyright (c) 2012  FUJITSU LIMITED.  All rights reserved.
>  * $COPYRIGHT$
>  *
>  * Additional copyrights may follow
> @@ -97,36 +98,26 @@
> mca_pml_ob1_recv_request_t* request = 
> (mca_pml_ob1_recv_request_t*)ompi_request;
> mca_pml_ob1_comm_t* comm = 
> request->req_recv.req_base.req_comm->c_pml_comm;
> 
> -if( true == ompi_request->req_complete ) { /* way to late to cancel this 
> one */
> -/*
> - * Receive request completed, make user buffer accessable.
> - */
> -MEMCHECKER(
> -memchecker_call(_memchecker_base_mem_defined,
> -request->req_recv.req_base.req_addr,
> -request->req_recv.req_base.req_count,
> -request->req_recv.req_base.req_datatype);
> -);
> +if( true == request->req_match_received ) { /* way to late to cancel 
> this one */
> +assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); /* 
> + not matched isn't it */
> return OMPI_SUCCESS;
> }
> 
> /* The rest should be protected behind the match logic lock */
> OPAL_THREAD_LOCK(>matching_lock);
> -if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match 
> has not been already done */
> -   if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
> -  opal_list_remove_item( >wild_receives, 
> (opal_list_item_t*)request );
> -   } else {
> -  mca_pml_ob1_comm_proc_t* proc = comm->procs + 
> request->req_recv.req_base.req_peer;
> -  opal_list_remove_item(>specific_receives, 
> (opal_list_item_t*)request);
> -   }
> -   PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
> -&(request->req_recv.req_base), PERUSE_RECV );
> -   /**
> -* As now the PML is done with this request we have to force the 
> pml_complete
> -* to true. Otherwise, the request will never be freed.
> -*/
> -   request->req_recv.req_base.req_pml_complete = true;
> +if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
> +opal_list_remove_item( >wild_receives, 
> (opal_list_item_t*)request );
> +} else {
> +mca_pml_ob1_comm_proc_t* proc = comm->procs + 
> request->req_recv.req_base.req_peer;
> +opal_list_remove_item(>specific_receives, 
> + (opal_list_item_t*)request);
> }
> +PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
> + &(request->req_recv.req_base), PERUSE_RECV );
> +/**
> + * As now the PML is done with this request we 

Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread Richard Graham
I do not see any resetting of sequence numbers.  It has been a long time since 
I have looked at the matching code, so don't know if the out-of-order handling 
has been taken out.  If not, the sequence number has to be dealt with in some 
manner, or else there will be a gap in the arriving sequence numbers, and the 
matching logic will prevent any further progress.

Rich

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf 
Of George Bosilca
Sent: Thursday, July 26, 2012 10:06 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it 
has a matched recv frag

Takahiro,

Indeed we were way to lax on canceling the requests. I modified your patch to 
correctly deal with the MEMCHECK macro (remove the call from the branch that 
will requires a completion function). The modified patch is attached below. I 
will commit asap.

  Thanks,
george.


Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
===
--- ompi/mca/pml/ob1/pml_ob1_recvreq.c  (revision 26870)
+++ ompi/mca/pml/ob1/pml_ob1_recvreq.c  (working copy)
@@ -3,7 +3,7 @@
  * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
  * University Research and Technology
  * Corporation.  All rights reserved.
- * Copyright (c) 2004-2009 The University of Tennessee and The University
+ * Copyright (c) 2004-2012 The University of Tennessee and The 
+ University
  * of Tennessee Research Foundation.  All rights
  * reserved.
  * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, @@ 
-15,6 +15,7 @@
  * Copyright (c) 2012  NVIDIA Corporation.  All rights reserved.
  * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights
  * reserved.
+ * Copyright (c) 2012  FUJITSU LIMITED.  All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -97,36 +98,26 @@
 mca_pml_ob1_recv_request_t* request = 
(mca_pml_ob1_recv_request_t*)ompi_request;
 mca_pml_ob1_comm_t* comm = request->req_recv.req_base.req_comm->c_pml_comm;

-if( true == ompi_request->req_complete ) { /* way to late to cancel this 
one */
-/*
- * Receive request completed, make user buffer accessable.
- */
-MEMCHECKER(
-memchecker_call(_memchecker_base_mem_defined,
-request->req_recv.req_base.req_addr,
-request->req_recv.req_base.req_count,
-request->req_recv.req_base.req_datatype);
-);
+if( true == request->req_match_received ) { /* way to late to cancel this 
one */
+assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); /* 
+ not matched isn't it */
 return OMPI_SUCCESS;
 }

 /* The rest should be protected behind the match logic lock */
 OPAL_THREAD_LOCK(>matching_lock);
-if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match has 
not been already done */
-   if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
-  opal_list_remove_item( >wild_receives, 
(opal_list_item_t*)request );
-   } else {
-  mca_pml_ob1_comm_proc_t* proc = comm->procs + 
request->req_recv.req_base.req_peer;
-  opal_list_remove_item(>specific_receives, 
(opal_list_item_t*)request);
-   }
-   PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
-&(request->req_recv.req_base), PERUSE_RECV );
-   /**
-* As now the PML is done with this request we have to force the 
pml_complete
-* to true. Otherwise, the request will never be freed.
-*/
-   request->req_recv.req_base.req_pml_complete = true;
+if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
+opal_list_remove_item( >wild_receives, 
(opal_list_item_t*)request );
+} else {
+mca_pml_ob1_comm_proc_t* proc = comm->procs + 
request->req_recv.req_base.req_peer;
+opal_list_remove_item(>specific_receives, 
+ (opal_list_item_t*)request);
 }
+PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
+ &(request->req_recv.req_base), PERUSE_RECV );
+/**
+ * As now the PML is done with this request we have to force the 
pml_complete
+ * to true. Otherwise, the request will never be freed.
+ */
+request->req_recv.req_base.req_pml_complete = true;
 OPAL_THREAD_UNLOCK(>matching_lock);

 OPAL_THREAD_LOCK(_request_lock);
@@ -138,7 +129,7 @@
 MCA_PML_OB1_RECV_REQUEST_MPI_COMPLETE(request);
 OPAL_THREAD_UNLOCK(_request_lock);
 /*
- * Receive request cancelled, make user buffer accessable.
+ * Receive request cancelled, make user buffer accessible.
  */
 MEMCHECKER(
 

Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread George Bosilca
Takahiro,

Indeed we were way to lax on canceling the requests. I modified your patch to 
correctly deal with the MEMCHECK macro (remove the call from the branch that 
will requires a completion function). The modified patch is attached below. I 
will commit asap.

  Thanks,
george.


Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
===
--- ompi/mca/pml/ob1/pml_ob1_recvreq.c  (revision 26870)
+++ ompi/mca/pml/ob1/pml_ob1_recvreq.c  (working copy)
@@ -3,7 +3,7 @@
  * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
  * University Research and Technology
  * Corporation.  All rights reserved.
- * Copyright (c) 2004-2009 The University of Tennessee and The University
+ * Copyright (c) 2004-2012 The University of Tennessee and The University
  * of Tennessee Research Foundation.  All rights
  * reserved.
  * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, 
@@ -15,6 +15,7 @@
  * Copyright (c) 2012  NVIDIA Corporation.  All rights reserved.
  * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights
  * reserved.
+ * Copyright (c) 2012  FUJITSU LIMITED.  All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -97,36 +98,26 @@
 mca_pml_ob1_recv_request_t* request = 
(mca_pml_ob1_recv_request_t*)ompi_request;
 mca_pml_ob1_comm_t* comm = request->req_recv.req_base.req_comm->c_pml_comm;

-if( true == ompi_request->req_complete ) { /* way to late to cancel this 
one */
-/*
- * Receive request completed, make user buffer accessable.
- */
-MEMCHECKER(
-memchecker_call(_memchecker_base_mem_defined,
-request->req_recv.req_base.req_addr,
-request->req_recv.req_base.req_count,
-request->req_recv.req_base.req_datatype);
-);
+if( true == request->req_match_received ) { /* way to late to cancel this 
one */
+assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); /* not 
matched isn't it */
 return OMPI_SUCCESS;
 }

 /* The rest should be protected behind the match logic lock */
 OPAL_THREAD_LOCK(>matching_lock);
-if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match has 
not been already done */
-   if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
-  opal_list_remove_item( >wild_receives, 
(opal_list_item_t*)request );
-   } else {
-  mca_pml_ob1_comm_proc_t* proc = comm->procs + 
request->req_recv.req_base.req_peer;
-  opal_list_remove_item(>specific_receives, 
(opal_list_item_t*)request);
-   }
-   PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
-&(request->req_recv.req_base), PERUSE_RECV );
-   /**
-* As now the PML is done with this request we have to force the 
pml_complete
-* to true. Otherwise, the request will never be freed.
-*/
-   request->req_recv.req_base.req_pml_complete = true;
+if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
+opal_list_remove_item( >wild_receives, 
(opal_list_item_t*)request );
+} else {
+mca_pml_ob1_comm_proc_t* proc = comm->procs + 
request->req_recv.req_base.req_peer;
+opal_list_remove_item(>specific_receives, 
(opal_list_item_t*)request);
 }
+PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
+ &(request->req_recv.req_base), PERUSE_RECV );
+/**
+ * As now the PML is done with this request we have to force the 
pml_complete
+ * to true. Otherwise, the request will never be freed.
+ */
+request->req_recv.req_base.req_pml_complete = true;
 OPAL_THREAD_UNLOCK(>matching_lock);

 OPAL_THREAD_LOCK(_request_lock);
@@ -138,7 +129,7 @@
 MCA_PML_OB1_RECV_REQUEST_MPI_COMPLETE(request);
 OPAL_THREAD_UNLOCK(_request_lock);
 /*
- * Receive request cancelled, make user buffer accessable.
+ * Receive request cancelled, make user buffer accessible.
  */
 MEMCHECKER(
 memchecker_call(_memchecker_base_mem_defined,

On Jul 26, 2012, at 13:41 , Kawashima, Takahiro wrote:

> Hi Open MPI developers,
> 
> I found a small bug in Open MPI.
> 
> See attached program cancelled.c.
> In this program, rank 1 tries to cancel a MPI_Irecv and calls a MPI_Recv
> instead if the cancellation succeeds. This program should terminate whether
> the cancellation succeeds or not. But it leads a deadlock in MPI_Recv after
> printing "MPI_Test_cancelled: 1".
> I confirmed it works fine with MPICH2.
> 
> The problem is in mca_pml_ob1_recv_request_cancel function in
> ompi/mca/pml/ob1/pml_ob1_recvreq.c. It accepts the cancellation unless
> the request has been completed. I think it