Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag
George, Thanks for review and commit! I've confirmed your modification. Takahiro Kawashima, MPI development team, Fujitsu > Takahiro, > > Indeed we were way to lax on canceling the requests. I modified your patch to > correctly deal with the MEMCHECK macro (remove the call from the branch that > will requires a completion function). The modified patch is attached below. I > will commit asap. > > Thanks, > george. > > > Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c > === > --- ompi/mca/pml/ob1/pml_ob1_recvreq.c(revision 26870) > +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c(working copy) > @@ -3,7 +3,7 @@ > * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana > * University Research and Technology > * Corporation. All rights reserved. > - * Copyright (c) 2004-2009 The University of Tennessee and The University > + * Copyright (c) 2004-2012 The University of Tennessee and The University > * of Tennessee Research Foundation. All rights > * reserved. > * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, > @@ -15,6 +15,7 @@ > * Copyright (c) 2012 NVIDIA Corporation. All rights reserved. > * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights > * reserved. > + * Copyright (c) 2012 FUJITSU LIMITED. All rights reserved. > * $COPYRIGHT$ > * > * Additional copyrights may follow > @@ -97,36 +98,26 @@ > mca_pml_ob1_recv_request_t* request = > (mca_pml_ob1_recv_request_t*)ompi_request; > mca_pml_ob1_comm_t* comm = > request->req_recv.req_base.req_comm->c_pml_comm; > > -if( true == ompi_request->req_complete ) { /* way to late to cancel this > one */ > -/* > - * Receive request completed, make user buffer accessable. > - */ > -MEMCHECKER( > -memchecker_call(_memchecker_base_mem_defined, > -request->req_recv.req_base.req_addr, > -request->req_recv.req_base.req_count, > -request->req_recv.req_base.req_datatype); > -); > +if( true == request->req_match_received ) { /* way to late to cancel > this one */ > +assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); /* not > matched isn't it */ > return OMPI_SUCCESS; > } > > /* The rest should be protected behind the match logic lock */ > OPAL_THREAD_LOCK(>matching_lock); > -if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match > has not been already done */ > - if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) { > - opal_list_remove_item( >wild_receives, > (opal_list_item_t*)request ); > - } else { > - mca_pml_ob1_comm_proc_t* proc = comm->procs + > request->req_recv.req_base.req_peer; > - opal_list_remove_item(>specific_receives, > (opal_list_item_t*)request); > - } > - PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q, > -&(request->req_recv.req_base), PERUSE_RECV ); > - /** > -* As now the PML is done with this request we have to force the > pml_complete > -* to true. Otherwise, the request will never be freed. > -*/ > - request->req_recv.req_base.req_pml_complete = true; > +if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) { > +opal_list_remove_item( >wild_receives, > (opal_list_item_t*)request ); > +} else { > +mca_pml_ob1_comm_proc_t* proc = comm->procs + > request->req_recv.req_base.req_peer; > +opal_list_remove_item(>specific_receives, > (opal_list_item_t*)request); > } > +PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q, > + &(request->req_recv.req_base), PERUSE_RECV ); > +/** > + * As now the PML is done with this request we have to force the > pml_complete > + * to true. Otherwise, the request will never be freed. > + */ > +request->req_recv.req_base.req_pml_complete = true; > OPAL_THREAD_UNLOCK(>matching_lock); > > OPAL_THREAD_LOCK(_request_lock); > @@ -138,7 +129,7 @@ > MCA_PML_OB1_RECV_REQUEST_MPI_COMPLETE(request); > OPAL_THREAD_UNLOCK(_request_lock); > /* > - * Receive request cancelled, make user buffer accessable. > + * Receive request cancelled, make user buffer accessible. > */ > MEMCHECKER( > memchecker_call(_memchecker_base_mem_defined, > > On Jul 26, 2012, at 13:41 , Kawashima, Takahiro wrote: > > > Hi Open MPI developers, > > > > I found a small bug in Open MPI. > > > > See attached program cancelled.c. > > In this program, rank 1 tries to cancel a MPI_Irecv and calls a MPI_Recv > > instead if the cancellation succeeds.
Re: [OMPI devel] [OMPI svn] svn:open-mpi r26868 - in trunk/orte/mca/plm: base rsh
Sorry for delay - local network was down and I couldn't commit the one-line fix :-( Turns out that there was a bug in the rsh launcher (the daemons *always* declared a failed launch) that was previously being ignored and was now exposed, resulting in a possible race condition. Fixed now. Thanks Ralph On Jul 26, 2012, at 8:39 AM, TERRY DONTJE wrote: > Interestingly enough it worked for me for a while and then after many runs I > started seeing the below too. > > --td > > On 7/26/2012 11:07 AM, Ralph Castain wrote: >> >> Hmmm...it was working for me, but I'll recheck. Thanks! >> >> On Jul 26, 2012, at 8:04 AM, George Bosilca wrote: >> >>> r26868 seems to have some issues. It works well as long as all processes >>> are started on the same node (aka. there is a single daemon), but it breaks >>> with the error message attached below if there are more than two daemons. >>> >>> $ mpirun -np 2 --bynode ./runme >>> [node01:07767] [[21341,0],1] ORTE_ERROR_LOG: A message is attempting to be >>> sent to a process whose contact information is unknown in file >>> ../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c at line 362 >>> [node01:07767] [[21341,0],1] attempted to send to [[21341,0],2]: tag 15 >>> [node01:07767] [[21341,0],1] ORTE_ERROR_LOG: A message is attempting to be >>> sent to a process whose contact information is unknown in file >>> ../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c at line 157 >>> >>> I confirm that applying the reverted commit brings the trunk to a normal >>> state. >>> >>> Please - a tad more care in what gets committed?? >>> >>> george. >>> >>> >>> On Jul 25, 2012, at 23:46 , svn-commit-mai...@open-mpi.org wrote: >>> Author: rhc (Ralph Castain) Date: 2012-07-25 17:46:45 EDT (Wed, 25 Jul 2012) New Revision: 26868 URL: https://svn.open-mpi.org/trac/ompi/changeset/26868 Log: Reconnect the rsh/ssh error reporting code for remote spawns to report failure to launch. Ensure the HNP correctly reports non-zero exit status when ssh encounters a problem. Thanks to Terry for spotting it! Text files modified: trunk/orte/mca/plm/base/plm_base_launch_support.c |44 trunk/orte/mca/plm/base/plm_base_receive.c| 6 + trunk/orte/mca/plm/base/plm_private.h | 4 +++ trunk/orte/mca/plm/rsh/plm_rsh_module.c |18 +++- 4 files changed, 62 insertions(+), 10 deletions(-) Modified: trunk/orte/mca/plm/base/plm_base_launch_support.c == --- trunk/orte/mca/plm/base/plm_base_launch_support.c Wed Jul 25 12:32:51 2012(r26867) +++ trunk/orte/mca/plm/base/plm_base_launch_support.c 2012-07-25 17:46:45 EDT (Wed, 25 Jul 2012) (r26868) @@ -741,6 +741,50 @@ } +void orte_plm_base_daemon_failed(int st, orte_process_name_t* sender, + opal_buffer_t *buffer, + orte_rml_tag_t tag, void *cbdata) +{ +int status, rc; +int32_t n; +orte_vpid_t vpid; +orte_proc_t *daemon; + +/* get the daemon job, if necessary */ +if (NULL == jdatorted) { +jdatorted = orte_get_job_data_object(ORTE_PROC_MY_NAME->jobid); +} + +/* unpack the daemon that failed */ +n=1; +if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer, , , ORTE_VPID))) { +ORTE_ERROR_LOG(rc); +ORTE_UPDATE_EXIT_STATUS(ORTE_ERROR_DEFAULT_EXIT_CODE); +goto finish; +} + +/* unpack the exit status */ +n=1; +if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer, , , OPAL_INT))) { +ORTE_ERROR_LOG(rc); +status = ORTE_ERROR_DEFAULT_EXIT_CODE; +ORTE_UPDATE_EXIT_STATUS(ORTE_ERROR_DEFAULT_EXIT_CODE); +} else { +ORTE_UPDATE_EXIT_STATUS(WEXITSTATUS(status)); +} + +/* find the daemon and update its state/status */ +if (NULL == (daemon = (orte_proc_t*)opal_pointer_array_get_item(jdatorted->procs, vpid))) { +ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND); +goto finish; +} +daemon->state = ORTE_PROC_STATE_FAILED_TO_START; +daemon->exit_code = status; + + finish: +ORTE_ACTIVATE_PROC_STATE(>name, ORTE_PROC_STATE_FAILED_TO_START); +} + int orte_plm_base_setup_orted_cmd(int *argc, char ***argv) { int i, loc; Modified: trunk/orte/mca/plm/base/plm_base_receive.c
[OMPI devel] OpenMPI and SGE integration made more stable
It is a long-standing problem that due to a bug in Sun GridEngine (setting the stack size limit equal to the address space limit) using qrsh from within OpenMPI fails if a large memory is requested but the stack size not explicitly set to a reasonably small value. The best solution were if SGE just would not touch the stack size limit and leave it at INFINITY. However I have tested that just reducing the stack size limit in file orte/mca/plm/rsh/plm_rsh_module.c, function ssh_child() before execv'ing qrsh circumvents the problem, so just after exec_patch is set by strdup(...) I inserted the lines { struct rlimit rlim; int l; l=strlen(exec_path); if (l > 5 && !strcmp("/qrsh", exec_path + (l-5))) { getrlimit(RLIMIT_STACK, ); if (rlim.rlim_max > 1000L) rlim.rlim_max=1000L; if (rlim.rlim_cur > 1000L) rlim.rlim_cur=1000L; setrlimit(RLIMIT_STACK, ); } } It looks quick-and-dirty and it certainly is, but it solves a severe problem many users have with OpenMPI and SGE. Feel free to use this information as you like. Note that MPI worker jobs eventually spawned off on "distant" nodes do not suffer from the reduced stack size limit, it is only the qrsh command. Is this (still) of interest? +-+--+ | Prof. Christoph van Wüllen | Tele-Phone (+49) (0)631 205 2749 | | TU Kaiserslautern, FB Chemie| Tele-Fax (+49) (0)631 205 2750 | | Erwin-Schrödinger-Str. | | | D-67663 Kaiserslautern, Germany | vanwul...@chemie.uni-kl.de | || | HomePage: http://www.chemie.uni-kl.de/vanwullen | +-+--+
Re: [OMPI devel] [OMPI svn] svn:open-mpi r26868 - in trunk/orte/mca/plm: base rsh
Interestingly enough it worked for me for a while and then after many runs I started seeing the below too. --td On 7/26/2012 11:07 AM, Ralph Castain wrote: Hmmm...it was working for me, but I'll recheck. Thanks! On Jul 26, 2012, at 8:04 AM, George Bosilca wrote: r26868 seems to have some issues. It works well as long as all processes are started on the same node (aka. there is a single daemon), but it breaks with the error message attached below if there are more than two daemons. $ mpirun -np 2 --bynode ./runme [node01:07767] [[21341,0],1] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c at line 362 [node01:07767] [[21341,0],1] attempted to send to [[21341,0],2]: tag 15 [node01:07767] [[21341,0],1] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c at line 157 I confirm that applying the reverted commit brings the trunk to a normal state. Please - a tad more care in what gets committed?? george. On Jul 25, 2012, at 23:46 , svn-commit-mai...@open-mpi.org wrote: Author: rhc (Ralph Castain) Date: 2012-07-25 17:46:45 EDT (Wed, 25 Jul 2012) New Revision: 26868 URL: https://svn.open-mpi.org/trac/ompi/changeset/26868 Log: Reconnect the rsh/ssh error reporting code for remote spawns to report failure to launch. Ensure the HNP correctly reports non-zero exit status when ssh encounters a problem. Thanks to Terry for spotting it! Text files modified: trunk/orte/mca/plm/base/plm_base_launch_support.c |44 trunk/orte/mca/plm/base/plm_base_receive.c| 6 + trunk/orte/mca/plm/base/plm_private.h | 4 +++ trunk/orte/mca/plm/rsh/plm_rsh_module.c |18 +++- 4 files changed, 62 insertions(+), 10 deletions(-) Modified: trunk/orte/mca/plm/base/plm_base_launch_support.c == --- trunk/orte/mca/plm/base/plm_base_launch_support.c Wed Jul 25 12:32:51 2012(r26867) +++ trunk/orte/mca/plm/base/plm_base_launch_support.c 2012-07-25 17:46:45 EDT (Wed, 25 Jul 2012) (r26868) @@ -741,6 +741,50 @@ } +void orte_plm_base_daemon_failed(int st, orte_process_name_t* sender, + opal_buffer_t *buffer, + orte_rml_tag_t tag, void *cbdata) +{ +int status, rc; +int32_t n; +orte_vpid_t vpid; +orte_proc_t *daemon; + +/* get the daemon job, if necessary */ +if (NULL == jdatorted) { +jdatorted = orte_get_job_data_object(ORTE_PROC_MY_NAME->jobid); +} + +/* unpack the daemon that failed */ +n=1; +if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer,,, ORTE_VPID))) { +ORTE_ERROR_LOG(rc); +ORTE_UPDATE_EXIT_STATUS(ORTE_ERROR_DEFAULT_EXIT_CODE); +goto finish; +} + +/* unpack the exit status */ +n=1; +if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer,,, OPAL_INT))) { +ORTE_ERROR_LOG(rc); +status = ORTE_ERROR_DEFAULT_EXIT_CODE; +ORTE_UPDATE_EXIT_STATUS(ORTE_ERROR_DEFAULT_EXIT_CODE); +} else { +ORTE_UPDATE_EXIT_STATUS(WEXITSTATUS(status)); +} + +/* find the daemon and update its state/status */ +if (NULL == (daemon = (orte_proc_t*)opal_pointer_array_get_item(jdatorted->procs, vpid))) { +ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND); +goto finish; +} +daemon->state = ORTE_PROC_STATE_FAILED_TO_START; +daemon->exit_code = status; + + finish: +ORTE_ACTIVATE_PROC_STATE(>name, ORTE_PROC_STATE_FAILED_TO_START); +} + int orte_plm_base_setup_orted_cmd(int *argc, char ***argv) { int i, loc; Modified: trunk/orte/mca/plm/base/plm_base_receive.c == --- trunk/orte/mca/plm/base/plm_base_receive.c Wed Jul 25 12:32:51 2012 (r26867) +++ trunk/orte/mca/plm/base/plm_base_receive.c 2012-07-25 17:46:45 EDT (Wed, 25 Jul 2012) (r26868) @@ -87,6 +87,12 @@ orte_plm_base_daemon_callback, NULL))) { ORTE_ERROR_LOG(rc); } +if (ORTE_SUCCESS != (rc = orte_rml.recv_buffer_nb(ORTE_NAME_WILDCARD, + ORTE_RML_TAG_REPORT_REMOTE_LAUNCH, + ORTE_RML_PERSISTENT, + orte_plm_base_daemon_failed, NULL))) { +ORTE_ERROR_LOG(rc); +} } recv_issued = true; Modified: trunk/orte/mca/plm/base/plm_private.h == --- trunk/orte/mca/plm/base/plm_private.h Wed Jul 25 12:32:51 2012
Re: [OMPI devel] [OMPI svn] svn:open-mpi r26868 - in trunk/orte/mca/plm: base rsh
r26868 seems to have some issues. It works well as long as all processes are started on the same node (aka. there is a single daemon), but it breaks with the error message attached below if there are more than two daemons. $ mpirun -np 2 --bynode ./runme [node01:07767] [[21341,0],1] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c at line 362 [node01:07767] [[21341,0],1] attempted to send to [[21341,0],2]: tag 15 [node01:07767] [[21341,0],1] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c at line 157 I confirm that applying the reverted commit brings the trunk to a normal state. Please - a tad more care in what gets committed?? george. On Jul 25, 2012, at 23:46 , svn-commit-mai...@open-mpi.org wrote: > Author: rhc (Ralph Castain) > Date: 2012-07-25 17:46:45 EDT (Wed, 25 Jul 2012) > New Revision: 26868 > URL: https://svn.open-mpi.org/trac/ompi/changeset/26868 > > Log: > Reconnect the rsh/ssh error reporting code for remote spawns to report > failure to launch. Ensure the HNP correctly reports non-zero exit status when > ssh encounters a problem. > > Thanks to Terry for spotting it! > > Text files modified: > trunk/orte/mca/plm/base/plm_base_launch_support.c |44 > > trunk/orte/mca/plm/base/plm_base_receive.c| 6 + > > trunk/orte/mca/plm/base/plm_private.h | 4 +++ > > trunk/orte/mca/plm/rsh/plm_rsh_module.c |18 +++- > > 4 files changed, 62 insertions(+), 10 deletions(-) > > Modified: trunk/orte/mca/plm/base/plm_base_launch_support.c > == > --- trunk/orte/mca/plm/base/plm_base_launch_support.c Wed Jul 25 12:32:51 > 2012(r26867) > +++ trunk/orte/mca/plm/base/plm_base_launch_support.c 2012-07-25 17:46:45 EDT > (Wed, 25 Jul 2012) (r26868) > @@ -741,6 +741,50 @@ > > } > > +void orte_plm_base_daemon_failed(int st, orte_process_name_t* sender, > + opal_buffer_t *buffer, > + orte_rml_tag_t tag, void *cbdata) > +{ > +int status, rc; > +int32_t n; > +orte_vpid_t vpid; > +orte_proc_t *daemon; > + > +/* get the daemon job, if necessary */ > +if (NULL == jdatorted) { > +jdatorted = orte_get_job_data_object(ORTE_PROC_MY_NAME->jobid); > +} > + > +/* unpack the daemon that failed */ > +n=1; > +if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer, , , > ORTE_VPID))) { > +ORTE_ERROR_LOG(rc); > +ORTE_UPDATE_EXIT_STATUS(ORTE_ERROR_DEFAULT_EXIT_CODE); > +goto finish; > +} > + > +/* unpack the exit status */ > +n=1; > +if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer, , , > OPAL_INT))) { > +ORTE_ERROR_LOG(rc); > +status = ORTE_ERROR_DEFAULT_EXIT_CODE; > +ORTE_UPDATE_EXIT_STATUS(ORTE_ERROR_DEFAULT_EXIT_CODE); > +} else { > +ORTE_UPDATE_EXIT_STATUS(WEXITSTATUS(status)); > +} > + > +/* find the daemon and update its state/status */ > +if (NULL == (daemon = > (orte_proc_t*)opal_pointer_array_get_item(jdatorted->procs, vpid))) { > +ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND); > +goto finish; > +} > +daemon->state = ORTE_PROC_STATE_FAILED_TO_START; > +daemon->exit_code = status; > + > + finish: > +ORTE_ACTIVATE_PROC_STATE(>name, ORTE_PROC_STATE_FAILED_TO_START); > +} > + > int orte_plm_base_setup_orted_cmd(int *argc, char ***argv) > { > int i, loc; > > Modified: trunk/orte/mca/plm/base/plm_base_receive.c > == > --- trunk/orte/mca/plm/base/plm_base_receive.cWed Jul 25 12:32:51 > 2012(r26867) > +++ trunk/orte/mca/plm/base/plm_base_receive.c2012-07-25 17:46:45 EDT > (Wed, 25 Jul 2012) (r26868) > @@ -87,6 +87,12 @@ > > orte_plm_base_daemon_callback, NULL))) { > ORTE_ERROR_LOG(rc); > } > +if (ORTE_SUCCESS != (rc = orte_rml.recv_buffer_nb(ORTE_NAME_WILDCARD, > + > ORTE_RML_TAG_REPORT_REMOTE_LAUNCH, > + > ORTE_RML_PERSISTENT, > + > orte_plm_base_daemon_failed, NULL))) { > +ORTE_ERROR_LOG(rc); > +} > } > recv_issued = true; > > > Modified: trunk/orte/mca/plm/base/plm_private.h > == > ---
Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag
OK, so this is only for receive, and not for send, I take it. Should have looked closer. Rich -Original Message- From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca Sent: Thursday, July 26, 2012 10:47 AM To: Open MPI Developers Subject: Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag Rich, There is no matching in this case. Canceling a receive operation is possible only up to the moment the request has been matched. Up to this point the sequence numbers of the peers are not used, so removing a non-matched request has no impact on the sequence number. george. On Jul 26, 2012, at 16:31 , Richard Graham wrote: > I do not see any resetting of sequence numbers. It has been a long time > since I have looked at the matching code, so don't know if the out-of-order > handling has been taken out. If not, the sequence number has to be dealt > with in some manner, or else there will be a gap in the arriving sequence > numbers, and the matching logic will prevent any further progress. > > Rich > > -Original Message- > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] > On Behalf Of George Bosilca > Sent: Thursday, July 26, 2012 10:06 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] [patch] MPI_Cancel should not cancel a > request if it has a matched recv frag > > Takahiro, > > Indeed we were way to lax on canceling the requests. I modified your patch to > correctly deal with the MEMCHECK macro (remove the call from the branch that > will requires a completion function). The modified patch is attached below. I > will commit asap. > > Thanks, >george. > > > Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c > === > --- ompi/mca/pml/ob1/pml_ob1_recvreq.c(revision 26870) > +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c(working copy) > @@ -3,7 +3,7 @@ > * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana > * University Research and Technology > * Corporation. All rights reserved. > - * Copyright (c) 2004-2009 The University of Tennessee and The > University > + * Copyright (c) 2004-2012 The University of Tennessee and The > + University > * of Tennessee Research Foundation. All rights > * reserved. > * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, @@ > -15,6 +15,7 @@ > * Copyright (c) 2012 NVIDIA Corporation. All rights reserved. > * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights > * reserved. > + * Copyright (c) 2012 FUJITSU LIMITED. All rights reserved. > * $COPYRIGHT$ > * > * Additional copyrights may follow > @@ -97,36 +98,26 @@ > mca_pml_ob1_recv_request_t* request = > (mca_pml_ob1_recv_request_t*)ompi_request; > mca_pml_ob1_comm_t* comm = > request->req_recv.req_base.req_comm->c_pml_comm; > > -if( true == ompi_request->req_complete ) { /* way to late to cancel this > one */ > -/* > - * Receive request completed, make user buffer accessable. > - */ > -MEMCHECKER( > -memchecker_call(_memchecker_base_mem_defined, > -request->req_recv.req_base.req_addr, > -request->req_recv.req_base.req_count, > -request->req_recv.req_base.req_datatype); > -); > +if( true == request->req_match_received ) { /* way to late to cancel > this one */ > +assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); > + /* not matched isn't it */ > return OMPI_SUCCESS; > } > > /* The rest should be protected behind the match logic lock */ > OPAL_THREAD_LOCK(>matching_lock); > -if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match > has not been already done */ > - if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) { > - opal_list_remove_item( >wild_receives, > (opal_list_item_t*)request ); > - } else { > - mca_pml_ob1_comm_proc_t* proc = comm->procs + > request->req_recv.req_base.req_peer; > - opal_list_remove_item(>specific_receives, > (opal_list_item_t*)request); > - } > - PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q, > -&(request->req_recv.req_base), PERUSE_RECV ); > - /** > -* As now the PML is done with this request we have to force the > pml_complete > -* to true. Otherwise, the request will never be freed. > -*/ > - request->req_recv.req_base.req_pml_complete = true; > +if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) { > +opal_list_remove_item( >wild_receives, > (opal_list_item_t*)request ); > +} else { > +
Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag
Rich, There is no matching in this case. Canceling a receive operation is possible only up to the moment the request has been matched. Up to this point the sequence numbers of the peers are not used, so removing a non-matched request has no impact on the sequence number. george. On Jul 26, 2012, at 16:31 , Richard Graham wrote: > I do not see any resetting of sequence numbers. It has been a long time > since I have looked at the matching code, so don't know if the out-of-order > handling has been taken out. If not, the sequence number has to be dealt > with in some manner, or else there will be a gap in the arriving sequence > numbers, and the matching logic will prevent any further progress. > > Rich > > -Original Message- > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On > Behalf Of George Bosilca > Sent: Thursday, July 26, 2012 10:06 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if > it has a matched recv frag > > Takahiro, > > Indeed we were way to lax on canceling the requests. I modified your patch to > correctly deal with the MEMCHECK macro (remove the call from the branch that > will requires a completion function). The modified patch is attached below. I > will commit asap. > > Thanks, >george. > > > Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c > === > --- ompi/mca/pml/ob1/pml_ob1_recvreq.c(revision 26870) > +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c(working copy) > @@ -3,7 +3,7 @@ > * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana > * University Research and Technology > * Corporation. All rights reserved. > - * Copyright (c) 2004-2009 The University of Tennessee and The University > + * Copyright (c) 2004-2012 The University of Tennessee and The > + University > * of Tennessee Research Foundation. All rights > * reserved. > * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, @@ > -15,6 +15,7 @@ > * Copyright (c) 2012 NVIDIA Corporation. All rights reserved. > * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights > * reserved. > + * Copyright (c) 2012 FUJITSU LIMITED. All rights reserved. > * $COPYRIGHT$ > * > * Additional copyrights may follow > @@ -97,36 +98,26 @@ > mca_pml_ob1_recv_request_t* request = > (mca_pml_ob1_recv_request_t*)ompi_request; > mca_pml_ob1_comm_t* comm = > request->req_recv.req_base.req_comm->c_pml_comm; > > -if( true == ompi_request->req_complete ) { /* way to late to cancel this > one */ > -/* > - * Receive request completed, make user buffer accessable. > - */ > -MEMCHECKER( > -memchecker_call(_memchecker_base_mem_defined, > -request->req_recv.req_base.req_addr, > -request->req_recv.req_base.req_count, > -request->req_recv.req_base.req_datatype); > -); > +if( true == request->req_match_received ) { /* way to late to cancel > this one */ > +assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); /* > + not matched isn't it */ > return OMPI_SUCCESS; > } > > /* The rest should be protected behind the match logic lock */ > OPAL_THREAD_LOCK(>matching_lock); > -if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match > has not been already done */ > - if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) { > - opal_list_remove_item( >wild_receives, > (opal_list_item_t*)request ); > - } else { > - mca_pml_ob1_comm_proc_t* proc = comm->procs + > request->req_recv.req_base.req_peer; > - opal_list_remove_item(>specific_receives, > (opal_list_item_t*)request); > - } > - PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q, > -&(request->req_recv.req_base), PERUSE_RECV ); > - /** > -* As now the PML is done with this request we have to force the > pml_complete > -* to true. Otherwise, the request will never be freed. > -*/ > - request->req_recv.req_base.req_pml_complete = true; > +if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) { > +opal_list_remove_item( >wild_receives, > (opal_list_item_t*)request ); > +} else { > +mca_pml_ob1_comm_proc_t* proc = comm->procs + > request->req_recv.req_base.req_peer; > +opal_list_remove_item(>specific_receives, > + (opal_list_item_t*)request); > } > +PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q, > + &(request->req_recv.req_base), PERUSE_RECV ); > +/** > + * As now the PML is done with this request we
Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag
I do not see any resetting of sequence numbers. It has been a long time since I have looked at the matching code, so don't know if the out-of-order handling has been taken out. If not, the sequence number has to be dealt with in some manner, or else there will be a gap in the arriving sequence numbers, and the matching logic will prevent any further progress. Rich -Original Message- From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca Sent: Thursday, July 26, 2012 10:06 AM To: Open MPI Developers Subject: Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag Takahiro, Indeed we were way to lax on canceling the requests. I modified your patch to correctly deal with the MEMCHECK macro (remove the call from the branch that will requires a completion function). The modified patch is attached below. I will commit asap. Thanks, george. Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c === --- ompi/mca/pml/ob1/pml_ob1_recvreq.c (revision 26870) +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c (working copy) @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2009 The University of Tennessee and The University + * Copyright (c) 2004-2012 The University of Tennessee and The + University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, @@ -15,6 +15,7 @@ * Copyright (c) 2012 NVIDIA Corporation. All rights reserved. * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2012 FUJITSU LIMITED. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -97,36 +98,26 @@ mca_pml_ob1_recv_request_t* request = (mca_pml_ob1_recv_request_t*)ompi_request; mca_pml_ob1_comm_t* comm = request->req_recv.req_base.req_comm->c_pml_comm; -if( true == ompi_request->req_complete ) { /* way to late to cancel this one */ -/* - * Receive request completed, make user buffer accessable. - */ -MEMCHECKER( -memchecker_call(_memchecker_base_mem_defined, -request->req_recv.req_base.req_addr, -request->req_recv.req_base.req_count, -request->req_recv.req_base.req_datatype); -); +if( true == request->req_match_received ) { /* way to late to cancel this one */ +assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); /* + not matched isn't it */ return OMPI_SUCCESS; } /* The rest should be protected behind the match logic lock */ OPAL_THREAD_LOCK(>matching_lock); -if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match has not been already done */ - if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) { - opal_list_remove_item( >wild_receives, (opal_list_item_t*)request ); - } else { - mca_pml_ob1_comm_proc_t* proc = comm->procs + request->req_recv.req_base.req_peer; - opal_list_remove_item(>specific_receives, (opal_list_item_t*)request); - } - PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q, -&(request->req_recv.req_base), PERUSE_RECV ); - /** -* As now the PML is done with this request we have to force the pml_complete -* to true. Otherwise, the request will never be freed. -*/ - request->req_recv.req_base.req_pml_complete = true; +if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) { +opal_list_remove_item( >wild_receives, (opal_list_item_t*)request ); +} else { +mca_pml_ob1_comm_proc_t* proc = comm->procs + request->req_recv.req_base.req_peer; +opal_list_remove_item(>specific_receives, + (opal_list_item_t*)request); } +PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q, + &(request->req_recv.req_base), PERUSE_RECV ); +/** + * As now the PML is done with this request we have to force the pml_complete + * to true. Otherwise, the request will never be freed. + */ +request->req_recv.req_base.req_pml_complete = true; OPAL_THREAD_UNLOCK(>matching_lock); OPAL_THREAD_LOCK(_request_lock); @@ -138,7 +129,7 @@ MCA_PML_OB1_RECV_REQUEST_MPI_COMPLETE(request); OPAL_THREAD_UNLOCK(_request_lock); /* - * Receive request cancelled, make user buffer accessable. + * Receive request cancelled, make user buffer accessible. */ MEMCHECKER(
Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag
Takahiro, Indeed we were way to lax on canceling the requests. I modified your patch to correctly deal with the MEMCHECK macro (remove the call from the branch that will requires a completion function). The modified patch is attached below. I will commit asap. Thanks, george. Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c === --- ompi/mca/pml/ob1/pml_ob1_recvreq.c (revision 26870) +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c (working copy) @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2009 The University of Tennessee and The University + * Copyright (c) 2004-2012 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, @@ -15,6 +15,7 @@ * Copyright (c) 2012 NVIDIA Corporation. All rights reserved. * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2012 FUJITSU LIMITED. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -97,36 +98,26 @@ mca_pml_ob1_recv_request_t* request = (mca_pml_ob1_recv_request_t*)ompi_request; mca_pml_ob1_comm_t* comm = request->req_recv.req_base.req_comm->c_pml_comm; -if( true == ompi_request->req_complete ) { /* way to late to cancel this one */ -/* - * Receive request completed, make user buffer accessable. - */ -MEMCHECKER( -memchecker_call(_memchecker_base_mem_defined, -request->req_recv.req_base.req_addr, -request->req_recv.req_base.req_count, -request->req_recv.req_base.req_datatype); -); +if( true == request->req_match_received ) { /* way to late to cancel this one */ +assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); /* not matched isn't it */ return OMPI_SUCCESS; } /* The rest should be protected behind the match logic lock */ OPAL_THREAD_LOCK(>matching_lock); -if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match has not been already done */ - if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) { - opal_list_remove_item( >wild_receives, (opal_list_item_t*)request ); - } else { - mca_pml_ob1_comm_proc_t* proc = comm->procs + request->req_recv.req_base.req_peer; - opal_list_remove_item(>specific_receives, (opal_list_item_t*)request); - } - PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q, -&(request->req_recv.req_base), PERUSE_RECV ); - /** -* As now the PML is done with this request we have to force the pml_complete -* to true. Otherwise, the request will never be freed. -*/ - request->req_recv.req_base.req_pml_complete = true; +if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) { +opal_list_remove_item( >wild_receives, (opal_list_item_t*)request ); +} else { +mca_pml_ob1_comm_proc_t* proc = comm->procs + request->req_recv.req_base.req_peer; +opal_list_remove_item(>specific_receives, (opal_list_item_t*)request); } +PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q, + &(request->req_recv.req_base), PERUSE_RECV ); +/** + * As now the PML is done with this request we have to force the pml_complete + * to true. Otherwise, the request will never be freed. + */ +request->req_recv.req_base.req_pml_complete = true; OPAL_THREAD_UNLOCK(>matching_lock); OPAL_THREAD_LOCK(_request_lock); @@ -138,7 +129,7 @@ MCA_PML_OB1_RECV_REQUEST_MPI_COMPLETE(request); OPAL_THREAD_UNLOCK(_request_lock); /* - * Receive request cancelled, make user buffer accessable. + * Receive request cancelled, make user buffer accessible. */ MEMCHECKER( memchecker_call(_memchecker_base_mem_defined, On Jul 26, 2012, at 13:41 , Kawashima, Takahiro wrote: > Hi Open MPI developers, > > I found a small bug in Open MPI. > > See attached program cancelled.c. > In this program, rank 1 tries to cancel a MPI_Irecv and calls a MPI_Recv > instead if the cancellation succeeds. This program should terminate whether > the cancellation succeeds or not. But it leads a deadlock in MPI_Recv after > printing "MPI_Test_cancelled: 1". > I confirmed it works fine with MPICH2. > > The problem is in mca_pml_ob1_recv_request_cancel function in > ompi/mca/pml/ob1/pml_ob1_recvreq.c. It accepts the cancellation unless > the request has been completed. I think it
[OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag
Hi Open MPI developers, I found a small bug in Open MPI. See attached program cancelled.c. In this program, rank 1 tries to cancel a MPI_Irecv and calls a MPI_Recv instead if the cancellation succeeds. This program should terminate whether the cancellation succeeds or not. But it leads a deadlock in MPI_Recv after printing "MPI_Test_cancelled: 1". I confirmed it works fine with MPICH2. The problem is in mca_pml_ob1_recv_request_cancel function in ompi/mca/pml/ob1/pml_ob1_recvreq.c. It accepts the cancellation unless the request has been completed. I think it should not accept the cancellation if the request has been matched. If it want to accept the cancellation, it must push the recv frag to the unexpected message queue back and redo matching. Furthermore, the receive buffer must be reverted if the received message has been written to the receive buffer partially in a pipeline protocol. Attached patch cancel-recv.patch is a sample fix for this bug for Open MPI trunk. Though this patch has 65 lines, main modifications are adding one if-statement and deleting one if-statement. Other lines are just for indent alignment. I cannot confirm the MEMCHECKER part is correct. Could anyone review it before committing? Regards, Takahiro Kawashima, MPI development team, Fujitsu #include #include #include /* rendezvous */ #define BUFSIZE1 (1024*1024) /* eager */ #define BUFSIZE2 (8) int main(int argc, char *argv[]) { int myrank, cancelled; void *buf1, *buf2; MPI_Request request; MPI_Status status; MPI_Init(, ); MPI_Comm_rank(MPI_COMM_WORLD, ); buf1 = malloc(BUFSIZE1); buf2 = malloc(BUFSIZE2); if (myrank == 0) { MPI_Isend(buf1, BUFSIZE1, MPI_BYTE, 1, 0, MPI_COMM_WORLD, ); MPI_Send(buf2, BUFSIZE2, MPI_BYTE, 1, 1, MPI_COMM_WORLD); MPI_Wait(, ); } else if (myrank == 1) { MPI_Irecv(buf1, BUFSIZE1, MPI_BYTE, 0, 0, MPI_COMM_WORLD, ); MPI_Recv(buf2, BUFSIZE2, MPI_BYTE, 0, 1, MPI_COMM_WORLD, ); MPI_Cancel(); MPI_Wait(, ); MPI_Test_cancelled(, ); printf("MPI_Test_cancelled: %d\n", cancelled); fflush(stdout); if (cancelled) { MPI_Recv(buf1, BUFSIZE1, MPI_BYTE, 0, 0, MPI_COMM_WORLD, ); } } MPI_Finalize(); free(buf1); free(buf2); return 0; } Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c === --- ompi/mca/pml/ob1/pml_ob1_recvreq.c (revision 26836) +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c (working copy) @@ -97,36 +97,36 @@ mca_pml_ob1_recv_request_t* request = (mca_pml_ob1_recv_request_t*)ompi_request; mca_pml_ob1_comm_t* comm = request->req_recv.req_base.req_comm->c_pml_comm; -if( true == ompi_request->req_complete ) { /* way to late to cancel this one */ -/* - * Receive request completed, make user buffer accessable. - */ -MEMCHECKER( -memchecker_call(_memchecker_base_mem_defined, -request->req_recv.req_base.req_addr, -request->req_recv.req_base.req_count, -request->req_recv.req_base.req_datatype); -); +if( true == request->req_match_received ) { /* way to late to cancel this one */ +if( true == ompi_request->req_complete ) { +/* + * Receive request completed, make user buffer accessable. + */ +MEMCHECKER( +memchecker_call(_memchecker_base_mem_defined, +request->req_recv.req_base.req_addr, +request->req_recv.req_base.req_count, +request->req_recv.req_base.req_datatype); +); +} return OMPI_SUCCESS; } /* The rest should be protected behind the match logic lock */ OPAL_THREAD_LOCK(>matching_lock); -if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match has not been already done */ - if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) { - opal_list_remove_item( >wild_receives, (opal_list_item_t*)request ); - } else { - mca_pml_ob1_comm_proc_t* proc = comm->procs + request->req_recv.req_base.req_peer; - opal_list_remove_item(>specific_receives, (opal_list_item_t*)request); - } - PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q, -&(request->req_recv.req_base), PERUSE_RECV ); - /** -* As now the PML is done with this request we have to force the pml_complete -* to true. Otherwise, the request will never be freed. -*/ - request->req_recv.req_base.req_pml_complete = true; +if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) { + opal_list_remove_item( >wild_receives, (opal_list_item_t*)request ); +} else { +