Summary:cpsv: prevented multiple write request with same checkpointHandle at cpnd [#1467] Review request for Trac Ticket(s): #1467 Peer Reviewer(s): Nagu Pull request to: <<LIST THE PERSON WITH PUSH ACCESS HERE>> Affected branch(es): all Development branch: default
-------------------------------- Impacted area Impact y/n -------------------------------- Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services n OpenSAF services y Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): --------------------------------------------- changeset 5a4673f68b029b9e067faaf16080e0533115d541 Author: A V Mahesh <[email protected]> Date: Mon, 31 Aug 2015 09:48:55 +0530 cpsv: prevented multiple write request with same checkpointHandle at cpnd [#1467] Bug Analysis : checkpointHandle - is A pointer to the checkpoint handle, allocated in the address space of the invoking process (CPA/application) . CPA stores into the memory area of CPA/application/process uses to access the checkpoint in subsequent invocations of the functions of the Checkpoint Service API. In the case of saCkptCheckpointOpenAsync() , saCkptCheckpointWrite() , this handle is returned in the corresponding response message. Eevn though saCkptCheckpointWrite() is Sync request checkpointHandle is used by CPND for tracking CPND<--->CPND messaging invoking activity on request of saCkptCheckpointWrite() . If the ckpoint is SA_CKPT_CHECKPOINT_COLLOCATED & SA_CKPT_WR_ALL_REPLICAS checkpoint , and the checkpoint is opened on multiple nodes , and saCkptCheckpointWrite() are beeing requested by multiple CPAs from same Node , then the local CPND to update the all other CPNDS , whic has opened the same Ckpt, to track the pending invocations/event a teperory cpnd_evt entry added with key checkpointHandle and after a successful write responserevived from all the CPND's , using this cpnd_evt entry the local CPND will response back to CPA/application with result, and then deleted cpnd_evt entry from local CPND ( temporary only for tracking response message of peer CPND ) In current code CPA is using malloc() return value as checkpointHandle for unique reference key where malloc() returns virtual memory specific to processes, so malloc() can return the same pointer value in separate CPA processes . If we are running two ckpt app (2 processes) try to write data into two difference/same checkpoints , it is possible that the checkpointHandle is being passed as same from both checkpoint application processes , as same checkpointHandle is being shared to CPND by both CPA as key reference, the CPND can miss behave because of ambiguous same reference key. Solution : As malloc() is standred call and the Checkpoint Service API Specification says checkpointHandle should be pointer allocated in the address space of the invoking process/CPA , CPND will return try again before the saCkptCheckpointWrite() call , if multiple saCkptCheckpointWrite() request come to CPND with SAME checkpointHandle ( same virtual memory address) from different CPA's at the same time. Complete diffstat: ------------------ osaf/services/saf/cpsv/cpnd/cpnd_evt.c | 18 +++++++++++++++--- osaf/services/saf/cpsv/cpnd/cpnd_proc.c | 3 ++- 2 files changed, 17 insertions(+), 4 deletions(-) Testing Commands: ----------------- Steps to reproduce and Vitrification of patch : ------------------------------------------------------------ Step -1 : - Apply patch - Add sleep(30) in cpnd_evt_proc_nd2nd_ckpt_active_data_access_req() function of file `osaf/services/saf/cpsv/cpnd/cpnd_evt.c` at line no :3037 And build, install and bringup Opensaf on all 4 nodes SC-1, SC-2, PL-3, SC-4 ). ==================================================================================================================== static uint32_t cpnd_evt_proc_nd2nd_ckpt_active_data_access_req(CPND_CB *cb, CPND_EVT *evt, CPSV_SEND_INFO *sinfo) { sleep(30); uint32_t rc = NCSCC_RC_SUCCESS; ==================================================================================================================== Step -2 Build Cpsv `test_3opens_app_A.c & test_3opens_app_B.c` application on all 4 nodes SC-1, SC-2, PL-3, SC-4 ( attached to Ticket) ==================================================== SC-1:# gcc test_3opens_app_A.c -o node_A -lSaCkpt; SC-1:# gcc test_3opens_app_B.c -o node_B -lSaCkpt; ==================================================== Step -3 : Bring up Opensaf on all 4 nodes SC-1, SC-2, PL-3, SC-4 Step -4 : Run checkpoint application ./node_A In all 4 nodes SC-1, SC-2, PL-3, PL-4 and don`t Press <Enter> key. ==================================================== SC-1:# ./node_A 0 saCkptCheckpointOpen returned checkpointHandle 626e60 1 saCkptCheckpointOpen returned checkpointHandle 626fe0 2 saCkptCheckpointOpen returned checkpointHandle 627270 3 saCkptCheckpointOpen returned checkpointHandle 6273f0 4 saCkptCheckpointOpen returned checkpointHandle 627570 CPSV:CPA:ONsaCkptCheckpointWrite Waiting to Read from Checkpoint .... saCkptCheckpointWrite Press <Enter> key to continue... ==================================================== Step -5 :Run checkpoint application ./node_B only on 2 nodes SC-1 & SC-2 and don`t Press <Enter> key. ==================================================== SC-1:# ./node_B 0 saCkptCheckpointOpen returned checkpointHandle 626e60 1 saCkptCheckpointOpen returned checkpointHandle 626fe0 2 saCkptCheckpointOpen returned checkpointHandle 627270 3 saCkptCheckpointOpen returned checkpointHandle 6273f0 4 saCkptCheckpointOpen returned checkpointHandle 627570 CPSV:CPA:ONsaCkptCheckpointWrite Waiting to Read from Checkpoint .... saCkptCheckpointWrite Press <Enter> key to continue... ==================================================== Step -6 : Press <Enter> key for ./node_A ./node_B application quickly to write simultaneously on SC-1 only, then for `node_B` checkpoint application you will get try-again for saCkptCheckpointWrite(). (Note : With out patch you will see SR reported issue) ==================================================== SC-1: # ./node_A 0 saCkptCheckpointOpen returned checkpointHandle 626e60 1 saCkptCheckpointOpen returned checkpointHandle 626fe0 2 saCkptCheckpointOpen returned checkpointHandle 627270 3 saCkptCheckpointOpen returned checkpointHandle 6273f0 4 saCkptCheckpointOpen returned checkpointHandle 627570 CPSV:CPA:ONsaCkptCheckpointWrite Waiting to Read from Checkpoint .... saCkptCheckpointWrite Press <Enter> key to continue... 1 saCkptCheckpointWrite checkpointHandle 626e60 2 saCkptCheckpointWrite checkpointHandle 626e60 3 saCkptCheckpointWrite checkpointHandle 626e60 4 saCkptCheckpointWrite checkpointHandle 626e60 222 saCkptCheckpointWrite checkpointHandle 626e60 saCkptCheckpointRead Waiting to Read from Checkpoint .... saCkptCheckpointRead Press <Enter> key to continue... SC-1:# ./node_B 0 saCkptCheckpointOpen returned checkpointHandle 626e60 1 saCkptCheckpointOpen returned checkpointHandle 626fe0 2 saCkptCheckpointOpen returned checkpointHandle 627270 3 saCkptCheckpointOpen returned checkpointHandle 6273f0 4 saCkptCheckpointOpen returned checkpointHandle 627570 CPSV:CPA:ONsaCkptCheckpointWrite Waiting to Read from Checkpoint .... saCkptCheckpointWrite Press <Enter> key to continue... 1 saCkptCheckpointWrite checkpointHandle 626e60 Attempt 1-0: saCkptCheckpointWrite returned 6. ==================================================== Testing, Expected Results: -------------------------- Conditions of Submission: ------------------------- <<HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC>> Arch Built Started Linux distro ------------------------------------------- mips n n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: ------------------- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
