Hi Alex, On 10/1/2015 10:34 PM, Alex Jones wrote: > When a collocated checkpoint replica is opened, and the active replica > has > large numbers of sections (~200k),
can you please share the size of each sections . -AVM On 10/5/2015 9:31 AM, A V Mahesh wrote: > Hi Alex, > > If you have ready to use test application can you please attach. > > -AVM > > On 10/1/2015 10:34 PM, Alex Jones wrote: >> Summary: CKPT: fix crash in cpnd when opening replica times out [#1510] >> Review request for Trac Ticket(s): 1510 >> Peer Reviewer(s): AVM >> Pull request to: AVM >> Affected branch(es): default, 4.7, 4.6, 4.5 >> Development branch: <<IF ANY GIVE THE REPO URL>> >> >> -------------------------------- >> Impacted area Impact y/n >> -------------------------------- >> Docs n >> Build system n >> RPM/packaging n >> Configuration files n >> Startup scripts n >> SAF services y >> OpenSAF services n >> Core libraries n >> Samples n >> Tests n >> Other n >> >> >> Comments (indicate scope for each "y" above): >> --------------------------------------------- >> <<EXPLAIN/COMMENT THE PATCH SERIES HERE>> >> >> changeset 923566e6c96312c15330b4e8ed0c81a80a2701f0 >> Author: Alex Jones <[email protected]> >> Date: Thu, 01 Oct 2015 12:56:53 -0400 >> >> ckptnd: fix crash when checkpoint open sync to active times out >> [#1510] >> >> ckptnd core dumps with many different stack traces >> >> When a collocated checkpoint replica is opened, and the active >> replica has >> large numbers of sections (~200k), the sync from the active to >> the replica >> can timeout. If the MDS sync succeeds, but the error code in the >> out_evt is >> not SA_AIS_OK, the current code jumps to the >> ckpt_shm_node_free_error label. >> The code under this label assumes that the node was not >> successfully created >> in the database, so doesn't remove it. But in this case it was >> created. The >> node memory is freed, but the node is not removed from the >> database. The >> next time this checkpoint is accessed, cpnd will access freed >> memory and >> crash. >> >> Set a flag after the node has been added to the database. And in the >> ckpt_node_free_error label, remove the node from the database if >> it was >> added. >> >> >> Complete diffstat: >> ------------------ >> osaf/services/saf/cpsv/cpnd/cpnd_evt.c | 10 ++++++++++ >> 1 files changed, 10 insertions(+), 0 deletions(-) >> >> >> Testing Commands: >> ----------------- >> 1) create a collocated checkpoint with 200k sections, and continue >> updating the >> sections >> 2) open the same checkpoint on another node (this creates a replica) >> >> >> Testing, Expected Results: >> -------------------------- >> 1) cpnd on the replica node should not crash, and sync should succeed >> >> >> Conditions of Submission: >> ------------------------- >> <<HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC>> >> >> >> Arch Built Started Linux distro >> ------------------------------------------- >> mips n n >> mips64 n n >> x86 n n >> x86_64 y y >> powerpc n n >> powerpc64 n n >> >> >> Reviewer Checklist: >> ------------------- >> [Submitters: make sure that your review doesn't trigger any checkmarks!] >> >> >> Your checkin has not passed review because (see checked entries): >> >> ___ Your RR template is generally incomplete; it has too many blank >> entries >> that need proper data filled in. >> >> ___ You have failed to nominate the proper persons for review and push. >> >> ___ Your patches do not have proper short+long header >> >> ___ You have grammar/spelling in your header that is unacceptable. >> >> ___ You have exceeded a sensible line length in your >> headers/comments/text. >> >> ___ You have failed to put in a proper Trac Ticket # into your commits. >> >> ___ You have incorrectly put/left internal data in your comments/files >> (i.e. internal bug tracking tool IDs, product names etc) >> >> ___ You have not given any evidence of testing beyond basic build tests. >> Demonstrate some level of runtime or other sanity testing. >> >> ___ You have ^M present in some of your files. These have to be removed. >> >> ___ You have needlessly changed whitespace or added whitespace crimes >> like trailing spaces, or spaces before tabs. >> >> ___ You have mixed real technical changes with whitespace and other >> cosmetic code cleanup changes. These have to be separate commits. >> >> ___ You need to refactor your submission into logical chunks; there is >> too much content into a single commit. >> >> ___ You have extraneous garbage in your review (merge commits etc) >> >> ___ You have giant attachments which should never have been sent; >> Instead you should place your content in a public tree to be >> pulled. >> >> ___ You have too many commits attached to an e-mail; resend as threaded >> commits, or place in a public tree for a pull. >> >> ___ You have resent this content multiple times without a clear >> indication >> of what has changed between each re-send. >> >> ___ You have failed to adequately and individually address all of the >> comments and change requests that were proposed in the initial >> review. >> >> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) >> >> ___ Your computer have a badly configured date and time; confusing the >> the threaded patch review. >> >> ___ Your changes affect IPC mechanism, and you don't present any results >> for in-service upgradability test. >> >> ___ Your changes affect user manual and documentation, your patch series >> do not contain the patch that updates the Doxygen manual. >> > ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
