[devel] [PATCH 1 of 1] amfd: disallow delete of CtCs object if Ct maps to comp [#2428]
osaf/services/saf/amf/amfd/ctcstype.cc | 60 - 1 files changed, 58 insertions(+), 2 deletions(-) diff --git a/osaf/services/saf/amf/amfd/ctcstype.cc b/osaf/services/saf/amf/amfd/ctcstype.cc --- a/osaf/services/saf/amf/amfd/ctcstype.cc +++ b/osaf/services/saf/amf/amfd/ctcstype.cc @@ -26,7 +26,7 @@ #include AmfDb *ctcstype_db = nullptr; - +static void find_ct_name_from_association(const SaNameT *haystack, SaNameT *dn, const char *needle); static void ctcstype_db_add(AVD_CTCS_TYPE *ctcstype) { unsigned int rc = ctcstype_db->insert(ctcstype->name,ctcstype); @@ -185,17 +185,73 @@ static SaAisErrorT ctcstype_ccb_complete report_ccb_validation_error(opdata, "Modification of SaAmfCtCsType not supported"); break; case CCBUTIL_DELETE: + AVD_CTCS_TYPE *ctcstype; + SaNameT ct_name; + AVD_COMP_TYPE *comp_type; + AVD_COMP *comp; + CcbUtilOperationData_t *t_opData; + +ctcstype = ctcstype_db->find(Amf::to_string(&opdata->objectName)); +if (ctcstype != nullptr) { + find_ct_name_from_association(&opdata->objectName, &ct_name, ",safVersion"); + TRACE("'%s'", ct_name.value); + comp_type = comptype_db->find(Amf::to_string(&ct_name)); + if ((comp_type) && (nullptr != comp_type->list_of_comp)) { + /* check whether there exists a delete operation for +* each of the Comp in the comp_type list in the current CCB +*/ + bool comp_exist = false; + TRACE("SaAmfCompType '%s' has components", comp_type->name.value); + comp = comp_type->list_of_comp; + while (comp != nullptr) { + TRACE("%s", comp->comp_info.name.value); + t_opData = ccbutil_getCcbOpDataByDN(opdata->ccbId, &comp->comp_info.name); + TRACE("%p", t_opData); + if ((t_opData == nullptr) || (t_opData->operationType != CCBUTIL_DELETE)) { + TRACE("Here %p", t_opData); + comp_exist = true; + break; + } + comp = comp->comp_type_list_comp_next; + } + if (comp_exist == true) { + rc = SA_AIS_ERR_BAD_OPERATION; + report_ccb_validation_error(opdata, "SaAmfCompType '%s' is in use", comp_type->name.value); + goto done; + } + } else + TRACE("SaAmfCompType '%p'. SaAmfCompType '%s' has no components", comp_type, ct_name.value); + } rc = SA_AIS_OK; break; default: osafassert(0); break; } - +done: TRACE_LEAVE2("%u", rc); return rc; } +/** + * Initialize a DN by searching for needle in haystack where two times safVersion comes. + * @param haystack + * @param dn + * @param needle + * @note: "safSupportedCsType=safVersion=1\,safCSType=AmfDemo1,safVersion=1,safCompType=AmfDemo1" + */ +static void find_ct_name_from_association(const SaNameT *haystack, SaNameT *dn, const char *needle) +{ +char *p; + +memset(dn, 0, sizeof(SaNameT)); +p = strstr((char*)haystack->value, needle); +osafassert(p); + p++; /* Increament after comma (,) */ +dn->length = strlen(p); +memcpy(dn->value, p, dn->length); +} + static void ctcstype_ccb_apply_cb(CcbUtilOperationData_t *opdata) { AVD_CTCS_TYPE *ctcstype; -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0 of 1] Review Request for amfd: disallow delete of CtCs object if Ct maps to comp [#2428]
Summary: amfd: disallow delete of CtCs object if Ct maps to comp [#2428] Review request for Trac Ticket(s): #2428 Peer Reviewer(s): Amf Dev Pull request to: <> Affected branch(es): All Development branch: 5.1 Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - <> changeset 6f1d4e05b4cdbd45dcc9c8cb461ed22ee5e52fc2 Author: Nagendra Kumar Date: Fri, 14 Apr 2017 18:49:49 +0530 amfd: disallow delete of CtCs object if Ct maps to comp [#2428] Complete diffstat: -- osaf/services/saf/amf/amfd/ctcstype.cc | 60 ++-- 1 files changed, 58 insertions(+), 2 deletions(-) Testing Commands: - As per ticket. Testing, Expected Results: -- Amfd shouldnot crash. Conditions of Submission: - Ack Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for "amf: support amf tool command to know AMF cluster/nodes status [#2354]"
Summary: amf: support amf tool command to know AMF cluster/nodes status [#2354] Review request for Ticket(s): 2354 Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE *** Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-2354 Private repository: git://git.code.sf.net/u/praveenmalviya/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesn OpenSAF servicesn Core libraries n Samples n Tests n Other y Comments (indicate scope for each "y" above): - revision 2a8ff5aaa7dc8c964de5e99343a6ab6b35562925 Author: praveenmalviya Date: Fri, 14 Apr 2017 14:36:37 +0530 amf: support amf tool command to know AMF cluster/nodes status [#2354] With this command, a user can know the status of nodes in AMF cluster. For example: 1)normal cluster NODE(AMF) STATUS SC-1 UP SC-2 DOWN 2)During SCs absence: NODE(AMF) STATUS PL-3 SAM SC-1 DOWN SC-2 DOWN Added Files: src/amf/tools/amf_cluster_status.cc Complete diffstat: -- opensaf.spec.in | 1 + src/amf/Makefile.am | 16 +- src/amf/tools/Makefile | 2 +- src/amf/tools/amf_cluster_status.cc | 325 4 files changed, 342 insertions(+), 2 deletions(-) Testing Commands: - run command: amfclusterstatus Testing, Expected Results: -- 1)normal cluster NODE(AMF) STATUS SC-1 UP SC-2 DOWN 2)During SCs absence: NODE(AMF) STATUS PL-3 SAM SC-1 DOWN SC-2 DOWN Conditions of Submission: - Ack from any reviewer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot __
[devel] [PATCH 1/1] amf: support amf tool command to know AMF cluster/nodes status [#2354]
With this command, a user can know the status of nodes in AMF cluster. For example: 1)normal cluster NODE(AMF) STATUS SC-1 UP SC-2 DOWN 2)During SCs absence: NODE(AMF) STATUS PL-3 SAM SC-1 DOWN SC-2 DOWN --- opensaf.spec.in | 1 + src/amf/Makefile.am | 16 +- src/amf/tools/Makefile | 2 +- src/amf/tools/amf_cluster_status.cc | 325 4 files changed, 342 insertions(+), 2 deletions(-) create mode 100644 src/amf/tools/amf_cluster_status.cc diff --git a/opensaf.spec.in b/opensaf.spec.in index 672a0f4..0078fc2 100644 --- a/opensaf.spec.in +++ b/opensaf.spec.in @@ -994,6 +994,7 @@ fi %files amf-integration %defattr(-,root,root) %{_sbindir}/amfpm +%{_sbindir}/amfclusterstatus %if %is_ais_ckpt diff --git a/src/amf/Makefile.am b/src/amf/Makefile.am index 8c175c2..f91b5ee 100644 --- a/src/amf/Makefile.am +++ b/src/amf/Makefile.am @@ -161,7 +161,7 @@ noinst_HEADERS += \ src/amf/common/amf_si_assign.h \ src/amf/common/amf_util.h -sbin_PROGRAMS += bin/amfpm +sbin_PROGRAMS += bin/amfpm bin/amfclusterstatus osaf_execbin_PROGRAMS += bin/osafamfd bin/osafamfnd bin/osafamfwd CORE_INCLUDES += -I$(top_srcdir)/src/amf/saf TESTS += bin/testamfd @@ -281,6 +281,20 @@ bin_amfpm_SOURCES = \ bin_amfpm_LDADD = \ lib/libSaAmf.la +bin_amfclusterstatus_CXXFLAGS = $(AM_CXXFLAGS) + +bin_amfclusterstatus_CPPFLAGS = \ + -DSA_EXTENDED_NAME_SOURCE \ + $(AM_CPPFLAGS) + +bin_amfclusterstatus_SOURCES = \ + src/amf/tools/amf_cluster_status.cc + +bin_amfclusterstatus_LDADD = \ + lib/libopensaf_core.la \ + lib/libosaf_common.la \ + lib/libSaImmOm.la + bin_osafamfd_CXXFLAGS = -fno-strict-aliasing $(AM_CXXFLAGS) bin_osafamfd_CPPFLAGS = \ diff --git a/src/amf/tools/Makefile b/src/amf/tools/Makefile index 0b47822..f5b2fc3 100644 --- a/src/amf/tools/Makefile +++ b/src/amf/tools/Makefile @@ -15,4 +15,4 @@ # all: - $(MAKE) -C ../../.. bin/amfpm + $(MAKE) -C ../../.. bin/amfpm bin/amfclusterstatus diff --git a/src/amf/tools/amf_cluster_status.cc b/src/amf/tools/amf_cluster_status.cc new file mode 100644 index 000..f67a090 --- /dev/null +++ b/src/amf/tools/amf_cluster_status.cc @@ -0,0 +1,325 @@ +/* -*- OpenSAF -*- + * + * Copyright (C) 2017, Oracle and/or its affiliates. All rights reserved. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed + * under the GNU Lesser General Public License Version 2.1, February 1999. + * The complete license can be accessed from the following location: + * http://opensource.org/licenses/lgpl-license.php + * See the Copying file included with the OpenSAF distribution for full + * licensing terms. + * + */ + +#include +#include +#include +#include +#include +#include + +#include "mds/mds_papi.h" +#include "base/osaf_time.h" +#include "base/saf_error.h" +#include "base/osaf_extended_name.h" +#include "base/ncs_main_papi.h" +#include "imm/saf/saImmOm.h" +#include "osaf/immutil/immutil.h" + +enum { + AMF_CLUSTER_MDS_VERSION = 1, + AMF_CLUSTER_MDS_SVC_ID = 501 +}; + +class CLM_NODE { + public: +std::string clm_rdn; +uint32_t saClmNodeIsMember; +uint32_t saClmNodeID; +CLM_NODE(): clm_rdn(""), saClmNodeIsMember(0), saClmNodeID(0) { } +}; + +class AMF_NODE { + public: +std::string amf_rdn; +std::string clm_rdn; +AMF_NODE(): amf_rdn(""), clm_rdn("") { } +}; +static SaVersionT immVersion = { 'A', 2, 1 }; +static MDS_HDL mds_hdl; +static bool is_avd_up = false; +static std::vector avnd_up_db; +static std::map clm_db; +static std::map amf_db; +static uint32_t width = strlen("NODE(AMF)"); + +uint32_t mds_callback(struct ncsmds_callback_info *info) { + uint32_t rc = NCSCC_RC_SUCCESS; + if (info->i_op != MDS_CALLBACK_SVC_EVENT) { +rc = NCSCC_RC_FAILURE; +goto done; + } + if (info->info.svc_evt.i_your_id != AMF_CLUSTER_MDS_SVC_ID) { +std::cout<<"Not my service id :"info.svc_evt.i_svc_id == NCSMDS_SVC_ID_AVD) { + if (m_MDS_DEST_IS_AN_ADEST(info->info.svc_evt.i_dest)) +is_avd_up = true; +} else if (info->info.svc_evt.i_svc_id == NCSMDS_SVC_ID_AVND) { +avnd_up_db.push_back(info->info.svc_evt.i_node_id); +} + } +done: + return rc; +} + +static uint32_t mds_get_handle() { + NCSADA_INFO arg; + + memset(&arg, 0, sizeof(NCSADA_INFO)); + arg.req = NCSADA_GET_HDLS; + uint32_t rc = ncsada_api(&arg); + + if (rc != NCSCC_RC_SUCCESS) { +std::cout<<"MDS registration failed with :"<("SaImmAttrClassName"); + searchParam.searchOneAttr.attrValueType = SA_IMM_ATTR_SASTRINGT; + searchParam.searchOneAttr.attrValue = &cl
Re: [devel] [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765] V5
Hi Hoang, ACK , you can push. >>So I will continue checking it in separate ticket. Please create a ticket for tracking. -AVM On 4/14/2017 2:14 PM, Vo Minh Hoang wrote: > Dear Mahesh, > > Thank you for your comments. > I add 2 of my ideals inline, please find [Hoang] tags. > > Dear Zoran, > > Do you have any extra comment about this patch? > If not, I will request pushing it at start of next week. > > Sincerely, > Hoang > > -Original Message- > From: A V Mahesh [mailto:mahesh.va...@oracle.com] > Sent: Thursday, April 13, 2017 5:47 PM > To: Vo Minh Hoang ; zoran.milinko...@ericsson.com > Cc: opensaf-devel@lists.sourceforge.net; Ramesh Babu Betham > > Subject: Re: [PATCH 1 of 1] cpd: to correct failover behavior of cpsv > [#1765] V5 > > Hi Hoang, > > ACK with following : ( tested basic ND restarts) > > - The below errors are not related this patch, those are test case related > > - It look their a existing issue ( not related to this patch ) on Cpnd down > the STANDBY Cpd is > also starting `cpd_tmr_start(&node_info->cpnd_ret_timer,..);` please > check that flow once > (after cpnd restart keep some sleep Actvie CPD and do a switch over ) > [Hoang]: I also can reproduce this behavior but could not find error. > So I will continue checking it in separate ticket. > It is a little bit weird that standby cpd trigger something. Honestly I > think standby should do data sync only. Btw, that is too soon to talk about > this case. > > - You introduced cpd_tmr_stop(&cpnd_info->cpnd_ret_timer); in > cpnd_down_process() > but cpnd_up_process() do call > `cpd_tmr_stop(&cpnd_info->cpnd_ret_timer);` > do check that it may be redundant call . > [Hoang]: I thought we should keep this call even it is redundant in this > case. We are detecting more and more unexpected error cases in system and > cannot tell for sure it is redundant or not. > > -AVM > > On 4/12/2017 2:19 PM, A V Mahesh wrote: >> Hi Hoang, >> >> On 2/10/2017 3:09 PM, Vo Minh Hoang wrote: >>> If cpnd is temporary down only, we don't need clean up anything. >>> If cpnd is permanently down, the bad effect of this proposal is that >>> replica is not clean up. But if cpnd permanently down, we have to >>> reboot node for recovering so I think this cleanup is not really >>> necessary. >>> >>> I also checked this implementation with possible test cases and have >>> not seen any side effect. >>> Please consider it >> We are observing new node_user_info databases mismatch Errors, while >> testing multiple CPND restart with this patch,I will do more debugging >> and update the root cause. >> >> == >> = >> >> >> Apr 12 14:06:57 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start >> CPND_RETENTION timer id = 0x7f86f0500cf0, arg=0x7f86f0501ef0 *Apr 12 >> 14:06:58 SC-1 osafckptd[27594]: ER cpd_proc_decrease_node_user_info >> failed - no user on node id 0x2020F* Apr 12 14:06:58 SC-1 >> osafckptd[27594]: NO cpnd_down_process:: Start CPND_RETENTION timer id >> = 0x7f86f0501750, arg=0x7f86f0501ef0 *Apr 12 14:06:59 SC-1 >> osafckptd[27594]: ER cpd_proc_decrease_node_user_info failed - no user >> on node id 0x2020F* Apr 12 14:06:59 SC-1 osafckptd[27594]: NO >> cpnd_down_process:: Start CPND_RETENTION timer id = 0x7f86f0503ab0, >> arg=0x7f86f0501ef0 Apr 12 14:07:00 SC-1 osafckptd[27594]: NO >> cpnd_down_process:: Start CPND_RETENTION timer id = 0x7f86f0500c70, >> arg=0x7f86f0501ef0 Apr 12 14:07:01 SC-1 osafckptd[27594]: NO >> cpnd_down_process:: Start CPND_RETENTION timer id = 0x7f86f0500930, >> arg=0x7f86f0501ef0 *Apr 12 14:07:03 SC-1 osafckptd[27594]: ER >> cpd_proc_decrease_node_user_info failed - no user on node id 0x2020*F >> Apr 12 14:07:03 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start >> CPND_RETENTION timer id = 0x7f86f04fe3a0, arg=0x7f86f0501ef0 Apr 12 >> 14:07:04 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start >> CPND_RETENTION timer id = 0x7f86f0500cf0, arg=0x7f86f0501ef0 >> >> == >> = >> >> >> -AVM >> >> >> On 4/12/2017 11:08 AM, A V Mahesh wrote: >>> Hi Hoang, >>> >>> On 2/10/2017 3:09 PM, Vo Minh Hoang wrote: Dear Mahesh, Based on what I saw, in this case, retention time cannot detect CPND temporarily down because its pid changed. >>> I will check that , I have some test cases based this retention time >>> , not sure how were they working. >>> >>> Can you please provide reproducible steps, I did look at ticket , but >>> looks complex , if you have any application that reproduces the case >>> please share. >>> >>> -AVM If cpnd is temporary down only, we don't need clean up anything. If cpnd is permanently down, the bad effect of this proposal is that replica is not clean up. But if cpnd permanently down, we have to reboot node for recovering so I t
Re: [devel] [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765] V5
Dear Mahesh, Thank you for your comments. I add 2 of my ideals inline, please find [Hoang] tags. Dear Zoran, Do you have any extra comment about this patch? If not, I will request pushing it at start of next week. Sincerely, Hoang -Original Message- From: A V Mahesh [mailto:mahesh.va...@oracle.com] Sent: Thursday, April 13, 2017 5:47 PM To: Vo Minh Hoang ; zoran.milinko...@ericsson.com Cc: opensaf-devel@lists.sourceforge.net; Ramesh Babu Betham Subject: Re: [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765] V5 Hi Hoang, ACK with following : ( tested basic ND restarts) - The below errors are not related this patch, those are test case related - It look their a existing issue ( not related to this patch ) on Cpnd down the STANDBY Cpd is also starting `cpd_tmr_start(&node_info->cpnd_ret_timer,..);` please check that flow once (after cpnd restart keep some sleep Actvie CPD and do a switch over ) [Hoang]: I also can reproduce this behavior but could not find error. So I will continue checking it in separate ticket. It is a little bit weird that standby cpd trigger something. Honestly I think standby should do data sync only. Btw, that is too soon to talk about this case. - You introduced cpd_tmr_stop(&cpnd_info->cpnd_ret_timer); in cpnd_down_process() but cpnd_up_process() do call `cpd_tmr_stop(&cpnd_info->cpnd_ret_timer);` do check that it may be redundant call . [Hoang]: I thought we should keep this call even it is redundant in this case. We are detecting more and more unexpected error cases in system and cannot tell for sure it is redundant or not. -AVM On 4/12/2017 2:19 PM, A V Mahesh wrote: > Hi Hoang, > > On 2/10/2017 3:09 PM, Vo Minh Hoang wrote: >> If cpnd is temporary down only, we don't need clean up anything. >> If cpnd is permanently down, the bad effect of this proposal is that >> replica is not clean up. But if cpnd permanently down, we have to >> reboot node for recovering so I think this cleanup is not really >> necessary. >> >> I also checked this implementation with possible test cases and have >> not seen any side effect. >> Please consider it > We are observing new node_user_info databases mismatch Errors, while > testing multiple CPND restart with this patch,I will do more debugging > and update the root cause. > > == > = > > > Apr 12 14:06:57 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start > CPND_RETENTION timer id = 0x7f86f0500cf0, arg=0x7f86f0501ef0 *Apr 12 > 14:06:58 SC-1 osafckptd[27594]: ER cpd_proc_decrease_node_user_info > failed - no user on node id 0x2020F* Apr 12 14:06:58 SC-1 > osafckptd[27594]: NO cpnd_down_process:: Start CPND_RETENTION timer id > = 0x7f86f0501750, arg=0x7f86f0501ef0 *Apr 12 14:06:59 SC-1 > osafckptd[27594]: ER cpd_proc_decrease_node_user_info failed - no user > on node id 0x2020F* Apr 12 14:06:59 SC-1 osafckptd[27594]: NO > cpnd_down_process:: Start CPND_RETENTION timer id = 0x7f86f0503ab0, > arg=0x7f86f0501ef0 Apr 12 14:07:00 SC-1 osafckptd[27594]: NO > cpnd_down_process:: Start CPND_RETENTION timer id = 0x7f86f0500c70, > arg=0x7f86f0501ef0 Apr 12 14:07:01 SC-1 osafckptd[27594]: NO > cpnd_down_process:: Start CPND_RETENTION timer id = 0x7f86f0500930, > arg=0x7f86f0501ef0 *Apr 12 14:07:03 SC-1 osafckptd[27594]: ER > cpd_proc_decrease_node_user_info failed - no user on node id 0x2020*F > Apr 12 14:07:03 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start > CPND_RETENTION timer id = 0x7f86f04fe3a0, arg=0x7f86f0501ef0 Apr 12 > 14:07:04 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start > CPND_RETENTION timer id = 0x7f86f0500cf0, arg=0x7f86f0501ef0 > > == > = > > > -AVM > > > On 4/12/2017 11:08 AM, A V Mahesh wrote: >> Hi Hoang, >> >> On 2/10/2017 3:09 PM, Vo Minh Hoang wrote: >>> Dear Mahesh, >>> >>> Based on what I saw, in this case, retention time cannot detect CPND >>> temporarily down because its pid changed. >> I will check that , I have some test cases based this retention time >> , not sure how were they working. >> >> Can you please provide reproducible steps, I did look at ticket , but >> looks complex , if you have any application that reproduces the case >> please share. >> >> -AVM >>> >>> If cpnd is temporary down only, we don't need clean up anything. >>> If cpnd is permanently down, the bad effect of this proposal is that >>> replica is not clean up. But if cpnd permanently down, we have to >>> reboot node for recovering so I think this cleanup is not really >>> necessary. >>> >>> I also checked this implementation with possible test cases and have >>> not seen any side effect. >>> Please consider it. >>> >>> Thank you and best regards, >>> Hoang >>> >>> -Original Message- >>> From: A V Mahesh [mailto:mahesh.va..