Re: [devel] [PATCH 0/2] Review Request for fmd: improve failover response time [#3008]

2019-02-18 Thread minh . chau
Hi Gary,

ack for code review. Still a few other places that call
opensaf_quick_reboot can be visited later.

Thanks,
Minh
> Summary: fmd: improve failover response time V2 [#3008]
> Review request for Ticket(s): 3008
> Peer Reviewer(s): Hans, Minh
> Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
> Affected branch(es): develop
> Development branch: ticket-3008
> Base revision: 5766361568498f8a496d87d8daafe9bffbd75ed9
> Personal repository: git://git.code.sf.net/u/userid-2226215/review
>
> 
> Impacted area   Impact y/n
> 
>  Docsn
>  Build systemn
>  RPM/packaging   n
>  Configuration files n
>  Startup scripts n
>  SAF servicesy
>  OpenSAF servicesn
>  Core libraries  n
>  Samples n
>  Tests   n
>  Other   n
>
>
> Comments (indicate scope for each "y" above):
> -
>
> revision 8ccffc2cd9cd117578227e9cd49421e5c578fec6
> Author:   Gary Lee 
> Date: Tue, 19 Feb 2019 14:57:53 +1100
>
> rded: do not send SUCCESS to main thread [#3008]
>
> do not send RDE_MSG_ACTIVE_PROMOTION_SUCCESS to
> main thread if lock cannot be obtained
>
>
>
> revision 28e17d107f4a079155e03d9f875a3c0262ea19f5
> Author:   Gary Lee 
> Date: Tue, 19 Feb 2019 14:57:53 +1100
>
> fmd: improve failover response time [#3008]
>
> Improve failover response time if split brain prevention is enabled
> but FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is set to 0.
>
> Also, return immediately if node promotion fails to avoid
> sending active role to RDA.
>
>
>
> Complete diffstat:
> --
>  src/fm/fmd/fm_rda.cc | 14 +-
>  src/rde/rded/role.cc |  2 ++
>  2 files changed, 11 insertions(+), 5 deletions(-)
>
>
> Testing Commands:
> -
> *** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***
>
>
> Testing, Expected Results:
> --
> *** PASTE COMMAND OUTPUTS / TEST RESULTS ***
>
>
> Conditions of Submission:
> -
> *** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***
>
>
> Arch  Built StartedLinux distro
> ---
> mipsn  n
> mips64  n  n
> x86 n  n
> x86_64  y  y
> powerpc n  n
> powerpc64   n  n
>
>
> Reviewer Checklist:
> ---
> [Submitters: make sure that your review doesn't trigger any checkmarks!]
>
>
> Your checkin has not passed review because (see checked entries):
>
> ___ Your RR template is generally incomplete; it has too many blank
> entries
> that need proper data filled in.
>
> ___ You have failed to nominate the proper persons for review and push.
>
> ___ Your patches do not have proper short+long header
>
> ___ You have grammar/spelling in your header that is unacceptable.
>
> ___ You have exceeded a sensible line length in your
> headers/comments/text.
>
> ___ You have failed to put in a proper Trac Ticket # into your commits.
>
> ___ You have incorrectly put/left internal data in your comments/files
> (i.e. internal bug tracking tool IDs, product names etc)
>
> ___ You have not given any evidence of testing beyond basic build tests.
> Demonstrate some level of runtime or other sanity testing.
>
> ___ You have ^M present in some of your files. These have to be removed.
>
> ___ You have needlessly changed whitespace or added whitespace crimes
> like trailing spaces, or spaces before tabs.
>
> ___ You have mixed real technical changes with whitespace and other
> cosmetic code cleanup changes. These have to be separate commits.
>
> ___ You need to refactor your submission into logical chunks; there is
> too much content into a single commit.
>
> ___ You have extraneous garbage in your review (merge commits etc)
>
> ___ You have giant attachments which should never have been sent;
> Instead you should place your content in a public tree to be pulled.
>
> ___ You have too many commits attached to an e-mail; resend as threaded
> commits, or place in a public tree for a pull.
>
> ___ You have resent this content multiple times without a clear indication
> of what has changed between each re-send.
>
> ___ You have failed to adequately and individually address all of the
> comments and change requests that were proposed in the initial review.
>
> ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email
> etc)
>
> ___ Your computer have a badly configured date and time; confusing the
> the threaded patch review.
>
> ___ Your changes affect IPC mechanism, and you don't present any results
> for in-service upgradability test.
>
> ___ Your changes affect user manual and documentation, your patch series
> do not contain the patch that updates the Doxygen manual.
>
>





[devel] SA_AIS_ERR_NOT_EXIST error when calling the function saAmfComponentNameGet

2019-02-18 Thread shiva

Hello all,

    I have an AMF application which runs as (High Availability 
Framework)HAFW in 2N redundant systems for confD active/standby 
configuration. So, when I call the function 
*saAmfComponentNameGet(confd_ha_cb.info.amfHandle, 
_ha_cb.info.compName)* I am getting an error stating 
*(SA_AIS_ERR_NOT_EXIST, 12). *what am i doing wrong here? why am i 
getting this error? I have attached the AppConfig file and my 
application files for your reference. Please find the attachments.


Thanks in advance,

Regards.




http://www.saforum.org/IMMSchema; xsi:noNamespaceSchemaLocation="SAI-AIS-IMM-XSD-A.01.01.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;>
	
		safAppType=CONFD_HA
	
	
		safSgType=CONFD_HA
	
	
		safSuType=CONFD_HA
	
	
		safCompType=CONFD_HA
	
	
	safSvcType=CONFD_HA
	
	
	safCSType=CONFD_HA
	
	
	safVersion=1,safSvcType=CONFD_HA
	
	
		safVersion=1,safAppType=CONFD_HA
		
			saAmfApptSGTypes
			safVersion=1,safSgType=CONFD_HA
		
	
	
		safVersion=1,safSgType=CONFD_HA
		
			saAmfSgtRedundancyModel
			1
		
		
			saAmfSgtValidSuTypes
			safVersion=1,safSuType=CONFD_HA
		
		
			saAmfSgtDefAutoAdjustProb
			100
		
		
			saAmfSgtDefCompRestartProb
			40
		
		
			saAmfSgtDefCompRestartMax
			10
		
		
			saAmfSgtDefSuRestartProb
			40
		
		
			saAmfSgtDefSuRestartMax
			10
		
	
	
		safVersion=1,safSuType=CONFD_HA
		
			saAmfSutIsExternal
			0
		
		
			saAmfSutDefSUFailover
			1
		
		
			saAmfSutProvidesSvcTypes
			safVersion=1,safSvcType=CONFD_HA
		
	
	
		safVersion=1,safCompType=CONFD_HA
		
			saAmfCtCompCategory
			1
		
		
			saAmfCtSwBundle
			safSmfBundle=CONFD_HA
		
		
			saAmfCtDefClcCliTimeout
			600
		
		
			saAmfCtDefCallbackTimeout
			100
		
		
			saAmfCtRelPathInstantiateCmd
			confd_ha_inst.sh
		
		
			saAmfCtDefInstantiateCmdArgv
			
		
		
			saAmfCtRelPathCleanupCmd
			confd_ha_cleanup.sh
		
		
			saAmfCtDefCleanupCmdArgv
			
		
		
			saAmfCtDefQuiescingCompleteTimeout
			500
		
		
			saAmfCtDefRecoveryOnError
			2
		
		
			saAmfCtDefDisableRestart
			0
		
		
			saAmfCtDefCmdEnv
			AMF_DEMO_VAR1=CT_VALUE1
			AMF_DEMO_VAR2=CT_VALUE2
		
	
	
		safVersion=1,safCSType=CONFD_HA
	
	
		safMemberCompType=safVersion=1\,safCompType=CONFD_HA,safVersion=1,safSuType=CONFD_HA
	
	
	  safMemberCSType=safVersion=1\,safCSType=CONFD_HA,safVersion=1,safSvcType=CONFD_HA
	
	
		safSupportedCsType=safVersion=1\,safCSType=CONFD_HA,safVersion=1,safCompType=CONFD_HA
		
			saAmfCtCompCapability
			1
		
	
	
		safHealthcheckKey=HC_CONFD_HA,safVersion=1,safCompType=CONFD_HA
		
			saAmfHctDefPeriod
			150
		
		
			saAmfHctDefMaxDuration
			600
		
	

	
		safApp=CONFD_HA
		
			saAmfAppType
			safVersion=1,safAppType=CONFD_HA
		
	
	
		safSg=SG_CONFD_HA,safApp=CONFD_HA
		
			saAmfSGType
			safVersion=1,safSgType=CONFD_HA
		
		
			saAmfSGSuHostNodeGroup
			safAmfNodeGroup=SCs,safAmfCluster=myAmfCluster
		
		
			saAmfSGAutoRepair
			0
		
		
			saAmfSGAutoAdjust
			0
		
		
			saAmfSGNumPrefInserviceSUs
			10
		
		
			saAmfSGNumPrefAssignedSUs
			10
		
	
	
		safSi=Si_CONFD_HA,safApp=CONFD_HA
		
			saAmfSvcType
			safVersion=1,safSvcType=CONFD_HA
		
		
			saAmfSIProtectedbySG
			safSg=SG_CONFD_HA,safApp=CONFD_HA
		 
	
	
	safCsi=Csi_CONFD_HA,safSi=Si_CONFD_HA,safApp=CONFD_HA
	
		saAmfCSType
		safVersion=1,safCSType=CONFD_HA
	
	
	
			safSmfBundle=CONFD_HA
	
	
		safInstalledSwBundle=safSmfBundle=CONFD_HA,safAmfNode=SC-1,safAmfCluster=myAmfCluster
		
			saAmfNodeSwBundlePathPrefix
			/etc/opt/opensaf
		
	
	
		safSu=SuT_CONFD_HA_1,safSg=SG_CONFD_HA,safApp=CONFD_HA
		
			saAmfSUType
			safVersion=1,safSuType=CONFD_HA
		
		
			saAmfSURank
			1
		
		
			saAmfSUAdminState
			3
		
		
			saAmfSUHostNodeOrNodeGroup
			safAmfNode=SC-1,safAmfCluster=myAmfCluster

	
	
safComp=CompT_CONFD_HA,safSu=SuT_CONFD_HA_1,safSg=SG_CONFD_HA,safApp=CONFD_HA
	
		saAmfCompType
		safVersion=1,safCompType=CONFD_HA
	
	
	
	safSupportedCsType=safVersion=1\,safCSType=CONFD_HA,safComp=CompT_CONFD_HA,safSu=SuT_CONFD_HA_1,safSg=SG_CONFD_HA,safApp=CONFD_HA
	
	
		safInstalledSwBundle=safSmfBundle=CONFD_HA,safAmfNode=SC-2,safAmfCluster=myAmfCluster
		
			saAmfNodeSwBundlePathPrefix
			/etc/opt/opensaf
		
	
	
		safSu=SuT_CONFD_HA_2,safSg=SG_CONFD_HA,safApp=CONFD_HA
		

			saAmfSUType
			safVersion=1,safSuType=CONFD_HA
		
		
			saAmfSURank
			2
		
		
			saAmfSUAdminState
			3
		

saAmfSUHostNodeOrNodeGroup
safAmfNode=SC-2,safAmfCluster=myAmfCluster

	
	
	safComp=CompT_CONFD_HA,safSu=SuT_CONFD_HA_2,safSg=SG_CONFD_HA,safApp=CONFD_HA
	
		saAmfCompType
		safVersion=1,safCompType=CONFD_HA
	
	
	
	safSupportedCsType=safVersion=1\,safCSType=CONFD_HA,safComp=CompT_CONFD_HA,safSu=SuT_CONFD_HA_2,safSg=SG_CONFD_HA,safApp=CONFD_HA
	

/*
Integration glue between ConfD and OpenSAF AMF

Copyright (C) 2009 

[devel] [PATCH 0/2] Review Request for fmd: improve failover response time [#3008]

2019-02-18 Thread Gary Lee
Summary: fmd: improve failover response time V2 [#3008]
Review request for Ticket(s): 3008
Peer Reviewer(s): Hans, Minh
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3008
Base revision: 5766361568498f8a496d87d8daafe9bffbd75ed9
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy 
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 8ccffc2cd9cd117578227e9cd49421e5c578fec6
Author: Gary Lee 
Date:   Tue, 19 Feb 2019 14:57:53 +1100

rded: do not send SUCCESS to main thread [#3008]

do not send RDE_MSG_ACTIVE_PROMOTION_SUCCESS to
main thread if lock cannot be obtained



revision 28e17d107f4a079155e03d9f875a3c0262ea19f5
Author: Gary Lee 
Date:   Tue, 19 Feb 2019 14:57:53 +1100

fmd: improve failover response time [#3008]

Improve failover response time if split brain prevention is enabled
but FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is set to 0.

Also, return immediately if node promotion fails to avoid
sending active role to RDA.



Complete diffstat:
--
 src/fm/fmd/fm_rda.cc | 14 +-
 src/rde/rded/role.cc |  2 ++
 2 files changed, 11 insertions(+), 5 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 2/2] rded: do not send SUCCESS to main thread [#3008]

2019-02-18 Thread Gary Lee
do not send RDE_MSG_ACTIVE_PROMOTION_SUCCESS to
main thread if lock cannot be obtained
---
 src/rde/rded/role.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
index 06e93c6..3effc25 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -114,6 +114,7 @@ void Role::PromoteNode(const uint64_t cluster_size,
 LOG_ER("Unable to set active controller in consensus service");
 opensaf_quick_reboot("Unable to set active controller "
 "in consensus service");
+return;
   }
 
   RDE_CONTROL_BLOCK* cb = rde_get_control_block();
@@ -135,6 +136,7 @@ void Role::PromoteNode(const uint64_t cluster_size,
 LOG_ER("Unable to set active controller in consensus service");
 opensaf_quick_reboot("Unable to set active controller in "
 "consensus service");
+return;
   }
   std::this_thread::sleep_for(std::chrono::seconds(1));
 }
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/2] fmd: improve failover response time [#3008]

2019-02-18 Thread Gary Lee
Improve failover response time if split brain prevention is enabled
but FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is set to 0.

Also, return immediately if node promotion fails to avoid
sending active role to RDA.
---
 src/fm/fmd/fm_rda.cc | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/fm/fmd/fm_rda.cc b/src/fm/fmd/fm_rda.cc
index 504757c..d3063ba 100644
--- a/src/fm/fmd/fm_rda.cc
+++ b/src/fm/fmd/fm_rda.cc
@@ -88,17 +88,20 @@ uint32_t fm_rda_set_role(FM_CB *fm_cb, PCS_RDA_ROLE role) {
 
   Consensus consensus_service;
   if (consensus_service.IsEnabled() == true) {
-// Allow topology events to be processed first. The MDS thread may
-// be processing MDS down events and updating cluster_size concurrently.
-// We need cluster_size to be as accurate as possible, without waiting
-// too long for node down events.
-std::this_thread::sleep_for(std::chrono::seconds(4));
+if (consensus_service.PrioritisePartitionSize() == true) {
+  // Allow topology events to be processed first. The MDS thread may
+  // be processing MDS down events and updating cluster_size concurrently.
+  // We need cluster_size to be as accurate as possible, without waiting
+  // too long for node down events.
+  std::this_thread::sleep_for(std::chrono::seconds(4));
+}
 
 rc = consensus_service.PromoteThisNode(true, fm_cb->cluster_size);
 if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) {
   LOG_ER("Unable to set active controller in consensus service");
   opensaf_quick_reboot("Unable to set active controller "
   "in consensus service");
+  return NCSCC_RC_FAILURE;
 } else if (rc == SA_AIS_ERR_EXIST) {
   // @todo if we don't reboot, we don't seem to recover from this. Can we
   // improve?
@@ -107,6 +110,7 @@ uint32_t fm_rda_set_role(FM_CB *fm_cb, PCS_RDA_ROLE role) {
   "cluster?");
   opensaf_quick_reboot("A controller is already active. We were separated "
"from the cluster?");
+  return NCSCC_RC_FAILURE;
 }
   }
 
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel