Re: [users] EXTERNAL: RE: IMM "Try Again" for Admin Commands - help - clarification

nagendra Sun, 08 Jul 2018 07:49:07 -0700

Hi Jim,
 
>>Can you clarify your suggestion
Admin operation on node group is an extension feature on top of Amf 
Specifications. This feature has been implemented in OpenSAF 4.6 release. Since 
you are using OpenSAF 5.2.0 release, you have this feature in the deployed 
systems. This has been implemented to cater Scale-in/Scale-out scenarios of 
Cloud, where it is desired to shutdown/start multiple nodes in one-shot based 
on resource demand fluctuations.


Please download "OpenSAF_AMF_PR.odt" from 
https://sourceforge.net/p/opensaf/documentation/ci/default/tree/
 and refer Section 2.2.10 of AMF programmers reference doc as pointed by Gary.

How to perform admin operations on node group?
Steps to create node group and perform admin operations on node group:
Step #1: Create a node group object (mygroup) containing two nodes PL-3 and 
PL-4 by following commands (In your case, there will be 4 nodes PL-3, PL-4, 
PL-5, PL-6):
immcfg -c SaAmfNodeGroup -a 
saAmfNGNodeList="safAmfNode=PL-3,safAmfCluster=myAmfCluster" -a 
saAmfNGNodeList="safAmfNode=PL-4,safAmfCluster=myAmfCluster" 
safAmfNodeGroup=mygroup,safAmfCluster=myAmfCluster

The node group object creation and its contents can be validated by the 
following command:
immlist safAmfNodeGroup=mygroup,safAmfCluster=myAmfCluster

Name                                               Type         Value(s)
========================================================================
safAmfNodeGroup                                    SA_STRING_T  
safAmfNodeGroup=mygroup
saAmfNGNodeList                                    SA_NAME_T    
safAmfNode=PL-3,safAmfCluster=myAmfCluster (42) 
safAmfNode=PL-4,safAmfCluster=myAmfCluster (42)
saAmfNGAdminState                                  SA_UINT32_T  1 (0x1)
SaImmAttrImplementerName                           SA_STRING_T  safAmfService
SaImmAttrClassName                                 SA_STRING_T  SaAmfNodeGroup
SaImmAttrAdminOwnerName                            SA_STRING_T  <Empty>


Step #2: Perform admin operation. Perform lock operation by the following 
command:
amf-adm lock safAmfNodeGroup=mygroup,safAmfCluster=myAmfCluster

The operation's success can be validated by checking saAmfNGAdminState, it 
should be in locked(2) state:
immlist safAmfNodeGroup=mygroup,safAmfCluster=myAmfCluster

Name                                               Type         Value(s)
========================================================================
safAmfNodeGroup                                    SA_STRING_T  
safAmfNodeGroup=mygroup
saAmfNGNodeList                                    SA_NAME_T    
safAmfNode=PL-3,safAmfCluster=myAmfCluster (42) 
safAmfNode=PL-4,safAmfCluster=myAmfCluster (42)
saAmfNGAdminState                                  SA_UINT32_T  2 (0x2)
SaImmAttrImplementerName                           SA_STRING_T  safAmfService
SaImmAttrClassName                                 SA_STRING_T  SaAmfNodeGroup
SaImmAttrAdminOwnerName                            SA_STRING_T  <Empty>


Further lock-in and other commands can be performed as below:
amf-adm lock-in safAmfNodeGroup=mygroup,safAmfCluster=myAmfCluster
amf-adm unlock-in safAmfNodeGroup=mygroup,safAmfCluster=myAmfCluster
amf-adm unlock safAmfNodeGroup=mygroup,safAmfCluster=myAmfCluster

Step #3: Delete the node group after the completion of admin operations (Or Can 
be kept for further admin operations):
immcfg -d safAmfNodeGroup=mygroup,safAmfCluster=myAmfCluster

Note: Please note that the admin state of nodes in the node group is the same 
as node group i.e. after Step #1 the admin state of node group and its nodes is 
in locked state. After Step #2 the admin state of node group and its nodes is 
in locked-in state.

Admin operations can be performed on individual node to change the admin state 
of that node as below:
amf-adm unlock-in safAmfNode=PL-3,safAmfCluster=myAmfCluster
amf-adm unlock safAmfNode=PL-3,safAmfCluster=myAmfCluster
amf-adm unlock-in safAmfNode=PL-4,safAmfCluster=myAmfCluster
amf-adm unlock safAmfNode=PL-4,safAmfCluster=myAmfCluster

>>Can you clarify the OpenSAF behavior for the following scenario
Based on the information provided, I think each node B,C,D and E are having at 
least one SU of a common SG.
I have tried to explain 'TRY_AGAIN return scenario' by an animation in the ppt 
attached.

Hope this helps. Let me know if you have any follow up questions.
 
Thanks,
Nagendra, 91-9866424860
www.hasolutions.in
https://www.linkedin.com/company/hasolutions/
High Availability Solutions Pvt. Ltd.
- We provide OpenSAF support and services
 
 
 
 
--------- Original Message --------- Subject: RE: EXTERNAL: RE: [users] IMM 
"Try Again" for Admin Commands - help - clarification
From: "Carroll, James R" <[email protected]>
Date: 7/7/18 12:31 am
To: "[email protected]" <[email protected]>, 
"[email protected]" <[email protected]>

  Hi Nagendra,
  
 Thank you so much for your informative response.  I do have some follow up 
questions, however.
  
  Can you clarify your suggestion: “You can also think about performing node 
group operation”.  According to the AMF specification B.04.01, Section 8.7, “No 
administrative operations are defined for a node group”.   I am not sure how 
this can be used to resolve sending admin commands to the individual nodes. Can 
you clarify the OpenSAF behavior for the following scenario: Controller Node A, 
needs to send Admin Command “Shutdown”, to Payload Nodes, in following order 

 

 <![if !supportLists]>                                      i.     
<![endif]>Payload Node B - Admin Shutdown
 <![if !supportLists]>                                     ii.     
<![endif]>Payload Node C - Admin Shutdown
 <![if !supportLists]>                                   iii.     
<![endif]>Payload Node D - Admin Shutdown
 <![if !supportLists]>                                   iv.     
<![endif]>Payload Node E - Admin Shutdown
  Our EXPECTATION of the above scenario: The sending of an ADMIN command to 
Node B, is independent of Nodes C, or D,  or E.  Therefore, all 4 commands 
should be issued by OpenSAF, in Parallel, with no dependency between nodes.  
Therefore, none of the Shutdown commands should be responded to with TRY_AGAIN. 
Our OBSERVATION appears to be showing: OpenSAF issues Admin Command to Node B.  
Then, the commands to nodes C, D, and E, will not execute, until Node B has 
completed.  In other words, it appears to be sequentially dependent.   Nodes C, 
D, and E are getting TRY_AGAIN.  Once Node B is done, then Node C begins 
shutting down, and Nodes D and E get TRY_AGAIN.  And so on for the remaining 
nodes. 



  
 Thanks.
  
 Jim
  
  
  
 From: [email protected] <[email protected]> 
 Sent: Friday, July 06, 2018 1:25 AM
 To: Carroll, James R (US) <[email protected]>; 
[email protected]
 Subject: EXTERNAL: RE: [users] IMM "Try Again" for Admin Commands - help
 
  Hi Jim,
 
 The following are the most probable reasons for getting TRY_AGAIN for node 
admin operations (node lock/shutdown). I assume the components/applications are 
SA-Aware(if not then equivalent actions from Amf can be correlated).
 
 For node lock/shutdown admin operations:
 - The components/applications receiving quiscing/quisced/removed callbacks are 
taking time to respond to Amf.
 - The components/applications receiving Active callbacks at another node 
(because of lock issued on the current node and there was Standby Service unit 
at another node) are taking time to respond to Amf.
 
 Until, the components/applications don't respond to Amf Callbacks, Amf will 
return TRY_AGAIN for further admin operation on the node.
 
 This is expected behavior because until one admin operation is not successful 
on the entities, another admin operation can't be accepted until some more 
time(So, the admin operations get TRY_AGAIN).
 
 Suggestion:
 Step 1: Debugging of the application responses time.
 Step 2: If the application taking time to respond because of genuine reasons, 
then you can have a script performing admin operations, the script should 
handle TRY_AGAIN.
 Step 3: You can also think about performing node group operation.
 
 Please find some point-to-point responses:
 >>need to understand why the IMM is busy.
 In my understanding, Imm is not busy, rather Amf is not getting callback 
responses from applications and Amf is returning TRY_AGAIN to Imm, which in 
tern returning TRY_AGAIN to applications issuing admin operation.
 >>how long to wait until the operations can be performed.
 Until all the callbacks are not responded to Amf, the admin operations will 
return TRY_AGAIN.
 >>Is this a known and documented issue?
 It is defined by Specifications to return TRY_AGAIN by Service if the 
operation can't be accepted at that time.
 >>Is it possible that this issue has been addressed in a later release that we 
 >>can capture?
 This behavior of returning TRY_AGAIN is the same in all the releases.
 >>Are there any accepted practices or guidelines on how to deal with this 
 >>condition?
 As suggested in Step 2, you can keep sleep for milli/micro seconds if get 
TRY_AGAIN and then call admin operation again in your script or applications 
issuing admin operations.
 
 Hope that helps.
 
 

 
Thanks,
 
Nagendra, 91-9866424860
 
www.hasolutions.in
 
High Availability Solutions Pvt. Ltd.
 
 - High Availability Solutions Provider.
 

 

 

 

 

 
 --------- Original Message ---------
  Subject: [users] IMM "Try Again" for Admin Commands - help
 From: "Carroll, James R" <[email protected]>
 Date: 7/5/18 9:40 pm
 To: "[email protected]" <[email protected]>
 
 All,
 
 We are using OpenSAF 5.2.0, and are experiencing issues with Admin commands to 
perform NODE operations. We are getting multiple responses of TRY_AGAIN, and 
need to understand why the IMM is busy, and how long to wait until the 
operations can be performed.
 
 Some background for the Admin commands being performed. We have a single 
controller node, and 4 payload nodes. In our current configuration, OpenSAF 
controller node is only housing OpenSAF daemons, there are no user developed 
applications running on the controller node. In addition, we have all 4 payload 
nodes up and running essentially idle, with minimal load. We issue an ADMIN 
command to shutdown each of the Payload nodes (the controller node is 
unaffected). Each of the admin commands responds with TRY_AGAIN. And then we 
have to wait arbitrary times, then try again, until the IMM accepts the 
command, for each node. In our view of this scenario, these are near-perfect 
conditions for OpenSAF: the controller has its own node, and the system is 
fully idle. Yet we continue to re-issue the ADMIN command, and we get a 
response of busy, try again. Eventually, each command is accepted (one for each 
payload node), and then we can issue the Lock Instantiation. Note - we have 
also tried scenarios using the LOCK, and LOCK_Instantiate sequence, instead of 
SHUTDOWN, and see similar behavior.
 
 Is this a known and documented issue? Is it possible that this issue has been 
addressed in a later release that we can capture?
 Are there any accepted practices or guidelines on how to deal with this 
condition?
 
 Thank you.
 
 Jim
 
 ------------------------------------------------------------------------------
 Check out the vibrant tech community on one of the world's most
 engaging tech sites, Slashdot.org! http://sdm.link/slashdot
 _______________________________________________
 Opensaf-users mailing list
 [email protected]
 https://lists.sourceforge.net/lists/listinfo/opensaf-users
 
 Hi Jim,
 
 The following are the most probable reasons for getting TRY_AGAIN for node 
admin operations (node lock/shutdown). I assume the components/applications are 
SA-Aware(if not then equivalent actions from Amf can be correlated).
 
 For node lock/shutdown admin operations:
 - The components/applications receiving quiscing/quisced/removed callbacks are 
taking time to respond to Amf.
 - The components/applications receiving Active callbacks at another node 
(because of lock issued on the current node and there was Standby Service unit 
at another node) are taking time to respond to Amf.
 
 Until, the components/applications don't respond to Amf Callbacks, Amf will 
return TRY_AGAIN for further admin operation on the node.
 
 This is expected behaviour because until one admiin operation is not 
successful on the entities, another admin operation can't be accepeted until 
some more time(So, the admin operations get TRY_AGAIN).
 
 Suggestion:
 Step 1: Debugging of the application responses time.
 Step 2: If the application taking time to respond because of genuine reasons, 
then you can have a script performing admin operations, the script should 
handle TRY_AGAIN.
 Step 3: You can also think about performing node group operation.
 
 Please find some point-to-point responses:
 >>need to understand why the IMM is busy.
 In my umderstanding, Imm is not busy, rather Amf is not getting callback 
responses from applications and Amf is returning TRY_AGAIN to Imm, which in 
tern returning TRY_AGAIN to applications issuing admin operation.
 >>how long to wait until the operations can be performed.
 Untill all the callbacks are not responded to Amf, the admin operations will 
return TRY_AGAIN.
 >>Is this a known and documented issue?
 It is defined by Specifications to return TRY_AGAIN by Service if the 
operation can't be accepcted at that time.
 >>Is it possible that this issue has been addressed in a later release that we 
 >>can capture?
 This behaviour of returning TRY_AGAIN is the same in all the releases.
 >>Are there any accepted practices or guidelines on how to deal with this 
 >>condition?
 As suggested in Step 2, you can keep sleep for milli/micro seconds if get 
TRY_AGAIN and then call admin operation again in your script or applications 
issuing admin operations.
 
 Hope that helps.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] EXTERNAL: RE: IMM "Try Again" for Admin Commands - help - clarification

Reply via email to