Hi,

Srikanth: Thanks for the information.

I have analyzed the situation. The two issues are same (one case AMF 
application comps are running on locked payloads). The  message " NO Pending 
Response sent for CLM track callback::OK '7'" is because of AMF responding two 
times for same invocationid. For the case mentioned in ticket description this 
message is not observed because applications installed on locked nodes makes 
the difference. CLMS properly maintains invocationid for all clients per 
callback. So to understand the problem I considered a diferent case.

 Suppose one payload node PL-4 is locked and an application still has not 
responded for the track callbacks and another payload PL-3 is stopped (OpenSAF 
stop). Application is hosted on PL-5 and its track flags are same as AMFD: 
(SA_TRACK_CURRENT | SA_TRACK_CHANGES_ONLY |  SA_TRACK_VALIDATE_STEP | 
SA_TRACK_START_STEP).
In this case what is observed is when PL-4 is locked both AMF and app gets 
track callback for CHANGE_START.Here AMF responds for the callback but 
application does not respond. Now PL-3 is stopped. Here CLM delievers track 
callback for COMPLETED step but it contains numberOfItems=2 both payload PL-3 
and PL-4. Even application the same. 
Application never responds for the PL-4 callback and node lock timer expires at 
CLMD and it again sends completed callback to both AMFD and application. Since 
both AMFD and application has registered for SA_TRACK_CHANGES_ONLY,I really 
doubt CLM should send callback for both PL-3 and PL-4. In the description of 
ticket I have pointed out this problem for CHANGE_START case. In CLM spec in 
section 3.5.2 SaClmClusterTrackCallbackT_4 page 51:

The value of the numberOfItems attribute in the structure to which the
notificationBuffer parameter points might be greater than the value of the
numberOfMembers parameter if either the SA_TRACK_CHANGES flag or the
SA_TRACK_CHANGES_ONLY flags is set, and one or more member nodes have left
the cluster membership. In this case, the structure to which the
notificationBuffer parameter points might contain information about the current
members of the cluster and also about nodes that have recently left the cluster
membership.

I am going though ticket list and spec for more information regarding this.
Thanks,
Praveen


Attachments:

- 
[node_lock_and_stop.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/6b54e875/538b/attachment/node_lock_and_stop.tgz)
 (382.7 kB; application/x-compressed)
- 
[two_nodes_lock.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/6b54e875/538b/attachment/two_nodes_lock.tgz)
 (335.0 kB; application/x-compressed)


---

** [tickets:#2372] amf/clm: CLM lock of two more nodes returns REPAIR_PENDING 
for first node.**

**Status:** accepted
**Milestone:** 5.0.2
**Created:** Tue Mar 14, 2017 09:29 AM UTC by Praveen
**Last Updated:** Wed Mar 15, 2017 06:27 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[osafamfd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafamfd) 
(3.4 MB; application/octet-stream)
- 
[osafclmd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafclmd) 
(860.9 kB; application/octet-stream)


Steps to reproduce:
1) Bring 4 nodes cluster up.
2) Deploy AMf demo on PL-3 and PL-4.
3) LOCK amfd nodes PL-3 and PL-4.
4) Make arranegements so that termination of amf_demo on PL-3 takes  more time 
compare to PL-4.
5)From one terminal issue CLM lock of PL-3 first and in not time issue CLM lock 
of PL-4.

CLM and AMF traces are attached.  
Analysis:
When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on 
PL-3. When termination of amf_demo still going on AMF gets another track 
callback with rootcausetentity as PL-4. However callback contains information 
of PL-3 also. AMFD starts terminating  amf_demo on PL-4 but at the same time it 
responds of PL-3 with invocation id of PL-4 callback. CLM assumes that PL-4 
change_started completed and sends completion callback for PL-4. In this 
callback, AMF clears internal flags which monitors the graceful removal of 
nodes. Since AMF never responded for PL-3 callback, callback timer expires in 
CLMD and it sends complete callback to AMF. AMF thinks this is the case of 
nodefailover and tries to failover PL-3.

Note: In all these stages, CLM sends track callback with information of all the 
nodes. AMF registers params are:
 
SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY|SA_TRACK_VALIDATE_STEP|SA_TRACK_START_STEP.
  I am still evaluating whther issue is in CLM or AMF. Since AMF registers for 
**|SA_TRACK_CHANGES_ONLY|** should CLM give information of all the nodes in all 
subsequent callbacks?
 Also AMF should respond to callback when it has completed termination of comps.


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to