Hi,
Srikanth: Thanks for the information.
I have analyzed the situation. The two issues are same (one case AMF
application comps are running on locked payloads). The message " NO Pending
Response sent for CLM track callback::OK '7'" is because of AMF responding two
times for same invocationid. For the case mentioned in ticket description this
message is not observed because applications installed on locked nodes makes
the difference. CLMS properly maintains invocationid for all clients per
callback. So to understand the problem I considered a diferent case.
Suppose one payload node PL-4 is locked and an application still has not
responded for the track callbacks and another payload PL-3 is stopped (OpenSAF
stop). Application is hosted on PL-5 and its track flags are same as AMFD:
(SA_TRACK_CURRENT | SA_TRACK_CHANGES_ONLY | SA_TRACK_VALIDATE_STEP |
SA_TRACK_START_STEP).
In this case what is observed is when PL-4 is locked both AMF and app gets
track callback for CHANGE_START.Here AMF responds for the callback but
application does not respond. Now PL-3 is stopped. Here CLM delievers track
callback for COMPLETED step but it contains numberOfItems=2 both payload PL-3
and PL-4. Even application the same.
Application never responds for the PL-4 callback and node lock timer expires at
CLMD and it again sends completed callback to both AMFD and application. Since
both AMFD and application has registered for SA_TRACK_CHANGES_ONLY,I really
doubt CLM should send callback for both PL-3 and PL-4. In the description of
ticket I have pointed out this problem for CHANGE_START case. In CLM spec in
section 3.5.2 SaClmClusterTrackCallbackT_4 page 51:
The value of the numberOfItems attribute in the structure to which the
notificationBuffer parameter points might be greater than the value of the
numberOfMembers parameter if either the SA_TRACK_CHANGES flag or the
SA_TRACK_CHANGES_ONLY flags is set, and one or more member nodes have left
the cluster membership. In this case, the structure to which the
notificationBuffer parameter points might contain information about the current
members of the cluster and also about nodes that have recently left the cluster
membership.
I am going though ticket list and spec for more information regarding this.
Thanks,
Praveen
Attachments:
-
[node_lock_and_stop.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/6b54e875/538b/attachment/node_lock_and_stop.tgz)
(382.7 kB; application/x-compressed)
-
[two_nodes_lock.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/6b54e875/538b/attachment/two_nodes_lock.tgz)
(335.0 kB; application/x-compressed)
---
** [tickets:#2372] amf/clm: CLM lock of two more nodes returns REPAIR_PENDING
for first node.**
**Status:** accepted
**Milestone:** 5.0.2
**Created:** Tue Mar 14, 2017 09:29 AM UTC by Praveen
**Last Updated:** Wed Mar 15, 2017 06:27 AM UTC
**Owner:** Praveen
**Attachments:**
-
[osafamfd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafamfd)
(3.4 MB; application/octet-stream)
-
[osafclmd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafclmd)
(860.9 kB; application/octet-stream)
Steps to reproduce:
1) Bring 4 nodes cluster up.
2) Deploy AMf demo on PL-3 and PL-4.
3) LOCK amfd nodes PL-3 and PL-4.
4) Make arranegements so that termination of amf_demo on PL-3 takes more time
compare to PL-4.
5)From one terminal issue CLM lock of PL-3 first and in not time issue CLM lock
of PL-4.
CLM and AMF traces are attached.
Analysis:
When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on
PL-3. When termination of amf_demo still going on AMF gets another track
callback with rootcausetentity as PL-4. However callback contains information
of PL-3 also. AMFD starts terminating amf_demo on PL-4 but at the same time it
responds of PL-3 with invocation id of PL-4 callback. CLM assumes that PL-4
change_started completed and sends completion callback for PL-4. In this
callback, AMF clears internal flags which monitors the graceful removal of
nodes. Since AMF never responded for PL-3 callback, callback timer expires in
CLMD and it sends complete callback to AMF. AMF thinks this is the case of
nodefailover and tries to failover PL-3.
Note: In all these stages, CLM sends track callback with information of all the
nodes. AMF registers params are:
SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY|SA_TRACK_VALIDATE_STEP|SA_TRACK_START_STEP.
I am still evaluating whther issue is in CLM or AMF. Since AMF registers for
**|SA_TRACK_CHANGES_ONLY|** should CLM give information of all the nodes in all
subsequent callbacks?
Also AMF should respond to callback when it has completed termination of comps.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets