Hi again,
I am not sure how you replicate this scenario. It looks to me you
deliberately reboot the PL-3 before amfd process the synced assignment.
The node_up sequence has many branches, you are replicating one of those.
What I mean, at the time amfd calls avd_process_state_info_queue(), PL
Hi Minh,
I check your concern about if PL down before sending node up but after synced
assignments (recover after headless).
Current code still work correctly. Because active AMFD don't recreate
assignments.
So, my point of view is still there, just reset 3 fields as current patch.
Jul 10
Hi,
You are right about the behavior according to SMF spec. I misinterpreted
the code.
I have no further comments regarding assert(). You can wait for comments
from Lennart and Gary.
Thanks,
Nguyen
On 7/9/2018 8:43 AM, Tran Thuan wrote:
Hi Nguyen,
SMF spec describe:
"Also, if a registered
Hi,
How about the @nodes_exit_cnt? The active amfd checkpoints this
@nodes_exit_cnt to the standby amfd inside mds down event. You can say
it's not used but it's what's happening for this counter.
"AMFND not yet up", what you mean is node_state AVD_AVND_STATE_ABSENT?
You reset counter if
There is a abnormal state that AMFND on remote node keep sending
message to active AMFD but active AMFD see that node already left.
The msg_id expected is not matched and the remote node keep stuck
as out of control of active AMFD.
In this case, active AMFD can trigger remote fencing for that node
Summary: amf: Recover node that disconnnect from active AMFD [#2880]
Review request for Ticket(s): 2880
Peer Reviewer(s): Hans, Gary
Pull request to: Hans, Gary
Affected branch(es): develop
Development branch: ticket-2880
Base revision: ca6067f31038b9d7076b5836d691591e992302ee
Personal repository:
Hi Minh,
Since AMFND not yet up before, other fields are not updated yet.
I think only 3 fields is changed in this scenario, that's why my is reset them
only.
Do you think so?
Best Regards,
Thuan
-Original Message-
From: Minh Hon Chau
Sent: Monday, July 9, 2018 1:22 PM
To: Tran
Hi,
Ok, so we can ignore the "AMFND already up" case.
"- AMFND not yet up before (node reboot up/first up): this is our issue.
+ current only delete node without care msg_id counter."
Are you able to reproduce this problem?
You only reset the msg_id counter of a node, how about the others: