On 2017/10/18 7:21, Andrew Morton wrote:
> On Thu, 21 Sep 2017 02:09:33 +0000 Zhangyang <zhang.ya...@h3c.com> wrote:
> 
>> In our test, We fond that , when the network down, qs->qs_holds could not be 
>> reduce to zero, it will lead to the node can't do fence.
>>
>>
>>
>> o2net_idle_timer -> o2quo_conn_err -> qs->qs_holds++, after 
>> O2NET_QUORUM_DELAY_MS if qs_holds could be subtract to zero, it could do 
>> make_decision.
>>
>> But if there are many nodes, when one node network down which contains o2net 
>> connections may not do o2net_idle_timer at the same time.
>>
>> So when a o2net_node have done nn->nn_still_up, but the qs_holds is not 
>> zero. because the other o2net_node have not done nn->nn_still_up.
>>
>> So the first o2net_node will do o2net_idle_timer again, and the qs_holds 
>> could be add again. And the qs_holds is global variable, so it formed a 
>> loop, the node could not do o2quo_make_decision, because of qs_holds never 
>> be zero.
>>
>>
>>
>> I alter the function o2quo_conn_err, take o2quo_set_hold under control of 
>> the bit map qs_conn_bm.
> 
> I merged this, subject to review by the ocfs2 maintainers.
> 
> The changelog and the comment are really hard to understand.  Perhaps
> one of the ocfs2 developers could suggest some more clear words to use?

OK, I will help Yang Zhang to re-send this patch with a proper and clear 
changelog

Thanks,
Changwei

> 
> Thanks.
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Reply via email to