Hi Joseph, > 在 2016年1月20日,下午5:18,Joseph Qi <joseph...@huawei.com> 写道: > > Hi Junxiao, > Thanks for the patch set. > In case only one node storage link down, if this node doesn't fence > self, other nodes will still check and mark this node dead, which will > cause cluster membership inconsistency. > In your patch set, I cannot see any logic to handle this. Am I missing > something? No, there is no logic for this. But why didn’t node fence self when storage down? What make a softirq timer can’t be run, another bug?
Thanks, Junxiao. > > On 2016/1/20 11:13, Junxiao Bi wrote: >> Hi, >> >> This serial of patches is to fix the issue that when storage down, >> all nodes will fence self due to write timeout. >> With this patch set, all nodes will keep going until storage back >> online, except if the following issue happens, then all nodes will >> do as before to fence self. >> 1. io error got >> 2. network between nodes down >> 3. nodes panic >> >> Junxiao Bi (6): >> ocfs2: o2hb: add negotiate timer >> ocfs2: o2hb: add NEGO_TIMEOUT message >> ocfs2: o2hb: add NEGOTIATE_APPROVE message >> ocfs2: o2hb: add some user/debug log >> ocfs2: o2hb: don't negotiate if last hb fail >> ocfs2: o2hb: fix hb hung time >> >> fs/ocfs2/cluster/heartbeat.c | 181 >> ++++++++++++++++++++++++++++++++++++++++-- >> 1 file changed, 175 insertions(+), 6 deletions(-) >> >> Thanks, >> Junxiao. >> >> _______________________________________________ >> Ocfs2-devel mailing list >> Ocfs2-devel@oss.oracle.com >> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >> >> > > _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel