On Mon, Aug 7, 2017 at 2:13 AM, Changwei Ge <ge.chang...@h3c.com> wrote: > Hi, > > In current code, while flushing AST, we don't handle an exception that > sending AST or BAST is failed. > But it is indeed possible that AST or BAST is lost due to some kind of > networks fault. > > If above exception happens, the requesting node will never obtain an AST > back, hence, it will never acquire the lock or abort current locking. > > With this patch, I'd like to fix this issue by re-queuing the AST or > BAST if sending is failed due to networks fault. > > And the re-queuing AST or BAST will be dropped if the requesting node is > dead! > > It will improve the reliability a lot.
Can you detail your testing? Code-wise this looks fine to me but as you note, this is a pretty hard to hit corner case so it'd be nice to hear that you were able to exercise it. Thanks, --Mark _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel