Sometimes io error is returned when storage is down for a while.  Like for
iscsi device, stroage is made offline when session timeout, and this will
make all io return -EIO.  For this case, nodes shouldn't do negotiate
timeout but should fence self.  So let nodes fence self when
o2hb_do_disk_heartbeat return an error, this is the same behavior with
o2hb without negotiate timer.

Signed-off-by: Junxiao Bi <junxiao...@oracle.com>
Reviewed-by: Ryan Ding <ryan.d...@oracle.com>
Cc: Gang He <g...@suse.com>
Cc: rwxybh <rwx...@126.com>
Cc: Mark Fasheh <mfas...@suse.de>
Cc: Joel Becker <jl...@evilplan.org>
Cc: Joseph Qi <joseph...@huawei.com>
Signed-off-by: Andrew Morton <a...@linux-foundation.org>
---
 fs/ocfs2/cluster/heartbeat.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 9f4a02ed85fd..c040fc3dd605 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -284,6 +284,9 @@ struct o2hb_region {
        /* Message key for negotiate timeout message. */
        unsigned int            hr_key;
        struct list_head        hr_handler_list;
+
+       /* last hb status, 0 for success, other value for error. */
+       int                     hr_last_hb_status;
 };
 
 struct o2hb_bio_wait_ctxt {
@@ -396,6 +399,12 @@ static void o2hb_nego_timeout(struct work_struct *work)
        struct o2hb_region *reg;
 
        reg = container_of(work, struct o2hb_region, hr_nego_timeout_work.work);
+       /* don't negotiate timeout if last hb failed since it is very
+        * possible io failed. Should let write timeout fence self.
+        */
+       if (reg->hr_last_hb_status)
+               return;
+
        o2hb_fill_node_map(live_node_bitmap, sizeof(live_node_bitmap));
        /* lowest node as master node to make negotiate decision. */
        master_node = find_next_bit(live_node_bitmap, O2NM_MAX_NODES, 0);
@@ -1229,6 +1238,7 @@ static int o2hb_thread(void *data)
                before_hb = ktime_get_real();
 
                ret = o2hb_do_disk_heartbeat(reg);
+               reg->hr_last_hb_status = ret;
 
                after_hb = ktime_get_real();
 
-- 
1.7.9.5


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Reply via email to