[Cluster-devel] [GFS2 PATCH] Revert "gfs2: read journal in large chunks to locate the head"

2019-02-13 Thread Abhi Das
This reverts commit 2a5f14f279f59143139bcd1606903f2f80a34241. This patch causes xfstests generic/311 to fail. Reverting this for now until we have a proper fix. Signed-off-by: Abhi Das --- fs/gfs2/glops.c | 1 - fs/gfs2/log.c| 4 +- fs/gfs2/lops.c | 190

[Cluster-devel] [PATCH] Revert "gfs2: read journal in large chunks to locate the head"

2019-02-13 Thread Bob Peterson
This reverts commit 2a5f14f279f59143139bcd1606903f2f80a34241. This patch causes xfstests generic/311 to fail. Reverting this for now until we have a proper fix. Signed-off-by: Abhi Das Signed-off-by: Bob Peterson --- fs/gfs2/glops.c | 1 - fs/gfs2/log.c| 4 +- fs/gfs2/lops.c

[Cluster-devel] [GFS2 PATCH 6/9] gfs2: Make secondary withdrawers wait for first withdrawer

2019-02-13 Thread Bob Peterson
Before this patch, if a process encountered an error and decided to withdraw, if another process was already in the process of withdrawing, the secondary withdraw would be silently ignored, which set it free to proceed with its processing, unlock any locks, etc. That's correct behavior if the

[Cluster-devel] [GFS2 PATCH 9/9] dlm: recover slot regardless of whether we still have a connection

2019-02-13 Thread Bob Peterson
Before this patch dlm would skip the recover_slot phase of recovery if it still had a valid comm connection to the failed node. However, gfs2 still needs to perform journal replay, otherwise we run the risk of journal replay that happens at reboot time overwriting metadata we've since modified

[Cluster-devel] [GFS2 PATCH 4/9] gfs2: Force withdraw to replay journals and wait for it to finish

2019-02-13 Thread Bob Peterson
When a node withdraws from a file system, it often leaves its journal in an incomplete state. This is especially true when the withdraw is caused by io errors writing to the journal. Before this patch, a withdraw would try to write a "shutdown" record to the journal, tell dlm it's done with the

[Cluster-devel] [GFS2 PATCH 5/9] gfs2: Keep transactions on ail1 list until after issuing revokes

2019-02-13 Thread Bob Peterson
Before this patch, function gfs2_write_revokes would call function gfs2_ail1_empty, then run the ail1 list, issuing revokes. But gfs2_ail1_empty can move transactions to the ail2 list, and thus, their revokes were never issued. This patch adds a new parameter to gfs2_ail1_empty that allows the

[Cluster-devel] [GFS2 PATCH 8/9] gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty

2019-02-13 Thread Bob Peterson
Before this patch, if gfs2_ail_empty_gl saw there was nothing on the ail list, it would return and not flush the log. The problem is that there could still be a revoke for the rgrp sitting on the sd_log_le_revoke list that's been recently taken off the ail list. But that revoke still needs to be

[Cluster-devel] [GFS2 PATCH 3/9] gfs2: Empty the ail for the glock when rgrps are invalidated

2019-02-13 Thread Bob Peterson
Before this patch, function rgrp_go_inval would not invalidate the ail list, which meant that there might still be buffers outstanding on the ail that had revokes still pending. If the revokes had still not been written when the glock was given to another node, and that node (with outstanding

[Cluster-devel] [GFS2 PATCH 0/9] GFS2: Withdraw corruption patches

2019-02-13 Thread Bob Peterson
I consider this more of a preliminary "collection" of patches rather than a "patch set" per se. In other words, most of these do not rely upon the previous patches, although some do. Some of them may be removed without a lot of difficulty if they are found to be problematic. I thought about

[Cluster-devel] [GFS2 PATCH 2/9] gfs2: Ignore recovery attempts if gfs2 has io error or is withdrawn

2019-02-13 Thread Bob Peterson
This patch addresses various problems with gfs2/dlm recovery. For example, suppose a node with a bunch of gfs2 mounts suddenly reboots due to kernel panic, and dlm determines it should perform recovery. DLM does so from a pseudo-state machine calling various callbacks into lock_dlm to perform a

[Cluster-devel] [GFS2 PATCH 7/9] gfs2: Check for log write errors and withdraw in rgrp_go_inval

2019-02-13 Thread Bob Peterson
Before this patch, function rgrp_go_inval just assumed all the writes submitted to the journal were finished and successful. But if they're not, and a revoke fails to make its way to the journal, a journal replay on another node will cause corruption if we let the go_inval function continue and

[Cluster-devel] [GFS2 PATCH 1/9] gfs2: Introduce concept of a pending withdraw

2019-02-13 Thread Bob Peterson
File system withdraws can be delayed when inconsistencies are discovered when we cannot withdraw immediately, for example, when critical spin_locks are held. But delaying the withdraw can cause gfs2 to ignore the error and keep running for a short period of time. For example, an rgrp glock may be