[Cluster-devel] BUG() in iput() during unmount

2020-06-08 Thread Ross Lagerwall
Hi, During some testing, I hit an issue during unmount that was seems quite similar to the problem I reported a few weeks ago (BUG during umount() after withdrawal). I am using 5.7.1 which includes the patches for the earlier issue. Here is the log: [ 304.212698] connection1:0: ping timeout of

Re: [Cluster-devel] [PATCH] dlm: Switch to using wait_event()

2020-05-11 Thread Ross Lagerwall
Ping? On 4/29/20 1:15 PM, Ross Lagerwall wrote: > We saw an issue in a production server on a customer deployment where > DLM 4.0.7 gets "stuck" and unable to join new lockspaces. > > See - https://lists.clusterlabs.org/pipermail/users/2019-January/016054.html > >

Re: [Cluster-devel] [GFS2 PATCH 0/4] gfs2: misc withdraw patch fixes

2020-04-28 Thread Ross Lagerwall
t; fs/gfs2/util.c| 10 ++ > 4 files changed, 13 insertions(+), 6 deletions(-) > This patch series fixes the issue I reported last week regarding the BUG during unmount and my testcase now passes. Thanks! -- Ross Lagerwall

[Cluster-devel] BUG during umount() after withdrawal

2020-04-24 Thread Ross Lagerwall
Hi, I'm doing some testing on 5.7-rc2 which includes Bob's recovery patches. I used a new xfstest (see the end of this mail) which injects some IO errors to force the filesystem to be withdrawn and then checks that it can be remounted successfully. However, it hits a BUG() during umount() after

[Cluster-devel] Follow up to "kernel BUG at fs/gfs2/inode.h:64"

2019-12-13 Thread Ross Lagerwall
is some form of corruption that would be fixed by Bob's recovery patch series? [1] https://www.redhat.com/archives/cluster-devel/2019-January/msg7.html Thanks, -- Ross Lagerwall

Re: [Cluster-devel] [GFS2 PATCH 11/12] gfs2: Fix iomap write page reclaim deadlock

2019-06-11 Thread Ross Lagerwall
On 6/8/19 1:16 PM, Andreas Gruenbacher wrote: Hi Ross, On Fri, 7 Jun 2019 at 18:21, Ross Lagerwall wrote: On 5/7/19 9:32 PM, Andreas Gruenbacher wrote: Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing buffered writes by starting a transaction in i

Re: [Cluster-devel] [GFS2 PATCH 11/12] gfs2: Fix iomap write page reclaim deadlock

2019-06-07 Thread Ross Lagerwall
1/0 Revoke 0/0 Reverting this commit fixes the issue. Tested with git master as of today (16d72dd4891fe). Thanks, -- Ross Lagerwall

Re: [Cluster-devel] [PATCH 1/2] gfs2: Fix occasional glock use-after-free

2019-04-09 Thread Ross Lagerwall
-after-free. Thanks for working on this! -- Ross Lagerwall

Re: [Cluster-devel] [GFS2 PATCH V2] gfs2: clean_journal was setting sd_log_flush_head replaying other journals

2019-03-28 Thread Ross Lagerwall
On 3/27/19 5:35 PM, Bob Peterson wrote: Hi, Yesterday I posted a patch for this problem, but it was grossly inadequate. This patch, version 2, is another attempt to fix it. Many thanks to Ross Lagerwall for helping us identify, fix, and test the problem. Regards, Bob Peterson --- gfs2

[Cluster-devel] [PATCH v2] gfs2: Fix lru_count going negative

2019-03-27 Thread Ross Lagerwall
so that the LRU flag is added/removed precisely when the glock is added/removed from lru_list. Signed-off-by: Ross Lagerwall --- I've detached this from "gfs2: Fix occasional glock use-after-free" since this can go in separately while that is still under discussion. Changed in

Re: [Cluster-devel] gfs2 iomap dealock, IOMAP_F_UNBALANCED

2019-03-27 Thread Ross Lagerwall
ce impact. Tested-by: Ross Lagerwall

Re: [Cluster-devel] Assertion failure: sdp->sd_log_blks_free <= sdp->sd_jdesc->jd_blocks

2019-03-27 Thread Ross Lagerwall
+++ b/include/uapi/linux/gfs2_ondisk.h @@ -431,6 +431,7 @@ struct gfs2_ea_header { #define GFS2_LFC_TRANS_END 0x0100 #define GFS2_LFC_LOGD_JFLUSH_REQD 0x0200 #define GFS2_LFC_LOGD_AIL_FLUSH_REQD 0x0400 +#define GFS2_LFC_LOGD_TEST 0x0800 #define LH_V1_SIZE (offsetofend(struct gfs2_log_header, lh_hash)) Thanks, -- Ross Lagerwall

Re: [Cluster-devel] [PATCH 1/2] gfs2: Fix occasional glock use-after-free

2019-03-26 Thread Ross Lagerwall
On 1/31/19 5:18 PM, Andreas Gruenbacher wrote: Hi Ross, On Thu, 31 Jan 2019 at 11:56, Ross Lagerwall wrote: Each gfs2_bufdata stores a reference to a glock but the reference count isn't incremented. This causes an occasional use-after-free of the glock. Fix by taking a reference on the glock

Re: [Cluster-devel] Assertion failure: sdp->sd_log_blks_free <= sdp->sd_jdesc->jd_blocks

2019-03-22 Thread Ross Lagerwall
p->sd_jdesc->jd_blocks" failed [ 1104.061245]function = log_pull_tail, file = fs/gfs2/log.c, line = 510 It always seems to happen shortly after journal recovery. I added some debug logging at the point of the assertion failure and got the following: (snip) Any ideas about this? Thanks,

[Cluster-devel] Assertion failure: sdp->sd_log_blks_free <= sdp->sd_jdesc->jd_blocks

2019-03-19 Thread Ross Lagerwall
_log_head_flush during recovery which causes a subsequent call to gfs2_log_flush() to hit the assertion. Any ideas about this? Thanks, -- Ross Lagerwall

Re: [Cluster-devel] [PATCH] gfs2: Prevent writeback in gfs2_file_write_iter

2019-03-14 Thread Ross Lagerwall
WRITE_INODE, + wbc->sync_mode == WB_SYNC_ALL); + if (ret) + return ret; + } if (bdi->wb.dirty_exceeded) gfs2_ail1_flush(sdp, wbc); else --- Regards, -- Ross Lagerwall

Re: [Cluster-devel] [PATCH 1/2] gfs2: Fix occasional glock use-after-free

2019-02-01 Thread Ross Lagerwall
On 1/31/19 5:18 PM, Andreas Gruenbacher wrote: Hi Ross, On Thu, 31 Jan 2019 at 11:56, Ross Lagerwall wrote: Each gfs2_bufdata stores a reference to a glock but the reference count isn't incremented. This causes an occasional use-after-free of the glock. Fix by taking a reference on the glock

Re: [Cluster-devel] [PATCH 2/2] gfs2: Fix lru_count going negative

2019-01-31 Thread Ross Lagerwall
-by: Ross Lagerwall --- fs/gfs2/glock.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index b92740edc416..53e6c7e0c1b3 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -185,13 +185,14 @@ void gfs2_glock_add_to_lru(struct

[Cluster-devel] [PATCH 1/2] gfs2: Fix occasional glock use-after-free

2019-01-31 Thread Ross Lagerwall
, 88801aff6270) ... Signed-off-by: Ross Lagerwall --- fs/gfs2/aops.c| 3 +-- fs/gfs2/lops.c| 2 +- fs/gfs2/meta_io.c | 2 +- fs/gfs2/trans.c | 9 - fs/gfs2/trans.h | 2 ++ 5 files changed, 13 insertions(+), 5 deletions(-) diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c index 05dd78f4b2b3

[Cluster-devel] [PATCH 0/2] GFS2 counting fixes

2019-01-31 Thread Ross Lagerwall
Here are a couple of fixes for GFS2 (ref-)counting going wrong. Ross Lagerwall (2): gfs2: Fix occasional glock use-after-free gfs2: Fix lru_count going negative fs/gfs2/aops.c| 3 +-- fs/gfs2/glock.c | 16 +--- fs/gfs2/lops.c| 2 +- fs/gfs2/meta_io.c | 2 +- fs/gfs2

[Cluster-devel] [PATCH 2/2] gfs2: Fix lru_count going negative

2019-01-31 Thread Ross Lagerwall
so that the LRU flag is added/removed precisely when the glock is added/removed from lru_list. Signed-off-by: Ross Lagerwall --- fs/gfs2/glock.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index b92740edc416..53e6c7e0c1b3