[Cluster-devel] [PATCH 3/6] GFS2: Fix slab memory leak in gfs2_bufdata
From: Bob Peterson rpete...@redhat.com This patch fixes a slab memory leak that sometimes can occur for files with a very short lifespan. The problem occurs when a dinode is deleted before it has gotten to the journal properly. In the leak scenario, the bd object is pinned for journal committment (queued to the metadata buffers queue: sd_log_le_buf) but is subsequently unpinned and dequeued before it finds its way to the ail or the revoke queue. In this rare circumstance, the bd object needs to be freed from slab memory, or it is forgotten. We have to be very careful how we do it, though, because multiple processes can call gfs2_remove_from_journal. In order to avoid double-frees, only the process that does the unpinning is allowed to free the bd. Signed-off-by: Bob Peterson rpete...@redhat.com Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c index 9324150..52f177b 100644 --- a/fs/gfs2/meta_io.c +++ b/fs/gfs2/meta_io.c @@ -258,6 +258,7 @@ void gfs2_remove_from_journal(struct buffer_head *bh, struct gfs2_trans *tr, int struct address_space *mapping = bh-b_page-mapping; struct gfs2_sbd *sdp = gfs2_mapping2sbd(mapping); struct gfs2_bufdata *bd = bh-b_private; + int was_pinned = 0; if (test_clear_buffer_pinned(bh)) { trace_gfs2_pin(bd, 0); @@ -273,12 +274,16 @@ void gfs2_remove_from_journal(struct buffer_head *bh, struct gfs2_trans *tr, int tr-tr_num_databuf_rm++; } tr-tr_touched = 1; + was_pinned = 1; brelse(bh); } if (bd) { spin_lock(sdp-sd_ail_lock); if (bd-bd_tr) { gfs2_trans_add_revoke(sdp, bd); + } else if (was_pinned) { + bh-b_private = NULL; + kmem_cache_free(gfs2_bufdata_cachep, bd); } spin_unlock(sdp-sd_ail_lock); } -- 1.8.3.1
[Cluster-devel] [PATCH 1/6] GFS2: don't hold s_umount over blkdev_put
This is a GFS2 version of Tejun's patch: 4f331f01b9c43bf001d3ffee578a97a1e0633eac vfs: don't hold s_umount over close_bdev_exclusive() call In this case its blkdev_put itself that is the issue and this patch uses the same solution of dropping and retaking s_umount. Reported-by: Tejun Heo t...@kernel.org Reported-by: Al Viro v...@zeniv.linux.org.uk Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 82303b4..52fa883 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -1366,8 +1366,18 @@ static struct dentry *gfs2_mount(struct file_system_type *fs_type, int flags, if (IS_ERR(s)) goto error_bdev; - if (s-s_root) + if (s-s_root) { + /* +* s_umount nests inside bd_mutex during +* __invalidate_device(). blkdev_put() acquires +* bd_mutex and can't be called under s_umount. Drop +* s_umount temporarily. This is safe as we're +* holding an active reference. +*/ + up_write(s-s_umount); blkdev_put(bdev, mode); + down_write(s-s_umount); + } memset(args, 0, sizeof(args)); args.ar_quota = GFS2_QUOTA_DEFAULT; -- 1.8.3.1
[Cluster-devel] [PATCH 2/6] GFS2: Fix use-after-free race when calling gfs2_remove_from_ail
From: Bob Peterson rpete...@redhat.com Function gfs2_remove_from_ail drops the reference on the bh via brelse. This patch fixes a race condition whereby bh is deferenced after the brelse when setting bd-bd_blkno = bh-b_blocknr; Under certain rare circumstances, bh might be gone or reused, and bd-bd_blkno is set to whatever that memory happens to be, which is often 0. Later, in gfs2_trans_add_unrevoke, that bd fails the test bd-bd_blkno = blkno which causes it to never be freed. The end result is that the bd is never freed from the bufdata cache, which results in this error: slab error in kmem_cache_destroy(): cache `gfs2_bufdata': Can't free all objects Signed-off-by: Bob Peterson rpete...@redhat.com Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index 610613f..9dcb977 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -551,10 +551,10 @@ void gfs2_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd) struct buffer_head *bh = bd-bd_bh; struct gfs2_glock *gl = bd-bd_gl; - gfs2_remove_from_ail(bd); - bd-bd_bh = NULL; bh-b_private = NULL; bd-bd_blkno = bh-b_blocknr; + gfs2_remove_from_ail(bd); /* drops ref on bh */ + bd-bd_bh = NULL; bd-bd_ops = gfs2_revoke_lops; sdp-sd_log_num_revoke++; atomic_inc(gl-gl_revokes); -- 1.8.3.1
[Cluster-devel] [PATCH 4/6] GFS2: Fix incorrect invalidation for DIO/buffered I/O
In patch 209806aba9d540dde3db0a5ce72307f85f33468f we allowed local deferred locks to be granted against a cached exclusive lock. That opened up a corner case which this patch now fixes. The solution to the problem is to check whether we have cached pages each time we do direct I/O and if so to unmap, flush and invalidate those pages. Since the glock state machine normally does that for us, mostly the code will be a no-op. Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c index b7fc035..73f3e4e 100644 --- a/fs/gfs2/aops.c +++ b/fs/gfs2/aops.c @@ -986,6 +986,7 @@ static ssize_t gfs2_direct_IO(int rw, struct kiocb *iocb, { struct file *file = iocb-ki_filp; struct inode *inode = file-f_mapping-host; + struct address_space *mapping = inode-i_mapping; struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_holder gh; int rv; @@ -1006,6 +1007,35 @@ static ssize_t gfs2_direct_IO(int rw, struct kiocb *iocb, if (rv != 1) goto out; /* dio not valid, fall back to buffered i/o */ + /* +* Now since we are holding a deferred (CW) lock at this point, you +* might be wondering why this is ever needed. There is a case however +* where we've granted a deferred local lock against a cached exclusive +* glock. That is ok provided all granted local locks are deferred, but +* it also means that it is possible to encounter pages which are +* cached and possibly also mapped. So here we check for that and sort +* them out ahead of the dio. The glock state machine will take care of +* everything else. +* +* If in fact the cached glock state (gl-gl_state) is deferred (CW) in +* the first place, mapping-nr_pages will always be zero. +*/ + if (mapping-nrpages) { + loff_t lstart = offset (PAGE_CACHE_SIZE - 1); + loff_t len = iov_length(iov, nr_segs); + loff_t end = PAGE_ALIGN(offset + len) - 1; + + rv = 0; + if (len == 0) + goto out; + if (test_and_clear_bit(GIF_SW_PAGED, ip-i_flags)) + unmap_shared_mapping_range(ip-i_inode.i_mapping, offset, len); + rv = filemap_write_and_wait_range(mapping, lstart, end); + if (rv) + return rv; + truncate_inode_pages_range(mapping, lstart, end); + } + rv = __blockdev_direct_IO(rw, iocb, inode, inode-i_sb-s_bdev, iov, offset, nr_segs, gfs2_get_block_direct, NULL, NULL, 0); -- 1.8.3.1
[Cluster-devel] [PATCH 6/6] GFS2: Fix unsafe dereference in dump_holder()
From: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp GLOCK_BUG_ON() might call this function without RCU read lock. Make sure that RCU read lock is held when using task_struct returned from pid_task(). Signed-off-by: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index c8420f7..6f7a47c 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -1655,6 +1655,7 @@ static int dump_holder(struct seq_file *seq, const struct gfs2_holder *gh) struct task_struct *gh_owner = NULL; char flags_buf[32]; + rcu_read_lock(); if (gh-gh_owner_pid) gh_owner = pid_task(gh-gh_owner_pid, PIDTYPE_PID); gfs2_print_dbg(seq, H: s:%s f:%s e:%d p:%ld [%s] %pS\n, @@ -1664,6 +1665,7 @@ static int dump_holder(struct seq_file *seq, const struct gfs2_holder *gh) gh-gh_owner_pid ? (long)pid_nr(gh-gh_owner_pid) : -1, gh_owner ? gh_owner-comm : (ended), (void *)gh-gh_ip); + rcu_read_unlock(); return 0; } -- 1.8.3.1
[Cluster-devel] GFS2: Pre-pull patch posting (fixes)
Hi, Here is a set of small fixes for GFS2. There is a fix to drop s_umount which is copied in from the core vfs, two patches relate to a hard to hit use after free and memory leak. Two patches related to using DIO and buffered I/O on the same file to ensure correct operation in relation to glock state changes. The final patch adds an RCU read lock to ensure correct locking on an error path, Steve.
[Cluster-devel] [PATCH 5/6] GFS2: Wait for async DIO in glock state changes
We need to wait for any outstanding DIO to complete in a couple of situations. Firstly, in case we are changing out of deferred mode (in inode_go_sync) where GLF_DIRTY will not be set. That call could be prefixed with a test for gl_state == LM_ST_DEFERRED but it doesn't seem worth it bearing in mind that the test for outstanding DIO is very quick anyway, in the usual case that there is none. The second case is in inode_go_lock which will catch the cases where we have a cached EX lock, but where we grant deferred locks against it so that there is no glock state transistion. We only need to wait if the state is not deferred, since DIO is valid anyway in that state. Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c index db908f6..f88dcd9 100644 --- a/fs/gfs2/glops.c +++ b/fs/gfs2/glops.c @@ -192,8 +192,11 @@ static void inode_go_sync(struct gfs2_glock *gl) if (ip !S_ISREG(ip-i_inode.i_mode)) ip = NULL; - if (ip test_and_clear_bit(GIF_SW_PAGED, ip-i_flags)) - unmap_shared_mapping_range(ip-i_inode.i_mapping, 0, 0); + if (ip) { + if (test_and_clear_bit(GIF_SW_PAGED, ip-i_flags)) + unmap_shared_mapping_range(ip-i_inode.i_mapping, 0, 0); + inode_dio_wait(ip-i_inode); + } if (!test_and_clear_bit(GLF_DIRTY, gl-gl_flags)) return; @@ -410,6 +413,9 @@ static int inode_go_lock(struct gfs2_holder *gh) return error; } + if (gh-gh_state != LM_ST_DEFERRED) + inode_dio_wait(ip-i_inode); + if ((ip-i_diskflags GFS2_DIF_TRUNC_IN_PROG) (gl-gl_state == LM_ST_EXCLUSIVE) (gh-gh_state == LM_ST_EXCLUSIVE)) { -- 1.8.3.1
[Cluster-devel] GFS2: Pull request (fixes)
Hi, Please consider pulling the following bug fixes, Steve. - The following changes since commit 7e3528c3660a2e8602abc7858b0994d611f74bc3: slab.h: remove duplicate kmalloc declaration and fix kernel-doc warnings (2013-11-24 11:01:16 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes.git tags/gfs2-fixes for you to fetch changes up to 0b3a2c9968d453d5827e635a6f3d69129f70af66: GFS2: Fix unsafe dereference in dump_holder() (2014-01-02 12:18:04 +) Here is a set of small fixes for GFS2. There is a fix to drop s_umount which is copied in from the core vfs, two patches relate to a hard to hit use after free and memory leak. Two patches related to using DIO and buffered I/O on the same file to ensure correct operation in relation to glock state changes. The final patch adds an RCU read lock to ensure correct locking on an error path. Bob Peterson (2): GFS2: Fix use-after-free race when calling gfs2_remove_from_ail GFS2: Fix slab memory leak in gfs2_bufdata Steven Whitehouse (3): GFS2: don't hold s_umount over blkdev_put GFS2: Fix incorrect invalidation for DIO/buffered I/O GFS2: Wait for async DIO in glock state changes Tetsuo Handa (1): GFS2: Fix unsafe dereference in dump_holder() fs/gfs2/aops.c | 30 ++ fs/gfs2/glock.c | 2 ++ fs/gfs2/glops.c | 10 -- fs/gfs2/log.c| 4 ++-- fs/gfs2/meta_io.c| 5 + fs/gfs2/ops_fstype.c | 12 +++- 6 files changed, 58 insertions(+), 5 deletions(-) signature.asc Description: This is a digitally signed message part