[Cluster-devel] [PATCH 3/6] GFS2: Fix slab memory leak in gfs2_bufdata

2014-01-02 Thread Steven Whitehouse
From: Bob Peterson rpete...@redhat.com

This patch fixes a slab memory leak that sometimes can occur
for files with a very short lifespan. The problem occurs when
a dinode is deleted before it has gotten to the journal properly.
In the leak scenario, the bd object is pinned for journal
committment (queued to the metadata buffers queue: sd_log_le_buf)
but is subsequently unpinned and dequeued before it finds its way
to the ail or the revoke queue. In this rare circumstance, the bd
object needs to be freed from slab memory, or it is forgotten.
We have to be very careful how we do it, though, because
multiple processes can call gfs2_remove_from_journal. In order to
avoid double-frees, only the process that does the unpinning is
allowed to free the bd.

Signed-off-by: Bob Peterson rpete...@redhat.com
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index 9324150..52f177b 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -258,6 +258,7 @@ void gfs2_remove_from_journal(struct buffer_head *bh, 
struct gfs2_trans *tr, int
struct address_space *mapping = bh-b_page-mapping;
struct gfs2_sbd *sdp = gfs2_mapping2sbd(mapping);
struct gfs2_bufdata *bd = bh-b_private;
+   int was_pinned = 0;
 
if (test_clear_buffer_pinned(bh)) {
trace_gfs2_pin(bd, 0);
@@ -273,12 +274,16 @@ void gfs2_remove_from_journal(struct buffer_head *bh, 
struct gfs2_trans *tr, int
tr-tr_num_databuf_rm++;
}
tr-tr_touched = 1;
+   was_pinned = 1;
brelse(bh);
}
if (bd) {
spin_lock(sdp-sd_ail_lock);
if (bd-bd_tr) {
gfs2_trans_add_revoke(sdp, bd);
+   } else if (was_pinned) {
+   bh-b_private = NULL;
+   kmem_cache_free(gfs2_bufdata_cachep, bd);
}
spin_unlock(sdp-sd_ail_lock);
}
-- 
1.8.3.1



[Cluster-devel] [PATCH 1/6] GFS2: don't hold s_umount over blkdev_put

2014-01-02 Thread Steven Whitehouse
This is a GFS2 version of Tejun's patch:
4f331f01b9c43bf001d3ffee578a97a1e0633eac
vfs: don't hold s_umount over close_bdev_exclusive() call

In this case its blkdev_put itself that is the issue and this
patch uses the same solution of dropping and retaking s_umount.

Reported-by: Tejun Heo t...@kernel.org
Reported-by: Al Viro v...@zeniv.linux.org.uk
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 82303b4..52fa883 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1366,8 +1366,18 @@ static struct dentry *gfs2_mount(struct file_system_type 
*fs_type, int flags,
if (IS_ERR(s))
goto error_bdev;
 
-   if (s-s_root)
+   if (s-s_root) {
+   /*
+* s_umount nests inside bd_mutex during
+* __invalidate_device().  blkdev_put() acquires
+* bd_mutex and can't be called under s_umount.  Drop
+* s_umount temporarily.  This is safe as we're
+* holding an active reference.
+*/
+   up_write(s-s_umount);
blkdev_put(bdev, mode);
+   down_write(s-s_umount);
+   }
 
memset(args, 0, sizeof(args));
args.ar_quota = GFS2_QUOTA_DEFAULT;
-- 
1.8.3.1



[Cluster-devel] [PATCH 2/6] GFS2: Fix use-after-free race when calling gfs2_remove_from_ail

2014-01-02 Thread Steven Whitehouse
From: Bob Peterson rpete...@redhat.com

Function gfs2_remove_from_ail drops the reference on the bh via
brelse. This patch fixes a race condition whereby bh is deferenced
after the brelse when setting bd-bd_blkno = bh-b_blocknr;
Under certain rare circumstances, bh might be gone or reused,
and bd-bd_blkno is set to whatever that memory happens to be,
which is often 0. Later, in gfs2_trans_add_unrevoke, that bd fails
the test bd-bd_blkno = blkno which causes it to never be freed.
The end result is that the bd is never freed from the bufdata cache,
which results in this error:
slab error in kmem_cache_destroy(): cache `gfs2_bufdata': Can't free all objects

Signed-off-by: Bob Peterson rpete...@redhat.com
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 610613f..9dcb977 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -551,10 +551,10 @@ void gfs2_add_revoke(struct gfs2_sbd *sdp, struct 
gfs2_bufdata *bd)
struct buffer_head *bh = bd-bd_bh;
struct gfs2_glock *gl = bd-bd_gl;
 
-   gfs2_remove_from_ail(bd);
-   bd-bd_bh = NULL;
bh-b_private = NULL;
bd-bd_blkno = bh-b_blocknr;
+   gfs2_remove_from_ail(bd); /* drops ref on bh */
+   bd-bd_bh = NULL;
bd-bd_ops = gfs2_revoke_lops;
sdp-sd_log_num_revoke++;
atomic_inc(gl-gl_revokes);
-- 
1.8.3.1



[Cluster-devel] [PATCH 4/6] GFS2: Fix incorrect invalidation for DIO/buffered I/O

2014-01-02 Thread Steven Whitehouse
In patch 209806aba9d540dde3db0a5ce72307f85f33468f we allowed
local deferred locks to be granted against a cached exclusive
lock. That opened up a corner case which this patch now
fixes.

The solution to the problem is to check whether we have cached
pages each time we do direct I/O and if so to unmap, flush
and invalidate those pages. Since the glock state machine
normally does that for us, mostly the code will be a no-op.

Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index b7fc035..73f3e4e 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -986,6 +986,7 @@ static ssize_t gfs2_direct_IO(int rw, struct kiocb *iocb,
 {
struct file *file = iocb-ki_filp;
struct inode *inode = file-f_mapping-host;
+   struct address_space *mapping = inode-i_mapping;
struct gfs2_inode *ip = GFS2_I(inode);
struct gfs2_holder gh;
int rv;
@@ -1006,6 +1007,35 @@ static ssize_t gfs2_direct_IO(int rw, struct kiocb *iocb,
if (rv != 1)
goto out; /* dio not valid, fall back to buffered i/o */
 
+   /*
+* Now since we are holding a deferred (CW) lock at this point, you
+* might be wondering why this is ever needed. There is a case however
+* where we've granted a deferred local lock against a cached exclusive
+* glock. That is ok provided all granted local locks are deferred, but
+* it also means that it is possible to encounter pages which are
+* cached and possibly also mapped. So here we check for that and sort
+* them out ahead of the dio. The glock state machine will take care of
+* everything else.
+*
+* If in fact the cached glock state (gl-gl_state) is deferred (CW) in
+* the first place, mapping-nr_pages will always be zero.
+*/
+   if (mapping-nrpages) {
+   loff_t lstart = offset  (PAGE_CACHE_SIZE - 1);
+   loff_t len = iov_length(iov, nr_segs);
+   loff_t end = PAGE_ALIGN(offset + len) - 1;
+
+   rv = 0;
+   if (len == 0)
+   goto out;
+   if (test_and_clear_bit(GIF_SW_PAGED, ip-i_flags))
+   unmap_shared_mapping_range(ip-i_inode.i_mapping, 
offset, len);
+   rv = filemap_write_and_wait_range(mapping, lstart, end);
+   if (rv)
+   return rv;
+   truncate_inode_pages_range(mapping, lstart, end);
+   }
+
rv = __blockdev_direct_IO(rw, iocb, inode, inode-i_sb-s_bdev, iov,
  offset, nr_segs, gfs2_get_block_direct,
  NULL, NULL, 0);
-- 
1.8.3.1



[Cluster-devel] [PATCH 6/6] GFS2: Fix unsafe dereference in dump_holder()

2014-01-02 Thread Steven Whitehouse
From: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp

GLOCK_BUG_ON() might call this function without RCU read lock. Make sure that
RCU read lock is held when using task_struct returned from pid_task().

Signed-off-by: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index c8420f7..6f7a47c 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1655,6 +1655,7 @@ static int dump_holder(struct seq_file *seq, const struct 
gfs2_holder *gh)
struct task_struct *gh_owner = NULL;
char flags_buf[32];
 
+   rcu_read_lock();
if (gh-gh_owner_pid)
gh_owner = pid_task(gh-gh_owner_pid, PIDTYPE_PID);
gfs2_print_dbg(seq,  H: s:%s f:%s e:%d p:%ld [%s] %pS\n,
@@ -1664,6 +1665,7 @@ static int dump_holder(struct seq_file *seq, const struct 
gfs2_holder *gh)
   gh-gh_owner_pid ? (long)pid_nr(gh-gh_owner_pid) : -1,
   gh_owner ? gh_owner-comm : (ended),
   (void *)gh-gh_ip);
+   rcu_read_unlock();
return 0;
 }
 
-- 
1.8.3.1



[Cluster-devel] GFS2: Pre-pull patch posting (fixes)

2014-01-02 Thread Steven Whitehouse
Hi,

Here is a set of small fixes for GFS2. There is a fix to drop
s_umount which is copied in from the core vfs, two patches
relate to a hard to hit use after free and memory leak.
Two patches related to using DIO and buffered I/O on the same
file to ensure correct operation in relation to glock state
changes. The final patch adds an RCU read lock to ensure
correct locking on an error path,

Steve.




[Cluster-devel] [PATCH 5/6] GFS2: Wait for async DIO in glock state changes

2014-01-02 Thread Steven Whitehouse
We need to wait for any outstanding DIO to complete in a couple
of situations. Firstly, in case we are changing out of deferred
mode (in inode_go_sync) where GLF_DIRTY will not be set. That
call could be prefixed with a test for gl_state == LM_ST_DEFERRED
but it doesn't seem worth it bearing in mind that the test for
outstanding DIO is very quick anyway, in the usual case that there
is none.

The second case is in inode_go_lock which will catch the cases
where we have a cached EX lock, but where we grant deferred locks
against it so that there is no glock state transistion. We only
need to wait if the state is not deferred, since DIO is valid
anyway in that state.

Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index db908f6..f88dcd9 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -192,8 +192,11 @@ static void inode_go_sync(struct gfs2_glock *gl)
 
if (ip  !S_ISREG(ip-i_inode.i_mode))
ip = NULL;
-   if (ip  test_and_clear_bit(GIF_SW_PAGED, ip-i_flags))
-   unmap_shared_mapping_range(ip-i_inode.i_mapping, 0, 0);
+   if (ip) {
+   if (test_and_clear_bit(GIF_SW_PAGED, ip-i_flags))
+   unmap_shared_mapping_range(ip-i_inode.i_mapping, 0, 0);
+   inode_dio_wait(ip-i_inode);
+   }
if (!test_and_clear_bit(GLF_DIRTY, gl-gl_flags))
return;
 
@@ -410,6 +413,9 @@ static int inode_go_lock(struct gfs2_holder *gh)
return error;
}
 
+   if (gh-gh_state != LM_ST_DEFERRED)
+   inode_dio_wait(ip-i_inode);
+
if ((ip-i_diskflags  GFS2_DIF_TRUNC_IN_PROG) 
(gl-gl_state == LM_ST_EXCLUSIVE) 
(gh-gh_state == LM_ST_EXCLUSIVE)) {
-- 
1.8.3.1



[Cluster-devel] GFS2: Pull request (fixes)

2014-01-02 Thread Steven Whitehouse
Hi,

Please consider pulling the following bug fixes,

Steve.

-

The following changes since commit 7e3528c3660a2e8602abc7858b0994d611f74bc3:

  slab.h: remove duplicate kmalloc declaration and fix kernel-doc warnings 
(2013-11-24 11:01:16 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes.git 
tags/gfs2-fixes

for you to fetch changes up to 0b3a2c9968d453d5827e635a6f3d69129f70af66:

  GFS2: Fix unsafe dereference in dump_holder() (2014-01-02 12:18:04 +)


Here is a set of small fixes for GFS2. There is a fix to drop
s_umount which is copied in from the core vfs, two patches
relate to a hard to hit use after free and memory leak.
Two patches related to using DIO and buffered I/O on the same
file to ensure correct operation in relation to glock state
changes. The final patch adds an RCU read lock to ensure
correct locking on an error path.


Bob Peterson (2):
  GFS2: Fix use-after-free race when calling gfs2_remove_from_ail
  GFS2: Fix slab memory leak in gfs2_bufdata

Steven Whitehouse (3):
  GFS2: don't hold s_umount over blkdev_put
  GFS2: Fix incorrect invalidation for DIO/buffered I/O
  GFS2: Wait for async DIO in glock state changes

Tetsuo Handa (1):
  GFS2: Fix unsafe dereference in dump_holder()

 fs/gfs2/aops.c   | 30 ++
 fs/gfs2/glock.c  |  2 ++
 fs/gfs2/glops.c  | 10 --
 fs/gfs2/log.c|  4 ++--
 fs/gfs2/meta_io.c|  5 +
 fs/gfs2/ops_fstype.c | 12 +++-
 6 files changed, 58 insertions(+), 5 deletions(-)



signature.asc
Description: This is a digitally signed message part