[Cluster-devel] [PATCH 01/15] GFS2: directly return gfs2_dir_check()

2014-12-08 Thread Steven Whitehouse
From: Fabian Frederick f...@skynet.be

No need to store gfs2_dir_check result and test it before returning.

Signed-off-by: Fabian Frederick f...@skynet.be
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index c4ed823..b41b5c7 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1045,11 +1045,7 @@ static int gfs2_unlink_ok(struct gfs2_inode *dip, const 
struct qstr *name,
if (error)
return error;
 
-   error = gfs2_dir_check(dip-i_inode, name, ip);
-   if (error)
-   return error;
-
-   return 0;
+   return gfs2_dir_check(dip-i_inode, name, ip);
 }
 
 /**
-- 
1.8.3.1



[Cluster-devel] [PATCH 07/15] GFS2: Update timestamps on fallocate

2014-12-08 Thread Steven Whitehouse
From: Andrew Price anpr...@redhat.com

gfs2_fallocate() wasn't updating ctime and mtime when modifying the
inode. Add a call to file_update_time() to do that.

Signed-off-by: Andrew Price anpr...@redhat.com
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 1182164..6e600ab 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -874,7 +874,8 @@ retry:
 
if (!(mode  FALLOC_FL_KEEP_SIZE)  (pos + count)  inode-i_size) {
i_size_write(inode, pos + count);
-   mark_inode_dirty(inode);
+   /* Marks the inode as dirty */
+   file_update_time(file);
}
 
return generic_write_sync(file, pos, count);
-- 
1.8.3.1



[Cluster-devel] GFS2: Pre-pull patch posting (merge window)

2014-12-08 Thread Steven Whitehouse
Hi,

In contrast to recent merge windows, there are a number of interesting features
this time. There is a set of patches to improve performance in relation to
block reservations. Some correctness fixes for fallocate, and an update
to the freeze/thaw code which greatly simplyfies this code path. In
addition there is a set of clean ups from Al Viro too,

Steve.



[Cluster-devel] [PATCH 10/15] GFS2: Deletion of unnecessary checks before two function calls

2014-12-08 Thread Steven Whitehouse
From: Markus Elfring elfr...@users.sourceforge.net

The functions iput() and put_pid() test whether their argument is NULL
and then return immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring elfr...@users.sourceforge.net
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 8f0c19d..a23524a 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -836,8 +836,7 @@ void gfs2_holder_reinit(unsigned int state, unsigned flags, 
struct gfs2_holder *
gh-gh_flags = flags;
gh-gh_iflags = 0;
gh-gh_ip = _RET_IP_;
-   if (gh-gh_owner_pid)
-   put_pid(gh-gh_owner_pid);
+   put_pid(gh-gh_owner_pid);
gh-gh_owner_pid = get_pid(task_pid(current));
 }
 
-- 
1.8.3.1



[Cluster-devel] [PATCH 04/15] GFS2: If we use up our block reservation, request more next time

2014-12-08 Thread Steven Whitehouse
From: Bob Peterson rpete...@redhat.com

If we run out of blocks for a given multi-block allocation, we obviously
did not reserve enough. We should reserve more blocks for the next
reservation to reduce fragmentation. This patch increases the size hint
for reservations when they run out.

Signed-off-by: Bob Peterson rpete...@redhat.com
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index f4e4a0c..9150207 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -2251,6 +2251,9 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
trace_gfs2_rs(rs, TRACE_RS_CLAIM);
if (rs-rs_free  !ret)
goto out;
+   /* We used up our block reservation, so we should
+  reserve more blocks next time. */
+   atomic_add(RGRP_RSRV_ADDBLKS, rs-rs_sizehint);
}
__rs_deltree(rs);
}
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index 5d8f085..b104f4a 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -20,6 +20,7 @@
  */
 #define RGRP_RSRV_MINBYTES 8
 #define RGRP_RSRV_MINBLKS ((u32)(RGRP_RSRV_MINBYTES * GFS2_NBBY))
+#define RGRP_RSRV_ADDBLKS 64
 
 struct gfs2_rgrpd;
 struct gfs2_sbd;
-- 
1.8.3.1



[Cluster-devel] [PATCH 06/15] GFS2: Update i_size properly on fallocate

2014-12-08 Thread Steven Whitehouse
From: Andrew Price anpr...@redhat.com

This addresses an issue caught by fsx where the inode size was not being
updated to the expected value after fallocate(2) with mode 0.

The problem was caused by the offset and len parameters being converted
to multiples of the file system's block size, so i_size would be rounded
up to the nearest block size multiple instead of the requested size.

This replaces the per-chunk i_size updates with a single i_size_write on
successful completion of the operation.  With this patch gfs2 gets
through a complete run of fsx.

For clarity, the check for (error == 0) following the loop is removed as
all failures before that point jump to out_* labels or return.

Signed-off-by: Andrew Price anpr...@redhat.com
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 3786579..1182164 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -729,7 +729,6 @@ static int fallocate_chunk(struct inode *inode, loff_t 
offset, loff_t len,
struct gfs2_inode *ip = GFS2_I(inode);
struct buffer_head *dibh;
int error;
-   loff_t size = len;
unsigned int nr_blks;
sector_t lblock = offset  inode-i_blkbits;
 
@@ -763,11 +762,6 @@ static int fallocate_chunk(struct inode *inode, loff_t 
offset, loff_t len,
goto out;
}
}
-   if (offset + size  inode-i_size  !(mode  FALLOC_FL_KEEP_SIZE))
-   i_size_write(inode, offset + size);
-
-   mark_inode_dirty(inode);
-
 out:
brelse(dibh);
return error;
@@ -878,9 +872,12 @@ retry:
gfs2_quota_unlock(ip);
}
 
-   if (error == 0)
-   error = generic_write_sync(file, pos, count);
-   return error;
+   if (!(mode  FALLOC_FL_KEEP_SIZE)  (pos + count)  inode-i_size) {
+   i_size_write(inode, pos + count);
+   mark_inode_dirty(inode);
+   }
+
+   return generic_write_sync(file, pos, count);
 
 out_trans_fail:
gfs2_inplace_release(ip);
-- 
1.8.3.1



[Cluster-devel] [PATCH 05/15] GFS2: Use inode_newsize_ok and get_write_access in fallocate

2014-12-08 Thread Steven Whitehouse
From: Andrew Price anpr...@redhat.com

gfs2_fallocate wasn't checking inode_newsize_ok nor get_write_access.
Split out the context setup and inode locking pieces into a separate
function to make it more clear and add these missing calls.

inode_newsize_ok is called conditional on FALLOC_FL_KEEP_SIZE as there
is no need to enforce a file size limit if it isn't going to change.

Signed-off-by: Andrew Price anpr...@redhat.com
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 5ebe568..3786579 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -797,8 +797,7 @@ static void calc_max_reserv(struct gfs2_inode *ip, loff_t 
max, loff_t *len,
}
 }
 
-static long gfs2_fallocate(struct file *file, int mode, loff_t offset,
-  loff_t len)
+static long __gfs2_fallocate(struct file *file, int mode, loff_t offset, 
loff_t len)
 {
struct inode *inode = file_inode(file);
struct gfs2_sbd *sdp = GFS2_SB(inode);
@@ -812,14 +811,9 @@ static long gfs2_fallocate(struct file *file, int mode, 
loff_t offset,
loff_t bsize_mask = ~((loff_t)sdp-sd_sb.sb_bsize - 1);
loff_t next = (offset + len - 1)  sdp-sd_sb.sb_bsize_shift;
loff_t max_chunk_size = UINT_MAX  bsize_mask;
-   struct gfs2_holder gh;
 
next = (next + 1)  sdp-sd_sb.sb_bsize_shift;
 
-   /* We only support the FALLOC_FL_KEEP_SIZE mode */
-   if (mode  ~FALLOC_FL_KEEP_SIZE)
-   return -EOPNOTSUPP;
-
offset = bsize_mask;
 
len = next - offset;
@@ -830,17 +824,6 @@ static long gfs2_fallocate(struct file *file, int mode, 
loff_t offset,
if (bytes == 0)
bytes = sdp-sd_sb.sb_bsize;
 
-   error = gfs2_rs_alloc(ip);
-   if (error)
-   return error;
-
-   mutex_lock(inode-i_mutex);
-
-   gfs2_holder_init(ip-i_gl, LM_ST_EXCLUSIVE, 0, gh);
-   error = gfs2_glock_nq(gh);
-   if (unlikely(error))
-   goto out_uninit;
-
gfs2_size_hint(file, offset, len);
 
while (len  0) {
@@ -853,8 +836,7 @@ static long gfs2_fallocate(struct file *file, int mode, 
loff_t offset,
}
error = gfs2_quota_lock_check(ip);
if (error)
-   goto out_unlock;
-
+   return error;
 retry:
gfs2_write_calc_reserv(ip, bytes, data_blocks, ind_blocks);
 
@@ -898,18 +880,58 @@ retry:
 
if (error == 0)
error = generic_write_sync(file, pos, count);
-   goto out_unlock;
+   return error;
 
 out_trans_fail:
gfs2_inplace_release(ip);
 out_qunlock:
gfs2_quota_unlock(ip);
+   return error;
+}
+
+static long gfs2_fallocate(struct file *file, int mode, loff_t offset, loff_t 
len)
+{
+   struct inode *inode = file_inode(file);
+   struct gfs2_inode *ip = GFS2_I(inode);
+   struct gfs2_holder gh;
+   int ret;
+
+   if (mode  ~FALLOC_FL_KEEP_SIZE)
+   return -EOPNOTSUPP;
+
+   mutex_lock(inode-i_mutex);
+
+   gfs2_holder_init(ip-i_gl, LM_ST_EXCLUSIVE, 0, gh);
+   ret = gfs2_glock_nq(gh);
+   if (ret)
+   goto out_uninit;
+
+   if (!(mode  FALLOC_FL_KEEP_SIZE) 
+   (offset + len)  inode-i_size) {
+   ret = inode_newsize_ok(inode, offset + len);
+   if (ret)
+   goto out_unlock;
+   }
+
+   ret = get_write_access(inode);
+   if (ret)
+   goto out_unlock;
+
+   ret = gfs2_rs_alloc(ip);
+   if (ret)
+   goto out_putw;
+
+   ret = __gfs2_fallocate(file, mode, offset, len);
+   if (ret)
+   gfs2_rs_deltree(ip-i_res);
+out_putw:
+   put_write_access(inode);
 out_unlock:
gfs2_glock_dq(gh);
 out_uninit:
gfs2_holder_uninit(gh);
mutex_unlock(inode-i_mutex);
-   return error;
+   return ret;
 }
 
 #ifdef CONFIG_GFS2_FS_LOCKING_DLM
-- 
1.8.3.1



[Cluster-devel] [PATCH 03/15] GFS2: Only increase rs_sizehint

2014-12-08 Thread Steven Whitehouse
From: Bob Peterson rpete...@redhat.com

If an application does a sequence of (1) big write, (2) little write
we don't necessarily want to reset the size hint based on the smaller
size. The fact that they did any big writes implies they may do more,
and therefore we should try to allocate bigger block reservations, even
if the last few were small writes. Therefore this patch changes function
gfs2_size_hint so that the size hint can only grow; it cannot shrink.
This is especially important where there are multiple writers.

Signed-off-by: Bob Peterson rpete...@redhat.com
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 80dd44d..5ebe568 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -337,7 +337,8 @@ static void gfs2_size_hint(struct file *filep, loff_t 
offset, size_t size)
size_t blks = (size + sdp-sd_sb.sb_bsize - 1)  
sdp-sd_sb.sb_bsize_shift;
int hint = min_t(size_t, INT_MAX, blks);
 
-   atomic_set(ip-i_res-rs_sizehint, hint);
+   if (hint  atomic_read(ip-i_res-rs_sizehint))
+   atomic_set(ip-i_res-rs_sizehint, hint);
 }
 
 /**
-- 
1.8.3.1



[Cluster-devel] [PATCH 11/15] GFS2: bugger off early if O_CREAT open finds a directory

2014-12-08 Thread Steven Whitehouse
From: Al Viro v...@zeniv.linux.org.uk

Signed-off-by: Al Viro v...@zeniv.linux.org.uk
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 04065e5..f41b2fd 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -624,6 +624,11 @@ static int gfs2_create_inode(struct inode *dir, struct 
dentry *dentry,
inode = gfs2_dir_search(dir, dentry-d_name, !S_ISREG(mode) || excl);
error = PTR_ERR(inode);
if (!IS_ERR(inode)) {
+   if (S_ISDIR(inode-i_mode)) {
+   iput(inode);
+   inode = ERR_PTR(-EISDIR);
+   goto fail_gunlock;
+   }
d = d_splice_alias(inode, dentry);
error = PTR_ERR(d);
if (IS_ERR(d)) {
-- 
1.8.3.1



[Cluster-devel] [PATCH 02/15] GFS2: Set of distributed preferences for rgrps

2014-12-08 Thread Steven Whitehouse
From: Bob Peterson rpete...@redhat.com

This patch tries to use the journal numbers to evenly distribute
which node prefers which resource group for block allocations. This
is to help performance.

Signed-off-by: Bob Peterson rpete...@redhat.com
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 39e7e99..1b89918 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -97,6 +97,7 @@ struct gfs2_rgrpd {
 #define GFS2_RDF_CHECK 0x1000 /* check for unlinked inodes */
 #define GFS2_RDF_UPTODATE  0x2000 /* rg is up to date */
 #define GFS2_RDF_ERROR 0x4000 /* error in rg */
+#define GFS2_RDF_PREFERRED 0x8000 /* This rgrp is preferred */
 #define GFS2_RDF_MASK  0xf000 /* mask for internal flags */
spinlock_t rd_rsspin;   /* protects reservation related vars */
struct rb_root rd_rstree;   /* multi-block reservation tree */
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 7474c41..f4e4a0c 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -936,7 +936,7 @@ static int read_rindex_entry(struct gfs2_inode *ip)
rgd-rd_gl-gl_vm.start = rgd-rd_addr * bsize;
rgd-rd_gl-gl_vm.end = rgd-rd_gl-gl_vm.start + (rgd-rd_length * 
bsize) - 1;
rgd-rd_rgl = (struct gfs2_rgrp_lvb *)rgd-rd_gl-gl_lksb.sb_lvbptr;
-   rgd-rd_flags = ~GFS2_RDF_UPTODATE;
+   rgd-rd_flags = ~(GFS2_RDF_UPTODATE | GFS2_RDF_PREFERRED);
if (rgd-rd_data  sdp-sd_max_rg_data)
sdp-sd_max_rg_data = rgd-rd_data;
spin_lock(sdp-sd_rindex_spin);
@@ -955,6 +955,36 @@ fail:
 }
 
 /**
+ * set_rgrp_preferences - Run all the rgrps, selecting some we prefer to use
+ * @sdp: the GFS2 superblock
+ *
+ * The purpose of this function is to select a subset of the resource groups
+ * and mark them as PREFERRED. We do it in such a way that each node prefers
+ * to use a unique set of rgrps to minimize glock contention.
+ */
+static void set_rgrp_preferences(struct gfs2_sbd *sdp)
+{
+   struct gfs2_rgrpd *rgd, *first;
+   int i;
+
+   /* Skip an initial number of rgrps, based on this node's journal ID.
+  That should start each node out on its own set. */
+   rgd = gfs2_rgrpd_get_first(sdp);
+   for (i = 0; i  sdp-sd_lockstruct.ls_jid; i++)
+   rgd = gfs2_rgrpd_get_next(rgd);
+   first = rgd;
+
+   do {
+   rgd-rd_flags |= GFS2_RDF_PREFERRED;
+   for (i = 0; i  sdp-sd_journals; i++) {
+   rgd = gfs2_rgrpd_get_next(rgd);
+   if (rgd == first)
+   break;
+   }
+   } while (rgd != first);
+}
+
+/**
  * gfs2_ri_update - Pull in a new resource index from the disk
  * @ip: pointer to the rindex inode
  *
@@ -973,6 +1003,8 @@ static int gfs2_ri_update(struct gfs2_inode *ip)
if (error  0)
return error;
 
+   set_rgrp_preferences(sdp);
+
sdp-sd_rindex_uptodate = 1;
return 0;
 }
@@ -1891,6 +1923,25 @@ static bool gfs2_select_rgrp(struct gfs2_rgrpd **pos, 
const struct gfs2_rgrpd *b
 }
 
 /**
+ * fast_to_acquire - determine if a resource group will be fast to acquire
+ *
+ * If this is one of our preferred rgrps, it should be quicker to acquire,
+ * because we tried to set ourselves up as dlm lock master.
+ */
+static inline int fast_to_acquire(struct gfs2_rgrpd *rgd)
+{
+   struct gfs2_glock *gl = rgd-rd_gl;
+
+   if (gl-gl_state != LM_ST_UNLOCKED  list_empty(gl-gl_holders) 
+   !test_bit(GLF_DEMOTE_IN_PROGRESS, gl-gl_flags) 
+   !test_bit(GLF_DEMOTE, gl-gl_flags))
+   return 1;
+   if (rgd-rd_flags  GFS2_RDF_PREFERRED)
+   return 1;
+   return 0;
+}
+
+/**
  * gfs2_inplace_reserve - Reserve space in the filesystem
  * @ip: the inode to reserve space for
  * @ap: the allocation parameters
@@ -1932,10 +1983,15 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, const 
struct gfs2_alloc_parms *a
rg_locked = 0;
if (skip  skip--)
goto next_rgrp;
-   if (!gfs2_rs_active(rs)  (loops  2) 
-gfs2_rgrp_used_recently(rs, 1000) 
-gfs2_rgrp_congested(rs-rs_rbm.rgd, loops))
-   goto next_rgrp;
+   if (!gfs2_rs_active(rs)) {
+   if (loops == 0 
+   !fast_to_acquire(rs-rs_rbm.rgd))
+   goto next_rgrp;
+   if ((loops  2) 
+   gfs2_rgrp_used_recently(rs, 1000) 
+   gfs2_rgrp_congested(rs-rs_rbm.rgd, loops))
+   goto next_rgrp;
+   }
error = gfs2_glock_nq_init(rs-rs_rbm.rgd-rd_gl,
   

[Cluster-devel] [PATCH 14/15] GFS2: gfs2_dir_get_hash_table(): avoiding deferred vfree() is easy here...

2014-12-08 Thread Steven Whitehouse
From: Al Viro v...@zeniv.linux.org.uk

vfree() is allowed under spinlock these days, but it's cheaper when
it doesn't step into deferred case and here it's very easy to avoid.

Signed-off-by: Al Viro v...@zeniv.linux.org.uk
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index c247fed..c5a34f0 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -370,11 +370,12 @@ static __be64 *gfs2_dir_get_hash_table(struct gfs2_inode 
*ip)
}
 
spin_lock(inode-i_lock);
-   if (ip-i_hash_cache)
-   kvfree(hc);
-   else
+   if (likely(!ip-i_hash_cache)) {
ip-i_hash_cache = hc;
+   hc = NULL;
+   }
spin_unlock(inode-i_lock);
+   kvfree(hc);
 
return ip-i_hash_cache;
 }
-- 
1.8.3.1



[Cluster-devel] [PATCH 08/15] fs: add freeze_super/thaw_super fs hooks

2014-12-08 Thread Steven Whitehouse
From: Benjamin Marzinski bmarz...@redhat.com

Currently, freezing a filesystem involves calling freeze_super, which locks
sb-s_umount and then calls the fs-specific freeze_fs hook. This makes it
hard for gfs2 (and potentially other cluster filesystems) to use the vfs
freezing code to do freezes on all the cluster nodes.

In order to communicate that a freeze has been requested, and to make sure
that only one node is trying to freeze at a time, gfs2 uses a glock
(sd_freeze_gl). The problem is that there is no hook for gfs2 to acquire
this lock before calling freeze_super. This means that two nodes can
attempt to freeze the filesystem by both calling freeze_super, acquiring
the sb-s_umount lock, and then attempting to grab the cluster glock
sd_freeze_gl. Only one will succeed, and the other will be stuck in
freeze_super, making it impossible to finish freezing the node.

To solve this problem, this patch adds the freeze_super and thaw_super
hooks.  If a filesystem implements these hooks, they are called instead of
the vfs freeze_super and thaw_super functions. This means that every
filesystem that implements these hooks must call the vfs freeze_super and
thaw_super functions itself within the hook function to make use of the vfs
freezing code.

Reviewed-by: Jan Kara j...@suse.cz
Signed-off-by: Benjamin Marzinski bmarz...@redhat.com
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 1d9c9f3..b48c41b 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -235,7 +235,10 @@ struct super_block *freeze_bdev(struct block_device *bdev)
sb = get_active_super(bdev);
if (!sb)
goto out;
-   error = freeze_super(sb);
+   if (sb-s_op-freeze_super)
+   error = sb-s_op-freeze_super(sb);
+   else
+   error = freeze_super(sb);
if (error) {
deactivate_super(sb);
bdev-bd_fsfreeze_count--;
@@ -272,7 +275,10 @@ int thaw_bdev(struct block_device *bdev, struct 
super_block *sb)
if (!sb)
goto out;
 
-   error = thaw_super(sb);
+   if (sb-s_op-thaw_super)
+   error = sb-s_op-thaw_super(sb);
+   else
+   error = thaw_super(sb);
if (error) {
bdev-bd_fsfreeze_count++;
mutex_unlock(bdev-bd_fsfreeze_mutex);
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 8ac3fad..77c9a78 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -518,10 +518,12 @@ static int ioctl_fsfreeze(struct file *filp)
return -EPERM;
 
/* If filesystem doesn't support freeze feature, return. */
-   if (sb-s_op-freeze_fs == NULL)
+   if (sb-s_op-freeze_fs == NULL  sb-s_op-freeze_super == NULL)
return -EOPNOTSUPP;
 
/* Freeze */
+   if (sb-s_op-freeze_super)
+   return sb-s_op-freeze_super(sb);
return freeze_super(sb);
 }
 
@@ -533,6 +535,8 @@ static int ioctl_fsthaw(struct file *filp)
return -EPERM;
 
/* Thaw */
+   if (sb-s_op-thaw_super)
+   return sb-s_op-thaw_super(sb);
return thaw_super(sb);
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9ab779e..b4a1d73c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1577,7 +1577,9 @@ struct super_operations {
void (*evict_inode) (struct inode *);
void (*put_super) (struct super_block *);
int (*sync_fs)(struct super_block *sb, int wait);
+   int (*freeze_super) (struct super_block *);
int (*freeze_fs) (struct super_block *);
+   int (*thaw_super) (struct super_block *);
int (*unfreeze_fs) (struct super_block *);
int (*statfs) (struct dentry *, struct kstatfs *);
int (*remount_fs) (struct super_block *, int *, char *);
-- 
1.8.3.1



[Cluster-devel] [PATCH 15/15] GFS2: gfs2_atomic_open(): simplify the use of finish_no_open()

2014-12-08 Thread Steven Whitehouse
From: Al Viro v...@zeniv.linux.org.uk

In -atomic_open(inode, dentry, file, opened) calling finish_no_open(file, NULL)
is equivalent to dget(dentry); return finish_no_open(file, dentry);

No need to open-code that...

Signed-off-by: Al Viro v...@zeniv.linux.org.uk
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 9e8545b..9054002 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1245,11 +1245,8 @@ static int gfs2_atomic_open(struct inode *dir, struct 
dentry *dentry,
if (d != NULL)
dentry = d;
if (dentry-d_inode) {
-   if (!(*opened  FILE_OPENED)) {
-   if (d == NULL)
-   dget(dentry);
-   return finish_no_open(file, dentry);
-   }
+   if (!(*opened  FILE_OPENED))
+   return finish_no_open(file, d);
dput(d);
return 0;
}
-- 
1.8.3.1



[Cluster-devel] [PATCH 12/15] GFS2: gfs2_create_inode(): don't bother with d_splice_alias()

2014-12-08 Thread Steven Whitehouse
From: Al Viro v...@zeniv.linux.org.uk

dentry is always hashed and negative, inode - non-error, non-NULL and
non-directory.  In such conditions d_splice_alias() is equivalent to
d_instantiate(dentry, inode) and return NULL, which simplifies the
downstream code and is consistent with the have to create a new object
case.

Signed-off-by: Al Viro v...@zeniv.linux.org.uk
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index f41b2fd..9e8545b 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -596,7 +596,6 @@ static int gfs2_create_inode(struct inode *dir, struct 
dentry *dentry,
struct gfs2_inode *dip = GFS2_I(dir), *ip;
struct gfs2_sbd *sdp = GFS2_SB(dip-i_inode);
struct gfs2_glock *io_gl;
-   struct dentry *d;
int error, free_vfs_inode = 0;
u32 aflags = 0;
unsigned blocks = 1;
@@ -629,22 +628,13 @@ static int gfs2_create_inode(struct inode *dir, struct 
dentry *dentry,
inode = ERR_PTR(-EISDIR);
goto fail_gunlock;
}
-   d = d_splice_alias(inode, dentry);
-   error = PTR_ERR(d);
-   if (IS_ERR(d)) {
-   inode = ERR_CAST(d);
-   goto fail_gunlock;
-   }
+   d_instantiate(dentry, inode);
error = 0;
if (file) {
-   if (S_ISREG(inode-i_mode)) {
-   WARN_ON(d != NULL);
+   if (S_ISREG(inode-i_mode))
error = finish_open(file, dentry, 
gfs2_open_common, opened);
-   } else {
-   error = finish_no_open(file, d);
-   }
-   } else {
-   dput(d);
+   else
+   error = finish_no_open(file, NULL);
}
gfs2_glock_dq_uninit(ghs);
return error;
-- 
1.8.3.1



[Cluster-devel] [PATCH 13/15] GFS2: use kvfree() instead of open-coding it

2014-12-08 Thread Steven Whitehouse
From: Al Viro v...@zeniv.linux.org.uk

Signed-off-by: Al Viro v...@zeniv.linux.org.uk
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 5d4261f..c247fed 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -365,22 +365,15 @@ static __be64 *gfs2_dir_get_hash_table(struct gfs2_inode 
*ip)
 
ret = gfs2_dir_read_data(ip, hc, hsize);
if (ret  0) {
-   if (is_vmalloc_addr(hc))
-   vfree(hc);
-   else
-   kfree(hc);
+   kvfree(hc);
return ERR_PTR(ret);
}
 
spin_lock(inode-i_lock);
-   if (ip-i_hash_cache) {
-   if (is_vmalloc_addr(hc))
-   vfree(hc);
-   else
-   kfree(hc);
-   } else {
+   if (ip-i_hash_cache)
+   kvfree(hc);
+   else
ip-i_hash_cache = hc;
-   }
spin_unlock(inode-i_lock);
 
return ip-i_hash_cache;
@@ -396,10 +389,7 @@ void gfs2_dir_hash_inval(struct gfs2_inode *ip)
 {
__be64 *hc = ip-i_hash_cache;
ip-i_hash_cache = NULL;
-   if (is_vmalloc_addr(hc))
-   vfree(hc);
-   else
-   kfree(hc);
+   kvfree(hc);
 }
 
 static inline int gfs2_dirent_sentinel(const struct gfs2_dirent *dent)
@@ -1168,10 +1158,7 @@ fail:
gfs2_dinode_out(dip, dibh-b_data);
brelse(dibh);
 out_kfree:
-   if (is_vmalloc_addr(hc2))
-   vfree(hc2);
-   else
-   kfree(hc2);
+   kvfree(hc2);
return error;
 }
 
@@ -1302,14 +1289,6 @@ static void *gfs2_alloc_sort_buffer(unsigned size)
return ptr;
 }
 
-static void gfs2_free_sort_buffer(void *ptr)
-{
-   if (is_vmalloc_addr(ptr))
-   vfree(ptr);
-   else
-   kfree(ptr);
-}
-
 static int gfs2_dir_read_leaf(struct inode *inode, struct dir_context *ctx,
  int *copied, unsigned *depth,
  u64 leaf_no)
@@ -1393,7 +1372,7 @@ static int gfs2_dir_read_leaf(struct inode *inode, struct 
dir_context *ctx,
 out_free:
for(i = 0; i  leaf; i++)
brelse(larr[i]);
-   gfs2_free_sort_buffer(larr);
+   kvfree(larr);
 out:
return error;
 }
@@ -2004,10 +1983,7 @@ out_rlist:
gfs2_rlist_free(rlist);
gfs2_quota_unhold(dip);
 out:
-   if (is_vmalloc_addr(ht))
-   vfree(ht);
-   else
-   kfree(ht);
+   kvfree(ht);
return error;
 }
 
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 64b29f7..c8b148b 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1360,13 +1360,8 @@ void gfs2_quota_cleanup(struct gfs2_sbd *sdp)
 
gfs2_assert_warn(sdp, !atomic_read(sdp-sd_quota_count));
 
-   if (sdp-sd_quota_bitmap) {
-   if (is_vmalloc_addr(sdp-sd_quota_bitmap))
-   vfree(sdp-sd_quota_bitmap);
-   else
-   kfree(sdp-sd_quota_bitmap);
-   sdp-sd_quota_bitmap = NULL;
-   }
+   kvfree(sdp-sd_quota_bitmap);
+   sdp-sd_quota_bitmap = NULL;
 }
 
 static void quotad_error(struct gfs2_sbd *sdp, const char *msg, int error)
-- 
1.8.3.1



[Cluster-devel] [PATCH 09/15] GFS2: update freeze code to use freeze/thaw_super on all nodes

2014-12-08 Thread Steven Whitehouse
From: Benjamin Marzinski bmarz...@redhat.com

The current gfs2 freezing code is considerably more complicated than it
should be because it doesn't use the vfs freezing code on any node except
the one that begins the freeze.  This is because it needs to acquire a
cluster glock before calling the vfs code to prevent a deadlock, and
without the new freeze_super and thaw_super hooks, that was impossible. To
deal with the issue, gfs2 had to do some hacky locking tricks to make sure
that a frozen node couldn't be holding on a lock it needed to do the
unfreeze ioctl.

This patch makes use of the new hooks to simply the gfs2 locking code. Now,
all the nodes in the cluster freeze and thaw in exactly the same way. Every
node in the cluster caches the freeze glock in the shared state.  The new
freeze_super hook allows the freezing node to grab this freeze glock in
the exclusive state without first calling the vfs freeze_super function.
All the nodes in the cluster see this lock change, and call the vfs
freeze_super function. The vfs locking code guarantees that the nodes can't
get stuck holding the glocks necessary to unfreeze the system.  To
unfreeze, the freezing node uses the new thaw_super hook to drop the freeze
glock. Again, all the nodes notice this, reacquire the glock in shared mode
and call the vfs thaw_super function.

Signed-off-by: Benjamin Marzinski bmarz...@redhat.com
Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 1cc0bba..fe91951 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -28,6 +28,8 @@
 #include trans.h
 #include dir.h
 
+struct workqueue_struct *gfs2_freeze_wq;
+
 static void gfs2_ail_error(struct gfs2_glock *gl, const struct buffer_head *bh)
 {
fs_err(gl-gl_sbd, AIL buffer %p: blocknr %llu state 0x%08lx mapping 
%p page state 0x%lx\n,
@@ -94,11 +96,8 @@ static void gfs2_ail_empty_gl(struct gfs2_glock *gl)
  * on the stack */
tr.tr_reserved = 1 + gfs2_struct2blk(sdp, tr.tr_revokes, sizeof(u64));
tr.tr_ip = _RET_IP_;
-   sb_start_intwrite(sdp-sd_vfs);
-   if (gfs2_log_reserve(sdp, tr.tr_reserved)  0) {
-   sb_end_intwrite(sdp-sd_vfs);
+   if (gfs2_log_reserve(sdp, tr.tr_reserved)  0)
return;
-   }
WARN_ON_ONCE(current-journal_info);
current-journal_info = tr;
 
@@ -469,20 +468,19 @@ static void inode_go_dump(struct seq_file *seq, const 
struct gfs2_glock *gl)
 
 static void freeze_go_sync(struct gfs2_glock *gl)
 {
+   int error = 0;
struct gfs2_sbd *sdp = gl-gl_sbd;
-   DEFINE_WAIT(wait);
 
if (gl-gl_state == LM_ST_SHARED 
test_bit(SDF_JOURNAL_LIVE, sdp-sd_flags)) {
-   atomic_set(sdp-sd_log_freeze, 1);
-   wake_up(sdp-sd_logd_waitq);
-   do {
-   prepare_to_wait(sdp-sd_log_frozen_wait, wait,
-   TASK_UNINTERRUPTIBLE);
-   if (atomic_read(sdp-sd_log_freeze))
-   io_schedule();
-   } while(atomic_read(sdp-sd_log_freeze));
-   finish_wait(sdp-sd_log_frozen_wait, wait);
+   atomic_set(sdp-sd_freeze_state, SFS_STARTING_FREEZE);
+   error = freeze_super(sdp-sd_vfs);
+   if (error) {
+   printk(KERN_INFO GFS2: couldn't freeze filesystem: 
%d\n, error);
+   gfs2_assert_withdraw(sdp, 0);
+   }
+   queue_work(gfs2_freeze_wq, sdp-sd_freeze_work);
+   gfs2_log_flush(sdp, NULL, FREEZE_FLUSH);
}
 }
 
diff --git a/fs/gfs2/glops.h b/fs/gfs2/glops.h
index 7455d26..8ed1857 100644
--- a/fs/gfs2/glops.h
+++ b/fs/gfs2/glops.h
@@ -12,6 +12,8 @@
 
 #include incore.h
 
+extern struct workqueue_struct *gfs2_freeze_wq;
+
 extern const struct gfs2_glock_operations gfs2_meta_glops;
 extern const struct gfs2_glock_operations gfs2_inode_glops;
 extern const struct gfs2_glock_operations gfs2_rgrp_glops;
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 1b89918..7a2dbbc 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -588,6 +588,12 @@ enum {
SDF_SKIP_DLM_UNLOCK = 8,
 };
 
+enum gfs2_freeze_state {
+   SFS_UNFROZEN= 0,
+   SFS_STARTING_FREEZE = 1,
+   SFS_FROZEN  = 2,
+};
+
 #define GFS2_FSNAME_LEN256
 
 struct gfs2_inum_host {
@@ -685,6 +691,7 @@ struct gfs2_sbd {
struct gfs2_holder sd_live_gh;
struct gfs2_glock *sd_rename_gl;
struct gfs2_glock *sd_freeze_gl;
+   struct work_struct sd_freeze_work;
wait_queue_head_t sd_glock_wait;
atomic_t sd_glock_disposal;
struct completion sd_locking_init;
@@ -789,6 +796,9 @@ struct gfs2_sbd {
wait_queue_head_t sd_log_flush_wait;
int sd_log_error;
 
+   atomic_t sd_reserving_log;
+   wait_queue_head_t sd_reserving_log_wait;
+
unsigned int sd_log_flush_head;
  

Re: [Cluster-devel] [Pacemaker] [RFC] Organizing HA Summit 2015

2014-12-08 Thread Jan Pokorný
Hello,

it occured to me that if you want to use the opportunity and double
as as tourist while being in Brno, it's about the right time to
consider reservations/ticket purchases this early.
At least in some cases it is a must, e.g., Villa Tugendhat:

http://rezervace.spilberk.cz/langchange.aspx?mrsname=languageId=2returnUrl=%2Flist

On 08/09/14 12:30 +0200, Fabio M. Di Nitto wrote:
 DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices.
 
 My suggestion would be to have a 2 days dedicated HA summit the 4th and
 the 5th of February.

-- 
Jan


pgpft2Zc5uLam.pgp
Description: PGP signature


[Cluster-devel] GFS2: Pull request (merge window)

2014-12-08 Thread Steven Whitehouse
Hi,

Please consider pulling the following changes,

Steve.

--

The following changes since commit 0df1f2487d2f0d04703f142813d53615d62a1da4:

  Linux 3.18-rc3 (2014-11-02 15:01:51 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw.git 
tags/gfs2-merge-window

for you to fetch changes up to ec7d879c457611e540cb465c25f3040facbd1185:

  GFS2: gfs2_atomic_open(): simplify the use of finish_no_open() (2014-11-20 
11:18:08 +)


In contrast to recent merge windows, there are a number of interesting features
this time. There is a set of patches to improve performance in relation to
block reservations. Some correctness fixes for fallocate, and an update
to the freeze/thaw code which greatly simplyfies this code path. In
addition there is a set of clean ups from Al Viro too.


Al Viro (5):
  GFS2: bugger off early if O_CREAT open finds a directory
  GFS2: gfs2_create_inode(): don't bother with d_splice_alias()
  GFS2: use kvfree() instead of open-coding it
  GFS2: gfs2_dir_get_hash_table(): avoiding deferred vfree() is easy here...
  GFS2: gfs2_atomic_open(): simplify the use of finish_no_open()

Andrew Price (3):
  GFS2: Use inode_newsize_ok and get_write_access in fallocate
  GFS2: Update i_size properly on fallocate
  GFS2: Update timestamps on fallocate

Benjamin Marzinski (2):
  fs: add freeze_super/thaw_super fs hooks
  GFS2: update freeze code to use freeze/thaw_super on all nodes

Bob Peterson (3):
  GFS2: Set of distributed preferences for rgrps
  GFS2: Only increase rs_sizehint
  GFS2: If we use up our block reservation, request more next time

Fabian Frederick (1):
  GFS2: directly return gfs2_dir_check()

Markus Elfring (1):
  GFS2: Deletion of unnecessary checks before two function calls

 fs/block_dev.c   |  10 -
 fs/gfs2/dir.c|  39 --
 fs/gfs2/file.c   |  83 --
 fs/gfs2/glock.c  |   3 +-
 fs/gfs2/glops.c  |  26 ++--
 fs/gfs2/glops.h  |   2 +
 fs/gfs2/incore.h |  19 ++---
 fs/gfs2/inode.c  |  72 +
 fs/gfs2/log.c|  42 +--
 fs/gfs2/main.c   |  11 -
 fs/gfs2/ops_fstype.c |  18 +++--
 fs/gfs2/quota.c  |   9 +
 fs/gfs2/rgrp.c   |  69 ---
 fs/gfs2/rgrp.h   |   1 +
 fs/gfs2/super.c  | 112 ++-
 fs/gfs2/super.h  |   1 +
 fs/gfs2/trans.c  |  17 ++--
 fs/ioctl.c   |   6 ++-
 include/linux/fs.h   |   2 +
 19 files changed, 315 insertions(+), 227 deletions(-)



signature.asc
Description: This is a digitally signed message part