[Cluster-devel] [PATCH 01/15] GFS2: directly return gfs2_dir_check()
From: Fabian Frederick f...@skynet.be No need to store gfs2_dir_check result and test it before returning. Signed-off-by: Fabian Frederick f...@skynet.be Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index c4ed823..b41b5c7 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -1045,11 +1045,7 @@ static int gfs2_unlink_ok(struct gfs2_inode *dip, const struct qstr *name, if (error) return error; - error = gfs2_dir_check(dip-i_inode, name, ip); - if (error) - return error; - - return 0; + return gfs2_dir_check(dip-i_inode, name, ip); } /** -- 1.8.3.1
[Cluster-devel] [PATCH 07/15] GFS2: Update timestamps on fallocate
From: Andrew Price anpr...@redhat.com gfs2_fallocate() wasn't updating ctime and mtime when modifying the inode. Add a call to file_update_time() to do that. Signed-off-by: Andrew Price anpr...@redhat.com Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 1182164..6e600ab 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -874,7 +874,8 @@ retry: if (!(mode FALLOC_FL_KEEP_SIZE) (pos + count) inode-i_size) { i_size_write(inode, pos + count); - mark_inode_dirty(inode); + /* Marks the inode as dirty */ + file_update_time(file); } return generic_write_sync(file, pos, count); -- 1.8.3.1
[Cluster-devel] GFS2: Pre-pull patch posting (merge window)
Hi, In contrast to recent merge windows, there are a number of interesting features this time. There is a set of patches to improve performance in relation to block reservations. Some correctness fixes for fallocate, and an update to the freeze/thaw code which greatly simplyfies this code path. In addition there is a set of clean ups from Al Viro too, Steve.
[Cluster-devel] [PATCH 10/15] GFS2: Deletion of unnecessary checks before two function calls
From: Markus Elfring elfr...@users.sourceforge.net The functions iput() and put_pid() test whether their argument is NULL and then return immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring elfr...@users.sourceforge.net Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 8f0c19d..a23524a 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -836,8 +836,7 @@ void gfs2_holder_reinit(unsigned int state, unsigned flags, struct gfs2_holder * gh-gh_flags = flags; gh-gh_iflags = 0; gh-gh_ip = _RET_IP_; - if (gh-gh_owner_pid) - put_pid(gh-gh_owner_pid); + put_pid(gh-gh_owner_pid); gh-gh_owner_pid = get_pid(task_pid(current)); } -- 1.8.3.1
[Cluster-devel] [PATCH 04/15] GFS2: If we use up our block reservation, request more next time
From: Bob Peterson rpete...@redhat.com If we run out of blocks for a given multi-block allocation, we obviously did not reserve enough. We should reserve more blocks for the next reservation to reduce fragmentation. This patch increases the size hint for reservations when they run out. Signed-off-by: Bob Peterson rpete...@redhat.com Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index f4e4a0c..9150207 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -2251,6 +2251,9 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip, trace_gfs2_rs(rs, TRACE_RS_CLAIM); if (rs-rs_free !ret) goto out; + /* We used up our block reservation, so we should + reserve more blocks next time. */ + atomic_add(RGRP_RSRV_ADDBLKS, rs-rs_sizehint); } __rs_deltree(rs); } diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h index 5d8f085..b104f4a 100644 --- a/fs/gfs2/rgrp.h +++ b/fs/gfs2/rgrp.h @@ -20,6 +20,7 @@ */ #define RGRP_RSRV_MINBYTES 8 #define RGRP_RSRV_MINBLKS ((u32)(RGRP_RSRV_MINBYTES * GFS2_NBBY)) +#define RGRP_RSRV_ADDBLKS 64 struct gfs2_rgrpd; struct gfs2_sbd; -- 1.8.3.1
[Cluster-devel] [PATCH 06/15] GFS2: Update i_size properly on fallocate
From: Andrew Price anpr...@redhat.com This addresses an issue caught by fsx where the inode size was not being updated to the expected value after fallocate(2) with mode 0. The problem was caused by the offset and len parameters being converted to multiples of the file system's block size, so i_size would be rounded up to the nearest block size multiple instead of the requested size. This replaces the per-chunk i_size updates with a single i_size_write on successful completion of the operation. With this patch gfs2 gets through a complete run of fsx. For clarity, the check for (error == 0) following the loop is removed as all failures before that point jump to out_* labels or return. Signed-off-by: Andrew Price anpr...@redhat.com Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 3786579..1182164 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -729,7 +729,6 @@ static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len, struct gfs2_inode *ip = GFS2_I(inode); struct buffer_head *dibh; int error; - loff_t size = len; unsigned int nr_blks; sector_t lblock = offset inode-i_blkbits; @@ -763,11 +762,6 @@ static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len, goto out; } } - if (offset + size inode-i_size !(mode FALLOC_FL_KEEP_SIZE)) - i_size_write(inode, offset + size); - - mark_inode_dirty(inode); - out: brelse(dibh); return error; @@ -878,9 +872,12 @@ retry: gfs2_quota_unlock(ip); } - if (error == 0) - error = generic_write_sync(file, pos, count); - return error; + if (!(mode FALLOC_FL_KEEP_SIZE) (pos + count) inode-i_size) { + i_size_write(inode, pos + count); + mark_inode_dirty(inode); + } + + return generic_write_sync(file, pos, count); out_trans_fail: gfs2_inplace_release(ip); -- 1.8.3.1
[Cluster-devel] [PATCH 05/15] GFS2: Use inode_newsize_ok and get_write_access in fallocate
From: Andrew Price anpr...@redhat.com gfs2_fallocate wasn't checking inode_newsize_ok nor get_write_access. Split out the context setup and inode locking pieces into a separate function to make it more clear and add these missing calls. inode_newsize_ok is called conditional on FALLOC_FL_KEEP_SIZE as there is no need to enforce a file size limit if it isn't going to change. Signed-off-by: Andrew Price anpr...@redhat.com Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 5ebe568..3786579 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -797,8 +797,7 @@ static void calc_max_reserv(struct gfs2_inode *ip, loff_t max, loff_t *len, } } -static long gfs2_fallocate(struct file *file, int mode, loff_t offset, - loff_t len) +static long __gfs2_fallocate(struct file *file, int mode, loff_t offset, loff_t len) { struct inode *inode = file_inode(file); struct gfs2_sbd *sdp = GFS2_SB(inode); @@ -812,14 +811,9 @@ static long gfs2_fallocate(struct file *file, int mode, loff_t offset, loff_t bsize_mask = ~((loff_t)sdp-sd_sb.sb_bsize - 1); loff_t next = (offset + len - 1) sdp-sd_sb.sb_bsize_shift; loff_t max_chunk_size = UINT_MAX bsize_mask; - struct gfs2_holder gh; next = (next + 1) sdp-sd_sb.sb_bsize_shift; - /* We only support the FALLOC_FL_KEEP_SIZE mode */ - if (mode ~FALLOC_FL_KEEP_SIZE) - return -EOPNOTSUPP; - offset = bsize_mask; len = next - offset; @@ -830,17 +824,6 @@ static long gfs2_fallocate(struct file *file, int mode, loff_t offset, if (bytes == 0) bytes = sdp-sd_sb.sb_bsize; - error = gfs2_rs_alloc(ip); - if (error) - return error; - - mutex_lock(inode-i_mutex); - - gfs2_holder_init(ip-i_gl, LM_ST_EXCLUSIVE, 0, gh); - error = gfs2_glock_nq(gh); - if (unlikely(error)) - goto out_uninit; - gfs2_size_hint(file, offset, len); while (len 0) { @@ -853,8 +836,7 @@ static long gfs2_fallocate(struct file *file, int mode, loff_t offset, } error = gfs2_quota_lock_check(ip); if (error) - goto out_unlock; - + return error; retry: gfs2_write_calc_reserv(ip, bytes, data_blocks, ind_blocks); @@ -898,18 +880,58 @@ retry: if (error == 0) error = generic_write_sync(file, pos, count); - goto out_unlock; + return error; out_trans_fail: gfs2_inplace_release(ip); out_qunlock: gfs2_quota_unlock(ip); + return error; +} + +static long gfs2_fallocate(struct file *file, int mode, loff_t offset, loff_t len) +{ + struct inode *inode = file_inode(file); + struct gfs2_inode *ip = GFS2_I(inode); + struct gfs2_holder gh; + int ret; + + if (mode ~FALLOC_FL_KEEP_SIZE) + return -EOPNOTSUPP; + + mutex_lock(inode-i_mutex); + + gfs2_holder_init(ip-i_gl, LM_ST_EXCLUSIVE, 0, gh); + ret = gfs2_glock_nq(gh); + if (ret) + goto out_uninit; + + if (!(mode FALLOC_FL_KEEP_SIZE) + (offset + len) inode-i_size) { + ret = inode_newsize_ok(inode, offset + len); + if (ret) + goto out_unlock; + } + + ret = get_write_access(inode); + if (ret) + goto out_unlock; + + ret = gfs2_rs_alloc(ip); + if (ret) + goto out_putw; + + ret = __gfs2_fallocate(file, mode, offset, len); + if (ret) + gfs2_rs_deltree(ip-i_res); +out_putw: + put_write_access(inode); out_unlock: gfs2_glock_dq(gh); out_uninit: gfs2_holder_uninit(gh); mutex_unlock(inode-i_mutex); - return error; + return ret; } #ifdef CONFIG_GFS2_FS_LOCKING_DLM -- 1.8.3.1
[Cluster-devel] [PATCH 03/15] GFS2: Only increase rs_sizehint
From: Bob Peterson rpete...@redhat.com If an application does a sequence of (1) big write, (2) little write we don't necessarily want to reset the size hint based on the smaller size. The fact that they did any big writes implies they may do more, and therefore we should try to allocate bigger block reservations, even if the last few were small writes. Therefore this patch changes function gfs2_size_hint so that the size hint can only grow; it cannot shrink. This is especially important where there are multiple writers. Signed-off-by: Bob Peterson rpete...@redhat.com Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 80dd44d..5ebe568 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -337,7 +337,8 @@ static void gfs2_size_hint(struct file *filep, loff_t offset, size_t size) size_t blks = (size + sdp-sd_sb.sb_bsize - 1) sdp-sd_sb.sb_bsize_shift; int hint = min_t(size_t, INT_MAX, blks); - atomic_set(ip-i_res-rs_sizehint, hint); + if (hint atomic_read(ip-i_res-rs_sizehint)) + atomic_set(ip-i_res-rs_sizehint, hint); } /** -- 1.8.3.1
[Cluster-devel] [PATCH 11/15] GFS2: bugger off early if O_CREAT open finds a directory
From: Al Viro v...@zeniv.linux.org.uk Signed-off-by: Al Viro v...@zeniv.linux.org.uk Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index 04065e5..f41b2fd 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -624,6 +624,11 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry, inode = gfs2_dir_search(dir, dentry-d_name, !S_ISREG(mode) || excl); error = PTR_ERR(inode); if (!IS_ERR(inode)) { + if (S_ISDIR(inode-i_mode)) { + iput(inode); + inode = ERR_PTR(-EISDIR); + goto fail_gunlock; + } d = d_splice_alias(inode, dentry); error = PTR_ERR(d); if (IS_ERR(d)) { -- 1.8.3.1
[Cluster-devel] [PATCH 02/15] GFS2: Set of distributed preferences for rgrps
From: Bob Peterson rpete...@redhat.com This patch tries to use the journal numbers to evenly distribute which node prefers which resource group for block allocations. This is to help performance. Signed-off-by: Bob Peterson rpete...@redhat.com Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index 39e7e99..1b89918 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -97,6 +97,7 @@ struct gfs2_rgrpd { #define GFS2_RDF_CHECK 0x1000 /* check for unlinked inodes */ #define GFS2_RDF_UPTODATE 0x2000 /* rg is up to date */ #define GFS2_RDF_ERROR 0x4000 /* error in rg */ +#define GFS2_RDF_PREFERRED 0x8000 /* This rgrp is preferred */ #define GFS2_RDF_MASK 0xf000 /* mask for internal flags */ spinlock_t rd_rsspin; /* protects reservation related vars */ struct rb_root rd_rstree; /* multi-block reservation tree */ diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index 7474c41..f4e4a0c 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -936,7 +936,7 @@ static int read_rindex_entry(struct gfs2_inode *ip) rgd-rd_gl-gl_vm.start = rgd-rd_addr * bsize; rgd-rd_gl-gl_vm.end = rgd-rd_gl-gl_vm.start + (rgd-rd_length * bsize) - 1; rgd-rd_rgl = (struct gfs2_rgrp_lvb *)rgd-rd_gl-gl_lksb.sb_lvbptr; - rgd-rd_flags = ~GFS2_RDF_UPTODATE; + rgd-rd_flags = ~(GFS2_RDF_UPTODATE | GFS2_RDF_PREFERRED); if (rgd-rd_data sdp-sd_max_rg_data) sdp-sd_max_rg_data = rgd-rd_data; spin_lock(sdp-sd_rindex_spin); @@ -955,6 +955,36 @@ fail: } /** + * set_rgrp_preferences - Run all the rgrps, selecting some we prefer to use + * @sdp: the GFS2 superblock + * + * The purpose of this function is to select a subset of the resource groups + * and mark them as PREFERRED. We do it in such a way that each node prefers + * to use a unique set of rgrps to minimize glock contention. + */ +static void set_rgrp_preferences(struct gfs2_sbd *sdp) +{ + struct gfs2_rgrpd *rgd, *first; + int i; + + /* Skip an initial number of rgrps, based on this node's journal ID. + That should start each node out on its own set. */ + rgd = gfs2_rgrpd_get_first(sdp); + for (i = 0; i sdp-sd_lockstruct.ls_jid; i++) + rgd = gfs2_rgrpd_get_next(rgd); + first = rgd; + + do { + rgd-rd_flags |= GFS2_RDF_PREFERRED; + for (i = 0; i sdp-sd_journals; i++) { + rgd = gfs2_rgrpd_get_next(rgd); + if (rgd == first) + break; + } + } while (rgd != first); +} + +/** * gfs2_ri_update - Pull in a new resource index from the disk * @ip: pointer to the rindex inode * @@ -973,6 +1003,8 @@ static int gfs2_ri_update(struct gfs2_inode *ip) if (error 0) return error; + set_rgrp_preferences(sdp); + sdp-sd_rindex_uptodate = 1; return 0; } @@ -1891,6 +1923,25 @@ static bool gfs2_select_rgrp(struct gfs2_rgrpd **pos, const struct gfs2_rgrpd *b } /** + * fast_to_acquire - determine if a resource group will be fast to acquire + * + * If this is one of our preferred rgrps, it should be quicker to acquire, + * because we tried to set ourselves up as dlm lock master. + */ +static inline int fast_to_acquire(struct gfs2_rgrpd *rgd) +{ + struct gfs2_glock *gl = rgd-rd_gl; + + if (gl-gl_state != LM_ST_UNLOCKED list_empty(gl-gl_holders) + !test_bit(GLF_DEMOTE_IN_PROGRESS, gl-gl_flags) + !test_bit(GLF_DEMOTE, gl-gl_flags)) + return 1; + if (rgd-rd_flags GFS2_RDF_PREFERRED) + return 1; + return 0; +} + +/** * gfs2_inplace_reserve - Reserve space in the filesystem * @ip: the inode to reserve space for * @ap: the allocation parameters @@ -1932,10 +1983,15 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, const struct gfs2_alloc_parms *a rg_locked = 0; if (skip skip--) goto next_rgrp; - if (!gfs2_rs_active(rs) (loops 2) -gfs2_rgrp_used_recently(rs, 1000) -gfs2_rgrp_congested(rs-rs_rbm.rgd, loops)) - goto next_rgrp; + if (!gfs2_rs_active(rs)) { + if (loops == 0 + !fast_to_acquire(rs-rs_rbm.rgd)) + goto next_rgrp; + if ((loops 2) + gfs2_rgrp_used_recently(rs, 1000) + gfs2_rgrp_congested(rs-rs_rbm.rgd, loops)) + goto next_rgrp; + } error = gfs2_glock_nq_init(rs-rs_rbm.rgd-rd_gl,
[Cluster-devel] [PATCH 14/15] GFS2: gfs2_dir_get_hash_table(): avoiding deferred vfree() is easy here...
From: Al Viro v...@zeniv.linux.org.uk vfree() is allowed under spinlock these days, but it's cheaper when it doesn't step into deferred case and here it's very easy to avoid. Signed-off-by: Al Viro v...@zeniv.linux.org.uk Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c index c247fed..c5a34f0 100644 --- a/fs/gfs2/dir.c +++ b/fs/gfs2/dir.c @@ -370,11 +370,12 @@ static __be64 *gfs2_dir_get_hash_table(struct gfs2_inode *ip) } spin_lock(inode-i_lock); - if (ip-i_hash_cache) - kvfree(hc); - else + if (likely(!ip-i_hash_cache)) { ip-i_hash_cache = hc; + hc = NULL; + } spin_unlock(inode-i_lock); + kvfree(hc); return ip-i_hash_cache; } -- 1.8.3.1
[Cluster-devel] [PATCH 08/15] fs: add freeze_super/thaw_super fs hooks
From: Benjamin Marzinski bmarz...@redhat.com Currently, freezing a filesystem involves calling freeze_super, which locks sb-s_umount and then calls the fs-specific freeze_fs hook. This makes it hard for gfs2 (and potentially other cluster filesystems) to use the vfs freezing code to do freezes on all the cluster nodes. In order to communicate that a freeze has been requested, and to make sure that only one node is trying to freeze at a time, gfs2 uses a glock (sd_freeze_gl). The problem is that there is no hook for gfs2 to acquire this lock before calling freeze_super. This means that two nodes can attempt to freeze the filesystem by both calling freeze_super, acquiring the sb-s_umount lock, and then attempting to grab the cluster glock sd_freeze_gl. Only one will succeed, and the other will be stuck in freeze_super, making it impossible to finish freezing the node. To solve this problem, this patch adds the freeze_super and thaw_super hooks. If a filesystem implements these hooks, they are called instead of the vfs freeze_super and thaw_super functions. This means that every filesystem that implements these hooks must call the vfs freeze_super and thaw_super functions itself within the hook function to make use of the vfs freezing code. Reviewed-by: Jan Kara j...@suse.cz Signed-off-by: Benjamin Marzinski bmarz...@redhat.com Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/block_dev.c b/fs/block_dev.c index 1d9c9f3..b48c41b 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -235,7 +235,10 @@ struct super_block *freeze_bdev(struct block_device *bdev) sb = get_active_super(bdev); if (!sb) goto out; - error = freeze_super(sb); + if (sb-s_op-freeze_super) + error = sb-s_op-freeze_super(sb); + else + error = freeze_super(sb); if (error) { deactivate_super(sb); bdev-bd_fsfreeze_count--; @@ -272,7 +275,10 @@ int thaw_bdev(struct block_device *bdev, struct super_block *sb) if (!sb) goto out; - error = thaw_super(sb); + if (sb-s_op-thaw_super) + error = sb-s_op-thaw_super(sb); + else + error = thaw_super(sb); if (error) { bdev-bd_fsfreeze_count++; mutex_unlock(bdev-bd_fsfreeze_mutex); diff --git a/fs/ioctl.c b/fs/ioctl.c index 8ac3fad..77c9a78 100644 --- a/fs/ioctl.c +++ b/fs/ioctl.c @@ -518,10 +518,12 @@ static int ioctl_fsfreeze(struct file *filp) return -EPERM; /* If filesystem doesn't support freeze feature, return. */ - if (sb-s_op-freeze_fs == NULL) + if (sb-s_op-freeze_fs == NULL sb-s_op-freeze_super == NULL) return -EOPNOTSUPP; /* Freeze */ + if (sb-s_op-freeze_super) + return sb-s_op-freeze_super(sb); return freeze_super(sb); } @@ -533,6 +535,8 @@ static int ioctl_fsthaw(struct file *filp) return -EPERM; /* Thaw */ + if (sb-s_op-thaw_super) + return sb-s_op-thaw_super(sb); return thaw_super(sb); } diff --git a/include/linux/fs.h b/include/linux/fs.h index 9ab779e..b4a1d73c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1577,7 +1577,9 @@ struct super_operations { void (*evict_inode) (struct inode *); void (*put_super) (struct super_block *); int (*sync_fs)(struct super_block *sb, int wait); + int (*freeze_super) (struct super_block *); int (*freeze_fs) (struct super_block *); + int (*thaw_super) (struct super_block *); int (*unfreeze_fs) (struct super_block *); int (*statfs) (struct dentry *, struct kstatfs *); int (*remount_fs) (struct super_block *, int *, char *); -- 1.8.3.1
[Cluster-devel] [PATCH 15/15] GFS2: gfs2_atomic_open(): simplify the use of finish_no_open()
From: Al Viro v...@zeniv.linux.org.uk In -atomic_open(inode, dentry, file, opened) calling finish_no_open(file, NULL) is equivalent to dget(dentry); return finish_no_open(file, dentry); No need to open-code that... Signed-off-by: Al Viro v...@zeniv.linux.org.uk Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index 9e8545b..9054002 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -1245,11 +1245,8 @@ static int gfs2_atomic_open(struct inode *dir, struct dentry *dentry, if (d != NULL) dentry = d; if (dentry-d_inode) { - if (!(*opened FILE_OPENED)) { - if (d == NULL) - dget(dentry); - return finish_no_open(file, dentry); - } + if (!(*opened FILE_OPENED)) + return finish_no_open(file, d); dput(d); return 0; } -- 1.8.3.1
[Cluster-devel] [PATCH 12/15] GFS2: gfs2_create_inode(): don't bother with d_splice_alias()
From: Al Viro v...@zeniv.linux.org.uk dentry is always hashed and negative, inode - non-error, non-NULL and non-directory. In such conditions d_splice_alias() is equivalent to d_instantiate(dentry, inode) and return NULL, which simplifies the downstream code and is consistent with the have to create a new object case. Signed-off-by: Al Viro v...@zeniv.linux.org.uk Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index f41b2fd..9e8545b 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -596,7 +596,6 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry, struct gfs2_inode *dip = GFS2_I(dir), *ip; struct gfs2_sbd *sdp = GFS2_SB(dip-i_inode); struct gfs2_glock *io_gl; - struct dentry *d; int error, free_vfs_inode = 0; u32 aflags = 0; unsigned blocks = 1; @@ -629,22 +628,13 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry, inode = ERR_PTR(-EISDIR); goto fail_gunlock; } - d = d_splice_alias(inode, dentry); - error = PTR_ERR(d); - if (IS_ERR(d)) { - inode = ERR_CAST(d); - goto fail_gunlock; - } + d_instantiate(dentry, inode); error = 0; if (file) { - if (S_ISREG(inode-i_mode)) { - WARN_ON(d != NULL); + if (S_ISREG(inode-i_mode)) error = finish_open(file, dentry, gfs2_open_common, opened); - } else { - error = finish_no_open(file, d); - } - } else { - dput(d); + else + error = finish_no_open(file, NULL); } gfs2_glock_dq_uninit(ghs); return error; -- 1.8.3.1
[Cluster-devel] [PATCH 13/15] GFS2: use kvfree() instead of open-coding it
From: Al Viro v...@zeniv.linux.org.uk Signed-off-by: Al Viro v...@zeniv.linux.org.uk Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c index 5d4261f..c247fed 100644 --- a/fs/gfs2/dir.c +++ b/fs/gfs2/dir.c @@ -365,22 +365,15 @@ static __be64 *gfs2_dir_get_hash_table(struct gfs2_inode *ip) ret = gfs2_dir_read_data(ip, hc, hsize); if (ret 0) { - if (is_vmalloc_addr(hc)) - vfree(hc); - else - kfree(hc); + kvfree(hc); return ERR_PTR(ret); } spin_lock(inode-i_lock); - if (ip-i_hash_cache) { - if (is_vmalloc_addr(hc)) - vfree(hc); - else - kfree(hc); - } else { + if (ip-i_hash_cache) + kvfree(hc); + else ip-i_hash_cache = hc; - } spin_unlock(inode-i_lock); return ip-i_hash_cache; @@ -396,10 +389,7 @@ void gfs2_dir_hash_inval(struct gfs2_inode *ip) { __be64 *hc = ip-i_hash_cache; ip-i_hash_cache = NULL; - if (is_vmalloc_addr(hc)) - vfree(hc); - else - kfree(hc); + kvfree(hc); } static inline int gfs2_dirent_sentinel(const struct gfs2_dirent *dent) @@ -1168,10 +1158,7 @@ fail: gfs2_dinode_out(dip, dibh-b_data); brelse(dibh); out_kfree: - if (is_vmalloc_addr(hc2)) - vfree(hc2); - else - kfree(hc2); + kvfree(hc2); return error; } @@ -1302,14 +1289,6 @@ static void *gfs2_alloc_sort_buffer(unsigned size) return ptr; } -static void gfs2_free_sort_buffer(void *ptr) -{ - if (is_vmalloc_addr(ptr)) - vfree(ptr); - else - kfree(ptr); -} - static int gfs2_dir_read_leaf(struct inode *inode, struct dir_context *ctx, int *copied, unsigned *depth, u64 leaf_no) @@ -1393,7 +1372,7 @@ static int gfs2_dir_read_leaf(struct inode *inode, struct dir_context *ctx, out_free: for(i = 0; i leaf; i++) brelse(larr[i]); - gfs2_free_sort_buffer(larr); + kvfree(larr); out: return error; } @@ -2004,10 +1983,7 @@ out_rlist: gfs2_rlist_free(rlist); gfs2_quota_unhold(dip); out: - if (is_vmalloc_addr(ht)) - vfree(ht); - else - kfree(ht); + kvfree(ht); return error; } diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index 64b29f7..c8b148b 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -1360,13 +1360,8 @@ void gfs2_quota_cleanup(struct gfs2_sbd *sdp) gfs2_assert_warn(sdp, !atomic_read(sdp-sd_quota_count)); - if (sdp-sd_quota_bitmap) { - if (is_vmalloc_addr(sdp-sd_quota_bitmap)) - vfree(sdp-sd_quota_bitmap); - else - kfree(sdp-sd_quota_bitmap); - sdp-sd_quota_bitmap = NULL; - } + kvfree(sdp-sd_quota_bitmap); + sdp-sd_quota_bitmap = NULL; } static void quotad_error(struct gfs2_sbd *sdp, const char *msg, int error) -- 1.8.3.1
[Cluster-devel] [PATCH 09/15] GFS2: update freeze code to use freeze/thaw_super on all nodes
From: Benjamin Marzinski bmarz...@redhat.com The current gfs2 freezing code is considerably more complicated than it should be because it doesn't use the vfs freezing code on any node except the one that begins the freeze. This is because it needs to acquire a cluster glock before calling the vfs code to prevent a deadlock, and without the new freeze_super and thaw_super hooks, that was impossible. To deal with the issue, gfs2 had to do some hacky locking tricks to make sure that a frozen node couldn't be holding on a lock it needed to do the unfreeze ioctl. This patch makes use of the new hooks to simply the gfs2 locking code. Now, all the nodes in the cluster freeze and thaw in exactly the same way. Every node in the cluster caches the freeze glock in the shared state. The new freeze_super hook allows the freezing node to grab this freeze glock in the exclusive state without first calling the vfs freeze_super function. All the nodes in the cluster see this lock change, and call the vfs freeze_super function. The vfs locking code guarantees that the nodes can't get stuck holding the glocks necessary to unfreeze the system. To unfreeze, the freezing node uses the new thaw_super hook to drop the freeze glock. Again, all the nodes notice this, reacquire the glock in shared mode and call the vfs thaw_super function. Signed-off-by: Benjamin Marzinski bmarz...@redhat.com Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c index 1cc0bba..fe91951 100644 --- a/fs/gfs2/glops.c +++ b/fs/gfs2/glops.c @@ -28,6 +28,8 @@ #include trans.h #include dir.h +struct workqueue_struct *gfs2_freeze_wq; + static void gfs2_ail_error(struct gfs2_glock *gl, const struct buffer_head *bh) { fs_err(gl-gl_sbd, AIL buffer %p: blocknr %llu state 0x%08lx mapping %p page state 0x%lx\n, @@ -94,11 +96,8 @@ static void gfs2_ail_empty_gl(struct gfs2_glock *gl) * on the stack */ tr.tr_reserved = 1 + gfs2_struct2blk(sdp, tr.tr_revokes, sizeof(u64)); tr.tr_ip = _RET_IP_; - sb_start_intwrite(sdp-sd_vfs); - if (gfs2_log_reserve(sdp, tr.tr_reserved) 0) { - sb_end_intwrite(sdp-sd_vfs); + if (gfs2_log_reserve(sdp, tr.tr_reserved) 0) return; - } WARN_ON_ONCE(current-journal_info); current-journal_info = tr; @@ -469,20 +468,19 @@ static void inode_go_dump(struct seq_file *seq, const struct gfs2_glock *gl) static void freeze_go_sync(struct gfs2_glock *gl) { + int error = 0; struct gfs2_sbd *sdp = gl-gl_sbd; - DEFINE_WAIT(wait); if (gl-gl_state == LM_ST_SHARED test_bit(SDF_JOURNAL_LIVE, sdp-sd_flags)) { - atomic_set(sdp-sd_log_freeze, 1); - wake_up(sdp-sd_logd_waitq); - do { - prepare_to_wait(sdp-sd_log_frozen_wait, wait, - TASK_UNINTERRUPTIBLE); - if (atomic_read(sdp-sd_log_freeze)) - io_schedule(); - } while(atomic_read(sdp-sd_log_freeze)); - finish_wait(sdp-sd_log_frozen_wait, wait); + atomic_set(sdp-sd_freeze_state, SFS_STARTING_FREEZE); + error = freeze_super(sdp-sd_vfs); + if (error) { + printk(KERN_INFO GFS2: couldn't freeze filesystem: %d\n, error); + gfs2_assert_withdraw(sdp, 0); + } + queue_work(gfs2_freeze_wq, sdp-sd_freeze_work); + gfs2_log_flush(sdp, NULL, FREEZE_FLUSH); } } diff --git a/fs/gfs2/glops.h b/fs/gfs2/glops.h index 7455d26..8ed1857 100644 --- a/fs/gfs2/glops.h +++ b/fs/gfs2/glops.h @@ -12,6 +12,8 @@ #include incore.h +extern struct workqueue_struct *gfs2_freeze_wq; + extern const struct gfs2_glock_operations gfs2_meta_glops; extern const struct gfs2_glock_operations gfs2_inode_glops; extern const struct gfs2_glock_operations gfs2_rgrp_glops; diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index 1b89918..7a2dbbc 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -588,6 +588,12 @@ enum { SDF_SKIP_DLM_UNLOCK = 8, }; +enum gfs2_freeze_state { + SFS_UNFROZEN= 0, + SFS_STARTING_FREEZE = 1, + SFS_FROZEN = 2, +}; + #define GFS2_FSNAME_LEN256 struct gfs2_inum_host { @@ -685,6 +691,7 @@ struct gfs2_sbd { struct gfs2_holder sd_live_gh; struct gfs2_glock *sd_rename_gl; struct gfs2_glock *sd_freeze_gl; + struct work_struct sd_freeze_work; wait_queue_head_t sd_glock_wait; atomic_t sd_glock_disposal; struct completion sd_locking_init; @@ -789,6 +796,9 @@ struct gfs2_sbd { wait_queue_head_t sd_log_flush_wait; int sd_log_error; + atomic_t sd_reserving_log; + wait_queue_head_t sd_reserving_log_wait; + unsigned int sd_log_flush_head;
Re: [Cluster-devel] [Pacemaker] [RFC] Organizing HA Summit 2015
Hello, it occured to me that if you want to use the opportunity and double as as tourist while being in Brno, it's about the right time to consider reservations/ticket purchases this early. At least in some cases it is a must, e.g., Villa Tugendhat: http://rezervace.spilberk.cz/langchange.aspx?mrsname=languageId=2returnUrl=%2Flist On 08/09/14 12:30 +0200, Fabio M. Di Nitto wrote: DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices. My suggestion would be to have a 2 days dedicated HA summit the 4th and the 5th of February. -- Jan pgpft2Zc5uLam.pgp Description: PGP signature
[Cluster-devel] GFS2: Pull request (merge window)
Hi, Please consider pulling the following changes, Steve. -- The following changes since commit 0df1f2487d2f0d04703f142813d53615d62a1da4: Linux 3.18-rc3 (2014-11-02 15:01:51 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw.git tags/gfs2-merge-window for you to fetch changes up to ec7d879c457611e540cb465c25f3040facbd1185: GFS2: gfs2_atomic_open(): simplify the use of finish_no_open() (2014-11-20 11:18:08 +) In contrast to recent merge windows, there are a number of interesting features this time. There is a set of patches to improve performance in relation to block reservations. Some correctness fixes for fallocate, and an update to the freeze/thaw code which greatly simplyfies this code path. In addition there is a set of clean ups from Al Viro too. Al Viro (5): GFS2: bugger off early if O_CREAT open finds a directory GFS2: gfs2_create_inode(): don't bother with d_splice_alias() GFS2: use kvfree() instead of open-coding it GFS2: gfs2_dir_get_hash_table(): avoiding deferred vfree() is easy here... GFS2: gfs2_atomic_open(): simplify the use of finish_no_open() Andrew Price (3): GFS2: Use inode_newsize_ok and get_write_access in fallocate GFS2: Update i_size properly on fallocate GFS2: Update timestamps on fallocate Benjamin Marzinski (2): fs: add freeze_super/thaw_super fs hooks GFS2: update freeze code to use freeze/thaw_super on all nodes Bob Peterson (3): GFS2: Set of distributed preferences for rgrps GFS2: Only increase rs_sizehint GFS2: If we use up our block reservation, request more next time Fabian Frederick (1): GFS2: directly return gfs2_dir_check() Markus Elfring (1): GFS2: Deletion of unnecessary checks before two function calls fs/block_dev.c | 10 - fs/gfs2/dir.c| 39 -- fs/gfs2/file.c | 83 -- fs/gfs2/glock.c | 3 +- fs/gfs2/glops.c | 26 ++-- fs/gfs2/glops.h | 2 + fs/gfs2/incore.h | 19 ++--- fs/gfs2/inode.c | 72 + fs/gfs2/log.c| 42 +-- fs/gfs2/main.c | 11 - fs/gfs2/ops_fstype.c | 18 +++-- fs/gfs2/quota.c | 9 + fs/gfs2/rgrp.c | 69 --- fs/gfs2/rgrp.h | 1 + fs/gfs2/super.c | 112 ++- fs/gfs2/super.h | 1 + fs/gfs2/trans.c | 17 ++-- fs/ioctl.c | 6 ++- include/linux/fs.h | 2 + 19 files changed, 315 insertions(+), 227 deletions(-) signature.asc Description: This is a digitally signed message part