Re: [Cluster-devel] Re: gfs2-utils: master - gfs_controld: Remove three unused functions
Hi, On Wed, 2009-10-14 at 12:53 -0500, David Teigland wrote: On Wed, Oct 14, 2009 at 02:55:04PM +, Steven Whitehouse wrote: gfs_controld: Remove three unused functions These functions are not called from anywhere and appear to be left over from earlier times. They were just added, but in translating the dlm_controld patch to gfs_controld I missed the bits that called them (both in cluster.git/STABLE3 and gfs2-utils.git) I'll reapply this bit with the bits that are missing. Dave I'm not sure I understand the purpose of this code. Is there more to come yet? The function find_mg_id() still seems to be unused. So far as I can figure out the purpose of the new code seems to be to maintain two timestamps: cluster_add_time whose sole purpose seems to be to check against cg-create_time but I'm not quite sure why, and cluster_remove_time which seems to not do anything at all at the moment. I can't get any clues from dlm_controld because cluster_remove_time seems to be unused there as well, Steve.
Re: [Cluster-devel] Re: gfs2-utils: master - gfs_controld: Remove three unused functions
Hi, On Fri, 2009-10-16 at 10:59 -0500, David Teigland wrote: On Fri, Oct 16, 2009 at 03:56:05PM +0100, Steven Whitehouse wrote: Hi, On Wed, 2009-10-14 at 12:53 -0500, David Teigland wrote: On Wed, Oct 14, 2009 at 02:55:04PM +, Steven Whitehouse wrote: gfs_controld: Remove three unused functions These functions are not called from anywhere and appear to be left over from earlier times. They were just added, but in translating the dlm_controld patch to gfs_controld I missed the bits that called them (both in cluster.git/STABLE3 and gfs2-utils.git) I'll reapply this bit with the bits that are missing. Dave I'm not sure I understand the purpose of this code. Is there more to come yet? The function find_mg_id() still seems to be unused. So far as I can figure out the purpose of the new code seems to be to maintain two timestamps: cluster_add_time whose sole purpose seems to be to check against cg-create_time but I'm not quite sure why, and cluster_remove_time which seems to not do anything at all at the moment. I can't get any clues from dlm_controld because cluster_remove_time seems to be unused there as well, Right, cluster_add_time is used, but cluster_remove_time isn't, although it can be very useful to know for debugging. Dave Yes, but used for what exactly? What is the purpose of this bit of code? Steve.
Re: [Cluster-devel] Re: gfs2-utils: master - gfs_controld: Remove three unused functions
Hi, On Fri, 2009-10-16 at 11:33 -0500, David Teigland wrote: On Fri, Oct 16, 2009 at 05:01:18PM +0100, Steven Whitehouse wrote: Hi, On Fri, 2009-10-16 at 10:59 -0500, David Teigland wrote: On Fri, Oct 16, 2009 at 03:56:05PM +0100, Steven Whitehouse wrote: Hi, On Wed, 2009-10-14 at 12:53 -0500, David Teigland wrote: On Wed, Oct 14, 2009 at 02:55:04PM +, Steven Whitehouse wrote: gfs_controld: Remove three unused functions These functions are not called from anywhere and appear to be left over from earlier times. They were just added, but in translating the dlm_controld patch to gfs_controld I missed the bits that called them (both in cluster.git/STABLE3 and gfs2-utils.git) I'll reapply this bit with the bits that are missing. Dave I'm not sure I understand the purpose of this code. Is there more to come yet? The function find_mg_id() still seems to be unused. So far as I can figure out the purpose of the new code seems to be to maintain two timestamps: cluster_add_time whose sole purpose seems to be to check against cg-create_time but I'm not quite sure why, and cluster_remove_time which seems to not do anything at all at the moment. I can't get any clues from dlm_controld because cluster_remove_time seems to be unused there as well, Right, cluster_add_time is used, but cluster_remove_time isn't, although it can be very useful to know for debugging. Dave Yes, but used for what exactly? What is the purpose of this bit of code? This bit about cluster_add_time? + /* a node's start can't match a change if the node joined the cluster + more recently than the change was created */ + + node = get_node_history(mg, hd-nodeid); + if (!node) { + log_group(mg, match_change %d:%u skip cg %u no node history, + hd-nodeid, seq, cg-seq); + return 0; + } + + if (node-cluster_add_time cg-create_time) { + log_group(mg, match_change %d:%u skip cg %u created %llu + cluster add %llu, hd-nodeid, seq, cg-seq, + (unsigned long long)cg-create_time, + (unsigned long long)node-cluster_add_time); + return 0; + } Yes, and the other bits that were recently added too. It all seems to be be related. The commit gave a brief summary and pointed to this other commit for the long description of the problems with sorting out events after partitions and merges: http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=bcc5fdef8473d99399c624a7bc15423a2af645c1 That assuming that you trace the commit log back from gfs2-utils into dlm and thence into cluster.git. The question really is why we have all these (apparently) different ideas of cluster membership. Looking at gfs_controld itself, it uses two CPGs (one for all gfs_controlds which seems to only be used in negotiating the protocol, of which there seems to be only one anyway, and the other on a per mount group basis), each of which have their own idea of which cluster members exist. So I guess one question I have is, can we be certain the the per mount group CPG will always have a membership which is a subset of the all gfs_controlds CPG? Will the sequencing of delivery of membership events by synchronised between the two CPGs wrt to other events (i.e. message delivery) ? I guess that question might be more in Steve Dake's line, so I've cc'd him too. Given all that, what is the relationship which the membership events reported in this new quorum_callback() have with the above? From my earlier investigations, it appeared that dlm_controld was in charge of ensuring quorum was attained before fencing took place, so I'm not quite sure why that should affect gfs_controld directly. The thing that I've not quite figured out yet is why we need to record the times at all. My understanding of corosync is that it gives us a guaranteed ordering of events, so that I'd expect to see a sequence number rather than an actual timestamp. That is always assuming that the timestamp isn't just being used as a monotonic sequence number, of course. Steve.
Re: [Cluster-devel] [PATCH] GFS2: Improve statfs and quota usability, try 3
Hi, Now in the -nmw tree. Thanks, Steve. On Tue, 2009-10-20 at 02:39 -0500, Benjamin Marzinski wrote: GFS2 now has three new mount options, statfs_quantum, quota_quantum and statfs_percent. statfs_quantum and quota_quantum simply allow you to set the tunables of the same name. Setting setting statfs_quantum to 0 will also turn on the statfs_slow tunable. statfs_percent accepts an integer between 0 and 100. Numbers between 1 and 100 will cause GFS2 to do any early sync when the local number of blocks free changes by at least statfs_percent from the totoal number of blocks free. Setting statfs_percent to 0 disables this. Signed-off-by: Benjamin Marzinski bmarz...@redhat.com --- fs/gfs2/incore.h |4 ++ fs/gfs2/ops_fstype.c | 14 -- fs/gfs2/quota.c | 21 +-- fs/gfs2/quota.h |2 + fs/gfs2/super.c | 69 --- 5 files changed, 100 insertions(+), 10 deletions(-) Index: gfs2-2.6-nmw/fs/gfs2/incore.h === --- gfs2-2.6-nmw.orig/fs/gfs2/incore.h +++ gfs2-2.6-nmw/fs/gfs2/incore.h @@ -430,6 +430,9 @@ struct gfs2_args { unsigned int ar_discard:1; /* discard requests */ unsigned int ar_errors:2; /* errors=withdraw | panic */ int ar_commit; /* Commit interval */ + int ar_statfs_quantum; /* The fast statfs interval */ + int ar_quota_quantum; /* The quota interval */ + int ar_statfs_percent; /* The % change to force sync */ }; struct gfs2_tune { @@ -558,6 +561,7 @@ struct gfs2_sbd { spinlock_t sd_statfs_spin; struct gfs2_statfs_change_host sd_statfs_master; struct gfs2_statfs_change_host sd_statfs_local; + int sd_statfs_force_sync; /* Resource group stuff */ Index: gfs2-2.6-nmw/fs/gfs2/ops_fstype.c === --- gfs2-2.6-nmw.orig/fs/gfs2/ops_fstype.c +++ gfs2-2.6-nmw/fs/gfs2/ops_fstype.c @@ -63,13 +63,10 @@ static void gfs2_tune_init(struct gfs2_t gt-gt_quota_warn_period = 10; gt-gt_quota_scale_num = 1; gt-gt_quota_scale_den = 1; - gt-gt_quota_quantum = 60; gt-gt_new_files_jdata = 0; gt-gt_max_readahead = 1 18; gt-gt_stall_secs = 600; gt-gt_complain_secs = 10; - gt-gt_statfs_quantum = 30; - gt-gt_statfs_slow = 0; } static struct gfs2_sbd *init_sbd(struct super_block *sb) @@ -1153,6 +1150,15 @@ static int fill_super(struct super_block sdp-sd_fsb2bb = 1 sdp-sd_fsb2bb_shift; sdp-sd_tune.gt_log_flush_secs = sdp-sd_args.ar_commit; + sdp-sd_tune.gt_quota_quantum = sdp-sd_args.ar_quota_quantum; + if (sdp-sd_args.ar_statfs_quantum) { + sdp-sd_tune.gt_statfs_slow = 0; + sdp-sd_tune.gt_statfs_quantum = sdp-sd_args.ar_statfs_quantum; + } + else { + sdp-sd_tune.gt_statfs_slow = 1; + sdp-sd_tune.gt_statfs_quantum = 30; + } error = init_names(sdp, silent); if (error) @@ -1308,6 +1314,8 @@ static int gfs2_get_sb(struct file_syste args.ar_quota = GFS2_QUOTA_DEFAULT; args.ar_data = GFS2_DATA_DEFAULT; args.ar_commit = 60; + args.ar_statfs_quantum = 30; + args.ar_quota_quantum = 60; args.ar_errors = GFS2_ERRORS_DEFAULT; error = gfs2_mount_args(args, data); Index: gfs2-2.6-nmw/fs/gfs2/super.c === --- gfs2-2.6-nmw.orig/fs/gfs2/super.c +++ gfs2-2.6-nmw/fs/gfs2/super.c @@ -70,6 +70,9 @@ enum { Opt_commit, Opt_err_withdraw, Opt_err_panic, + Opt_statfs_quantum, + Opt_statfs_percent, + Opt_quota_quantum, Opt_error, }; @@ -101,6 +104,9 @@ static const match_table_t tokens = { {Opt_commit, commit=%d}, {Opt_err_withdraw, errors=withdraw}, {Opt_err_panic, errors=panic}, + {Opt_statfs_quantum, statfs_quantum=%d}, + {Opt_statfs_percent, statfs_percent=%d}, + {Opt_quota_quantum, quota_quantum=%d}, {Opt_error, NULL} }; @@ -214,6 +220,28 @@ int gfs2_mount_args(struct gfs2_args *ar return rv ? rv : -EINVAL; } break; + case Opt_statfs_quantum: + rv = match_int(tmp[0], args-ar_statfs_quantum); + if (rv || args-ar_statfs_quantum 0) { + printk(KERN_WARNING GFS2: statfs_quantum mount option requires a non-negative numeric argument\n); + return rv ? rv : -EINVAL; + } + break; + case Opt_quota_quantum: + rv = match_int(tmp[0], args-ar_quota_quantum); +
[Cluster-devel] Re: linux-next: Tree for October 26 (gfs2)
Hi, On Mon, 2009-10-26 at 08:43 -0700, Randy Dunlap wrote: On Mon, 26 Oct 2009 17:21:04 +1100 Stephen Rothwell wrote: Hi all, Changes since 20091016: on i386: (.text+0x723a8b): undefined reference to `__divdi3' super.c::gfs2_statfs_change(): percent = (100 * l_sc-sc_free) / m_sc-sc_free; I guess it needs to use div64() etc. --- ~Randy Yes, it looks like it, Steve.
[Cluster-devel] Re: [PATCH] GFS2: remove division from new statfs code
Hi, Now in the -nmw git tree. Thanks, Steve. On Mon, 2009-10-26 at 13:29 -0500, Benjamin Marzinski wrote: It's not necessary to do any 64bit division for the statfs sync code, so remove it. Signed-off-by: Benjamin Marzinski bmarz...@redhat.com --- fs/gfs2/super.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) Index: gfs2-2.6-nmw/fs/gfs2/super.c === --- gfs2-2.6-nmw.orig/fs/gfs2/super.c +++ gfs2-2.6-nmw/fs/gfs2/super.c @@ -472,7 +472,8 @@ void gfs2_statfs_change(struct gfs2_sbd struct gfs2_statfs_change_host *l_sc = sdp-sd_statfs_local; struct gfs2_statfs_change_host *m_sc = sdp-sd_statfs_master; struct buffer_head *l_bh; - int percent, sync_percent; + s64 x, y; + int need_sync = 0; int error; error = gfs2_meta_inode_buffer(l_ip, l_bh); @@ -486,16 +487,16 @@ void gfs2_statfs_change(struct gfs2_sbd l_sc-sc_free += free; l_sc-sc_dinodes += dinodes; gfs2_statfs_change_out(l_sc, l_bh-b_data + sizeof(struct gfs2_dinode)); - if (m_sc-sc_free) - percent = (100 * l_sc-sc_free) / m_sc-sc_free; - else - percent = 100; + if (sdp-sd_args.ar_statfs_percent) { + x = 100 * l_sc-sc_free; + y = m_sc-sc_free * sdp-sd_args.ar_statfs_percent; + if (x = y || x = -y) + need_sync = 1; + } spin_unlock(sdp-sd_statfs_spin); brelse(l_bh); - sync_percent = sdp-sd_args.ar_statfs_percent; - if (sync_percent (percent = sync_percent || - percent = -sync_percent)) + if (need_sync) gfs2_wake_up_statfs(sdp); }
[Cluster-devel] Re: [PATCH] gfs2: add barrier/nobarrier mount options
Hi, Thanks for the patch. I've pushed it to the -nmw tree now. I've also added a two-liner of my own to display the nobarrier option in /proc/mounts, Steve. On Fri, 2009-10-30 at 08:03 +0100, Christoph Hellwig wrote: Currently gfs2 issues barrier unconditionally. There are various reasons to disable them, be that just for testing or for stupid devices flushing large battert backed caches. Add a nobarrier option that matches xfs and btrfs for this. Also add a symmetric barrier option to turn it back on at remount time. Signed-off-by: Christoph Hellwig h...@lst.de Index: linux-2.6/fs/gfs2/incore.h === --- linux-2.6.orig/fs/gfs2/incore.h 2009-10-30 07:43:42.246023792 +0100 +++ linux-2.6/fs/gfs2/incore.h2009-10-30 07:44:11.173255988 +0100 @@ -429,6 +429,7 @@ struct gfs2_args { unsigned int ar_meta:1; /* mount metafs */ unsigned int ar_discard:1; /* discard requests */ unsigned int ar_errors:2; /* errors=withdraw | panic */ + unsigned int ar_nobarrier:1;/* do not send barriers */ int ar_commit; /* Commit interval */ }; Index: linux-2.6/fs/gfs2/super.c === --- linux-2.6.orig/fs/gfs2/super.c2009-10-30 07:44:29.832024397 +0100 +++ linux-2.6/fs/gfs2/super.c 2009-10-30 07:53:24.117033618 +0100 @@ -70,6 +70,8 @@ enum { Opt_commit, Opt_err_withdraw, Opt_err_panic, + Opt_barrier, + Opt_nobarrier, Opt_error, }; @@ -98,6 +100,8 @@ static const match_table_t tokens = { {Opt_meta, meta}, {Opt_discard, discard}, {Opt_nodiscard, nodiscard}, + {Opt_barrier, barrier}, + {Opt_nobarrier, nobarrier}, {Opt_commit, commit=%d}, {Opt_err_withdraw, errors=withdraw}, {Opt_err_panic, errors=panic}, @@ -207,6 +211,12 @@ int gfs2_mount_args(struct gfs2_sbd *sdp case Opt_nodiscard: args-ar_discard = 0; break; + case Opt_barrier: + args-ar_nobarrier = 0; + break; + case Opt_nobarrier: + args-ar_nobarrier = 1; + break; case Opt_commit: rv = match_int(tmp[0], args-ar_commit); if (rv || args-ar_commit = 0) { @@ -1097,6 +1107,10 @@ static int gfs2_remount_fs(struct super_ sb-s_flags |= MS_POSIXACL; else sb-s_flags = ~MS_POSIXACL; + if (sdp-sd_args.ar_nobarrier) + set_bit(SDF_NOBARRIERS, sdp-sd_flags); + else + clear_bit(SDF_NOBARRIERS, sdp-sd_flags); spin_lock(gt-gt_spin); gt-gt_log_flush_secs = args.ar_commit; spin_unlock(gt-gt_spin); Index: linux-2.6/fs/gfs2/ops_fstype.c === --- linux-2.6.orig/fs/gfs2/ops_fstype.c 2009-10-30 07:52:11.050003877 +0100 +++ linux-2.6/fs/gfs2/ops_fstype.c2009-10-30 07:52:53.053005337 +0100 @@ -1143,6 +1143,8 @@ static int fill_super(struct super_block } if (sdp-sd_args.ar_posix_acl) sb-s_flags |= MS_POSIXACL; + if (sdp-sd_args.ar_nobarrier) + set_bit(SDF_NOBARRIERS, sdp-sd_flags); sb-s_magic = GFS2_MAGIC; sb-s_op = gfs2_super_ops;
Re: [Cluster-devel] Re: [PATCH] misc: use a proper range for minor number dynamic allocation
Hi, On Mon, 2009-11-09 at 17:03 -0600, David Teigland wrote: On Mon, Nov 09, 2009 at 01:28:36PM -0800, Andrew Morton wrote: On Fri, 23 Oct 2009 21:28:17 -0200 Thadeu Lima de Souza Cascardo casca...@holoscopio.com wrote: The current dynamic allocation of minor number for misc devices has some drawbacks. First of all, the range for dynamic numbers include some statically allocated numbers. It goes from 63 to 0, and we have numbers in the range from 1 to 15 already allocated. Although, it gives priority to the higher and not allocated numbers, we may end up in a situation where we must reject registering a driver which got a static number because a driver got its number with dynamic allocation. Considering fs/dlm/user.c allocates as many misc devices as lockspaces are created, and that we have more than 50 users around, it's not unreasonable to reach that situation. What is this DLM behaviour of which you speak? It sounds broken. One for each userland lockspace, I know of three userland apps using dlm: 1. rgmanager which is at the end of its life 2. clvmd which is switching to a different lock manager 3. ocfs2 tools, where the userland portion is transient; it only exists while the tool executes. That said, it shouldn't be a problem to switch to a single device in the next version of the interface. Dave As well as the per-userland lockspace misc devices there are also the misc devices of which there are only one instance shared between all lock spaces: dlm_lock - Used for userland communication with posix locks dlm-monitor - Used to only to check that dlm_controld is running (so far as I can tell) dlm-control - Used to create/remove userland dlm lockspaces I also had a look at other methods used by the dlm to communicate with userspace, and this is what I've come up with so far: configfs - Used to set up lockspaces debugfs - Used to get lock state information for debugging netlink - Used only to notify lock timeouts to dlm_controld sysfs - Used to implement a wait for a userland event (wait for write to a sysfs file) uevents - Used to trigger dlm_controld into performing an action which results in the write to sysfs mentioned above. This is netlink again, but with a layer over the top of it. If a change to the misc devices is planned, I'm wondering if it would be possible to merge some of the other functions into a single interface to simplify things a bit. In particular the netlink interface looks dubious to me since I think it should be doing a broadcast rather than the rather strange (and possibly a security issue with any process able to send messages to it and set their own pid so far as I can see). I have to say that I didn't test that, but there is no obvious check for privs that I can see in the dlm netlink code. Steve.
[Cluster-devel] GFS2: Clean up recovery code
The following patch cleans up the recovery code and fixes a few bugs along the way. The bugs are: o An incorrect assumption about the size of the journal o An issue where the superblock was being used to store variables local to the recovery process which would cause a problem if multiple journals were recovered at once. o Can report incorrect counts of blocks read recovered in some cases (this is harmless, its just a logging issue) Features: o Moves the recovery code from lops.c into recovery.c which allows making a number of functions static and removing other bits of code. o Removes the before scan functions as they are not needed (partly merged into the scan functions) o Removes the after scan functions. These have also been merged into the scan functions o We no longer call any functions which may in turn call withdraw from the recovery code. If there is an issue with recovery, we report it to the caller (and userspace). o New uevent env variable is documented o Superblock shrinks by 32 bytes on 64 bit arches. o Code shrinks by about 100 lines (probably more since there are more comments now) TODO: o Report where error has occurred in log, as well as what the error is o Check code for finding journal headers (maybe remove gfs2_log_header_host?) o Testing :-) For the moment, this is just a heads up on what I'm working on. I hope it won't be too long before I have a final version of this patch, Steve. diff --git a/Documentation/filesystems/gfs2-uevents.txt b/Documentation/filesystems/gfs2-uevents.txt index fd966dc..c029596 100644 --- a/Documentation/filesystems/gfs2-uevents.txt +++ b/Documentation/filesystems/gfs2-uevents.txt @@ -44,6 +44,10 @@ for every journal recovered, whether it is during the initial mount process or as the result of gfs_controld requesting a specific journal recovery via the /sys/fs/gfs2/fsname/lock_module/recovery file. +If the recovery has failed, then on recent versions of GFS2 the +ERROR= variable will also be included. This returns a kernel +error code indicating what went wrong during recovery. + Because the CHANGE uevent was used (in early versions of gfs_controld) without checking the environment variables to discover the state, we cannot add any more functions to it without running the risk of diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index 4792200..e497aaf 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -50,12 +50,6 @@ struct gfs2_log_operations { void (*lo_add) (struct gfs2_sbd *sdp, struct gfs2_log_element *le); void (*lo_before_commit) (struct gfs2_sbd *sdp); void (*lo_after_commit) (struct gfs2_sbd *sdp, struct gfs2_ail *ai); - void (*lo_before_scan) (struct gfs2_jdesc *jd, - struct gfs2_log_header_host *head, int pass); - int (*lo_scan_elements) (struct gfs2_jdesc *jd, unsigned int start, -struct gfs2_log_descriptor *ld, __be64 *ptr, -int pass); - void (*lo_after_scan) (struct gfs2_jdesc *jd, int error, int pass); const char *lo_name; }; @@ -648,15 +642,6 @@ struct gfs2_sbd { struct list_head sd_ail2_list; u64 sd_ail_sync_gen; - /* Replay stuff */ - - struct list_head sd_revoke_list; - unsigned int sd_replay_tail; - - unsigned int sd_found_blocks; - unsigned int sd_found_revokes; - unsigned int sd_replayed_blocks; - /* For quiescing the filesystem */ struct gfs2_holder sd_freeze_gh; diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index de97632..4d301af 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -136,6 +136,12 @@ static void buf_lo_add(struct gfs2_sbd *sdp, struct gfs2_log_element *le) struct gfs2_trans *tr; lock_buffer(bd-bd_bh); + mh = (struct gfs2_meta_header *)bd-bd_bh-b_data; + if (unlikely(mh-mh_magic != cpu_to_be32(GFS2_MAGIC))) { + printk(KERN_ERR GFS2: %s mh error: buf_lo_add block %llu\n, + sdp-sd_fsname, (unsigned long long)bd-bd_bh-b_blocknr); + BUG(); + } gfs2_log_lock(sdp); if (!list_empty(bd-bd_list_tr)) goto out; @@ -147,9 +153,7 @@ static void buf_lo_add(struct gfs2_sbd *sdp, struct gfs2_log_element *le) goto out; set_bit(GLF_LFLUSH, bd-bd_gl-gl_flags); set_bit(GLF_DIRTY, bd-bd_gl-gl_flags); - gfs2_meta_check(sdp, bd-bd_bh); gfs2_pin(sdp, bd-bd_bh); - mh = (struct gfs2_meta_header *)bd-bd_bh-b_data; mh-__pad0 = cpu_to_be64(0); mh-mh_jid = cpu_to_be32(sdp-sd_jdesc-jd_jid); sdp-sd_log_num_buf++; @@ -235,84 +239,6 @@ static void buf_lo_after_commit(struct gfs2_sbd *sdp, struct gfs2_ail *ai) gfs2_assert_warn(sdp, !sdp-sd_log_num_buf); } -static void buf_lo_before_scan(struct gfs2_jdesc *jd, - struct gfs2_log_header_host *head, int
[Cluster-devel] Re: [PATCH 2/2] dlm: Add down/up_write_non_owner to keep lockdep happy
Hi, On Thu, 2009-11-12 at 17:45 +0100, Peter Zijlstra wrote: On Thu, 2009-11-12 at 11:14 -0600, David Teigland wrote: up_write_non_owner() addresses this trace, which as you say, is from doing the down and up from different threads (which is the intention): That's really something I cannot advice to do. Aside from loosing lock-dependency validation (not a good thing), asymmetric locking like that is generally very hard to analyze since its not clear who 'owns' what data when. There are a few places in the kernel that use the non_owner things, and we should generally strive to remove them, not add more. Please consider solving your problem without adding things like this. The code that does this already exists - it is not being added by the patch. Its just that in recent kernels lockdep has started noticing the problem. I did seriously consider changing the locking rather than just silencing the messages, but it looks rather complicated and not easily replaced with other primitives. Any suggestions as to a better solution are welcome, Steve.
[Cluster-devel] Re: [PATCH 2/2] dlm: Add down/up_write_non_owner to keep lockdep happy
Hi, On Thu, 2009-11-12 at 12:34 -0600, David Teigland wrote: On Thu, Nov 12, 2009 at 05:24:12PM +, Steven Whitehouse wrote: Nov 12 15:10:01 chywoon kernel: [ INFO: possible recursive locking detected ] That recursive locking trace is something different. up_write_non_owner() addresses this trace, which as you say, is from doing the down and up from different threads (which is the intention): I don't think it is different, the traces differ due to the ordering of running of dlm_recoverd and mount.gfs2, I explained the recursive locking warning back in Sep: I've not looked at how to remove this recursive message. What happens is that mount calls dlm_new_lockspace() which returns with in_recovery locked. mount then makes a lock request which blocks on in_recovery (as expected) until the dlm_recoverd thread completes recovery and releases the in_recovery lock (triggering the unlock balance) to allow locking activity. It doesn't appear to me that up_write_non_owner() would suppress that. Dave It is simply down to the ordering of the running of the threads as to which message you get at mount time. There are two possible scenarios: Scenario 1: 1. mount.gfs2 calls (via mount sys call and gfs2) dlm_newlockspace() which takes the ls_in_recovery rwsem with a down_write() 2. mount.gfs2 goes on to try and take out a lock on the filesystem, and calls dlm_lock which tries to do a down_read() on the rwsem. Since this is from the same thread as the down_write() you get the recursive locking message reported in the dmesg which I attached to my earlier email. In the second scenario, dlm_recoverd runs between step 1 and 2 above. this results in the trace which you reported, since ls_in_recovery has then been unlocked from a different thread, which creates the unlocking balance trace which you posted. In both cases the cause is the same, its just the running order of the threads which results in it being reported in a different way. The patch should fix both of these reports, since it annotates the up down write side of the rwsem, Steve.
[Cluster-devel] GFS2: Move glock ref count drop out of finish_xmote
There have been a couple of instances reported recently where the glock ref count has hit zero too soon. Since the only time when this can happen is on the demote path (in other cases the ref count is held elevated by the callers, as well as in the lock operation itself) there is a good chance that the culprit is at the end of finish_xmote. This patch removes the ref count drop from the end of finish_xmote and moves it into the callers of that function. This will ensure that in future the ref count cannot be dropped too early. Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index a3f90ad..3bc7d98 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -513,7 +513,6 @@ retry: GLOCK_BUG_ON(gl, 1); } spin_unlock(gl-gl_spin); - gfs2_glock_put(gl); return; } @@ -524,8 +523,6 @@ retry: if (glops-go_xmote_bh) { spin_unlock(gl-gl_spin); rv = glops-go_xmote_bh(gl, gh); - if (rv == -EAGAIN) - return; spin_lock(gl-gl_spin); if (rv) { do_error(gl, rv); @@ -540,7 +537,6 @@ out: clear_bit(GLF_LOCK, gl-gl_flags); out_locked: spin_unlock(gl-gl_spin); - gfs2_glock_put(gl); } static unsigned int gfs2_lm_lock(struct gfs2_sbd *sdp, void *lock, @@ -600,7 +596,6 @@ __acquires(gl-gl_spin) if (!(ret LM_OUT_ASYNC)) { finish_xmote(gl, ret); - gfs2_glock_hold(gl); if (queue_delayed_work(glock_workqueue, gl-gl_work, 0) == 0) gfs2_glock_put(gl); } else { @@ -712,9 +707,12 @@ static void glock_work_func(struct work_struct *work) { unsigned long delay = 0; struct gfs2_glock *gl = container_of(work, struct gfs2_glock, gl_work.work); + int drop_ref = 0; - if (test_and_clear_bit(GLF_REPLY_PENDING, gl-gl_flags)) + if (test_and_clear_bit(GLF_REPLY_PENDING, gl-gl_flags)) { finish_xmote(gl, gl-gl_reply); + drop_ref = 1; + } down_read(gfs2_umount_flush_sem); spin_lock(gl-gl_spin); if (test_and_clear_bit(GLF_PENDING_DEMOTE, gl-gl_flags) @@ -732,6 +730,8 @@ static void glock_work_func(struct work_struct *work) if (!delay || queue_delayed_work(glock_workqueue, gl-gl_work, delay) == 0) gfs2_glock_put(gl); + if (drop_ref) + gfs2_glock_put(gl); } /**
Re: [Cluster-devel] [RFC] Proposal to align autotool versions
Hi, On Tue, 2009-11-24 at 14:33 +0100, Fabio M. Di Nitto wrote: Hi guys, I just completed testing of autotools in F13/rawhide and they seem to fulfill perfectly what we need so far. Fedora13 has: libtool 2.2.6 (doesn´t carry the bug for which we were forcing 2.2.7) autoconf 2.64 (higher than what we require now) automake 1.11/m4/pkg-config in more than recent enough versions. corosync/openais will eventually get libtool support. cluster-stable3 autotool implementation is eventually on the schedule. cluster/master trees are already ported. My suggestion is simply to use those versions across the projects so that developers will not require any longer to manually build autotools and can start easily testing again master trees. Please ACK/NACK. Fabio The sooner the better as far as I'm concerned, Steve.
[Cluster-devel] GFS2: Extra early pre-pull patch posting
Due to the larger than usual content of new items in this patch set I'm posting it a bit earlier than normal so that there is more time for review. There are a few bug fixes in this set, but most of the content is new code relating to xattrs and quotas. The ACL support is cleaned up and support for caching of ACLs has been added. At the same time the xattr support has been fixed and clean up too. There are a series of patches which add support for the XFS-style quota interface to GFS2. There has always been support for quotas in GFS2, but the interface was via a userland tool which manipulated the quota file directly. Due to the way in which the GFS2 quotas were implemented, they were a better fit for the XFS-style interface than the dquot interface, so that was the one which we chose to use. We do not support all features of the XFS quotas though (we don't have project quotas) and quotas are also turned on and off only via mount options as we still do not support the set_xstate function (but we do allow querying of the current quota state via get_xstate). Aside from that, it does cover most of the XFS feature set, and everything that is needed to manipulate all supported GFS2 quota types. The userland tools for generic quota manipulation do not yet understand how to talk to GFS2's quota interface as they assume that only XFS uses the XFS-style quota interface. That is a future project. In addition to that, the quota netlink notification interface is made into a generic feature so that GFS2 can use it as well as dquot based systems. Other features: o Added a barrier/nobarrier option in common with other filesystems (N.B. this defaults to on if it isn't specified) o A spare field in our common-to-many-objects metadata header is now used to write the journal id of the last node to modify that bit of metadata. This is ignored by the filesystem, but useful for debugging purposes. I have spotted that one of the patches starts FS2: instead of GFS2: and I'll try and fix that before the merge. Its a pain as it is part way down the patch series and I don't think I can fix it without rebasing the tree. Let me know if you spot anything else thats wrong, Steve.
[Cluster-devel] [PATCH 02/30] GFS2: Fix -o meta mounts for subsequent mounts (i.e. all but the first one)
We have a long term plan to use the -o meta flag to GFS2 mounts to access the alternate root which is used to store metadata for a GFS2 filesystem. This will allow us to eventually remove support for the gfs2meta filesystem type (which is in any case just a front end to the gfs2 filesystem type with the meta/master root). Currently the -o meta option is only taken into account on the initial mount of the filesystem. Subsequent mounts of the same filesystem (i.e. on the same device) result in basically the same as bind mounting the root of the original mount. This patch changes that by using what is more or less a copy of get_sb_bdev() and extending it so that it will take into account the alternate root in all cases. The main difference is that we have to parse the mount options a bit earlier. We can then use them to select the appropriate root towards the end of the function. In addition this also fixes a bug where it was possible (but certainly not desirable) to set different ro/rw options for the meta root when mounted via the gfs2meta fs compared with the original mount. Signed-off-by: Steven Whitehouse swhit...@redhat.com Cc: Alexander Viro av...@redhat.com --- fs/gfs2/ops_fstype.c | 135 +++-- fs/gfs2/super.c | 16 +++--- fs/gfs2/super.h |2 +- 3 files changed, 127 insertions(+), 26 deletions(-) diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 52fb6c0..e5ee062 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -1114,7 +1114,7 @@ void gfs2_online_uevent(struct gfs2_sbd *sdp) * Returns: errno */ -static int fill_super(struct super_block *sb, void *data, int silent) +static int fill_super(struct super_block *sb, struct gfs2_args *args, int silent) { struct gfs2_sbd *sdp; struct gfs2_holder mount_gh; @@ -1125,17 +1125,7 @@ static int fill_super(struct super_block *sb, void *data, int silent) printk(KERN_WARNING GFS2: can't alloc struct gfs2_sbd\n); return -ENOMEM; } - - sdp-sd_args.ar_quota = GFS2_QUOTA_DEFAULT; - sdp-sd_args.ar_data = GFS2_DATA_DEFAULT; - sdp-sd_args.ar_commit = 60; - sdp-sd_args.ar_errors = GFS2_ERRORS_DEFAULT; - - error = gfs2_mount_args(sdp, sdp-sd_args, data); - if (error) { - printk(KERN_WARNING GFS2: can't parse mount arguments\n); - goto fail; - } + sdp-sd_args = *args; if (sdp-sd_args.ar_spectator) { sb-s_flags |= MS_RDONLY; @@ -1243,18 +1233,125 @@ fail: return error; } -static int gfs2_get_sb(struct file_system_type *fs_type, int flags, - const char *dev_name, void *data, struct vfsmount *mnt) +static int set_gfs2_super(struct super_block *s, void *data) { - return get_sb_bdev(fs_type, flags, dev_name, data, fill_super, mnt); + s-s_bdev = data; + s-s_dev = s-s_bdev-bd_dev; + + /* +* We set the bdi here to the queue backing, file systems can +* overwrite this in -fill_super() +*/ + s-s_bdi = bdev_get_queue(s-s_bdev)-backing_dev_info; + return 0; } -static int test_meta_super(struct super_block *s, void *ptr) +static int test_gfs2_super(struct super_block *s, void *ptr) { struct block_device *bdev = ptr; return (bdev == s-s_bdev); } +/** + * gfs2_get_sb - Get the GFS2 superblock + * @fs_type: The GFS2 filesystem type + * @flags: Mount flags + * @dev_name: The name of the device + * @data: The mount arguments + * @mnt: The vfsmnt for this mount + * + * Q. Why not use get_sb_bdev() ? + * A. We need to select one of two root directories to mount, independent + *of whether this is the initial, or subsequent, mount of this sb + * + * Returns: 0 or -ve on error + */ + +static int gfs2_get_sb(struct file_system_type *fs_type, int flags, + const char *dev_name, void *data, struct vfsmount *mnt) +{ + struct block_device *bdev; + struct super_block *s; + fmode_t mode = FMODE_READ; + int error; + struct gfs2_args args; + struct gfs2_sbd *sdp; + + if (!(flags MS_RDONLY)) + mode |= FMODE_WRITE; + + bdev = open_bdev_exclusive(dev_name, mode, fs_type); + if (IS_ERR(bdev)) + return PTR_ERR(bdev); + + /* +* once the super is inserted into the list by sget, s_umount +* will protect the lockfs code from trying to start a snapshot +* while we are mounting +*/ + mutex_lock(bdev-bd_fsfreeze_mutex); + if (bdev-bd_fsfreeze_count 0) { + mutex_unlock(bdev-bd_fsfreeze_mutex); + error = -EBUSY; + goto error_bdev; + } + s = sget(fs_type, test_gfs2_super, set_gfs2_super, bdev); + mutex_unlock(bdev-bd_fsfreeze_mutex); + error = PTR_ERR(s); + if (IS_ERR(s)) + goto error_bdev; + + memset(args
[Cluster-devel] [PATCH 03/30] GFS2: Fix up system xattrs
This code has been shamelessly stolen from XFS at the suggestion of Christoph Hellwig. I've not added support for cached ACLs so far... watch for that in a later patch, although this is designed in such a way that they should be easy to add. Signed-off-by: Steven Whitehouse swhit...@redhat.com Cc: Christoph Hellwig h...@infradead.org --- fs/gfs2/acl.c | 170 +-- fs/gfs2/acl.h | 24 ++-- fs/gfs2/xattr.c | 18 -- 3 files changed, 120 insertions(+), 92 deletions(-) diff --git a/fs/gfs2/acl.c b/fs/gfs2/acl.c index 3fc4e3a..2168da1 100644 --- a/fs/gfs2/acl.c +++ b/fs/gfs2/acl.c @@ -12,6 +12,7 @@ #include linux/spinlock.h #include linux/completion.h #include linux/buffer_head.h +#include linux/xattr.h #include linux/posix_acl.h #include linux/posix_acl_xattr.h #include linux/gfs2_ondisk.h @@ -26,61 +27,6 @@ #include trans.h #include util.h -#define ACL_ACCESS 1 -#define ACL_DEFAULT 0 - -int gfs2_acl_validate_set(struct gfs2_inode *ip, int access, - struct gfs2_ea_request *er, int *remove, mode_t *mode) -{ - struct posix_acl *acl; - int error; - - error = gfs2_acl_validate_remove(ip, access); - if (error) - return error; - - if (!er-er_data) - return -EINVAL; - - acl = posix_acl_from_xattr(er-er_data, er-er_data_len); - if (IS_ERR(acl)) - return PTR_ERR(acl); - if (!acl) { - *remove = 1; - return 0; - } - - error = posix_acl_valid(acl); - if (error) - goto out; - - if (access) { - error = posix_acl_equiv_mode(acl, mode); - if (!error) - *remove = 1; - else if (error 0) - error = 0; - } - -out: - posix_acl_release(acl); - return error; -} - -int gfs2_acl_validate_remove(struct gfs2_inode *ip, int access) -{ - if (!GFS2_SB(ip-i_inode)-sd_args.ar_posix_acl) - return -EOPNOTSUPP; - if (!is_owner_or_cap(ip-i_inode)) - return -EPERM; - if (S_ISLNK(ip-i_inode.i_mode)) - return -EOPNOTSUPP; - if (!access !S_ISDIR(ip-i_inode.i_mode)) - return -EACCES; - - return 0; -} - static int acl_get(struct gfs2_inode *ip, const char *name, struct posix_acl **acl, struct gfs2_ea_location *el, char **datap, unsigned int *lenp) @@ -277,3 +223,117 @@ out_brelse: return error; } +static int gfs2_acl_type(const char *name) +{ + if (strcmp(name, GFS2_POSIX_ACL_ACCESS) == 0) + return ACL_TYPE_ACCESS; + if (strcmp(name, GFS2_POSIX_ACL_DEFAULT) == 0) + return ACL_TYPE_DEFAULT; + return -EINVAL; +} + +static int gfs2_xattr_system_get(struct inode *inode, const char *name, +void *buffer, size_t size) +{ + int type; + + type = gfs2_acl_type(name); + if (type 0) + return type; + + return gfs2_xattr_get(inode, GFS2_EATYPE_SYS, name, buffer, size); +} + +static int gfs2_set_mode(struct inode *inode, mode_t mode) +{ + int error = 0; + + if (mode != inode-i_mode) { + struct iattr iattr; + + iattr.ia_valid = ATTR_MODE; + iattr.ia_mode = mode; + + error = gfs2_setattr_simple(GFS2_I(inode), iattr); + } + + return error; +} + +static int gfs2_xattr_system_set(struct inode *inode, const char *name, +const void *value, size_t size, int flags) +{ + struct gfs2_sbd *sdp = GFS2_SB(inode); + struct posix_acl *acl = NULL; + int error = 0, type; + + if (!sdp-sd_args.ar_posix_acl) + return -EOPNOTSUPP; + + type = gfs2_acl_type(name); + if (type 0) + return type; + if (flags XATTR_CREATE) + return -EINVAL; + if (type == ACL_TYPE_DEFAULT !S_ISDIR(inode-i_mode)) + return value ? -EACCES : 0; + if ((current_fsuid() != inode-i_uid) !capable(CAP_FOWNER)) + return -EPERM; + if (S_ISLNK(inode-i_mode)) + return -EOPNOTSUPP; + + if (!value) + goto set_acl; + + acl = posix_acl_from_xattr(value, size); + if (!acl) { + /* +* acl_set_file(3) may request that we set default ACLs with +* zero length -- defend (gracefully) against that here. +*/ + goto out; + } + if (IS_ERR(acl)) { + error = PTR_ERR(acl); + goto out; + } + + error = posix_acl_valid(acl); + if (error) + goto out_release; + + error = -EINVAL; + if (acl-a_count GFS2_ACL_MAX_ENTRIES) + goto out_release; + + if (type == ACL_TYPE_ACCESS
[Cluster-devel] [PATCH 04/30] VFS: Add forget_all_cached_acls()
This is required for cluster filesystems which want to use cached ACLs so that they can invalidate the cache when required. Signed-off-by: Steven Whitehouse swhit...@redhat.com Cc: Alexander Viro av...@redhat.com Cc: Christoph Hellwig h...@infradead.org --- include/linux/posix_acl.h | 14 ++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/include/linux/posix_acl.h b/include/linux/posix_acl.h index 065a365..6760816 100644 --- a/include/linux/posix_acl.h +++ b/include/linux/posix_acl.h @@ -147,6 +147,20 @@ static inline void forget_cached_acl(struct inode *inode, int type) if (old != ACL_NOT_CACHED) posix_acl_release(old); } + +static inline void forget_all_cached_acls(struct inode *inode) +{ + struct posix_acl *old_access, *old_default; + spin_lock(inode-i_lock); + old_access = inode-i_acl; + old_default = inode-i_default_acl; + inode-i_acl = inode-i_default_acl = ACL_NOT_CACHED; + spin_unlock(inode-i_lock); + if (old_access != ACL_NOT_CACHED) + posix_acl_release(old_access); + if (old_default != ACL_NOT_CACHED) + posix_acl_release(old_default); +} #endif static inline void cache_no_acl(struct inode *inode) -- 1.6.2.5
[Cluster-devel] [PATCH 05/30] GFS2: Use forget_all_cached_acls()
Invalidate all the cached ACLs when we drop the glock. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/glops.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c index 6985eef..78554ac 100644 --- a/fs/gfs2/glops.c +++ b/fs/gfs2/glops.c @@ -13,6 +13,7 @@ #include linux/buffer_head.h #include linux/gfs2_ondisk.h #include linux/bio.h +#include linux/posix_acl.h #include gfs2.h #include incore.h @@ -184,8 +185,10 @@ static void inode_go_inval(struct gfs2_glock *gl, int flags) if (flags DIO_METADATA) { struct address_space *mapping = gl-gl_aspace-i_mapping; truncate_inode_pages(mapping, 0); - if (ip) + if (ip) { set_bit(GIF_INVALID, ip-i_flags); + forget_all_cached_acls(ip-i_inode); + } } if (ip == GFS2_I(gl-gl_sbd-sd_rindex)) -- 1.6.2.5
[Cluster-devel] [PATCH 06/30] GFS2: Use gfs2_set_mode() instead of munge_mode()
These two functions do the same thing, so lets only use one of them. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/acl.c | 46 +++--- 1 files changed, 11 insertions(+), 35 deletions(-) diff --git a/fs/gfs2/acl.c b/fs/gfs2/acl.c index 2168da1..1be3148 100644 --- a/fs/gfs2/acl.c +++ b/fs/gfs2/acl.c @@ -104,29 +104,20 @@ int gfs2_check_acl(struct inode *inode, int mask) return -EAGAIN; } -static int munge_mode(struct gfs2_inode *ip, mode_t mode) +static int gfs2_set_mode(struct inode *inode, mode_t mode) { - struct gfs2_sbd *sdp = GFS2_SB(ip-i_inode); - struct buffer_head *dibh; - int error; + int error = 0; - error = gfs2_trans_begin(sdp, RES_DINODE, 0); - if (error) - return error; + if (mode != inode-i_mode) { + struct iattr iattr; - error = gfs2_meta_inode_buffer(ip, dibh); - if (!error) { - gfs2_assert_withdraw(sdp, - (ip-i_inode.i_mode S_IFMT) == (mode S_IFMT)); - ip-i_inode.i_mode = mode; - gfs2_trans_add_bh(ip-i_gl, dibh, 1); - gfs2_dinode_out(ip, dibh-b_data); - brelse(dibh); - } + iattr.ia_valid = ATTR_MODE; + iattr.ia_mode = mode; - gfs2_trans_end(sdp); + error = gfs2_setattr_simple(GFS2_I(inode), iattr); + } - return 0; + return error; } int gfs2_acl_create(struct gfs2_inode *dip, struct gfs2_inode *ip) @@ -151,7 +142,7 @@ int gfs2_acl_create(struct gfs2_inode *dip, struct gfs2_inode *ip) if (!acl) { mode = ~current_umask(); if (mode != ip-i_inode.i_mode) - error = munge_mode(ip, mode); + error = gfs2_set_mode(ip-i_inode, mode); return error; } @@ -181,7 +172,7 @@ int gfs2_acl_create(struct gfs2_inode *dip, struct gfs2_inode *ip) if (error) goto out; munge: - error = munge_mode(ip, mode); + error = gfs2_set_mode(ip-i_inode, mode); out: posix_acl_release(acl); kfree(data); @@ -244,21 +235,6 @@ static int gfs2_xattr_system_get(struct inode *inode, const char *name, return gfs2_xattr_get(inode, GFS2_EATYPE_SYS, name, buffer, size); } -static int gfs2_set_mode(struct inode *inode, mode_t mode) -{ - int error = 0; - - if (mode != inode-i_mode) { - struct iattr iattr; - - iattr.ia_valid = ATTR_MODE; - iattr.ia_mode = mode; - - error = gfs2_setattr_simple(GFS2_I(inode), iattr); - } - - return error; -} static int gfs2_xattr_system_set(struct inode *inode, const char *name, const void *value, size_t size, int flags) -- 1.6.2.5
[Cluster-devel] [PATCH 07/30] GFS2: Clean up ACLs
To prepare for support for caching of ACLs, this cleans up the GFS2 ACL support by pushing the xattr code back into xattr.c and changing the acl_get function into one which only returns ACLs so that we can drop the caching function into it shortly. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/acl.c | 164 --- fs/gfs2/acl.h |2 +- fs/gfs2/inode.c |2 +- fs/gfs2/xattr.c | 56 +++ fs/gfs2/xattr.h |8 +-- 5 files changed, 132 insertions(+), 100 deletions(-) diff --git a/fs/gfs2/acl.c b/fs/gfs2/acl.c index 1be3148..bd0fce9 100644 --- a/fs/gfs2/acl.c +++ b/fs/gfs2/acl.c @@ -27,53 +27,40 @@ #include trans.h #include util.h -static int acl_get(struct gfs2_inode *ip, const char *name, - struct posix_acl **acl, struct gfs2_ea_location *el, - char **datap, unsigned int *lenp) +static const char *gfs2_acl_name(int type) { - char *data; - unsigned int len; - int error; + switch (type) { + case ACL_TYPE_ACCESS: + return GFS2_POSIX_ACL_ACCESS; + case ACL_TYPE_DEFAULT: + return GFS2_POSIX_ACL_DEFAULT; + } + return NULL; +} - el-el_bh = NULL; +static struct posix_acl *gfs2_acl_get(struct gfs2_inode *ip, int type) +{ + struct posix_acl *acl; + const char *name; + char *data; + int len; if (!ip-i_eattr) - return 0; - - error = gfs2_ea_find(ip, GFS2_EATYPE_SYS, name, el); - if (error) - return error; - if (!el-el_ea) - return 0; - if (!GFS2_EA_DATA_LEN(el-el_ea)) - goto out; - - len = GFS2_EA_DATA_LEN(el-el_ea); - data = kmalloc(len, GFP_NOFS); - error = -ENOMEM; - if (!data) - goto out; + return NULL; - error = gfs2_ea_get_copy(ip, el, data, len); - if (error 0) - goto out_kfree; - error = 0; + name = gfs2_acl_name(type); + if (name == NULL) + return ERR_PTR(-EINVAL); - if (acl) { - *acl = posix_acl_from_xattr(data, len); - if (IS_ERR(*acl)) - error = PTR_ERR(*acl); - } + len = gfs2_xattr_acl_get(ip, name, data); + if (len 0) + return ERR_PTR(len); + if (len == 0) + return NULL; -out_kfree: - if (error || !datap) { - kfree(data); - } else { - *datap = data; - *lenp = len; - } -out: - return error; + acl = posix_acl_from_xattr(data, len); + kfree(data); + return acl; } /** @@ -86,14 +73,12 @@ out: int gfs2_check_acl(struct inode *inode, int mask) { - struct gfs2_ea_location el; - struct posix_acl *acl = NULL; + struct posix_acl *acl; int error; - error = acl_get(GFS2_I(inode), GFS2_POSIX_ACL_ACCESS, acl, el, NULL, NULL); - brelse(el.el_bh); - if (error) - return error; + acl = gfs2_acl_get(GFS2_I(inode), ACL_TYPE_ACCESS); + if (IS_ERR(acl)) + return PTR_ERR(acl); if (acl) { error = posix_acl_permission(inode, acl, mask); @@ -120,32 +105,57 @@ static int gfs2_set_mode(struct inode *inode, mode_t mode) return error; } -int gfs2_acl_create(struct gfs2_inode *dip, struct gfs2_inode *ip) +static int gfs2_acl_set(struct inode *inode, int type, struct posix_acl *acl) { - struct gfs2_ea_location el; - struct gfs2_sbd *sdp = GFS2_SB(dip-i_inode); - struct posix_acl *acl = NULL, *clone; - mode_t mode = ip-i_inode.i_mode; - char *data = NULL; - unsigned int len; int error; + int len; + char *data; + const char *name = gfs2_acl_name(type); + + BUG_ON(name == NULL); + len = posix_acl_to_xattr(acl, NULL, 0); + if (len == 0) + return 0; + data = kmalloc(len, GFP_NOFS); + if (data == NULL) + return -ENOMEM; + error = posix_acl_to_xattr(acl, data, len); + if (error 0) + goto out; + error = gfs2_xattr_set(inode, GFS2_EATYPE_SYS, name, data, len, 0); +out: + kfree(data); + return error; +} + +int gfs2_acl_create(struct gfs2_inode *dip, struct inode *inode) +{ + struct gfs2_sbd *sdp = GFS2_SB(dip-i_inode); + struct posix_acl *acl, *clone; + mode_t mode = inode-i_mode; + int error = 0; if (!sdp-sd_args.ar_posix_acl) return 0; - if (S_ISLNK(ip-i_inode.i_mode)) + if (S_ISLNK(inode-i_mode)) return 0; - error = acl_get(dip, GFS2_POSIX_ACL_DEFAULT, acl, el, data, len); - brelse(el.el_bh); - if (error) - return error; + acl = gfs2_acl_get(dip, ACL_TYPE_DEFAULT); + if (IS_ERR(acl
[Cluster-devel] [PATCH 08/30] GFS2: Add cached ACLs support
The other patches in this series have been building towards being able to support cached ACLs like other filesystems. The only real difference with GFS2 is that we have to invalidate the cache when we drop a glock, but that is dealt with in earlier patches. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/acl.c | 27 +-- 1 files changed, 25 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/acl.c b/fs/gfs2/acl.c index bd0fce9..3eb1ea8 100644 --- a/fs/gfs2/acl.c +++ b/fs/gfs2/acl.c @@ -48,6 +48,10 @@ static struct posix_acl *gfs2_acl_get(struct gfs2_inode *ip, int type) if (!ip-i_eattr) return NULL; + acl = get_cached_acl(ip-i_inode, type); + if (acl != ACL_NOT_CACHED) + return acl; + name = gfs2_acl_name(type); if (name == NULL) return ERR_PTR(-EINVAL); @@ -123,6 +127,8 @@ static int gfs2_acl_set(struct inode *inode, int type, struct posix_acl *acl) if (error 0) goto out; error = gfs2_xattr_set(inode, GFS2_EATYPE_SYS, name, data, len, 0); + if (!error) + set_cached_acl(inode, type, acl); out: kfree(data); return error; @@ -209,6 +215,7 @@ int gfs2_acl_chmod(struct gfs2_inode *ip, struct iattr *attr) posix_acl_to_xattr(acl, data, len); error = gfs2_xattr_acl_chmod(ip, attr, data); kfree(data); + set_cached_acl(ip-i_inode, ACL_TYPE_ACCESS, acl); } out: @@ -228,15 +235,25 @@ static int gfs2_acl_type(const char *name) static int gfs2_xattr_system_get(struct inode *inode, const char *name, void *buffer, size_t size) { + struct posix_acl *acl; int type; + int error; type = gfs2_acl_type(name); if (type 0) return type; - return gfs2_xattr_get(inode, GFS2_EATYPE_SYS, name, buffer, size); -} + acl = gfs2_acl_get(GFS2_I(inode), type); + if (IS_ERR(acl)) + return PTR_ERR(acl); + if (acl == NULL) + return -ENODATA; + error = posix_acl_to_xattr(acl, buffer, size); + posix_acl_release(acl); + + return error; +} static int gfs2_xattr_system_set(struct inode *inode, const char *name, const void *value, size_t size, int flags) @@ -303,6 +320,12 @@ static int gfs2_xattr_system_set(struct inode *inode, const char *name, set_acl: error = gfs2_xattr_set(inode, GFS2_EATYPE_SYS, name, value, size, 0); + if (!error) { + if (acl) + set_cached_acl(inode, type, acl); + else + forget_cached_acl(inode, type); + } out_release: posix_acl_release(acl); out: -- 1.6.2.5
[Cluster-devel] [PATCH 09/30] VFS: Use GFP_NOFS in posix_acl_from_xattr()
GFS2 needs to call this from under a glock, so we need GFP_NOFS and I suspect that other filesystems might require this too. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/xattr_acl.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/xattr_acl.c b/fs/xattr_acl.c index c6ad7c7..05ac0fe 100644 --- a/fs/xattr_acl.c +++ b/fs/xattr_acl.c @@ -36,7 +36,7 @@ posix_acl_from_xattr(const void *value, size_t size) if (count == 0) return NULL; - acl = posix_acl_alloc(count, GFP_KERNEL); + acl = posix_acl_alloc(count, GFP_NOFS); if (!acl) return ERR_PTR(-ENOMEM); acl_e = acl-a_entries; -- 1.6.2.5
[Cluster-devel] [PATCH 10/30] GFS2: Alter arguments of gfs2_quota/statfs_sync
These two functions are altered so that gfs2_quota_sync may in future be called directly from the VFS. The GFS2 superblock changes to a VFS super block and there is an addition of an int argument which is currently ignored. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/quota.c |7 --- fs/gfs2/quota.h |2 +- fs/gfs2/super.c |7 --- fs/gfs2/super.h |2 +- fs/gfs2/sys.c |4 ++-- 5 files changed, 12 insertions(+), 10 deletions(-) diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index 2e9b932..ed9e197 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -1069,8 +1069,9 @@ void gfs2_quota_change(struct gfs2_inode *ip, s64 change, } } -int gfs2_quota_sync(struct gfs2_sbd *sdp) +int gfs2_quota_sync(struct super_block *sb, int type) { + struct gfs2_sbd *sdp = sb-s_fs_info; struct gfs2_quota_data **qda; unsigned int max_qd = gfs2_tune_get(sdp, gt_quota_simul_sync); unsigned int num_qd; @@ -1298,12 +1299,12 @@ static void quotad_error(struct gfs2_sbd *sdp, const char *msg, int error) } static void quotad_check_timeo(struct gfs2_sbd *sdp, const char *msg, - int (*fxn)(struct gfs2_sbd *sdp), + int (*fxn)(struct super_block *sb, int type), unsigned long t, unsigned long *timeo, unsigned int *new_timeo) { if (t = *timeo) { - int error = fxn(sdp); + int error = fxn(sdp-sd_vfs, 0); quotad_error(sdp, msg, error); *timeo = gfs2_tune_get_i(sdp-sd_tune, new_timeo) * HZ; } else { diff --git a/fs/gfs2/quota.h b/fs/gfs2/quota.h index 0fa5fa6..437afa7 100644 --- a/fs/gfs2/quota.h +++ b/fs/gfs2/quota.h @@ -25,7 +25,7 @@ extern int gfs2_quota_check(struct gfs2_inode *ip, u32 uid, u32 gid); extern void gfs2_quota_change(struct gfs2_inode *ip, s64 change, u32 uid, u32 gid); -extern int gfs2_quota_sync(struct gfs2_sbd *sdp); +extern int gfs2_quota_sync(struct super_block *sb, int type); extern int gfs2_quota_refresh(struct gfs2_sbd *sdp, int user, u32 id); extern int gfs2_quota_init(struct gfs2_sbd *sdp); diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 42e5458..e7b24d5 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -484,8 +484,9 @@ void update_statfs(struct gfs2_sbd *sdp, struct buffer_head *m_bh, gfs2_statfs_change_out(m_sc, m_bh-b_data + sizeof(struct gfs2_dinode)); } -int gfs2_statfs_sync(struct gfs2_sbd *sdp) +int gfs2_statfs_sync(struct super_block *sb, int type) { + struct gfs2_sbd *sdp = sb-s_fs_info; struct gfs2_inode *m_ip = GFS2_I(sdp-sd_statfs_inode); struct gfs2_inode *l_ip = GFS2_I(sdp-sd_sc_inode); struct gfs2_statfs_change_host *m_sc = sdp-sd_statfs_master; @@ -712,8 +713,8 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp) int error; flush_workqueue(gfs2_delete_workqueue); - gfs2_quota_sync(sdp); - gfs2_statfs_sync(sdp); + gfs2_quota_sync(sdp-sd_vfs, 0); + gfs2_statfs_sync(sdp-sd_vfs, 0); error = gfs2_glock_nq_init(sdp-sd_trans_gl, LM_ST_SHARED, GL_NOCACHE, t_gh); diff --git a/fs/gfs2/super.h b/fs/gfs2/super.h index ed962ea..3df60f2 100644 --- a/fs/gfs2/super.h +++ b/fs/gfs2/super.h @@ -44,7 +44,7 @@ extern void gfs2_statfs_change_in(struct gfs2_statfs_change_host *sc, const void *buf); extern void update_statfs(struct gfs2_sbd *sdp, struct buffer_head *m_bh, struct buffer_head *l_bh); -extern int gfs2_statfs_sync(struct gfs2_sbd *sdp); +extern int gfs2_statfs_sync(struct super_block *sb, int type); extern int gfs2_freeze_fs(struct gfs2_sbd *sdp); extern void gfs2_unfreeze_fs(struct gfs2_sbd *sdp); diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c index 4463297..be1b8ac 100644 --- a/fs/gfs2/sys.c +++ b/fs/gfs2/sys.c @@ -158,7 +158,7 @@ static ssize_t statfs_sync_store(struct gfs2_sbd *sdp, const char *buf, if (simple_strtol(buf, NULL, 0) != 1) return -EINVAL; - gfs2_statfs_sync(sdp); + gfs2_statfs_sync(sdp-sd_vfs, 0); return len; } @@ -171,7 +171,7 @@ static ssize_t quota_sync_store(struct gfs2_sbd *sdp, const char *buf, if (simple_strtol(buf, NULL, 0) != 1) return -EINVAL; - gfs2_quota_sync(sdp); + gfs2_quota_sync(sdp-sd_vfs, 0); return len; } -- 1.6.2.5
[Cluster-devel] [PATCH 11/30] GFS2: Hook gfs2_quota_sync into VFS via gfs2_quotactl_ops
The plan is to add further operations to the gfs2_quotactl_ops in future patches. The sync operation is easy, so we start with that one. We plan to use the XFS quota control functions because they more closely match the GFS2 ones. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/Kconfig |2 ++ fs/gfs2/ops_fstype.c |3 +++ fs/gfs2/quota.c |4 fs/gfs2/quota.h |1 + 4 files changed, 10 insertions(+), 0 deletions(-) diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig index 5971359..4dcddf8 100644 --- a/fs/gfs2/Kconfig +++ b/fs/gfs2/Kconfig @@ -8,6 +8,8 @@ config GFS2_FS select FS_POSIX_ACL select CRC32 select SLOW_WORK + select QUOTA + select QUOTACTL help A cluster filesystem. diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index e5ee062..36b11cb 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -18,6 +18,7 @@ #include linux/mount.h #include linux/gfs2_ondisk.h #include linux/slow-work.h +#include linux/quotaops.h #include gfs2.h #include incore.h @@ -1138,6 +1139,8 @@ static int fill_super(struct super_block *sb, struct gfs2_args *args, int silent sb-s_op = gfs2_super_ops; sb-s_export_op = gfs2_export_ops; sb-s_xattr = gfs2_xattr_handlers; + sb-s_qcop = gfs2_quotactl_ops; + sb_dqopt(sb)-flags |= DQUOT_QUOTA_SYS_FILE; sb-s_time_gran = 1; sb-s_maxbytes = MAX_LFS_FILESIZE; diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index ed9e197..73a43ce 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -1378,3 +1378,7 @@ int gfs2_quotad(void *data) return 0; } +const struct quotactl_ops gfs2_quotactl_ops = { + .quota_sync = gfs2_quota_sync, +}; + diff --git a/fs/gfs2/quota.h b/fs/gfs2/quota.h index 437afa7..025d15b 100644 --- a/fs/gfs2/quota.h +++ b/fs/gfs2/quota.h @@ -50,5 +50,6 @@ static inline int gfs2_quota_lock_check(struct gfs2_inode *ip) } extern int gfs2_shrink_qd_memory(int nr, gfp_t gfp_mask); +extern const struct quotactl_ops gfs2_quotactl_ops; #endif /* __QUOTA_DOT_H__ */ -- 1.6.2.5
[Cluster-devel] [PATCH 12/30] GFS2: Remove obsolete code in quota.c
There is no point in testing for GLF_DEMOTE here, we might as well always release the glock at that point. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/glock.h |9 - fs/gfs2/quota.c | 13 + 2 files changed, 5 insertions(+), 17 deletions(-) diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h index c609894..13f0bd2 100644 --- a/fs/gfs2/glock.h +++ b/fs/gfs2/glock.h @@ -180,15 +180,6 @@ static inline int gfs2_glock_is_held_shrd(struct gfs2_glock *gl) return gl-gl_state == LM_ST_SHARED; } -static inline int gfs2_glock_is_blocking(struct gfs2_glock *gl) -{ - int ret; - spin_lock(gl-gl_spin); - ret = test_bit(GLF_DEMOTE, gl-gl_flags); - spin_unlock(gl-gl_spin); - return ret; -} - int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number, const struct gfs2_glock_operations *glops, int create, struct gfs2_glock **glp); diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index 73a43ce..6aaa6c5 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -843,9 +843,8 @@ restart: if (force_refresh || qd-qd_qb.qb_magic != cpu_to_be32(GFS2_MAGIC)) { loff_t pos; gfs2_glock_dq_uninit(q_gh); - error = gfs2_glock_nq_init(qd-qd_gl, - LM_ST_EXCLUSIVE, GL_NOCACHE, - q_gh); + error = gfs2_glock_nq_init(qd-qd_gl, LM_ST_EXCLUSIVE, + GL_NOCACHE, q_gh); if (error) return error; @@ -871,11 +870,9 @@ restart: qlvb-qb_value = cpu_to_be64(q.qu_value); qd-qd_qb = *qlvb; - if (gfs2_glock_is_blocking(qd-qd_gl)) { - gfs2_glock_dq_uninit(q_gh); - force_refresh = 0; - goto restart; - } + gfs2_glock_dq_uninit(q_gh); + force_refresh = 0; + goto restart; } return 0; -- 1.6.2.5
[Cluster-devel] [PATCH 13/30] GFS2: Add get_xstate quota function
This allows querying of the quota state via the XFS quota API. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/quota.c | 23 +++ 1 files changed, 23 insertions(+), 0 deletions(-) diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index 6aaa6c5..e7114be 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -47,6 +47,7 @@ #include linux/gfs2_ondisk.h #include linux/kthread.h #include linux/freezer.h +#include linux/dqblk_xfs.h #include gfs2.h #include incore.h @@ -1375,7 +1376,29 @@ int gfs2_quotad(void *data) return 0; } +static int gfs2_quota_get_xstate(struct super_block *sb, +struct fs_quota_stat *fqs) +{ + struct gfs2_sbd *sdp = sb-s_fs_info; + + memset(fqs, 0, sizeof(struct fs_quota_stat)); + fqs-qs_version = FS_QSTAT_VERSION; + if (sdp-sd_args.ar_quota == GFS2_QUOTA_ON) + fqs-qs_flags = (XFS_QUOTA_UDQ_ENFD | XFS_QUOTA_GDQ_ENFD); + else if (sdp-sd_args.ar_quota == GFS2_QUOTA_ACCOUNT) + fqs-qs_flags = (XFS_QUOTA_UDQ_ACCT | XFS_QUOTA_GDQ_ACCT); + if (sdp-sd_quota_inode) { + fqs-qs_uquota.qfs_ino = GFS2_I(sdp-sd_quota_inode)-i_no_addr; + fqs-qs_uquota.qfs_nblks = sdp-sd_quota_inode-i_blocks; + } + fqs-qs_uquota.qfs_nextents = 1; /* unsupported */ + fqs-qs_gquota = fqs-qs_uquota; /* its the same inode in both cases */ + fqs-qs_incoredqs = atomic_read(qd_lru_count); + return 0; +} + const struct quotactl_ops gfs2_quotactl_ops = { .quota_sync = gfs2_quota_sync, + .get_xstate = gfs2_quota_get_xstate, }; -- 1.6.2.5
[Cluster-devel] [PATCH 14/30] GFS2: Add proper error reporting to quota sync via sysfs
For some reason, the errors were not making it to userspace. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/sys.c | 10 ++ 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c index be1b8ac..c5dad1e 100644 --- a/fs/gfs2/sys.c +++ b/fs/gfs2/sys.c @@ -178,6 +178,7 @@ static ssize_t quota_sync_store(struct gfs2_sbd *sdp, const char *buf, static ssize_t quota_refresh_user_store(struct gfs2_sbd *sdp, const char *buf, size_t len) { + int error; u32 id; if (!capable(CAP_SYS_ADMIN)) @@ -185,13 +186,14 @@ static ssize_t quota_refresh_user_store(struct gfs2_sbd *sdp, const char *buf, id = simple_strtoul(buf, NULL, 0); - gfs2_quota_refresh(sdp, 1, id); - return len; + error = gfs2_quota_refresh(sdp, 1, id); + return error ? error : len; } static ssize_t quota_refresh_group_store(struct gfs2_sbd *sdp, const char *buf, size_t len) { + int error; u32 id; if (!capable(CAP_SYS_ADMIN)) @@ -199,8 +201,8 @@ static ssize_t quota_refresh_group_store(struct gfs2_sbd *sdp, const char *buf, id = simple_strtoul(buf, NULL, 0); - gfs2_quota_refresh(sdp, 0, id); - return len; + error = gfs2_quota_refresh(sdp, 0, id); + return error ? error : len; } static ssize_t demote_rq_store(struct gfs2_sbd *sdp, const char *buf, size_t len) -- 1.6.2.5
[Cluster-devel] [PATCH 15/30] GFS2: Remove constant argument from qdsb_get()
The create argument to qdsb_get() was only ever set to true, so this patch removes that argument. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/quota.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index e7114be..f790f5a 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -462,12 +462,12 @@ static void qd_unlock(struct gfs2_quota_data *qd) qd_put(qd); } -static int qdsb_get(struct gfs2_sbd *sdp, int user, u32 id, int create, +static int qdsb_get(struct gfs2_sbd *sdp, int user, u32 id, struct gfs2_quota_data **qdp) { int error; - error = qd_get(sdp, user, id, create, qdp); + error = qd_get(sdp, user, id, CREATE, qdp); if (error) return error; @@ -509,20 +509,20 @@ int gfs2_quota_hold(struct gfs2_inode *ip, u32 uid, u32 gid) if (sdp-sd_args.ar_quota == GFS2_QUOTA_OFF) return 0; - error = qdsb_get(sdp, QUOTA_USER, ip-i_inode.i_uid, CREATE, qd); + error = qdsb_get(sdp, QUOTA_USER, ip-i_inode.i_uid, qd); if (error) goto out; al-al_qd_num++; qd++; - error = qdsb_get(sdp, QUOTA_GROUP, ip-i_inode.i_gid, CREATE, qd); + error = qdsb_get(sdp, QUOTA_GROUP, ip-i_inode.i_gid, qd); if (error) goto out; al-al_qd_num++; qd++; if (uid != NO_QUOTA_CHANGE uid != ip-i_inode.i_uid) { - error = qdsb_get(sdp, QUOTA_USER, uid, CREATE, qd); + error = qdsb_get(sdp, QUOTA_USER, uid, qd); if (error) goto out; al-al_qd_num++; @@ -530,7 +530,7 @@ int gfs2_quota_hold(struct gfs2_inode *ip, u32 uid, u32 gid) } if (gid != NO_QUOTA_CHANGE gid != ip-i_inode.i_gid) { - error = qdsb_get(sdp, QUOTA_GROUP, gid, CREATE, qd); + error = qdsb_get(sdp, QUOTA_GROUP, gid, qd); if (error) goto out; al-al_qd_num++; -- 1.6.2.5
[Cluster-devel] [PATCH 16/30] GFS2: Remove constant argument from qd_get()
This function was only ever called with the create argument set to true, so we can remove it. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/quota.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index f790f5a..db124af 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -165,7 +165,7 @@ fail: return error; } -static int qd_get(struct gfs2_sbd *sdp, int user, u32 id, int create, +static int qd_get(struct gfs2_sbd *sdp, int user, u32 id, struct gfs2_quota_data **qdp) { struct gfs2_quota_data *qd = NULL, *new_qd = NULL; @@ -203,7 +203,7 @@ static int qd_get(struct gfs2_sbd *sdp, int user, u32 id, int create, spin_unlock(qd_lru_lock); - if (qd || !create) { + if (qd) { if (new_qd) { gfs2_glock_put(new_qd-qd_gl); kmem_cache_free(gfs2_quotad_cachep, new_qd); @@ -467,7 +467,7 @@ static int qdsb_get(struct gfs2_sbd *sdp, int user, u32 id, { int error; - error = qd_get(sdp, user, id, CREATE, qdp); + error = qd_get(sdp, user, id, qdp); if (error) return error; @@ -1117,7 +1117,7 @@ int gfs2_quota_refresh(struct gfs2_sbd *sdp, int user, u32 id) struct gfs2_holder q_gh; int error; - error = qd_get(sdp, user, id, CREATE, qd); + error = qd_get(sdp, user, id, qd); if (error) return error; -- 1.6.2.5
[Cluster-devel] [PATCH 17/30] GFS2: Clean up gfs2_adjust_quota() and do_glock()
Both of these functions contained confusing and in one case duplicate code. This patch adds a new check in do_glock() so that we report -ENOENT if we are asked to sync a quota entry which doesn't exist. Due to the previous patch this is now reported correctly to userspace. Also there are a few new comments, and I hope that the code is easier to understand now. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/quota.c | 82 +- 1 files changed, 26 insertions(+), 56 deletions(-) diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index db124af..33e369f 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -15,7 +15,7 @@ * fuzziness in the current usage value of IDs that are being used on different * nodes in the cluster simultaneously. So, it is possible for a user on * multiple nodes to overrun their quota, but that overrun is controlable. - * Since quota tags are part of transactions, there is no need to a quota check + * Since quota tags are part of transactions, there is no need for a quota check * program to be run on node crashes or anything like that. * * There are couple of knobs that let the administrator manage the quota @@ -66,13 +66,6 @@ #define QUOTA_USER 1 #define QUOTA_GROUP 0 -struct gfs2_quota_host { - u64 qu_limit; - u64 qu_warn; - s64 qu_value; - u32 qu_ll_next; -}; - struct gfs2_quota_change_host { u64 qc_change; u32 qc_flags; /* GFS2_QCF_... */ @@ -618,33 +611,19 @@ static void do_qc(struct gfs2_quota_data *qd, s64 change) mutex_unlock(sdp-sd_quota_mutex); } -static void gfs2_quota_in(struct gfs2_quota_host *qu, const void *buf) -{ - const struct gfs2_quota *str = buf; - - qu-qu_limit = be64_to_cpu(str-qu_limit); - qu-qu_warn = be64_to_cpu(str-qu_warn); - qu-qu_value = be64_to_cpu(str-qu_value); - qu-qu_ll_next = be32_to_cpu(str-qu_ll_next); -} - -static void gfs2_quota_out(const struct gfs2_quota_host *qu, void *buf) -{ - struct gfs2_quota *str = buf; - - str-qu_limit = cpu_to_be64(qu-qu_limit); - str-qu_warn = cpu_to_be64(qu-qu_warn); - str-qu_value = cpu_to_be64(qu-qu_value); - str-qu_ll_next = cpu_to_be32(qu-qu_ll_next); - memset(str-qu_reserved, 0, sizeof(str-qu_reserved)); -} - /** - * gfs2_adjust_quota + * gfs2_adjust_quota - adjust record of current block usage + * @ip: The quota inode + * @loc: Offset of the entry in the quota file + * @change: The amount of change to record + * @qd: The quota data * * This function was mostly borrowed from gfs2_block_truncate_page which was * in turn mostly borrowed from ext3 + * + * Returns: 0 or -ve on error */ + static int gfs2_adjust_quota(struct gfs2_inode *ip, loff_t loc, s64 change, struct gfs2_quota_data *qd) { @@ -656,8 +635,7 @@ static int gfs2_adjust_quota(struct gfs2_inode *ip, loff_t loc, struct buffer_head *bh; struct page *page; void *kaddr; - char *ptr; - struct gfs2_quota_host qp; + struct gfs2_quota *qp; s64 value; int err = -EIO; @@ -701,18 +679,13 @@ static int gfs2_adjust_quota(struct gfs2_inode *ip, loff_t loc, gfs2_trans_add_bh(ip-i_gl, bh, 0); kaddr = kmap_atomic(page, KM_USER0); - ptr = kaddr + offset; - gfs2_quota_in(qp, ptr); - qp.qu_value += change; - value = qp.qu_value; - gfs2_quota_out(qp, ptr); + qp = kaddr + offset; + value = (s64)be64_to_cpu(qp-qu_value) + change; + qp-qu_value = cpu_to_be64(value); + qd-qd_qb.qb_value = qp-qu_value; flush_dcache_page(page); kunmap_atomic(kaddr, KM_USER0); err = 0; - qd-qd_qb.qb_magic = cpu_to_be32(GFS2_MAGIC); - qd-qd_qb.qb_value = cpu_to_be64(value); - ((struct gfs2_quota_lvb*)(qd-qd_gl-gl_lvb))-qb_magic = cpu_to_be32(GFS2_MAGIC); - ((struct gfs2_quota_lvb*)(qd-qd_gl-gl_lvb))-qb_value = cpu_to_be64(value); unlock: unlock_page(page); page_cache_release(page); @@ -741,8 +714,7 @@ static int do_sync(unsigned int num_qd, struct gfs2_quota_data **qda) sort(qda, num_qd, sizeof(struct gfs2_quota_data *), sort_qd, NULL); for (qx = 0; qx num_qd; qx++) { - error = gfs2_glock_nq_init(qda[qx]-qd_gl, - LM_ST_EXCLUSIVE, + error = gfs2_glock_nq_init(qda[qx]-qd_gl, LM_ST_EXCLUSIVE, GL_NOCACHE, ghs[qx]); if (error) goto out; @@ -797,8 +769,7 @@ static int do_sync(unsigned int num_qd, struct gfs2_quota_data **qda) qd = qda[x]; offset = qd2offset(qd); error = gfs2_adjust_quota(ip, offset, qd-qd_change_sync, - (struct gfs2_quota_data *) - qd
[Cluster-devel] [PATCH 18/30] GFS2: Add get_xquota support
This adds support for viewing the current GFS2 quota settings via the XFS quota API. The setting of quotas will be addressed in a later patch. Fields which are not supported here are left set to zero. Signed-off-by: Steven Whitehouse swhit...@redhat.com Reviewed-by: Bob Peterson rpete...@redhat.com --- fs/gfs2/quota.c | 43 +++ 1 files changed, 43 insertions(+), 0 deletions(-) diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index 33e369f..6c5d6aa 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -1367,8 +1367,51 @@ static int gfs2_quota_get_xstate(struct super_block *sb, return 0; } +static int gfs2_xquota_get(struct super_block *sb, int type, qid_t id, + struct fs_disk_quota *fdq) +{ + struct gfs2_sbd *sdp = sb-s_fs_info; + struct gfs2_quota_lvb *qlvb; + struct gfs2_quota_data *qd; + struct gfs2_holder q_gh; + int error; + + memset(fdq, 0, sizeof(struct fs_disk_quota)); + + if (sdp-sd_args.ar_quota == GFS2_QUOTA_OFF) + return -ESRCH; /* Crazy XFS error code */ + + if (type == USRQUOTA) + type = QUOTA_USER; + else if (type == GRPQUOTA) + type = QUOTA_GROUP; + else + return -EINVAL; + + error = qd_get(sdp, type, id, qd); + if (error) + return error; + error = do_glock(qd, FORCE, q_gh); + if (error) + goto out; + + qlvb = (struct gfs2_quota_lvb *)qd-qd_gl-gl_lvb; + fdq-d_version = FS_DQUOT_VERSION; + fdq-d_flags = (type == QUOTA_USER) ? XFS_USER_QUOTA : XFS_GROUP_QUOTA; + fdq-d_id = id; + fdq-d_blk_hardlimit = be64_to_cpu(qlvb-qb_limit); + fdq-d_blk_softlimit = be64_to_cpu(qlvb-qb_warn); + fdq-d_bcount = be64_to_cpu(qlvb-qb_value); + + gfs2_glock_dq_uninit(q_gh); +out: + qd_put(qd); + return error; +} + const struct quotactl_ops gfs2_quotactl_ops = { .quota_sync = gfs2_quota_sync, .get_xstate = gfs2_quota_get_xstate, + .get_xquota = gfs2_xquota_get, }; -- 1.6.2.5
[Cluster-devel] [PATCH 19/30] GFS2: Add set_xquota support
This patch adds the ability to set GFS2 quota limit and warning levels via the XFS quota API. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/quota.c | 198 +++--- 1 files changed, 172 insertions(+), 26 deletions(-) diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index 6c5d6aa..e8db534 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -615,8 +615,9 @@ static void do_qc(struct gfs2_quota_data *qd, s64 change) * gfs2_adjust_quota - adjust record of current block usage * @ip: The quota inode * @loc: Offset of the entry in the quota file - * @change: The amount of change to record + * @change: The amount of usage change to record * @qd: The quota data + * @fdq: The updated limits to record * * This function was mostly borrowed from gfs2_block_truncate_page which was * in turn mostly borrowed from ext3 @@ -625,19 +626,21 @@ static void do_qc(struct gfs2_quota_data *qd, s64 change) */ static int gfs2_adjust_quota(struct gfs2_inode *ip, loff_t loc, -s64 change, struct gfs2_quota_data *qd) +s64 change, struct gfs2_quota_data *qd, +struct fs_disk_quota *fdq) { struct inode *inode = ip-i_inode; struct address_space *mapping = inode-i_mapping; unsigned long index = loc PAGE_CACHE_SHIFT; unsigned offset = loc (PAGE_CACHE_SIZE - 1); unsigned blocksize, iblock, pos; - struct buffer_head *bh; + struct buffer_head *bh, *dibh; struct page *page; void *kaddr; struct gfs2_quota *qp; s64 value; int err = -EIO; + u64 size; if (gfs2_is_stuffed(ip)) gfs2_unstuff_dinode(ip, NULL); @@ -683,9 +686,34 @@ static int gfs2_adjust_quota(struct gfs2_inode *ip, loff_t loc, value = (s64)be64_to_cpu(qp-qu_value) + change; qp-qu_value = cpu_to_be64(value); qd-qd_qb.qb_value = qp-qu_value; + if (fdq) { + if (fdq-d_fieldmask FS_DQ_BSOFT) { + qp-qu_warn = cpu_to_be64(fdq-d_blk_softlimit); + qd-qd_qb.qb_warn = qp-qu_warn; + } + if (fdq-d_fieldmask FS_DQ_BHARD) { + qp-qu_limit = cpu_to_be64(fdq-d_blk_hardlimit); + qd-qd_qb.qb_limit = qp-qu_limit; + } + } flush_dcache_page(page); kunmap_atomic(kaddr, KM_USER0); - err = 0; + + err = gfs2_meta_inode_buffer(ip, dibh); + if (err) + goto unlock; + + size = loc + sizeof(struct gfs2_quota); + if (size inode-i_size) { + ip-i_disksize = size; + i_size_write(inode, size); + } + inode-i_mtime = inode-i_atime = CURRENT_TIME; + gfs2_trans_add_bh(ip-i_gl, dibh, 1); + gfs2_dinode_out(ip, dibh-b_data); + brelse(dibh); + mark_inode_dirty(inode); + unlock: unlock_page(page); page_cache_release(page); @@ -713,6 +741,7 @@ static int do_sync(unsigned int num_qd, struct gfs2_quota_data **qda) return -ENOMEM; sort(qda, num_qd, sizeof(struct gfs2_quota_data *), sort_qd, NULL); + mutex_lock_nested(ip-i_inode.i_mutex, I_MUTEX_QUOTA); for (qx = 0; qx num_qd; qx++) { error = gfs2_glock_nq_init(qda[qx]-qd_gl, LM_ST_EXCLUSIVE, GL_NOCACHE, ghs[qx]); @@ -768,8 +797,7 @@ static int do_sync(unsigned int num_qd, struct gfs2_quota_data **qda) for (x = 0; x num_qd; x++) { qd = qda[x]; offset = qd2offset(qd); - error = gfs2_adjust_quota(ip, offset, qd-qd_change_sync, - (struct gfs2_quota_data *)qd); + error = gfs2_adjust_quota(ip, offset, qd-qd_change_sync, qd, NULL); if (error) goto out_end_trans; @@ -789,20 +817,44 @@ out_gunlock: out: while (qx--) gfs2_glock_dq_uninit(ghs[qx]); + mutex_unlock(ip-i_inode.i_mutex); kfree(ghs); gfs2_log_flush(ip-i_gl-gl_sbd, ip-i_gl); return error; } +static int update_qd(struct gfs2_sbd *sdp, struct gfs2_quota_data *qd) +{ + struct gfs2_inode *ip = GFS2_I(sdp-sd_quota_inode); + struct gfs2_quota q; + struct gfs2_quota_lvb *qlvb; + loff_t pos; + int error; + + memset(q, 0, sizeof(struct gfs2_quota)); + pos = qd2offset(qd); + error = gfs2_internal_read(ip, NULL, (char *)q, pos, sizeof(q)); + if (error 0) + return error; + + qlvb = (struct gfs2_quota_lvb *)qd-qd_gl-gl_lvb; + qlvb-qb_magic = cpu_to_be32(GFS2_MAGIC); + qlvb-__pad = 0; + qlvb-qb_limit = q.qu_limit; + qlvb-qb_warn = q.qu_warn; + qlvb-qb_value = q.qu_value; + qd-qd_qb = *qlvb; + + return 0; +} + static
[Cluster-devel] [PATCH 20/30] VFS: Export dquot_send_warning
Sending a message to userspace in a generic format to warn of events (e.g. quota exceeded) in the quota subsystem is a generically useful feature. This patch makes some minor changes to the send_message function from dquot.c renaming it quota_send_message, moving it to quota.c and exporting it for use by filesystems which do not use the dquot code. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/quota/Kconfig |2 +- fs/quota/dquot.c | 93 + fs/quota/quota.c | 93 + include/linux/quota.h | 11 ++ 4 files changed, 114 insertions(+), 85 deletions(-) diff --git a/fs/quota/Kconfig b/fs/quota/Kconfig index 8047e01..353e78a 100644 --- a/fs/quota/Kconfig +++ b/fs/quota/Kconfig @@ -17,7 +17,7 @@ config QUOTA config QUOTA_NETLINK_INTERFACE bool Report quota messages through netlink interface - depends on QUOTA NET + depends on QUOTACTL NET help If you say Y here, quota warnings (about exceeding softlimit, reaching hardlimit, etc.) will be reported through netlink interface. If unsure, diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index 39b49c4..9b6ad90 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -77,10 +77,6 @@ #include linux/capability.h #include linux/quotaops.h #include linux/writeback.h /* for inode_lock, oddly enough.. */ -#ifdef CONFIG_QUOTA_NETLINK_INTERFACE -#include net/netlink.h -#include net/genetlink.h -#endif #include asm/uaccess.h @@ -1071,73 +1067,6 @@ static void print_warning(struct dquot *dquot, const int warntype) } #endif -#ifdef CONFIG_QUOTA_NETLINK_INTERFACE - -/* Netlink family structure for quota */ -static struct genl_family quota_genl_family = { - .id = GENL_ID_GENERATE, - .hdrsize = 0, - .name = VFS_DQUOT, - .version = 1, - .maxattr = QUOTA_NL_A_MAX, -}; - -/* Send warning to userspace about user which exceeded quota */ -static void send_warning(const struct dquot *dquot, const char warntype) -{ - static atomic_t seq; - struct sk_buff *skb; - void *msg_head; - int ret; - int msg_size = 4 * nla_total_size(sizeof(u32)) + - 2 * nla_total_size(sizeof(u64)); - - /* We have to allocate using GFP_NOFS as we are called from a -* filesystem performing write and thus further recursion into -* the fs to free some data could cause deadlocks. */ - skb = genlmsg_new(msg_size, GFP_NOFS); - if (!skb) { - printk(KERN_ERR - VFS: Not enough memory to send quota warning.\n); - return; - } - msg_head = genlmsg_put(skb, 0, atomic_add_return(1, seq), - quota_genl_family, 0, QUOTA_NL_C_WARNING); - if (!msg_head) { - printk(KERN_ERR - VFS: Cannot store netlink header in quota warning.\n); - goto err_out; - } - ret = nla_put_u32(skb, QUOTA_NL_A_QTYPE, dquot-dq_type); - if (ret) - goto attr_err_out; - ret = nla_put_u64(skb, QUOTA_NL_A_EXCESS_ID, dquot-dq_id); - if (ret) - goto attr_err_out; - ret = nla_put_u32(skb, QUOTA_NL_A_WARNING, warntype); - if (ret) - goto attr_err_out; - ret = nla_put_u32(skb, QUOTA_NL_A_DEV_MAJOR, - MAJOR(dquot-dq_sb-s_dev)); - if (ret) - goto attr_err_out; - ret = nla_put_u32(skb, QUOTA_NL_A_DEV_MINOR, - MINOR(dquot-dq_sb-s_dev)); - if (ret) - goto attr_err_out; - ret = nla_put_u64(skb, QUOTA_NL_A_CAUSED_ID, current_uid()); - if (ret) - goto attr_err_out; - genlmsg_end(skb, msg_head); - - genlmsg_multicast(skb, 0, quota_genl_family.id, GFP_NOFS); - return; -attr_err_out: - printk(KERN_ERR VFS: Not enough space to compose quota message!\n); -err_out: - kfree_skb(skb); -} -#endif /* * Write warnings to the console and send warning messages over netlink. * @@ -1145,18 +1074,20 @@ err_out: */ static void flush_warnings(struct dquot *const *dquots, char *warntype) { + struct dquot *dq; int i; - for (i = 0; i MAXQUOTAS; i++) - if (dquots[i] warntype[i] != QUOTA_NL_NOWARN - !warning_issued(dquots[i], warntype[i])) { + for (i = 0; i MAXQUOTAS; i++) { + dq = dquots[i]; + if (dq warntype[i] != QUOTA_NL_NOWARN + !warning_issued(dq, warntype[i])) { #ifdef CONFIG_PRINT_QUOTA_WARNING - print_warning(dquots[i], warntype[i]); -#endif -#ifdef CONFIG_QUOTA_NETLINK_INTERFACE - send_warning(dquots[i], warntype[i]); + print_warning(dq, warntype[i]); #endif + quota_send_warning(dq-dq_type, dq-dq_id
[Cluster-devel] [PATCH 21/30] GFS2: Use dquot_send_warning()
This adds support to GFS2 to send quota warnings via netlink. Also it removes a stray \r which was left over from when the code used to print warnings on the console. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/quota.c | 10 +- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index e8db534..1d4fc04 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -47,6 +47,7 @@ #include linux/gfs2_ondisk.h #include linux/kthread.h #include linux/freezer.h +#include linux/quota.h #include linux/dqblk_xfs.h #include gfs2.h @@ -1001,7 +1002,7 @@ static int print_message(struct gfs2_quota_data *qd, char *type) { struct gfs2_sbd *sdp = qd-qd_gl-gl_sbd; - printk(KERN_INFO GFS2: fsid=%s: quota %s for %s %u\r\n, + printk(KERN_INFO GFS2: fsid=%s: quota %s for %s %u\n, sdp-sd_fsname, type, (test_bit(QDF_USER, qd-qd_flags)) ? user : group, qd-qd_id); @@ -1038,6 +1039,10 @@ int gfs2_quota_check(struct gfs2_inode *ip, u32 uid, u32 gid) if (be64_to_cpu(qd-qd_qb.qb_limit) (s64)be64_to_cpu(qd-qd_qb.qb_limit) value) { print_message(qd, exceeded); + quota_send_warning(test_bit(QDF_USER, qd-qd_flags) ? + USRQUOTA : GRPQUOTA, qd-qd_id, + sdp-sd_vfs-s_dev, QUOTA_NL_BHARDWARN); + error = -EDQUOT; break; } else if (be64_to_cpu(qd-qd_qb.qb_warn) @@ -1045,6 +1050,9 @@ int gfs2_quota_check(struct gfs2_inode *ip, u32 uid, u32 gid) time_after_eq(jiffies, qd-qd_last_warn + gfs2_tune_get(sdp, gt_quota_warn_period) * HZ)) { + quota_send_warning(test_bit(QDF_USER, qd-qd_flags) ? + USRQUOTA : GRPQUOTA, qd-qd_id, + sdp-sd_vfs-s_dev, QUOTA_NL_BSOFTWARN); error = print_message(qd, warning); qd-qd_last_warn = jiffies; } -- 1.6.2.5
[Cluster-devel] [PATCH 22/30] GFS2: Improve statfs and quota usability
From: Benjamin Marzinski bmarz...@redhat.com GFS2 now has three new mount options, statfs_quantum, quota_quantum and statfs_percent. statfs_quantum and quota_quantum simply allow you to set the tunables of the same name. Setting setting statfs_quantum to 0 will also turn on the statfs_slow tunable. statfs_percent accepts an integer between 0 and 100. Numbers between 1 and 100 will cause GFS2 to do any early sync when the local number of blocks free changes by at least statfs_percent from the totoal number of blocks free. Setting statfs_percent to 0 disables this. Signed-off-by: Benjamin Marzinski bmarz...@redhat.com Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/incore.h |4 +++ fs/gfs2/ops_fstype.c | 14 -- fs/gfs2/quota.c | 21 +-- fs/gfs2/quota.h |2 + fs/gfs2/super.c | 69 +++--- 5 files changed, 100 insertions(+), 10 deletions(-) diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index 6edb423..c239b0f 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -430,6 +430,9 @@ struct gfs2_args { unsigned int ar_discard:1; /* discard requests */ unsigned int ar_errors:2; /* errors=withdraw | panic */ int ar_commit; /* Commit interval */ + int ar_statfs_quantum; /* The fast statfs interval */ + int ar_quota_quantum; /* The quota interval */ + int ar_statfs_percent; /* The % change to force sync */ }; struct gfs2_tune { @@ -558,6 +561,7 @@ struct gfs2_sbd { spinlock_t sd_statfs_spin; struct gfs2_statfs_change_host sd_statfs_master; struct gfs2_statfs_change_host sd_statfs_local; + int sd_statfs_force_sync; /* Resource group stuff */ diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 36b11cb..9744ee9 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -63,13 +63,10 @@ static void gfs2_tune_init(struct gfs2_tune *gt) gt-gt_quota_warn_period = 10; gt-gt_quota_scale_num = 1; gt-gt_quota_scale_den = 1; - gt-gt_quota_quantum = 60; gt-gt_new_files_jdata = 0; gt-gt_max_readahead = 1 18; gt-gt_stall_secs = 600; gt-gt_complain_secs = 10; - gt-gt_statfs_quantum = 30; - gt-gt_statfs_slow = 0; } static struct gfs2_sbd *init_sbd(struct super_block *sb) @@ -1153,6 +1150,15 @@ static int fill_super(struct super_block *sb, struct gfs2_args *args, int silent sdp-sd_fsb2bb = 1 sdp-sd_fsb2bb_shift; sdp-sd_tune.gt_log_flush_secs = sdp-sd_args.ar_commit; + sdp-sd_tune.gt_quota_quantum = sdp-sd_args.ar_quota_quantum; + if (sdp-sd_args.ar_statfs_quantum) { + sdp-sd_tune.gt_statfs_slow = 0; + sdp-sd_tune.gt_statfs_quantum = sdp-sd_args.ar_statfs_quantum; + } + else { + sdp-sd_tune.gt_statfs_slow = 1; + sdp-sd_tune.gt_statfs_quantum = 30; + } error = init_names(sdp, silent); if (error) @@ -1308,6 +1314,8 @@ static int gfs2_get_sb(struct file_system_type *fs_type, int flags, args.ar_quota = GFS2_QUOTA_DEFAULT; args.ar_data = GFS2_DATA_DEFAULT; args.ar_commit = 60; + args.ar_statfs_quantum = 30; + args.ar_quota_quantum = 60; args.ar_errors = GFS2_ERRORS_DEFAULT; error = gfs2_mount_args(args, data); diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index 1d4fc04..e3bf6ea 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -1344,6 +1344,14 @@ static void quotad_check_trunc_list(struct gfs2_sbd *sdp) } } +void gfs2_wake_up_statfs(struct gfs2_sbd *sdp) { + if (!sdp-sd_statfs_force_sync) { + sdp-sd_statfs_force_sync = 1; + wake_up(sdp-sd_quota_wait); + } +} + + /** * gfs2_quotad - Write cached quota changes into the quota file * @sdp: Pointer to GFS2 superblock @@ -1363,8 +1371,15 @@ int gfs2_quotad(void *data) while (!kthread_should_stop()) { /* Update the master statfs file */ - quotad_check_timeo(sdp, statfs, gfs2_statfs_sync, t, - statfs_timeo, tune-gt_statfs_quantum); + if (sdp-sd_statfs_force_sync) { + int error = gfs2_statfs_sync(sdp-sd_vfs, 0); + quotad_error(sdp, statfs, error); + statfs_timeo = gfs2_tune_get(sdp, gt_statfs_quantum) * HZ; + } + else + quotad_check_timeo(sdp, statfs, gfs2_statfs_sync, t, + statfs_timeo, + tune-gt_statfs_quantum); /* Update quota file */ quotad_check_timeo(sdp, sync, gfs2_quota_sync, t, @@ -1381,7 +1396,7 @@ int gfs2_quotad(void *data
[Cluster-devel] [PATCH 23/30] GFS2: remove division from new statfs code
From: Benjamin Marzinski bmarz...@redhat.com It's not necessary to do any 64bit division for the statfs sync code, so remove it. Signed-off-by: Benjamin Marzinski bmarz...@redhat.com Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/super.c | 17 + 1 files changed, 9 insertions(+), 8 deletions(-) diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 3fee2fd..b1dcfab 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -472,7 +472,8 @@ void gfs2_statfs_change(struct gfs2_sbd *sdp, s64 total, s64 free, struct gfs2_statfs_change_host *l_sc = sdp-sd_statfs_local; struct gfs2_statfs_change_host *m_sc = sdp-sd_statfs_master; struct buffer_head *l_bh; - int percent, sync_percent; + s64 x, y; + int need_sync = 0; int error; error = gfs2_meta_inode_buffer(l_ip, l_bh); @@ -486,16 +487,16 @@ void gfs2_statfs_change(struct gfs2_sbd *sdp, s64 total, s64 free, l_sc-sc_free += free; l_sc-sc_dinodes += dinodes; gfs2_statfs_change_out(l_sc, l_bh-b_data + sizeof(struct gfs2_dinode)); - if (m_sc-sc_free) - percent = (100 * l_sc-sc_free) / m_sc-sc_free; - else - percent = 100; + if (sdp-sd_args.ar_statfs_percent) { + x = 100 * l_sc-sc_free; + y = m_sc-sc_free * sdp-sd_args.ar_statfs_percent; + if (x = y || x = -y) + need_sync = 1; + } spin_unlock(sdp-sd_statfs_spin); brelse(l_bh); - sync_percent = sdp-sd_args.ar_statfs_percent; - if (sync_percent (percent = sync_percent || -percent = -sync_percent)) + if (need_sync) gfs2_wake_up_statfs(sdp); } -- 1.6.2.5
[Cluster-devel] [PATCH 24/30] GFS2: add barrier/nobarrier mount options
From: Christoph Hellwig h...@lst.de Currently gfs2 issues barrier unconditionally. There are various reasons to disable them, be that just for testing or for stupid devices flushing large battert backed caches. Add a nobarrier option that matches xfs and btrfs for this. Also add a symmetric barrier option to turn it back on at remount time. Signed-off-by: Christoph Hellwig h...@lst.de Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/incore.h |1 + fs/gfs2/ops_fstype.c |2 ++ fs/gfs2/super.c | 14 ++ 3 files changed, 17 insertions(+), 0 deletions(-) diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index c239b0f..4792200 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -429,6 +429,7 @@ struct gfs2_args { unsigned int ar_meta:1; /* mount metafs */ unsigned int ar_discard:1; /* discard requests */ unsigned int ar_errors:2; /* errors=withdraw | panic */ + unsigned int ar_nobarrier:1;/* do not send barriers */ int ar_commit; /* Commit interval */ int ar_statfs_quantum; /* The fast statfs interval */ int ar_quota_quantum; /* The quota interval */ diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 9744ee9..edfee24 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -1131,6 +1131,8 @@ static int fill_super(struct super_block *sb, struct gfs2_args *args, int silent } if (sdp-sd_args.ar_posix_acl) sb-s_flags |= MS_POSIXACL; + if (sdp-sd_args.ar_nobarrier) + set_bit(SDF_NOBARRIERS, sdp-sd_flags); sb-s_magic = GFS2_MAGIC; sb-s_op = gfs2_super_ops; diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index b1dcfab..5e4b314 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -73,6 +73,8 @@ enum { Opt_statfs_quantum, Opt_statfs_percent, Opt_quota_quantum, + Opt_barrier, + Opt_nobarrier, Opt_error, }; @@ -107,6 +109,8 @@ static const match_table_t tokens = { {Opt_statfs_quantum, statfs_quantum=%d}, {Opt_statfs_percent, statfs_percent=%d}, {Opt_quota_quantum, quota_quantum=%d}, + {Opt_barrier, barrier}, + {Opt_nobarrier, nobarrier}, {Opt_error, NULL} }; @@ -253,6 +257,12 @@ int gfs2_mount_args(struct gfs2_args *args, char *options) } args-ar_errors = GFS2_ERRORS_PANIC; break; + case Opt_barrier: + args-ar_nobarrier = 0; + break; + case Opt_nobarrier: + args-ar_nobarrier = 1; + break; case Opt_error: default: printk(KERN_WARNING GFS2: invalid mount option: %s\n, o); @@ -1143,6 +1153,10 @@ static int gfs2_remount_fs(struct super_block *sb, int *flags, char *data) sb-s_flags |= MS_POSIXACL; else sb-s_flags = ~MS_POSIXACL; + if (sdp-sd_args.ar_nobarrier) + set_bit(SDF_NOBARRIERS, sdp-sd_flags); + else + clear_bit(SDF_NOBARRIERS, sdp-sd_flags); spin_lock(gt-gt_spin); gt-gt_log_flush_secs = args.ar_commit; gt-gt_quota_quantum = args.ar_quota_quantum; -- 1.6.2.5
[Cluster-devel] [PATCH 25/30] GFS2: Display nobarrier option in /proc/mounts
Since the default is barriers on, this only displays the nobarrier option when that is active. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/super.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 5e4b314..c282ad4 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -1336,6 +1336,9 @@ static int gfs2_show_options(struct seq_file *s, struct vfsmount *mnt) } seq_printf(s, ,errors=%s, state); } + if (test_bit(SDF_NOBARRIERS, sdp-sd_flags)) + seq_printf(s, ,nobarrier); + return 0; } -- 1.6.2.5
[Cluster-devel] GFS2 git tree
Hi, Linus has pulled the -nmw tree. I'll leave it a few days before I start adding patches into the tree to avoid any confusion with those still trying to merge from linux-next, but don't let that delay you in posting any patches. If they are fixes, then they can go into the -fixes tree right away (currently empty) Steve.
[Cluster-devel] GFS2: Metadata address space clean up
This is a heads up on a patch I'm working on to clean up the metadata address space which is used in GFS2. This is a preliminary version which passes a few basic tests. I'll probably make a few more changes before the final version. Since the start of GFS2, an extra inode has been used to store the metadata belonging to each inode. The only reason for using this inode was to have an extra address space, the other fields were unused. This means that the memory usage was rather inefficient. The reason for keeping each inode's metadata in a separate address space is that when glocks are requested on remote nodes, we need to be able to efficiently locate the data and metadata which relating to that glock (inode) in order to sync or sync and invalidate it (depending on the remotely requested lock mode). This patch adds a new type of glock, which has in addition to its normal fields, has an address space. This applies to all inode and rgrp glocks (but to no other glock types which remain as before). As a result, we no longer need to have the second inode. This results in three major improvements: 1. A saving of approx 25% of memory used in caching inodes 2. A removal of the circular dependency between inodes and glocks 3. No confusion between normal and metadata inodes in super.c Although the first of these is the more immediately apparent, the second is just as important as it now enables a number of clean ups at umount time. Those will be the subject of future patches. Steve. diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c index 7b8da94..16c8214 100644 --- a/fs/gfs2/aops.c +++ b/fs/gfs2/aops.c @@ -1061,11 +1061,17 @@ out: int gfs2_releasepage(struct page *page, gfp_t gfp_mask) { - struct inode *aspace = page-mapping-host; - struct gfs2_sbd *sdp = aspace-i_sb-s_fs_info; + struct address_space *mapping = page-mapping; + struct inode *inode = mapping-host; + struct gfs2_sbd *sdp; struct buffer_head *bh, *head; struct gfs2_bufdata *bd; + if (mapping-a_ops == gfs2_meta_aops) + sdp = (((struct gfs2_glock *)mapping) - 1)-gl_sbd; + else + sdp = inode-i_sb-s_fs_info; + if (!page_has_buffers(page)) return 0; diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index f455a03..736d05b 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -154,12 +154,14 @@ static unsigned int gl_hash(const struct gfs2_sbd *sdp, static void glock_free(struct gfs2_glock *gl) { struct gfs2_sbd *sdp = gl-gl_sbd; - struct inode *aspace = gl-gl_aspace; + struct address_space *mapping = gfs2_glock2aspace(gl); + struct kmem_cache *cachep = gfs2_glock_cachep; - if (aspace) - gfs2_aspace_put(aspace); + GLOCK_BUG_ON(gl, mapping mapping-nrpages); trace_gfs2_glock_put(gl); - sdp-sd_lockstruct.ls_ops-lm_put_lock(gfs2_glock_cachep, gl); + if (mapping) + cachep = gfs2_glock_aspace_cachep; + sdp-sd_lockstruct.ls_ops-lm_put_lock(cachep, gl); } /** @@ -750,10 +752,11 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number, const struct gfs2_glock_operations *glops, int create, struct gfs2_glock **glp) { + struct super_block *s = sdp-sd_vfs; struct lm_lockname name = { .ln_number = number, .ln_type = glops-go_type }; struct gfs2_glock *gl, *tmp; unsigned int hash = gl_hash(sdp, name); - int error; + struct address_space *mapping; read_lock(gl_lock_addr(hash)); gl = search_bucket(hash, sdp, name); @@ -765,7 +768,10 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number, if (!create) return -ENOENT; - gl = kmem_cache_alloc(gfs2_glock_cachep, GFP_KERNEL); + if (glops-go_flags GLOF_ASPACE) + gl = kmem_cache_alloc(gfs2_glock_aspace_cachep, GFP_KERNEL); + else + gl = kmem_cache_alloc(gfs2_glock_cachep, GFP_KERNEL); if (!gl) return -ENOMEM; @@ -783,18 +789,18 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number, gl-gl_tchange = jiffies; gl-gl_object = NULL; gl-gl_sbd = sdp; - gl-gl_aspace = NULL; INIT_DELAYED_WORK(gl-gl_work, glock_work_func); INIT_WORK(gl-gl_delete, delete_work_func); - /* If this glock protects actual on-disk data or metadata blocks, - create a VFS inode to manage the pages/buffers holding them. */ - if (glops == gfs2_inode_glops || glops == gfs2_rgrp_glops) { - gl-gl_aspace = gfs2_aspace_get(sdp); - if (!gl-gl_aspace) { - error = -ENOMEM; - goto fail; - } + mapping = gfs2_glock2aspace(gl); + if (mapping) { +mapping-a_ops = gfs2_meta_aops; + mapping-host = s-s_bdev-bd_inode; + mapping-flags = 0; +
[Cluster-devel] GFS2: Ensure uptodate inode size when using O_APPEND
The VFS reads the inode size during generic_file_aio_write() but with no locking around it. In order to get the expected result from O_APPEND opens, this patch updated the inode size before calling generic_file_aio_write() There is of course still a race here, in that there is nothing to prevent another node coming in and extending the file in the mean time. On the other hand, when used with file locking this will ensure that the expected results are obtained. Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 4eb308a..a6abbae 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -569,6 +569,40 @@ static int gfs2_fsync(struct file *file, struct dentry *dentry, int datasync) return ret; } +/** + * gfs2_file_aio_write - Perform a write to a file + * @iocb: The io context + * @iov: The data to write + * @nr_segs: Number of @iov segments + * @pos: The file position + * + * We have to do a lock/unlock here to refresh the inode size for + * O_APPEND writes, otherwise we can land up writing at the wrong + * offset. There is still a race, but provided the app is using its + * own file locking, this will make O_APPEND work as expected. + * + */ + +static ssize_t gfs2_file_aio_write(struct kiocb *iocb, const struct iovec *iov, + unsigned long nr_segs, loff_t pos) +{ + struct file *file = iocb-ki_filp; + + if (file-f_flags O_APPEND) { + struct dentry *dentry = file-f_dentry; + struct gfs2_inode *ip = GFS2_I(dentry-d_inode); + struct gfs2_holder gh; + int ret; + + ret = gfs2_glock_nq_init(ip-i_gl, LM_ST_SHARED, 0, gh); + if (ret) + return ret; + gfs2_glock_dq_uninit(gh); + } + + return generic_file_aio_write(iocb, iov, nr_segs, pos); +} + #ifdef CONFIG_GFS2_FS_LOCKING_DLM /** @@ -711,7 +745,7 @@ const struct file_operations gfs2_file_fops = { .read = do_sync_read, .aio_read = generic_file_aio_read, .write = do_sync_write, - .aio_write = generic_file_aio_write, + .aio_write = gfs2_file_aio_write, .unlocked_ioctl = gfs2_ioctl, .mmap = gfs2_mmap, .open = gfs2_open, @@ -741,7 +775,7 @@ const struct file_operations gfs2_file_fops_nolock = { .read = do_sync_read, .aio_read = generic_file_aio_read, .write = do_sync_write, - .aio_write = generic_file_aio_write, + .aio_write = gfs2_file_aio_write, .unlocked_ioctl = gfs2_ioctl, .mmap = gfs2_mmap, .open = gfs2_open,
[Cluster-devel] GFS2: Remove loopy umount code
This is a follow up to the patch I posted yesterday and does the next bit of the changes. From 69a14ddaf57449c3f6ecfe96a898df5ded1a4256 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse swhit...@redhat.com Date: Tue, 8 Dec 2009 15:45:50 + Subject: GFS2: Remove loopy umount code As a consequence of the previous patch, we can now remove the loop which used to be required due to the circular dependency between the inodes and glocks. Instead we can just invalidate the inodes, and then clear up any glocks which are left. Also we no longer need the rwsem since there is no longer any danger of the inode invalidation calling back into the glock code (and from there back into the inode code). Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/glock.c | 35 --- fs/gfs2/ops_fstype.c |3 +-- fs/gfs2/super.c |1 + 3 files changed, 6 insertions(+), 33 deletions(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 736d05b..6e1e526 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -19,7 +19,6 @@ #include linux/list.h #include linux/wait.h #include linux/module.h -#include linux/rwsem.h #include asm/uaccess.h #include linux/seq_file.h #include linux/debugfs.h @@ -60,7 +59,6 @@ static int __dump_glock(struct seq_file *seq, const struct gfs2_glock *gl); #define GLOCK_BUG_ON(gl,x) do { if (unlikely(x)) { __dump_glock(NULL, gl); BUG(); } } while(0) static void do_xmote(struct gfs2_glock *gl, struct gfs2_holder *gh, unsigned int target); -static DECLARE_RWSEM(gfs2_umount_flush_sem); static struct dentry *gfs2_root; static struct workqueue_struct *glock_workqueue; struct workqueue_struct *gfs2_delete_workqueue; @@ -714,7 +712,6 @@ static void glock_work_func(struct work_struct *work) finish_xmote(gl, gl-gl_reply); drop_ref = 1; } - down_read(gfs2_umount_flush_sem); spin_lock(gl-gl_spin); if (test_and_clear_bit(GLF_PENDING_DEMOTE, gl-gl_flags) gl-gl_state != LM_ST_UNLOCKED @@ -727,7 +724,6 @@ static void glock_work_func(struct work_struct *work) } run_queue(gl, 0); spin_unlock(gl-gl_spin); - up_read(gfs2_umount_flush_sem); if (!delay || queue_delayed_work(glock_workqueue, gl-gl_work, delay) == 0) gfs2_glock_put(gl); @@ -1511,35 +1507,12 @@ void gfs2_glock_thaw(struct gfs2_sbd *sdp) void gfs2_gl_hash_clear(struct gfs2_sbd *sdp) { - unsigned long t; unsigned int x; - int cont; - t = jiffies; - - for (;;) { - cont = 0; - for (x = 0; x GFS2_GL_HASH_SIZE; x++) { - if (examine_bucket(clear_glock, sdp, x)) - cont = 1; - } - - if (!cont) - break; - - if (time_after_eq(jiffies, - t + gfs2_tune_get(sdp, gt_stall_secs) * HZ)) { - fs_warn(sdp, Unmount seems to be stalled. -Dumping lock state...\n); - gfs2_dump_lockstate(sdp); - t = jiffies; - } - - down_write(gfs2_umount_flush_sem); - invalidate_inodes(sdp-sd_vfs); - up_write(gfs2_umount_flush_sem); - msleep(10); - } + for (x = 0; x GFS2_GL_HASH_SIZE; x++) + examine_bucket(clear_glock, sdp, x); + flush_workqueue(glock_workqueue); + gfs2_dump_lockstate(sdp); } void gfs2_glock_finish_truncate(struct gfs2_inode *ip) diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index edfee24..717222a 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -1231,10 +1231,9 @@ fail_sb: fail_locking: init_locking(sdp, mount_gh, UNDO); fail_lm: + invalidate_inodes(sb); gfs2_gl_hash_clear(sdp); gfs2_lm_unmount(sdp); - while (invalidate_inodes(sb)) - yield(); fail_sys: gfs2_sys_fs_del(sdp); fail: diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 8ddc613..c008b08 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -858,6 +858,7 @@ restart: gfs2_clear_rgrpd(sdp); gfs2_jindex_free(sdp); /* Take apart glock structures and buffer lists */ + invalidate_inodes(sdp-sd_vfs); gfs2_gl_hash_clear(sdp); /* Unmount the locking protocol */ gfs2_lm_unmount(sdp); -- 1.6.2.5
[Cluster-devel] GFS2: Fix locking bug in rename
From 07bb4585daae6008fd3ad0f3f081e318a4266d1d Mon Sep 17 00:00:00 2001 From: Steven Whitehouse swhit...@redhat.com Date: Wed, 9 Dec 2009 13:55:12 + Subject: GFS2: Fix locking bug in rename The rename code was taking a resource group lock in cases where it wasn't actually needed, this caused problems if the rename was resulting in an inode being unlinked. The patch ensures that we only take the rgrp lock early if it is really needed. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/ops_inode.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c index 247436c..78f73ca 100644 --- a/fs/gfs2/ops_inode.c +++ b/fs/gfs2/ops_inode.c @@ -748,7 +748,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry, struct gfs2_rgrpd *nrgd; unsigned int num_gh; int dir_rename = 0; - int alloc_required; + int alloc_required = 0; unsigned int x; int error; @@ -867,7 +867,9 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry, goto out_gunlock; } - alloc_required = error = gfs2_diradd_alloc_required(ndir, ndentry-d_name); + if (nip == NULL) + alloc_required = gfs2_diradd_alloc_required(ndir, ndentry-d_name); + error = alloc_required; if (error 0) goto out_gunlock; error = 0; -- 1.6.2.5
[Cluster-devel] GFS2: Fix gfs2_xattr_acl_chmod()
From a49cd198c9ed316255acc25a937ea147d03bccaa Mon Sep 17 00:00:00 2001 From: Steven Whitehouse swhit...@redhat.com Date: Mon, 21 Dec 2009 13:55:28 + Subject: GFS2: Fix gfs2_xattr_acl_chmod() The ref counting for the bh returned by gfs2_ea_find() was wrong. This patch ensures that we always drop the ref count to that bh correctly. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/xattr.c | 21 +++-- 1 files changed, 11 insertions(+), 10 deletions(-) diff --git a/fs/gfs2/xattr.c b/fs/gfs2/xattr.c index 8a04108..c2ebdf2 100644 --- a/fs/gfs2/xattr.c +++ b/fs/gfs2/xattr.c @@ -1296,6 +1296,7 @@ fail: int gfs2_xattr_acl_chmod(struct gfs2_inode *ip, struct iattr *attr, char *data) { + struct gfs2_sbd *sdp = GFS2_SB(ip-i_inode); struct gfs2_ea_location el; struct buffer_head *dibh; int error; @@ -1305,16 +1306,17 @@ int gfs2_xattr_acl_chmod(struct gfs2_inode *ip, struct iattr *attr, char *data) return error; if (GFS2_EA_IS_STUFFED(el.el_ea)) { - error = gfs2_trans_begin(GFS2_SB(ip-i_inode), RES_DINODE + RES_EATTR, 0); - if (error) - return error; - - gfs2_trans_add_bh(ip-i_gl, el.el_bh, 1); - memcpy(GFS2_EA2DATA(el.el_ea), data, - GFS2_EA_DATA_LEN(el.el_ea)); - } else + error = gfs2_trans_begin(sdp, RES_DINODE + RES_EATTR, 0); + if (error == 0) { + gfs2_trans_add_bh(ip-i_gl, el.el_bh, 1); + memcpy(GFS2_EA2DATA(el.el_ea), data, + GFS2_EA_DATA_LEN(el.el_ea)); + } + } else { error = ea_acl_chmod_unstuffed(ip, el.el_ea, data); + } + brelse(el.el_bh); if (error) return error; @@ -1327,8 +1329,7 @@ int gfs2_xattr_acl_chmod(struct gfs2_inode *ip, struct iattr *attr, char *data) brelse(dibh); } - gfs2_trans_end(GFS2_SB(ip-i_inode)); - + gfs2_trans_end(sdp); return error; } -- 1.6.2.5
[Cluster-devel] git trees
Hi, After some delay due to a couple of tricky issues, I'm now back updating the GFS2 git trees again. I will probably send a pull request for the fixes tree fairly shortly now, I'm just giving it a day or two in -next first. At the moment -fixes and -nmw are identical, but I will start pushing more patches into -nmw shortly too, Steve.
[Cluster-devel] GFS2: Use MAX_LFS_FILESIZE for meta inode size
From 2a6833f27a0ed34ae169dc61961552c414263770 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse swhit...@redhat.com Date: Fri, 8 Jan 2010 13:44:49 + Subject: GFS2: Use MAX_LFS_FILESIZE for meta inode size Using ~0ULL was cauing sign issues in filemap_fdatawrite_range, so use MAX_LFS_FILESIZE instead. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/meta_io.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c index cb8d7a9..6f68a5f 100644 --- a/fs/gfs2/meta_io.c +++ b/fs/gfs2/meta_io.c @@ -121,7 +121,7 @@ struct inode *gfs2_aspace_get(struct gfs2_sbd *sdp) if (aspace) { mapping_set_gfp_mask(aspace-i_mapping, GFP_NOFS); aspace-i_mapping-a_ops = aspace_aops; - aspace-i_size = ~0ULL; + aspace-i_size = MAX_LFS_FILESIZE; ip = GFS2_I(aspace); clear_bit(GIF_USER, ip-i_flags); insert_inode_hash(aspace); -- 1.6.2.5
[Cluster-devel] GFS2: Metadata address space clean up
From 89bf4bea39ab65e0aa608cf5927d4d9b9e189c19 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse swhit...@redhat.com Date: Tue, 8 Dec 2009 12:12:13 + Subject: GFS2: Metadata address space clean up Since the start of GFS2, an extra inode has been used to store the metadata belonging to each inode. The only reason for using this inode was to have an extra address space, the other fields were unused. This means that the memory usage was rather inefficient. The reason for keeping each inode's metadata in a separate address space is that when glocks are requested on remote nodes, we need to be able to efficiently locate the data and metadata which relating to that glock (inode) in order to sync or sync and invalidate it (depending on the remotely requested lock mode). This patch adds a new type of glock, which has in addition to its normal fields, has an address space. This applies to all inode and rgrp glocks (but to no other glock types which remain as before). As a result, we no longer need to have the second inode. This results in three major improvements: 1. A saving of approx 25% of memory used in caching inodes 2. A removal of the circular dependency between inodes and glocks 3. No confusion between normal and metadata inodes in super.c Although the first of these is the more immediately apparent, the second is just as important as it now enables a number of clean ups at umount time. Those will be the subject of future patches. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/aops.c |4 ++-- fs/gfs2/glock.c| 40 +--- fs/gfs2/glock.h|7 +++ fs/gfs2/glops.c| 16 +--- fs/gfs2/incore.h |4 ++-- fs/gfs2/inode.c|6 ++ fs/gfs2/lock_dlm.c |5 - fs/gfs2/main.c | 28 fs/gfs2/meta_io.c | 46 ++ fs/gfs2/meta_io.h | 12 ++-- fs/gfs2/super.c| 26 -- fs/gfs2/util.c |1 + fs/gfs2/util.h |1 + 13 files changed, 101 insertions(+), 95 deletions(-) diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c index 7b8da94..0c1d0b8 100644 --- a/fs/gfs2/aops.c +++ b/fs/gfs2/aops.c @@ -1061,8 +1061,8 @@ out: int gfs2_releasepage(struct page *page, gfp_t gfp_mask) { - struct inode *aspace = page-mapping-host; - struct gfs2_sbd *sdp = aspace-i_sb-s_fs_info; + struct address_space *mapping = page-mapping; + struct gfs2_sbd *sdp = gfs2_mapping2sbd(mapping); struct buffer_head *bh, *head; struct gfs2_bufdata *bd; diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index f455a03..736d05b 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -154,12 +154,14 @@ static unsigned int gl_hash(const struct gfs2_sbd *sdp, static void glock_free(struct gfs2_glock *gl) { struct gfs2_sbd *sdp = gl-gl_sbd; - struct inode *aspace = gl-gl_aspace; + struct address_space *mapping = gfs2_glock2aspace(gl); + struct kmem_cache *cachep = gfs2_glock_cachep; - if (aspace) - gfs2_aspace_put(aspace); + GLOCK_BUG_ON(gl, mapping mapping-nrpages); trace_gfs2_glock_put(gl); - sdp-sd_lockstruct.ls_ops-lm_put_lock(gfs2_glock_cachep, gl); + if (mapping) + cachep = gfs2_glock_aspace_cachep; + sdp-sd_lockstruct.ls_ops-lm_put_lock(cachep, gl); } /** @@ -750,10 +752,11 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number, const struct gfs2_glock_operations *glops, int create, struct gfs2_glock **glp) { + struct super_block *s = sdp-sd_vfs; struct lm_lockname name = { .ln_number = number, .ln_type = glops-go_type }; struct gfs2_glock *gl, *tmp; unsigned int hash = gl_hash(sdp, name); - int error; + struct address_space *mapping; read_lock(gl_lock_addr(hash)); gl = search_bucket(hash, sdp, name); @@ -765,7 +768,10 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number, if (!create) return -ENOENT; - gl = kmem_cache_alloc(gfs2_glock_cachep, GFP_KERNEL); + if (glops-go_flags GLOF_ASPACE) + gl = kmem_cache_alloc(gfs2_glock_aspace_cachep, GFP_KERNEL); + else + gl = kmem_cache_alloc(gfs2_glock_cachep, GFP_KERNEL); if (!gl) return -ENOMEM; @@ -783,18 +789,18 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number, gl-gl_tchange = jiffies; gl-gl_object = NULL; gl-gl_sbd = sdp; - gl-gl_aspace = NULL; INIT_DELAYED_WORK(gl-gl_work, glock_work_func); INIT_WORK(gl-gl_delete, delete_work_func); - /* If this glock protects actual on-disk data or metadata blocks, - create a VFS inode to manage the pages/buffers holding them. */ - if (glops == gfs2_inode_glops || glops == gfs2_rgrp_glops) { - gl-gl_aspace
[Cluster-devel] GFS2: Remove loopy umount code
From 086332de5db343f8029d4436725090c42fcac7c7 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse swhit...@redhat.com Date: Fri, 8 Jan 2010 16:14:29 + Subject: GFS2: Remove loopy umount code As a consequence of the previous patch, we can now remove the loop which used to be required due to the circular dependency between the inodes and glocks. Instead we can just invalidate the inodes, and then clear up any glocks which are left. Also we no longer need the rwsem since there is no longer any danger of the inode invalidation calling back into the glock code (and from there back into the inode code). Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/glock.c | 35 --- fs/gfs2/incore.h |1 - fs/gfs2/ops_fstype.c |4 +--- fs/gfs2/super.c |1 + fs/gfs2/sys.c|2 -- 5 files changed, 6 insertions(+), 37 deletions(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 736d05b..6e1e526 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -19,7 +19,6 @@ #include linux/list.h #include linux/wait.h #include linux/module.h -#include linux/rwsem.h #include asm/uaccess.h #include linux/seq_file.h #include linux/debugfs.h @@ -60,7 +59,6 @@ static int __dump_glock(struct seq_file *seq, const struct gfs2_glock *gl); #define GLOCK_BUG_ON(gl,x) do { if (unlikely(x)) { __dump_glock(NULL, gl); BUG(); } } while(0) static void do_xmote(struct gfs2_glock *gl, struct gfs2_holder *gh, unsigned int target); -static DECLARE_RWSEM(gfs2_umount_flush_sem); static struct dentry *gfs2_root; static struct workqueue_struct *glock_workqueue; struct workqueue_struct *gfs2_delete_workqueue; @@ -714,7 +712,6 @@ static void glock_work_func(struct work_struct *work) finish_xmote(gl, gl-gl_reply); drop_ref = 1; } - down_read(gfs2_umount_flush_sem); spin_lock(gl-gl_spin); if (test_and_clear_bit(GLF_PENDING_DEMOTE, gl-gl_flags) gl-gl_state != LM_ST_UNLOCKED @@ -727,7 +724,6 @@ static void glock_work_func(struct work_struct *work) } run_queue(gl, 0); spin_unlock(gl-gl_spin); - up_read(gfs2_umount_flush_sem); if (!delay || queue_delayed_work(glock_workqueue, gl-gl_work, delay) == 0) gfs2_glock_put(gl); @@ -1511,35 +1507,12 @@ void gfs2_glock_thaw(struct gfs2_sbd *sdp) void gfs2_gl_hash_clear(struct gfs2_sbd *sdp) { - unsigned long t; unsigned int x; - int cont; - t = jiffies; - - for (;;) { - cont = 0; - for (x = 0; x GFS2_GL_HASH_SIZE; x++) { - if (examine_bucket(clear_glock, sdp, x)) - cont = 1; - } - - if (!cont) - break; - - if (time_after_eq(jiffies, - t + gfs2_tune_get(sdp, gt_stall_secs) * HZ)) { - fs_warn(sdp, Unmount seems to be stalled. -Dumping lock state...\n); - gfs2_dump_lockstate(sdp); - t = jiffies; - } - - down_write(gfs2_umount_flush_sem); - invalidate_inodes(sdp-sd_vfs); - up_write(gfs2_umount_flush_sem); - msleep(10); - } + for (x = 0; x GFS2_GL_HASH_SIZE; x++) + examine_bucket(clear_glock, sdp, x); + flush_workqueue(glock_workqueue); + gfs2_dump_lockstate(sdp); } void gfs2_glock_finish_truncate(struct gfs2_inode *ip) diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index 0f0d55a..f93f9b9 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -451,7 +451,6 @@ struct gfs2_tune { unsigned int gt_quota_quantum; /* Secs between syncs to quota file */ unsigned int gt_new_files_jdata; unsigned int gt_max_readahead; /* Max bytes to read-ahead from disk */ - unsigned int gt_stall_secs; /* Detects trouble! */ unsigned int gt_complain_secs; unsigned int gt_statfs_quantum; unsigned int gt_statfs_slow; diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index edfee24..968a99f 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -65,7 +65,6 @@ static void gfs2_tune_init(struct gfs2_tune *gt) gt-gt_quota_scale_den = 1; gt-gt_new_files_jdata = 0; gt-gt_max_readahead = 1 18; - gt-gt_stall_secs = 600; gt-gt_complain_secs = 10; } @@ -1231,10 +1230,9 @@ fail_sb: fail_locking: init_locking(sdp, mount_gh, UNDO); fail_lm: + invalidate_inodes(sb); gfs2_gl_hash_clear(sdp); gfs2_lm_unmount(sdp); - while (invalidate_inodes(sb)) - yield(); fail_sys: gfs2_sys_fs_del(sdp); fail: diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 8ddc613..c008b08 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -858,6 +858,7
[Cluster-devel] [PATCH 4/4] GFS2: Use MAX_LFS_FILESIZE for meta inode size
Using ~0ULL was cauing sign issues in filemap_fdatawrite_range, so use MAX_LFS_FILESIZE instead. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/meta_io.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c index cb8d7a9..6f68a5f 100644 --- a/fs/gfs2/meta_io.c +++ b/fs/gfs2/meta_io.c @@ -121,7 +121,7 @@ struct inode *gfs2_aspace_get(struct gfs2_sbd *sdp) if (aspace) { mapping_set_gfp_mask(aspace-i_mapping, GFP_NOFS); aspace-i_mapping-a_ops = aspace_aops; - aspace-i_size = ~0ULL; + aspace-i_size = MAX_LFS_FILESIZE; ip = GFS2_I(aspace); clear_bit(GIF_USER, ip-i_flags); insert_inode_hash(aspace); -- 1.6.2.5
[Cluster-devel] [PATCH 3/4] GFS2: Fix gfs2_xattr_acl_chmod()
The ref counting for the bh returned by gfs2_ea_find() was wrong. This patch ensures that we always drop the ref count to that bh correctly. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/xattr.c | 21 +++-- 1 files changed, 11 insertions(+), 10 deletions(-) diff --git a/fs/gfs2/xattr.c b/fs/gfs2/xattr.c index 8a04108..c2ebdf2 100644 --- a/fs/gfs2/xattr.c +++ b/fs/gfs2/xattr.c @@ -1296,6 +1296,7 @@ fail: int gfs2_xattr_acl_chmod(struct gfs2_inode *ip, struct iattr *attr, char *data) { + struct gfs2_sbd *sdp = GFS2_SB(ip-i_inode); struct gfs2_ea_location el; struct buffer_head *dibh; int error; @@ -1305,16 +1306,17 @@ int gfs2_xattr_acl_chmod(struct gfs2_inode *ip, struct iattr *attr, char *data) return error; if (GFS2_EA_IS_STUFFED(el.el_ea)) { - error = gfs2_trans_begin(GFS2_SB(ip-i_inode), RES_DINODE + RES_EATTR, 0); - if (error) - return error; - - gfs2_trans_add_bh(ip-i_gl, el.el_bh, 1); - memcpy(GFS2_EA2DATA(el.el_ea), data, - GFS2_EA_DATA_LEN(el.el_ea)); - } else + error = gfs2_trans_begin(sdp, RES_DINODE + RES_EATTR, 0); + if (error == 0) { + gfs2_trans_add_bh(ip-i_gl, el.el_bh, 1); + memcpy(GFS2_EA2DATA(el.el_ea), data, + GFS2_EA_DATA_LEN(el.el_ea)); + } + } else { error = ea_acl_chmod_unstuffed(ip, el.el_ea, data); + } + brelse(el.el_bh); if (error) return error; @@ -1327,8 +1329,7 @@ int gfs2_xattr_acl_chmod(struct gfs2_inode *ip, struct iattr *attr, char *data) brelse(dibh); } - gfs2_trans_end(GFS2_SB(ip-i_inode)); - + gfs2_trans_end(sdp); return error; } -- 1.6.2.5
[Cluster-devel] [PATCH 1/4] GFS2: Ensure uptodate inode size when using O_APPEND
The VFS reads the inode size during generic_file_aio_write() but with no locking around it. In order to get the expected result from O_APPEND opens, this patch updated the inode size before calling generic_file_aio_write() There is of course still a race here, in that there is nothing to prevent another node coming in and extending the file in the mean time. On the other hand, when used with file locking this will ensure that the expected results are obtained. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/file.c | 38 -- 1 files changed, 36 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 4eb308a..a6abbae 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -569,6 +569,40 @@ static int gfs2_fsync(struct file *file, struct dentry *dentry, int datasync) return ret; } +/** + * gfs2_file_aio_write - Perform a write to a file + * @iocb: The io context + * @iov: The data to write + * @nr_segs: Number of @iov segments + * @pos: The file position + * + * We have to do a lock/unlock here to refresh the inode size for + * O_APPEND writes, otherwise we can land up writing at the wrong + * offset. There is still a race, but provided the app is using its + * own file locking, this will make O_APPEND work as expected. + * + */ + +static ssize_t gfs2_file_aio_write(struct kiocb *iocb, const struct iovec *iov, + unsigned long nr_segs, loff_t pos) +{ + struct file *file = iocb-ki_filp; + + if (file-f_flags O_APPEND) { + struct dentry *dentry = file-f_dentry; + struct gfs2_inode *ip = GFS2_I(dentry-d_inode); + struct gfs2_holder gh; + int ret; + + ret = gfs2_glock_nq_init(ip-i_gl, LM_ST_SHARED, 0, gh); + if (ret) + return ret; + gfs2_glock_dq_uninit(gh); + } + + return generic_file_aio_write(iocb, iov, nr_segs, pos); +} + #ifdef CONFIG_GFS2_FS_LOCKING_DLM /** @@ -711,7 +745,7 @@ const struct file_operations gfs2_file_fops = { .read = do_sync_read, .aio_read = generic_file_aio_read, .write = do_sync_write, - .aio_write = generic_file_aio_write, + .aio_write = gfs2_file_aio_write, .unlocked_ioctl = gfs2_ioctl, .mmap = gfs2_mmap, .open = gfs2_open, @@ -741,7 +775,7 @@ const struct file_operations gfs2_file_fops_nolock = { .read = do_sync_read, .aio_read = generic_file_aio_read, .write = do_sync_write, - .aio_write = generic_file_aio_write, + .aio_write = gfs2_file_aio_write, .unlocked_ioctl = gfs2_ioctl, .mmap = gfs2_mmap, .open = gfs2_open, -- 1.6.2.5
[Cluster-devel] [PATCH 2/4] GFS2: Fix locking bug in rename
The rename code was taking a resource group lock in cases where it wasn't actually needed, this caused problems if the rename was resulting in an inode being unlinked. The patch ensures that we only take the rgrp lock early if it is really needed. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/ops_inode.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c index 247436c..78f73ca 100644 --- a/fs/gfs2/ops_inode.c +++ b/fs/gfs2/ops_inode.c @@ -748,7 +748,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry, struct gfs2_rgrpd *nrgd; unsigned int num_gh; int dir_rename = 0; - int alloc_required; + int alloc_required = 0; unsigned int x; int error; @@ -867,7 +867,9 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry, goto out_gunlock; } - alloc_required = error = gfs2_diradd_alloc_required(ndir, ndentry-d_name); + if (nip == NULL) + alloc_required = gfs2_diradd_alloc_required(ndir, ndentry-d_name); + error = alloc_required; if (error 0) goto out_gunlock; error = 0; -- 1.6.2.5
[Cluster-devel] GFS2: Pre-pull patch posting (fixes)
Here are four small fixes for GFS2. Assuming that nobody spots any errors, I'll be sending a pull request for these shortly, Steve.
[Cluster-devel] GFS2: Pull request (fixes)
Hi, Please consider pulling the following GFS2 bug fixes, Steve. The following changes since commit 74d2e4f8d79ae0c4b6ec027958d5b18058662eea: Linus Torvalds (1): Linux 2.6.33-rc3 are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes.git master Steven Whitehouse (4): GFS2: Ensure uptodate inode size when using O_APPEND GFS2: Fix locking bug in rename GFS2: Fix gfs2_xattr_acl_chmod() GFS2: Use MAX_LFS_FILESIZE for meta inode size fs/gfs2/file.c | 38 -- fs/gfs2/meta_io.c |2 +- fs/gfs2/ops_inode.c |6 -- fs/gfs2/xattr.c | 21 +++-- 4 files changed, 52 insertions(+), 15 deletions(-)
[Cluster-devel] GFS2 git trees
Hi, Linus pulled the -fixes tree last night, so I'm just about to rebase both GFS2 git trees to the new upstream kernel, Steve.
Re: [Cluster-devel] [PATCH] gfs2: Fix refcnt leak on gfs2_follow_link() error path
Hi, Thanks for the patch. I've pushed it into the GFS2 -fixes tree, Steve. On Tue, 2010-01-12 at 03:36 +0900, OGAWA Hirofumi wrote: If -follow_link handler return the error, it should decrement nd-path refcnt. This patch fix it. Signed-off-by: OGAWA Hirofumi hirof...@mail.parknet.co.jp --- fs/gfs2/ops_inode.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff -puN fs/gfs2/ops_inode.c~namei-gfs2-follow_link-fix fs/gfs2/ops_inode.c --- linux-2.6/fs/gfs2/ops_inode.c~namei-gfs2-follow_link-fix 2010-01-12 00:15:12.0 +0900 +++ linux-2.6-hirofumi/fs/gfs2/ops_inode.c2010-01-12 00:15:12.0 +0900 @@ -1086,7 +1086,8 @@ static void *gfs2_follow_link(struct den error = vfs_follow_link(nd, buf); if (buf != array) kfree(buf); - } + } else + path_put(nd-path); return ERR_PTR(error); } _
[Cluster-devel] GFS2: Wait for unlock completion on umount
This patch adds a wait on umount between the point at which we dispose of all glocks and the point at which we unmount the lock protocol. This ensures that we've received all the replies to our unlock requests before we stop the locking. Signed-off-by: Steven Whitehouse swhit...@redhat.com Reported-by: Fabio M. Di Nitto fdini...@redhat.com diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index f93f9b9..b8025e5 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -543,6 +543,8 @@ struct gfs2_sbd { struct gfs2_holder sd_live_gh; struct gfs2_glock *sd_rename_gl; struct gfs2_glock *sd_trans_gl; + wait_queue_head_t sd_glock_wait; + atomic_t sd_glock_disposal; /* Inode Stuff */ diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c index 094839e..484411c 100644 --- a/fs/gfs2/lock_dlm.c +++ b/fs/gfs2/lock_dlm.c @@ -21,6 +21,7 @@ static void gdlm_ast(void *arg) { struct gfs2_glock *gl = arg; unsigned ret = gl-gl_state; + struct gfs2_sbd *sdp = gl-gl_sbd; BUG_ON(gl-gl_lksb.sb_flags DLM_SBF_DEMOTED); @@ -33,6 +34,8 @@ static void gdlm_ast(void *arg) kmem_cache_free(gfs2_glock_aspace_cachep, gl); else kmem_cache_free(gfs2_glock_cachep, gl); + if (atomic_dec_and_test(sdp-sd_glock_disposal)) + wake_up(sdp-sd_glock_wait); return; case -DLM_ECANCEL: /* Cancel while getting lock */ ret |= LM_OUT_CANCELED; @@ -170,7 +173,8 @@ static unsigned int gdlm_lock(struct gfs2_glock *gl, static void gdlm_put_lock(struct kmem_cache *cachep, void *ptr) { struct gfs2_glock *gl = ptr; - struct lm_lockstruct *ls = gl-gl_sbd-sd_lockstruct; + struct gfs2_sbd *sdp = gl-gl_sbd; + struct lm_lockstruct *ls = sdp-sd_lockstruct; int error; if (gl-gl_lksb.sb_lkid == 0) { @@ -186,6 +190,7 @@ static void gdlm_put_lock(struct kmem_cache *cachep, void *ptr) (unsigned long long)gl-gl_name.ln_number, error); return; } + atomic_inc(sdp-sd_glock_disposal); } static void gdlm_cancel(struct gfs2_glock *gl) diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 968a99f..9baa566 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -81,6 +81,8 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb) gfs2_tune_init(sdp-sd_tune); + init_waitqueue_head(sdp-sd_glock_wait); + atomic_set(sdp-sd_glock_disposal, 0); spin_lock_init(sdp-sd_statfs_spin); spin_lock_init(sdp-sd_rindex_spin); diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index c008b08..e2bf19f 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -21,6 +21,7 @@ #include linux/gfs2_ondisk.h #include linux/crc32.h #include linux/time.h +#include linux/wait.h #include gfs2.h #include incore.h @@ -860,6 +861,8 @@ restart: /* Take apart glock structures and buffer lists */ invalidate_inodes(sdp-sd_vfs); gfs2_gl_hash_clear(sdp); + /* Wait for dlm to reply to all our unlock requests */ + wait_event(sdp-sd_glock_wait, atomic_read(sdp-sd_glock_disposal) == 0); /* Unmount the locking protocol */ gfs2_lm_unmount(sdp);
[Cluster-devel] GFS2: Fix previous patch
This fixes the rgrp patch, Steve. From ea0d7284f2f2bd56386e6c4810bf970e50472054 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse swhit...@redhat.com Date: Fri, 29 Jan 2010 15:20:34 + Subject: [PATCH 1/3] GFS2: Fix previous patch The do_div() call needs to remain. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/rgrp.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index 6702b82..46534a5 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -591,6 +591,7 @@ static int gfs2_ri_update(struct gfs2_inode *ip) u64 rgrp_count = ip-i_disksize; int error; + do_div(rgrp_count, sizeof(struct gfs2_rindex)); clear_rgrpdi(sdp); file_ra_state_init(ra_state, inode-i_mapping); -- 1.6.2.5
[Cluster-devel] GFS2: Extend umount wait coverage to full glock lifetime
From 0f76b65f50e4f17324ba184dd074c35788928ba7 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse swhit...@redhat.com Date: Fri, 29 Jan 2010 15:21:27 + Subject: [PATCH 2/3] GFS2: Extend umount wait coverage to full glock lifetime Although all glocks are, by the time of the umount glock wait, scheduled for demotion, some of them haven't made it far enough through the process for the original set of waiting code to wait for them. This extends the ref count to the whole glock lifetime in order to ensure that the waiting does catch all glocks. It does make it a bit more invasive, but it seems the only sensible solution at the moment. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/glock.c |2 ++ fs/gfs2/glock.h |2 +- fs/gfs2/lock_dlm.c |6 +++--- fs/gfs2/ops_fstype.c | 10 +- fs/gfs2/super.c |2 -- 5 files changed, 15 insertions(+), 7 deletions(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 6e1e526..4773f90 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -771,6 +771,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number, if (!gl) return -ENOMEM; + atomic_inc(sdp-sd_glock_disposal); gl-gl_flags = 0; gl-gl_name = name; atomic_set(gl-gl_ref, 1); @@ -1512,6 +1513,7 @@ void gfs2_gl_hash_clear(struct gfs2_sbd *sdp) for (x = 0; x GFS2_GL_HASH_SIZE; x++) examine_bucket(clear_glock, sdp, x); flush_workqueue(glock_workqueue); + wait_event(sdp-sd_glock_wait, atomic_read(sdp-sd_glock_disposal) == 0); gfs2_dump_lockstate(sdp); } diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h index dac7261..2bda191 100644 --- a/fs/gfs2/glock.h +++ b/fs/gfs2/glock.h @@ -123,7 +123,7 @@ struct lm_lockops { int (*lm_mount) (struct gfs2_sbd *sdp, const char *fsname); void (*lm_unmount) (struct gfs2_sbd *sdp); void (*lm_withdraw) (struct gfs2_sbd *sdp); - void (*lm_put_lock) (struct kmem_cache *cachep, void *gl); + void (*lm_put_lock) (struct kmem_cache *cachep, struct gfs2_glock *gl); unsigned int (*lm_lock) (struct gfs2_glock *gl, unsigned int req_state, unsigned int flags); void (*lm_cancel) (struct gfs2_glock *gl); diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c index 484411c..569b462 100644 --- a/fs/gfs2/lock_dlm.c +++ b/fs/gfs2/lock_dlm.c @@ -170,15 +170,16 @@ static unsigned int gdlm_lock(struct gfs2_glock *gl, return LM_OUT_ASYNC; } -static void gdlm_put_lock(struct kmem_cache *cachep, void *ptr) +static void gdlm_put_lock(struct kmem_cache *cachep, struct gfs2_glock *gl) { - struct gfs2_glock *gl = ptr; struct gfs2_sbd *sdp = gl-gl_sbd; struct lm_lockstruct *ls = sdp-sd_lockstruct; int error; if (gl-gl_lksb.sb_lkid == 0) { kmem_cache_free(cachep, gl); + if (atomic_dec_and_test(sdp-sd_glock_disposal)) + wake_up(sdp-sd_glock_wait); return; } @@ -190,7 +191,6 @@ static void gdlm_put_lock(struct kmem_cache *cachep, void *ptr) (unsigned long long)gl-gl_name.ln_number, error); return; } - atomic_inc(sdp-sd_glock_disposal); } static void gdlm_cancel(struct gfs2_glock *gl) diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 9baa566..d405c38 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -984,9 +984,17 @@ static const match_table_t nolock_tokens = { { Opt_err, NULL }, }; +static void nolock_put_lock(struct kmem_cache *cachep, struct gfs2_glock *gl) +{ + struct gfs2_sbd *sdp = gl-gl_sbd; + kmem_cache_free(cachep, gl); + if (atomic_dec_and_test(sdp-sd_glock_disposal)) + wake_up(sdp-sd_glock_wait); +} + static const struct lm_lockops nolock_ops = { .lm_proto_name = lock_nolock, - .lm_put_lock = kmem_cache_free, + .lm_put_lock = nolock_put_lock, .lm_tokens = nolock_tokens, }; diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index e2bf19f..e5e2262 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -861,8 +861,6 @@ restart: /* Take apart glock structures and buffer lists */ invalidate_inodes(sdp-sd_vfs); gfs2_gl_hash_clear(sdp); - /* Wait for dlm to reply to all our unlock requests */ - wait_event(sdp-sd_glock_wait, atomic_read(sdp-sd_glock_disposal) == 0); /* Unmount the locking protocol */ gfs2_lm_unmount(sdp); -- 1.6.2.5
[Cluster-devel] GFS2: Use GFP_NOFS for alloc structure
From 04988c7ee83641ca732910aff427ab08b0faa557 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse swhit...@redhat.com Date: Fri, 29 Jan 2010 15:48:57 + Subject: [PATCH 3/3] GFS2: Use GFP_NOFS for alloc structure This is called under a glock, so its a good plan to use GFP_NOFS Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/rgrp.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index 46534a5..503b842 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -911,7 +911,7 @@ void gfs2_rgrp_repolish_clones(struct gfs2_rgrpd *rgd) struct gfs2_alloc *gfs2_alloc_get(struct gfs2_inode *ip) { BUG_ON(ip-i_alloc != NULL); - ip-i_alloc = kzalloc(sizeof(struct gfs2_alloc), GFP_KERNEL); + ip-i_alloc = kzalloc(sizeof(struct gfs2_alloc), GFP_NOFS); return ip-i_alloc; } -- 1.6.2.5
[Cluster-devel] [PATCH 1/4] GFS2: Fix refcnt leak on gfs2_follow_link() error path
From: OGAWA Hirofumi hirof...@mail.parknet.co.jp If -follow_link handler return the error, it should decrement nd-path refcnt. This patch fix it. Signed-off-by: OGAWA Hirofumi hirof...@mail.parknet.co.jp Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/ops_inode.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c index 78f73ca..84350e1 100644 --- a/fs/gfs2/ops_inode.c +++ b/fs/gfs2/ops_inode.c @@ -1088,7 +1088,8 @@ static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd) error = vfs_follow_link(nd, buf); if (buf != array) kfree(buf); - } + } else + path_put(nd-path); return ERR_PTR(error); } -- 1.6.2.5
[Cluster-devel] [PATCH 4/4] GFS2: Use GFP_NOFS for alloc structure
This is called under a glock, so its a good plan to use GFP_NOFS Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/rgrp.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index 46534a5..503b842 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -911,7 +911,7 @@ void gfs2_rgrp_repolish_clones(struct gfs2_rgrpd *rgd) struct gfs2_alloc *gfs2_alloc_get(struct gfs2_inode *ip) { BUG_ON(ip-i_alloc != NULL); - ip-i_alloc = kzalloc(sizeof(struct gfs2_alloc), GFP_KERNEL); + ip-i_alloc = kzalloc(sizeof(struct gfs2_alloc), GFP_NOFS); return ip-i_alloc; } -- 1.6.2.5
[Cluster-devel] [PATCH 2/4] GFS2: Don't withdraw on partial rindex entries
From: Benjamin Marzinski bmarz...@redhat.com ince gfs2 writes the rindex file a block at a time, and releases the exclusive lock after each block, it is possible that another process will grab the lock in the middle of the write. Since rindex entries are not an even divisor of blocks, that other process may see partial entries. On grows, this is fine. The process can simply ignore the the partial entires. Previously, the code withdrew when it saw partial entries. Now it simply ignores them. Signed-off-by: Benjamin Marzinski bmarz...@redhat.com Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/rgrp.c |5 - 1 files changed, 0 insertions(+), 5 deletions(-) diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index 0608f49..6702b82 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -591,11 +591,6 @@ static int gfs2_ri_update(struct gfs2_inode *ip) u64 rgrp_count = ip-i_disksize; int error; - if (do_div(rgrp_count, sizeof(struct gfs2_rindex))) { - gfs2_consist_inode(ip); - return -EIO; - } - clear_rgrpdi(sdp); file_ra_state_init(ra_state, inode-i_mapping); -- 1.6.2.5
[Cluster-devel] [PATCH 3/4] GFS2: Fix previous patch
The do_div() call needs to remain. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/rgrp.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index 6702b82..46534a5 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -591,6 +591,7 @@ static int gfs2_ri_update(struct gfs2_inode *ip) u64 rgrp_count = ip-i_disksize; int error; + do_div(rgrp_count, sizeof(struct gfs2_rindex)); clear_rgrpdi(sdp); file_ra_state_init(ra_state, inode-i_mapping); -- 1.6.2.5
[Cluster-devel] [GFS2] Pull request (fixes)
Hi, Please consider pulling the following patches, Steve. The following changes since commit 066000dd856709b6980123eb39b957fe26993f7b: Ananth N Mavinakayanahalli (1): Revert x86, apic: Use logical flat on intel with = 8 logical cpus are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes.git master Benjamin Marzinski (1): GFS2: Don't withdraw on partial rindex entries OGAWA Hirofumi (1): GFS2: Fix refcnt leak on gfs2_follow_link() error path Steven Whitehouse (2): GFS2: Fix previous patch GFS2: Use GFP_NOFS for alloc structure fs/gfs2/ops_inode.c |3 ++- fs/gfs2/rgrp.c |8 ++-- 2 files changed, 4 insertions(+), 7 deletions(-)
[Cluster-devel] GFS2: Pre-pull patch posting (fixes)
Hi, Here are a couple of patches which between them fix a problem where occasionally it was possible for the GFS2 module to be unloaded before all the glocks were deallocated, which, needless to say, made the slab allocator unhappy, Steve.
[Cluster-devel] [PATCH 1/2] GFS2: Wait for unlock completion on umount
This patch adds a wait on umount between the point at which we dispose of all glocks and the point at which we unmount the lock protocol. This ensures that we've received all the replies to our unlock requests before we stop the locking. Signed-off-by: Steven Whitehouse swhit...@redhat.com Reported-by: Fabio M. Di Nitto fdini...@redhat.com --- fs/gfs2/incore.h |2 ++ fs/gfs2/lock_dlm.c |7 ++- fs/gfs2/ops_fstype.c |2 ++ fs/gfs2/super.c |3 +++ 4 files changed, 13 insertions(+), 1 deletions(-) diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index 4792200..bc0ad15 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -544,6 +544,8 @@ struct gfs2_sbd { struct gfs2_holder sd_live_gh; struct gfs2_glock *sd_rename_gl; struct gfs2_glock *sd_trans_gl; + wait_queue_head_t sd_glock_wait; + atomic_t sd_glock_disposal; /* Inode Stuff */ diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c index 46df988..cdd0755 100644 --- a/fs/gfs2/lock_dlm.c +++ b/fs/gfs2/lock_dlm.c @@ -21,6 +21,7 @@ static void gdlm_ast(void *arg) { struct gfs2_glock *gl = arg; unsigned ret = gl-gl_state; + struct gfs2_sbd *sdp = gl-gl_sbd; BUG_ON(gl-gl_lksb.sb_flags DLM_SBF_DEMOTED); @@ -30,6 +31,8 @@ static void gdlm_ast(void *arg) switch (gl-gl_lksb.sb_status) { case -DLM_EUNLOCK: /* Unlocked, so glock can be freed */ kmem_cache_free(gfs2_glock_cachep, gl); + if (atomic_dec_and_test(sdp-sd_glock_disposal)) + wake_up(sdp-sd_glock_wait); return; case -DLM_ECANCEL: /* Cancel while getting lock */ ret |= LM_OUT_CANCELED; @@ -167,7 +170,8 @@ static unsigned int gdlm_lock(struct gfs2_glock *gl, static void gdlm_put_lock(struct kmem_cache *cachep, void *ptr) { struct gfs2_glock *gl = ptr; - struct lm_lockstruct *ls = gl-gl_sbd-sd_lockstruct; + struct gfs2_sbd *sdp = gl-gl_sbd; + struct lm_lockstruct *ls = sdp-sd_lockstruct; int error; if (gl-gl_lksb.sb_lkid == 0) { @@ -183,6 +187,7 @@ static void gdlm_put_lock(struct kmem_cache *cachep, void *ptr) (unsigned long long)gl-gl_name.ln_number, error); return; } + atomic_inc(sdp-sd_glock_disposal); } static void gdlm_cancel(struct gfs2_glock *gl) diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index edfee24..9390fc7 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -82,6 +82,8 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb) gfs2_tune_init(sdp-sd_tune); + init_waitqueue_head(sdp-sd_glock_wait); + atomic_set(sdp-sd_glock_disposal, 0); spin_lock_init(sdp-sd_statfs_spin); spin_lock_init(sdp-sd_rindex_spin); diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index c282ad4..66242b3 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -21,6 +21,7 @@ #include linux/gfs2_ondisk.h #include linux/crc32.h #include linux/time.h +#include linux/wait.h #include gfs2.h #include incore.h @@ -860,6 +861,8 @@ restart: gfs2_jindex_free(sdp); /* Take apart glock structures and buffer lists */ gfs2_gl_hash_clear(sdp); + /* Wait for dlm to reply to all our unlock requests */ + wait_event(sdp-sd_glock_wait, atomic_read(sdp-sd_glock_disposal) == 0); /* Unmount the locking protocol */ gfs2_lm_unmount(sdp); -- 1.6.2.5
[Cluster-devel] [PATCH 2/2] GFS2: Extend umount wait coverage to full glock lifetime
Although all glocks are, by the time of the umount glock wait, scheduled for demotion, some of them haven't made it far enough through the process for the original set of waiting code to wait for them. This extends the ref count to the whole glock lifetime in order to ensure that the waiting does catch all glocks. It does make it a bit more invasive, but it seems the only sensible solution at the moment. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/glock.c |4 fs/gfs2/glock.h |2 +- fs/gfs2/lock_dlm.c |6 +++--- fs/gfs2/ops_fstype.c | 10 +- fs/gfs2/super.c |2 -- 5 files changed, 17 insertions(+), 7 deletions(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index f455a03..f426633 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -769,6 +769,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number, if (!gl) return -ENOMEM; + atomic_inc(sdp-sd_glock_disposal); gl-gl_flags = 0; gl-gl_name = name; atomic_set(gl-gl_ref, 1); @@ -1538,6 +1539,9 @@ void gfs2_gl_hash_clear(struct gfs2_sbd *sdp) up_write(gfs2_umount_flush_sem); msleep(10); } + flush_workqueue(glock_workqueue); + wait_event(sdp-sd_glock_wait, atomic_read(sdp-sd_glock_disposal) == 0); + gfs2_dump_lockstate(sdp); } void gfs2_glock_finish_truncate(struct gfs2_inode *ip) diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h index 13f0bd2..c0262fa 100644 --- a/fs/gfs2/glock.h +++ b/fs/gfs2/glock.h @@ -123,7 +123,7 @@ struct lm_lockops { int (*lm_mount) (struct gfs2_sbd *sdp, const char *fsname); void (*lm_unmount) (struct gfs2_sbd *sdp); void (*lm_withdraw) (struct gfs2_sbd *sdp); - void (*lm_put_lock) (struct kmem_cache *cachep, void *gl); + void (*lm_put_lock) (struct kmem_cache *cachep, struct gfs2_glock *gl); unsigned int (*lm_lock) (struct gfs2_glock *gl, unsigned int req_state, unsigned int flags); void (*lm_cancel) (struct gfs2_glock *gl); diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c index cdd0755..0e5e0e7 100644 --- a/fs/gfs2/lock_dlm.c +++ b/fs/gfs2/lock_dlm.c @@ -167,15 +167,16 @@ static unsigned int gdlm_lock(struct gfs2_glock *gl, return LM_OUT_ASYNC; } -static void gdlm_put_lock(struct kmem_cache *cachep, void *ptr) +static void gdlm_put_lock(struct kmem_cache *cachep, struct gfs2_glock *gl) { - struct gfs2_glock *gl = ptr; struct gfs2_sbd *sdp = gl-gl_sbd; struct lm_lockstruct *ls = sdp-sd_lockstruct; int error; if (gl-gl_lksb.sb_lkid == 0) { kmem_cache_free(cachep, gl); + if (atomic_dec_and_test(sdp-sd_glock_disposal)) + wake_up(sdp-sd_glock_wait); return; } @@ -187,7 +188,6 @@ static void gdlm_put_lock(struct kmem_cache *cachep, void *ptr) (unsigned long long)gl-gl_name.ln_number, error); return; } - atomic_inc(sdp-sd_glock_disposal); } static void gdlm_cancel(struct gfs2_glock *gl) diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 9390fc7..8a102f7 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -985,9 +985,17 @@ static const match_table_t nolock_tokens = { { Opt_err, NULL }, }; +static void nolock_put_lock(struct kmem_cache *cachep, struct gfs2_glock *gl) +{ + struct gfs2_sbd *sdp = gl-gl_sbd; + kmem_cache_free(cachep, gl); + if (atomic_dec_and_test(sdp-sd_glock_disposal)) + wake_up(sdp-sd_glock_wait); +} + static const struct lm_lockops nolock_ops = { .lm_proto_name = lock_nolock, - .lm_put_lock = kmem_cache_free, + .lm_put_lock = nolock_put_lock, .lm_tokens = nolock_tokens, }; diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 66242b3..b9dd3da 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -861,8 +861,6 @@ restart: gfs2_jindex_free(sdp); /* Take apart glock structures and buffer lists */ gfs2_gl_hash_clear(sdp); - /* Wait for dlm to reply to all our unlock requests */ - wait_event(sdp-sd_glock_wait, atomic_read(sdp-sd_glock_disposal) == 0); /* Unmount the locking protocol */ gfs2_lm_unmount(sdp); -- 1.6.2.5
[Cluster-devel] GFS2: Pull request (fixes)
Hi, Please consider pulling the following two changes, Steve. The following changes since commit 1a45dcfe2525e9432cb4aba461d4994fc2befe42: Linus Torvalds (1): Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes.git master Steven Whitehouse (2): GFS2: Wait for unlock completion on umount GFS2: Extend umount wait coverage to full glock lifetime fs/gfs2/glock.c |4 fs/gfs2/glock.h |2 +- fs/gfs2/incore.h |2 ++ fs/gfs2/lock_dlm.c | 11 --- fs/gfs2/ops_fstype.c | 12 +++- fs/gfs2/super.c |1 + 6 files changed, 27 insertions(+), 5 deletions(-)
Re: [Cluster-devel] [PATCH 1/4] gfs2: add IO submission trace points
Hi, On Fri, 2010-02-05 at 16:45 +1100, Dave Chinner wrote: Useful for tracking down where specific IOs are being issued from. Signed-off-by: Dave Chinner dchin...@redhat.com --- fs/gfs2/log.c|6 ++ fs/gfs2/lops.c |6 ++ fs/gfs2/trace_gfs2.h | 41 + 3 files changed, 53 insertions(+), 0 deletions(-) diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index 4511b08..bd26dff 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -121,6 +121,7 @@ __acquires(sdp-sd_log_lock) lock_buffer(bh); if (test_clear_buffer_dirty(bh)) { bh-b_end_io = end_buffer_write_sync; + trace_gfs2_submit_bh(bh, WRITE_SYNC_PLUG, __func__); submit_bh(WRITE_SYNC_PLUG, bh); This looks like it could be a generically useful function, I wonder if it would be possible to do this directly in submit_bh, since we should be able to use __builtin_return_address(0) to find out the origin of the call? Steve.
Re: [Cluster-devel] [PATCH 2/4] gfs2: ordered writes are backwards
Hi, This looks good. There is an argument for trying to sort the buffers as we write them (in case the application writes them out of order) but this seems a sensible change to catch 90% of cases. I'm just about to give this a quick test and I'll push this one in straight away if it looks good on my test, Steve. On Fri, 2010-02-05 at 16:45 +1100, Dave Chinner wrote: When we queue data buffers for ordered write, the buffers are added to the head of the ordered write list. When the log needs to push these buffers to disk, it also walks the list from the head. The result is that the the ordered buffers are submitted to disk in reverse order. For large writes, this means that whenever the log flushes large streams of reverse sequential order buffers are pushed down into the block layers. The elevators don't handle this particularly well, so IO rates tend to be significantly lower than if the IO was issued in ascending block order. Queue new ordered buffers to the tail of the ordered buffer list to ensure that IO is dispatched in the order it was submitted. This should significantly improve large sequential write speeds. On a disk capable of 85MB/s, speeds increase from 50MB/s to 65MB/s for noop and from 38MB/s to 50MB/s for cfq. Signed-off-by: Dave Chinner dchin...@redhat.com --- fs/gfs2/lops.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index 5708edf..7278cf0 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -532,9 +532,9 @@ static void databuf_lo_add(struct gfs2_sbd *sdp, struct gfs2_log_element *le) gfs2_pin(sdp, bd-bd_bh); tr-tr_num_databuf_new++; sdp-sd_log_num_databuf++; - list_add(le-le_list, sdp-sd_log_le_databuf); + list_add_tail(le-le_list, sdp-sd_log_le_databuf); } else { - list_add(le-le_list, sdp-sd_log_le_ordered); + list_add_tail(le-le_list, sdp-sd_log_le_ordered); } out: gfs2_log_unlock(sdp);
Re: [Cluster-devel] [PATCH 3/4] gfs2: ordered buffer writes are not sync
Hi, On Fri, 2010-02-05 at 16:45 +1100, Dave Chinner wrote: Currently gfs2 ordered buffer writes use WRITE_SYNC_PLUG as the IO type being dispatched. They aren't sync writes; we issue all the IO pending, then wait for it all. IOWs, this is async IO with a bulk wait on the end. We should use normal WRITE tagging for this, and before we start waiting make sure that all the Io is issued by unplugging the device. The use of normal WRITEs for these buffers should significantly reduce the overhead of processing in the cfq elevator and enable the disk subsystem to get much closer to disk bandwidth for large sequential writes. Signed-off-by: Dave Chinner dchin...@redhat.com That sounds reasonable. With respect to the new trace point, I'd raise the same question as per the initial patch in the series. Also I'm wondering about the calls to blk_run_backing_dev() as I'd thought that this would happen automatically when we get to wait for the I/O. Bearing in mind that your tests show no particular increase in performance for this change, I'm tempted to be a bit more cautious about applying it for now, Steve.
Re: [Cluster-devel] [PATCH 4/4] gfs2: introduce AIL lock
Hi, On Fri, 2010-02-05 at 16:45 +1100, Dave Chinner wrote: THe log lock is currently used to protect the AIL lists and the movements of buffers into and out of them. The lists are self contained and no log specific items outside the lists are accessed when starting or emptying the AIL lists. Hence the operation of the AIL does not require the protection of the log lock so split them out into a new AIL specific lock to reduce the amount of traffic on the log lock. This will also reduce the amount of serialisation that occurs when the gfs2_logd pushes on the AIL to move it forward. This reduces the impact of log pushing on sequential write throughput. On no-op scheduler on a disk that can do 85MB/s, this increases the write rate from 65MB/s with the ordering fixes to 75MB/s. Signed-off-by: Dave Chinner dchin...@redhat.com This looks good, but a couple of comments: --- fs/gfs2/glops.c | 10 -- fs/gfs2/incore.h |1 + fs/gfs2/log.c| 32 +--- fs/gfs2/log.h| 22 ++ fs/gfs2/lops.c |5 - fs/gfs2/ops_fstype.c |1 + 6 files changed, 53 insertions(+), 18 deletions(-) diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c index 78554ac..65048f9 100644 --- a/fs/gfs2/glops.c +++ b/fs/gfs2/glops.c @@ -57,20 +57,26 @@ static void gfs2_ail_empty_gl(struct gfs2_glock *gl) BUG_ON(current-journal_info); current-journal_info = tr; - gfs2_log_lock(sdp); + gfs2_ail_lock(sdp); this abstraction of a spinlock is left over from the old gfs1 code. I'd prefer when adding new locks just to use spinlock() directly, rather than abstracting it out like this. That way we don't have to think about what kind of lock it is. [snip] diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index 0fe2f3c..342d65e 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -80,7 +80,7 @@ static void gfs2_unpin(struct gfs2_sbd *sdp, struct buffer_head *bh, mark_buffer_dirty(bh); clear_buffer_pinned(bh); - gfs2_log_lock(sdp); + gfs2_ail_lock(sdp); if (bd-bd_ail) { list_del(bd-bd_ail_st_list); brelse(bh); @@ -91,6 +91,9 @@ static void gfs2_unpin(struct gfs2_sbd *sdp, struct buffer_head *bh, } bd-bd_ail = ai; list_add(bd-bd_ail_st_list, ai-ai_ail1_list); + gfs2_ail_unlock(sdp); + + gfs2_log_lock(sdp); clear_bit(GLF_LFLUSH, bd-bd_gl-gl_flags); trace_gfs2_pin(bd, 0); gfs2_log_unlock(sdp); I don't think the gfs2_log_lock() is actually required at this point. the LFLUSH bit is protected by the sd_log_flush_lock rwsem and the tracing doesn't need the log lock either, Steve.
Re: [Cluster-devel] [GFS2 PATCH] - Bug 537201 - Better error reporting when mounting a gfs fs without enough journals
Hi, Now in the GFS2 -nmw git tree. Thanks, Steve. On Fri, 2010-02-05 at 18:25 -0500, Abhijith Das wrote: Please ignore the previous patch. The patch inlining didn't work right. Here's the unmangled one. --Abhi - Abhijith Das a...@redhat.com wrote: From: Abhijith Das a...@redhat.com To: cluster-devel cluster-devel@redhat.com Sent: Friday, February 5, 2010 5:17:56 PM GMT -06:00 US/Canada Central Subject: [Cluster-devel] [GFS2 PATCH] - Bug 537201 - Better error reporting when mounting a gfs fs without enough journals Hi, We need this one-liner to signal the mount helper of the 'insufficient journals' condition. Signed-off-by: Abhijith Das a...@redhat.com diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index d405c38..a054b52 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -724,7 +724,7 @@ static int init_journal(struct gfs2_sbd *sdp, int undo) goto fail; } - error = -EINVAL; + error = -EUSERS; if (!gfs2_jindex_size(sdp)) { fs_err(sdp, no journals!\n); goto fail_jindex;
[Cluster-devel] GFS2: Fix bmap allocation corner-case bug
This patch solves a corner case during allocation which occurs if both metadata (indirect) and data blocks are required but there is an obstacle in the filesystem (e.g. a resource group header or another allocated block) such that when the allocation is requested only enough blocks for the metadata are returned. By changing the exit condition of this loop, we ensure that a minimum of one data block will always be returned. Signed-off-by: Steven Whitehouse swhit...@redhat.com diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index 6d47379..583e823 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -541,7 +541,7 @@ static int gfs2_bmap_alloc(struct inode *inode, const sector_t lblock, *ptr++ = cpu_to_be64(bn++); break; } - } while (state != ALLOC_DATA); + } while ((state != ALLOC_DATA) || !dblock); ip-i_height = height; gfs2_add_inode_blocks(ip-i_inode, alloced);
[Cluster-devel] [PATCH 2/2] GFS2: Fix bmap allocation corner-case bug
This patch solves a corner case during allocation which occurs if both metadata (indirect) and data blocks are required but there is an obstacle in the filesystem (e.g. a resource group header or another allocated block) such that when the allocation is requested only enough blocks for the metadata are returned. By changing the exit condition of this loop, we ensure that a minimum of one data block will always be returned. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/bmap.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index 6d47379..583e823 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -541,7 +541,7 @@ static int gfs2_bmap_alloc(struct inode *inode, const sector_t lblock, *ptr++ = cpu_to_be64(bn++); break; } - } while (state != ALLOC_DATA); + } while ((state != ALLOC_DATA) || !dblock); ip-i_height = height; gfs2_add_inode_blocks(ip-i_inode, alloced); -- 1.6.2.5
[Cluster-devel] [GFS2] Pre-pull patch posting (fixes)
Hi, Here are a couple of GFS2 fixes. Both are one-liners, Steve.
[Cluster-devel] [PATCH 1/2] GFS2: Fix error code
From: Abhijith Das a...@redhat.com We need this one-liner to signal the mount helper of the 'insufficient journals' condition. Signed-off-by: Abhijith Das a...@redhat.com Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/ops_fstype.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 8a102f7..a86ed63 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -725,7 +725,7 @@ static int init_journal(struct gfs2_sbd *sdp, int undo) goto fail; } - error = -EINVAL; + error = -EUSERS; if (!gfs2_jindex_size(sdp)) { fs_err(sdp, no journals!\n); goto fail_jindex; -- 1.6.2.5
[Cluster-devel] GFS2: Pull request (fixes)
Hi, Please consider pulling the following changes, Steve. - The following changes since commit 676ad585531e965416fd958747894541dabcec96: Linus Torvalds (1): Merge branch 'for-linus' of git://git.kernel.org/.../bp/bp are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes.git master Abhijith Das (1): GFS2: Fix error code Steven Whitehouse (1): GFS2: Fix bmap allocation corner-case bug fs/gfs2/bmap.c |2 +- fs/gfs2/ops_fstype.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
[Cluster-devel] GFS2: -nmw git tree
Hi, Linus has pulled a couple of fixes, so I've rebased the -nmw git tree again, Steve.
[Cluster-devel] [dlm] Two small sysfs patches
Hi, Please queue the following two patches for the next merge window for dlm. The first one adds a new sysfs variable so that the lockspace can be obtained without resorting to parsing the initial line of the sysfs message. The second one removes some obsolete code relating to one of the sysfs files, Steve.
[Cluster-devel] [PATCH 1/2] dlm: Send lockspace name with uevents
Although it is possible to get this information from the path, its much easier to provide the lockspace as a seperate env variable. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/dlm/lockspace.c | 14 +- 1 files changed, 13 insertions(+), 1 deletions(-) diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c index c010ecf..26a8bd4 100644 --- a/fs/dlm/lockspace.c +++ b/fs/dlm/lockspace.c @@ -191,6 +191,18 @@ static int do_uevent(struct dlm_ls *ls, int in) return error; } +static int dlm_uevent(struct kset *kset, struct kobject *kobj, + struct kobj_uevent_env *env) +{ + struct dlm_ls *ls = container_of(kobj, struct dlm_ls, ls_kobj); + + add_uevent_var(env, LOCKSPACE=%s, ls-ls_name); + return 0; +} + +static struct kset_uevent_ops dlm_uevent_ops = { + .uevent = dlm_uevent, +}; int __init dlm_lockspace_init(void) { @@ -199,7 +211,7 @@ int __init dlm_lockspace_init(void) INIT_LIST_HEAD(lslist); spin_lock_init(lslist_lock); - dlm_kset = kset_create_and_add(dlm, NULL, kernel_kobj); + dlm_kset = kset_create_and_add(dlm, dlm_uevent_ops, kernel_kobj); if (!dlm_kset) { printk(KERN_WARNING %s: can not create kset\n, __func__); return -ENOMEM; -- 1.6.2.5
[Cluster-devel] [PATCH 2/2] dlm: Remove obsolete lockspace lookup
We don't need to look up the lockspace in this particular case since we already have a pointer to it (which was being dereferenced in order to do the lookup in the first place). Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/dlm/lockspace.c |6 +- 1 files changed, 1 insertions(+), 5 deletions(-) diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c index 26a8bd4..ce0fdf5 100644 --- a/fs/dlm/lockspace.c +++ b/fs/dlm/lockspace.c @@ -37,10 +37,6 @@ static ssize_t dlm_control_store(struct dlm_ls *ls, const char *buf, size_t len) ssize_t ret = len; int n = simple_strtol(buf, NULL, 0); - ls = dlm_find_lockspace_local(ls-ls_local_handle); - if (!ls) - return -EINVAL; - switch (n) { case 0: dlm_ls_stop(ls); @@ -51,7 +47,7 @@ static ssize_t dlm_control_store(struct dlm_ls *ls, const char *buf, size_t len) default: ret = -EINVAL; } - dlm_put_lockspace(ls); + return ret; } -- 1.6.2.5
[Cluster-devel] dlm: Remove/bypass astd
While investigating Red Hat bug #537010 I started looking at the dlm's astd thread. The way in which the cast and bast requests are queued looked as if it might cause reordering since the bast requests are always delivered after any pending cast requests which is not always the correct ordering. This patch doesn't fix that bug, but it will prevent any races in that bit of code, and the performance benefits are also well worth having. I noticed that astd seems to be extraneous to requirements. The notifications to astd are already running in process context, so they could be delivered directly. That should improve smp performance since all the notifications would no longer be funneled through a single thread. Also, the only other function of astd seemed to be stopping the delivery of these notifications during recovery. Since, however, the notifications which are intercepted at recovery time are neither modified, nor filtered in any way, the only effect is to delay notifications for no obvious reason. I thought that probably removing the astd thread and delivering the cast and bast notifications directly would improve performance due to the elimination of a scheduling delay. I wrote a small test module which creates a dlm lock space, and does 100,000 NL - EX - NL lock conversions. Having run this test 10 times each on a 2.6.33-rc8 kernel and then the modified kernel including this patch, I got the following results: Original: Avg time 24.62 us per conversion (NL - EX - NL) Modified: Avg time 9.93 us per conversion Which is a fairly dramatic speed up. Please consider applying this patch. I've tested it in both clustered and single node GFS2 configurations. The test figures are from a single node configuration which was a deliberate choice in order to avoid any effects of network latency. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/dlm/Makefile|3 +- fs/dlm/ast.c | 165 fs/dlm/ast.h | 26 fs/dlm/lock.c | 16 - fs/dlm/lockspace.c | 17 +- fs/dlm/recover.c |1 - fs/dlm/recoverd.c | 11 7 files changed, 15 insertions(+), 224 deletions(-) delete mode 100644 fs/dlm/ast.c delete mode 100644 fs/dlm/ast.h diff --git a/fs/dlm/Makefile b/fs/dlm/Makefile index ca1c912..8f9f4d2 100644 --- a/fs/dlm/Makefile +++ b/fs/dlm/Makefile @@ -1,6 +1,5 @@ obj-$(CONFIG_DLM) += dlm.o -dlm-y := ast.o \ - config.o \ +dlm-y := config.o \ dir.o \ lock.o \ lockspace.o \ diff --git a/fs/dlm/ast.c b/fs/dlm/ast.c deleted file mode 100644 index dc2ad60..000 --- a/fs/dlm/ast.c +++ /dev/null @@ -1,165 +0,0 @@ -/** -*** -** -** Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. -** Copyright (C) 2004-2008 Red Hat, Inc. All rights reserved. -** -** This copyrighted material is made available to anyone wishing to use, -** modify, copy, or redistribute it subject to the terms and conditions -** of the GNU General Public License v.2. -** -*** -**/ - -#include dlm_internal.h -#include lock.h -#include user.h -#include ast.h - -#define WAKE_ASTS 0 - -static struct list_headast_queue; -static spinlock_t ast_queue_lock; -static struct task_struct *astd_task; -static unsigned long astd_wakeflags; -static struct mutexastd_running; - - -void dlm_del_ast(struct dlm_lkb *lkb) -{ - spin_lock(ast_queue_lock); - if (lkb-lkb_ast_type (AST_COMP | AST_BAST)) - list_del(lkb-lkb_astqueue); - spin_unlock(ast_queue_lock); -} - -void dlm_add_ast(struct dlm_lkb *lkb, int type, int bastmode) -{ - if (lkb-lkb_flags DLM_IFL_USER) { - dlm_user_add_ast(lkb, type, bastmode); - return; - } - - spin_lock(ast_queue_lock); - if (!(lkb-lkb_ast_type (AST_COMP | AST_BAST))) { - kref_get(lkb-lkb_ref); - list_add_tail(lkb-lkb_astqueue, ast_queue); - } - lkb-lkb_ast_type |= type; - if (bastmode) - lkb-lkb_bastmode = bastmode; - spin_unlock(ast_queue_lock); - - set_bit(WAKE_ASTS, astd_wakeflags); - wake_up_process(astd_task); -} - -static void process_asts(void) -{ - struct dlm_ls *ls = NULL; - struct dlm_rsb *r = NULL; - struct dlm_lkb *lkb; - void (*cast) (void *astparam); - void (*bast) (void *astparam, int mode); - int type = 0, bastmode; - -repeat: - spin_lock(ast_queue_lock
Re: [Cluster-devel] dlm: Remove/bypass astd
Hi, On Wed, 2010-02-17 at 13:43 +, Christine Caulfield wrote: One of the reasons that ASTs are delivered in a separate thread was to allow ASTs do do other locking operations without causing a deadlock. eg. it would allow locks to be dropped or converted inside a blocking AST callback routine. Hmm... GFS2 doesn't require that at all, nor is it ever likely to since we have the glock layer to deal with that. I've looked at the OCFS2 code and I don't think they need it either - maybe Mark or Joel can confirm that for certain. Those are the only two users at the moment. If it were to be the case that locking operations were being done in the context of the astd thread, the performance would be pretty poor since its a single thread no matter how many locks and lock spaces are in use. The only reasonable use for such a thing would also involve having to deal with the cache control for the locked object too (which for all current cases means disk I/O and/or cache invalidation), which would then also be limited to this single thread. So maybe either the new code already allows for this or it's functionality that's not needed in the kernel. It should still be an option for userspace applications, but that's a different story altogether, of course Chrissie Yes, I've left the userspace interface code alone for now. That continues to work in the original way. My main concern is with the kernel interface at the moment, Steve. On 17/02/10 13:23, Steven Whitehouse wrote: While investigating Red Hat bug #537010 I started looking at the dlm's astd thread. The way in which the cast and bast requests are queued looked as if it might cause reordering since the bast requests are always delivered after any pending cast requests which is not always the correct ordering. This patch doesn't fix that bug, but it will prevent any races in that bit of code, and the performance benefits are also well worth having. I noticed that astd seems to be extraneous to requirements. The notifications to astd are already running in process context, so they could be delivered directly. That should improve smp performance since all the notifications would no longer be funneled through a single thread. Also, the only other function of astd seemed to be stopping the delivery of these notifications during recovery. Since, however, the notifications which are intercepted at recovery time are neither modified, nor filtered in any way, the only effect is to delay notifications for no obvious reason. I thought that probably removing the astd thread and delivering the cast and bast notifications directly would improve performance due to the elimination of a scheduling delay. I wrote a small test module which creates a dlm lock space, and does 100,000 NL - EX - NL lock conversions. Having run this test 10 times each on a 2.6.33-rc8 kernel and then the modified kernel including this patch, I got the following results: Original: Avg time 24.62 us per conversion (NL - EX - NL) Modified: Avg time 9.93 us per conversion Which is a fairly dramatic speed up. Please consider applying this patch. I've tested it in both clustered and single node GFS2 configurations. The test figures are from a single node configuration which was a deliberate choice in order to avoid any effects of network latency. Signed-off-by: Steven Whitehouseswhit...@redhat.com ---
Re: [Cluster-devel] [PATCH 2/2] dlm: Remove obsolete lockspace lookup
Hi, On Wed, 2010-02-17 at 15:12 -0500, David Teigland wrote: On Wed, Feb 17, 2010 at 09:41:35AM +, Steven Whitehouse wrote: We don't need to look up the lockspace in this particular case since we already have a pointer to it (which was being dereferenced in order to do the lookup in the first place). It'll take more to convince me that that reference from find isn't needed. My assumption is that I added it because it was. Dave I'm not sure what more I can say here this is a sysfs file store function and one of the reasons for using it is that sysfs looks after the ref counting for you. Even aside from that, if you don't have a reference to the lockspace, then the dereference that is done to discover the lockspace name would be invalid, since the structure might have already been freed before the reference is obtained. You could also compare with with the other store and show functions in that same file and notice that none of them try to grab a reference to the lockspace in that way. So if this is required, then it must be required for those functions too. Either way there is something not quite right here and having studied the code in some detail, I'm pretty sure this is the correct fix, Steve. Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/dlm/lockspace.c |6 +- 1 files changed, 1 insertions(+), 5 deletions(-) diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c index 26a8bd4..ce0fdf5 100644 --- a/fs/dlm/lockspace.c +++ b/fs/dlm/lockspace.c @@ -37,10 +37,6 @@ static ssize_t dlm_control_store(struct dlm_ls *ls, const char *buf, size_t len) ssize_t ret = len; int n = simple_strtol(buf, NULL, 0); - ls = dlm_find_lockspace_local(ls-ls_local_handle); - if (!ls) - return -EINVAL; - switch (n) { case 0: dlm_ls_stop(ls); @@ -51,7 +47,7 @@ static ssize_t dlm_control_store(struct dlm_ls *ls, const char *buf, size_t len) default: ret = -EINVAL; } - dlm_put_lockspace(ls); + return ret; } -- 1.6.2.5
Re: [Cluster-devel] [PATCH 2/2] dlm: Remove obsolete lockspace lookup
Hi, On Thu, 2010-02-18 at 16:04 -0500, David Teigland wrote: On Thu, Feb 18, 2010 at 09:16:03AM +, Steven Whitehouse wrote: I'm not sure what more I can say here this is a sysfs file store function and one of the reasons for using it is that sysfs looks after the ref counting for you. Even aside from that, if you don't have a reference to the lockspace, then the dereference that is done to discover the lockspace name would be invalid, since the structure might have already been freed before the reference is obtained. You could also compare with with the other store and show functions in that same file and notice that none of them try to grab a reference to the lockspace in that way. So if this is required, then it must be required for those functions too. Either way there is something not quite right here and having studied the code in some detail, I'm pretty sure this is the correct fix, I guess you didn't see this oops in your tests. Can you show that the situation in this commit is no longer possible? No, I didn't hit it. I'm not sure how to reproduce whatever situation led to this in the first place. There was a clue though in the patch prior to the one you pointed out in the git tree, the comment in this patch doesn't make a lot of sense until without the context from that patch. I noticed that where the sysfs function does this: + ls = dlm_find_lockspace_local(ls-ls_local_handle); + if (!ls) + return -EINVAL; + it isn't primarily a ref count operation. Yes, it does get a ref count on the object if it is successful, but the main purpose is testing to see if the shutdown process has started (i.e. is the lockspace still on the ls_list). If the list removal used a list_del_init rather than a list del, the dlm_find_lockspace_local() call could be replaced with: spin_lock(lslist_lock); ret = list_empty(ls-ls_list); if (!ret) ls-ls_count++; spin_unlock(lslist_lock); if (ret) return -EINVAL; which might be a bit less confusing, and also saves traversing the list of lockspaces. This is basically a hold operation, rather than a find/get type operation. My confusion has arisen from the fact that there are three ref counters for the lockspace object. One is ls_count, one is ls_create_count and the other the is kobject ref count. ls_create_count seems to deal with user references, ls_count seems to be used for internal references and the kobject ref count only seems to be incremented/decremented on initial object creation/removal. Probably the correct long term solution is to at least merge the ls_count into kobject ref count system, and maybe the ls_create_count too. I'll have to do some more investigation before I can see whether there are any reasons why that isn't possible. Either way, we are getting away from what was originally a small and simple patch, so I'll suggest to ignore this one for now, and just apply the first one of the two which I sent. I'll have another look at this in the mean time, Steve.
[Cluster-devel] [PATCH 4/5] GFS2: ordered writes are backwards
From: Dave Chinner dchin...@redhat.com When we queue data buffers for ordered write, the buffers are added to the head of the ordered write list. When the log needs to push these buffers to disk, it also walks the list from the head. The result is that the the ordered buffers are submitted to disk in reverse order. For large writes, this means that whenever the log flushes large streams of reverse sequential order buffers are pushed down into the block layers. The elevators don't handle this particularly well, so IO rates tend to be significantly lower than if the IO was issued in ascending block order. Queue new ordered buffers to the tail of the ordered buffer list to ensure that IO is dispatched in the order it was submitted. This should significantly improve large sequential write speeds. On a disk capable of 85MB/s, speeds increase from 50MB/s to 65MB/s for noop and from 38MB/s to 50MB/s for cfq. Signed-off-by: Dave Chinner dchin...@redhat.com Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/lops.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index de97632..adc260f 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -528,9 +528,9 @@ static void databuf_lo_add(struct gfs2_sbd *sdp, struct gfs2_log_element *le) gfs2_pin(sdp, bd-bd_bh); tr-tr_num_databuf_new++; sdp-sd_log_num_databuf++; - list_add(le-le_list, sdp-sd_log_le_databuf); + list_add_tail(le-le_list, sdp-sd_log_le_databuf); } else { - list_add(le-le_list, sdp-sd_log_le_ordered); + list_add_tail(le-le_list, sdp-sd_log_le_ordered); } out: gfs2_log_unlock(sdp); -- 1.6.2.5
[Cluster-devel] [PATCH 5/5] GFS2: print glock numbers in hex
From: Bob Peterson rpete...@redhat.com This patch changes glock numbers from printing in decimal to hex. Since DLM prints corresponding resource IDs in hex, it makes debugging easier. Signed-off-by: Bob Peterson rpete...@redhat.com Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/glock.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 4773f90..454d4b4 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -1658,7 +1658,7 @@ static int __dump_glock(struct seq_file *seq, const struct gfs2_glock *gl) dtime *= 100/HZ; /* demote time in uSec */ if (!test_bit(GLF_DEMOTE, gl-gl_flags)) dtime = 0; - gfs2_print_dbg(seq, G: s:%s n:%u/%llu f:%s t:%s d:%s/%llu a:%d r:%d\n, + gfs2_print_dbg(seq, G: s:%s n:%u/%llx f:%s t:%s d:%s/%llu a:%d r:%d\n, state2str(gl-gl_state), gl-gl_name.ln_type, (unsigned long long)gl-gl_name.ln_number, -- 1.6.2.5
[Cluster-devel] [PATCH 2/5] GFS2: Remove loopy umount code
As a consequence of the previous patch, we can now remove the loop which used to be required due to the circular dependency between the inodes and glocks. Instead we can just invalidate the inodes, and then clear up any glocks which are left. Also we no longer need the rwsem since there is no longer any danger of the inode invalidation calling back into the glock code (and from there back into the inode code). Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/glock.c | 33 ++--- fs/gfs2/incore.h |1 - fs/gfs2/ops_fstype.c |4 +--- fs/gfs2/super.c |1 + fs/gfs2/sys.c|2 -- 5 files changed, 4 insertions(+), 37 deletions(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index dfb10a4..4773f90 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -19,7 +19,6 @@ #include linux/list.h #include linux/wait.h #include linux/module.h -#include linux/rwsem.h #include asm/uaccess.h #include linux/seq_file.h #include linux/debugfs.h @@ -60,7 +59,6 @@ static int __dump_glock(struct seq_file *seq, const struct gfs2_glock *gl); #define GLOCK_BUG_ON(gl,x) do { if (unlikely(x)) { __dump_glock(NULL, gl); BUG(); } } while(0) static void do_xmote(struct gfs2_glock *gl, struct gfs2_holder *gh, unsigned int target); -static DECLARE_RWSEM(gfs2_umount_flush_sem); static struct dentry *gfs2_root; static struct workqueue_struct *glock_workqueue; struct workqueue_struct *gfs2_delete_workqueue; @@ -714,7 +712,6 @@ static void glock_work_func(struct work_struct *work) finish_xmote(gl, gl-gl_reply); drop_ref = 1; } - down_read(gfs2_umount_flush_sem); spin_lock(gl-gl_spin); if (test_and_clear_bit(GLF_PENDING_DEMOTE, gl-gl_flags) gl-gl_state != LM_ST_UNLOCKED @@ -727,7 +724,6 @@ static void glock_work_func(struct work_struct *work) } run_queue(gl, 0); spin_unlock(gl-gl_spin); - up_read(gfs2_umount_flush_sem); if (!delay || queue_delayed_work(glock_workqueue, gl-gl_work, delay) == 0) gfs2_glock_put(gl); @@ -1512,35 +1508,10 @@ void gfs2_glock_thaw(struct gfs2_sbd *sdp) void gfs2_gl_hash_clear(struct gfs2_sbd *sdp) { - unsigned long t; unsigned int x; - int cont; - t = jiffies; - - for (;;) { - cont = 0; - for (x = 0; x GFS2_GL_HASH_SIZE; x++) { - if (examine_bucket(clear_glock, sdp, x)) - cont = 1; - } - - if (!cont) - break; - - if (time_after_eq(jiffies, - t + gfs2_tune_get(sdp, gt_stall_secs) * HZ)) { - fs_warn(sdp, Unmount seems to be stalled. -Dumping lock state...\n); - gfs2_dump_lockstate(sdp); - t = jiffies; - } - - down_write(gfs2_umount_flush_sem); - invalidate_inodes(sdp-sd_vfs); - up_write(gfs2_umount_flush_sem); - msleep(10); - } + for (x = 0; x GFS2_GL_HASH_SIZE; x++) + examine_bucket(clear_glock, sdp, x); flush_workqueue(glock_workqueue); wait_event(sdp-sd_glock_wait, atomic_read(sdp-sd_glock_disposal) == 0); gfs2_dump_lockstate(sdp); diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index 1de7e1b..b8025e5 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -451,7 +451,6 @@ struct gfs2_tune { unsigned int gt_quota_quantum; /* Secs between syncs to quota file */ unsigned int gt_new_files_jdata; unsigned int gt_max_readahead; /* Max bytes to read-ahead from disk */ - unsigned int gt_stall_secs; /* Detects trouble! */ unsigned int gt_complain_secs; unsigned int gt_statfs_quantum; unsigned int gt_statfs_slow; diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index a86ed63..a054b52 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -65,7 +65,6 @@ static void gfs2_tune_init(struct gfs2_tune *gt) gt-gt_quota_scale_den = 1; gt-gt_new_files_jdata = 0; gt-gt_max_readahead = 1 18; - gt-gt_stall_secs = 600; gt-gt_complain_secs = 10; } @@ -1241,10 +1240,9 @@ fail_sb: fail_locking: init_locking(sdp, mount_gh, UNDO); fail_lm: + invalidate_inodes(sb); gfs2_gl_hash_clear(sdp); gfs2_lm_unmount(sdp); - while (invalidate_inodes(sb)) - yield(); fail_sys: gfs2_sys_fs_del(sdp); fail: diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index ad7bc2d..e5e2262 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -859,6 +859,7 @@ restart: gfs2_clear_rgrpd(sdp); gfs2_jindex_free(sdp); /* Take apart glock structures and buffer lists */ + invalidate_inodes(sdp-sd_vfs
[Cluster-devel] GFS2: Pull request
Hi, Please consider pulling the following GFS2 changes, Steve. -- The following changes since commit 30ff056c42c665b9ea535d8515890857ae382540: Linus Torvalds (1): Merge branch 'x86-uv-for-linus' of git://git.kernel.org/.../tip/linux-2.6-tip are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw.git master Abhijith Das (1): GFS2: Remove old, unused linked list code from quota Bob Peterson (1): GFS2: print glock numbers in hex Dave Chinner (1): GFS2: ordered writes are backwards Steven Whitehouse (2): GFS2: Metadata address space clean up GFS2: Remove loopy umount code fs/gfs2/aops.c |4 +- fs/gfs2/glock.c | 75 ++- fs/gfs2/glock.h |7 fs/gfs2/glops.c | 16 + fs/gfs2/incore.h|5 +-- fs/gfs2/inode.c |6 +-- fs/gfs2/lock_dlm.c |5 ++- fs/gfs2/lops.c |4 +- fs/gfs2/main.c | 28 fs/gfs2/meta_io.c | 46 +++--- fs/gfs2/meta_io.h | 12 ++- fs/gfs2/ops_fstype.c|4 +-- fs/gfs2/super.c | 27 +-- fs/gfs2/sys.c |2 - fs/gfs2/util.c |1 + fs/gfs2/util.h |1 + include/linux/gfs2_ondisk.h | 30 + 17 files changed, 109 insertions(+), 164 deletions(-)
Re: [Cluster-devel] [PATCH] gfs2: do not select QUOTA
Hi, Looks good. Since I'm waiting for Linus to pull at the moment, I'll wait for that to happen and send this in the next batch of patches, Steve. On Wed, 2010-03-03 at 08:53 -0500, Christoph Hellwig wrote: gfs2 only needs the quotactl code, not the generic quota implementation. Signed-off-by: Christoph Hellwig h...@lst.de Index: linux-2.6/fs/gfs2/Kconfig === --- linux-2.6.orig/fs/gfs2/Kconfig2010-03-03 14:48:00.292026869 +0100 +++ linux-2.6/fs/gfs2/Kconfig 2010-03-03 14:48:03.546284090 +0100 @@ -8,7 +8,6 @@ config GFS2_FS select FS_POSIX_ACL select CRC32 select SLOW_WORK - select QUOTA select QUOTACTL help A cluster filesystem.
[Cluster-devel] GFS2 -nmw git tree
Hi, Now that 2.6.34-rc1 is out, I've redone the -nmw git tree. There is only one small patch in it at the moment. No doubt there will be more in the not too distant future, Steve.
[Cluster-devel] GFS2: Pre-pull patch posting
Here are three small (but important!) fixes to GFS2. Steve.
[Cluster-devel] [PATCH 2/3] GFS2: Allow the number of committed revokes to temporarily be negative
From: Benjamin Marzinski bmarz...@redhat.com GFS2 tracks the number of revokes and unrevokes that are part of committed transactions via sd_log_commited_revoke. It is possible for one process to add revokes during its transaction, while another process unrevokes them during its transaction. If the second process finishes its transaction first, sd_log_commited_revoke will be decremented by the number of unrevokes that the second process did, without first being incremented by the number of revokes the first process did. This is fine, since all started transactions must be completed before the journal can be flushed. However, sd_log_commited_revoke is an unsigned integer, and log_refund() causes an assertion failure if it would go negative at the end of a transaction. This patch makes sd_log_commited_revoke a signed integer and allows it to go negative. __gfs2_log_flush() still checks that it mataches the actual number of revokes. Signed-off-by: Benjamin Marzinski bmarz...@redhat.com Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/incore.h |2 +- fs/gfs2/log.c|3 +-- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index b8025e5..3aac46f 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -616,7 +616,7 @@ struct gfs2_sbd { unsigned int sd_log_blks_reserved; unsigned int sd_log_commited_buf; unsigned int sd_log_commited_databuf; - unsigned int sd_log_commited_revoke; + int sd_log_commited_revoke; unsigned int sd_log_num_buf; unsigned int sd_log_num_revoke; diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index 4511b08..e5bf4b5 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -417,7 +417,7 @@ static unsigned int calc_reserved(struct gfs2_sbd *sdp) databufhdrs_needed = (sdp-sd_log_commited_databuf + (dbuf_limit - 1)) / dbuf_limit; - if (sdp-sd_log_commited_revoke) + if (sdp-sd_log_commited_revoke 0) revokes = gfs2_struct2blk(sdp, sdp-sd_log_commited_revoke, sizeof(u64)); @@ -790,7 +790,6 @@ static void log_refund(struct gfs2_sbd *sdp, struct gfs2_trans *tr) gfs2_assert_withdraw(sdp, (((int)sdp-sd_log_commited_buf) = 0) || (((int)sdp-sd_log_commited_databuf) = 0)); sdp-sd_log_commited_revoke += tr-tr_num_revoke - tr-tr_num_revoke_rm; - gfs2_assert_withdraw(sdp, ((int)sdp-sd_log_commited_revoke) = 0); reserved = calc_reserved(sdp); gfs2_assert_withdraw(sdp, sdp-sd_log_blks_reserved + tr-tr_reserved = reserved); unused = sdp-sd_log_blks_reserved - reserved + tr-tr_reserved; -- 1.6.2.5
[Cluster-devel] [PATCH 1/3] GFS2: do not select QUOTA
From: Christoph Hellwig h...@infradead.org gfs2 only needs the quotactl code, not the generic quota implementation. Signed-off-by: Christoph Hellwig h...@lst.de Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/Kconfig |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig index 4dcddf8..a47b431 100644 --- a/fs/gfs2/Kconfig +++ b/fs/gfs2/Kconfig @@ -8,7 +8,6 @@ config GFS2_FS select FS_POSIX_ACL select CRC32 select SLOW_WORK - select QUOTA select QUOTACTL help A cluster filesystem. -- 1.6.2.5
[Cluster-devel] [PATCH 3/3] GFS2: Skip check for mandatory locks when unlocking
From: Sachin Prabhu spra...@redhat.com gfs2_lock() will skip locks on file which have mode set to 02666. This is a problem in cases where the mode of the file is changed after a process has obtained a lock on the file. Such a lock will be skipped and will result in a BUG in locks_remove_flock(). gfs2_lock() should skip the check for mandatory locks when unlocking a file. Signed-off-by: Sachin Prabhu spra...@redhat.com Signed-off-by: Steven Whitehouse swhit...@redhat.com --- fs/gfs2/file.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index a6abbae..e6dd2ae 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -640,7 +640,7 @@ static int gfs2_lock(struct file *file, int cmd, struct file_lock *fl) if (!(fl-fl_flags FL_POSIX)) return -ENOLCK; - if (__mandatory_lock(ip-i_inode)) + if (__mandatory_lock(ip-i_inode) fl-fl_type != F_UNLCK) return -ENOLCK; if (cmd == F_CANCELLK) { -- 1.6.2.5
[Cluster-devel] GFS2: Pull request (fixes)
Hi, Please consider pulling the following small fixes, Steve. -- The following changes since commit 57d54889cd00db2752994b389ba714138652e60c: Linus Torvalds (1): Linux 2.6.34-rc1 are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes.git master Benjamin Marzinski (1): GFS2: Allow the number of committed revokes to temporarily be negative Christoph Hellwig (1): GFS2: do not select QUOTA Sachin Prabhu (1): GFS2: Skip check for mandatory locks when unlocking fs/gfs2/Kconfig |1 - fs/gfs2/file.c |2 +- fs/gfs2/incore.h |2 +- fs/gfs2/log.c|3 +-- 4 files changed, 3 insertions(+), 5 deletions(-)
[Cluster-devel] GFS2: New truncate sequence
I've been working on this on and off in spare moments. This has now passed a few basic tests, so I think its time to dust it down and post it more widely. There are in fact two parts to this, the second part is to remove the i_disksize variable since after this, initial patch, it will always be identical to the inode's i_size. Thats a simple exercise, for a follow up patch. This is a nice clean up of the truncate code, reducing the code size by approx 50 lines of code. This patch also ensures that we correctly truncate files which have been extended but not written to (e.g. if the copy from userspace results in a segfault). Signed-off-by: Steven Whitehouse swhit...@redhat.com Cc: Nick Piggin npig...@suse.de diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c index 0c1d0b8..371bea5 100644 --- a/fs/gfs2/aops.c +++ b/fs/gfs2/aops.c @@ -698,8 +698,11 @@ out: return 0; page_cache_release(page); + gfs2_trans_end(sdp); if (pos + len ip-i_inode.i_size) - vmtruncate(ip-i_inode, ip-i_inode.i_size); + gfs2_trim_blocks(ip-i_inode); + goto out_trans_fail; + out_endtrans: gfs2_trans_end(sdp); out_trans_fail: diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index 583e823..56aed67 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -884,83 +884,14 @@ out: } /** - * do_grow - Make a file look bigger than it is - * @ip: the inode - * @size: the size to set the file to - * - * Called with an exclusive lock on @ip. - * - * Returns: errno - */ - -static int do_grow(struct gfs2_inode *ip, u64 size) -{ - struct gfs2_sbd *sdp = GFS2_SB(ip-i_inode); - struct gfs2_alloc *al; - struct buffer_head *dibh; - int error; - - al = gfs2_alloc_get(ip); - if (!al) - return -ENOMEM; - - error = gfs2_quota_lock_check(ip); - if (error) - goto out; - - al-al_requested = sdp-sd_max_height + RES_DATA; - - error = gfs2_inplace_reserve(ip); - if (error) - goto out_gunlock_q; - - error = gfs2_trans_begin(sdp, - sdp-sd_max_height + al-al_rgd-rd_length + - RES_JDATA + RES_DINODE + RES_STATFS + RES_QUOTA, 0); - if (error) - goto out_ipres; - - error = gfs2_meta_inode_buffer(ip, dibh); - if (error) - goto out_end_trans; - - if (size sdp-sd_sb.sb_bsize - sizeof(struct gfs2_dinode)) { - if (gfs2_is_stuffed(ip)) { - error = gfs2_unstuff_dinode(ip, NULL); - if (error) - goto out_brelse; - } - } - - ip-i_disksize = size; - ip-i_inode.i_mtime = ip-i_inode.i_ctime = CURRENT_TIME; - gfs2_trans_add_bh(ip-i_gl, dibh, 1); - gfs2_dinode_out(ip, dibh-b_data); - -out_brelse: - brelse(dibh); -out_end_trans: - gfs2_trans_end(sdp); -out_ipres: - gfs2_inplace_release(ip); -out_gunlock_q: - gfs2_quota_unlock(ip); -out: - gfs2_alloc_put(ip); - return error; -} - - -/** * gfs2_block_truncate_page - Deal with zeroing out data for truncate * * This is partly borrowed from ext3. */ -static int gfs2_block_truncate_page(struct address_space *mapping) +static int gfs2_block_truncate_page(struct address_space *mapping, loff_t from) { struct inode *inode = mapping-host; struct gfs2_inode *ip = GFS2_I(inode); - loff_t from = inode-i_size; unsigned long index = from PAGE_CACHE_SHIFT; unsigned offset = from (PAGE_CACHE_SIZE-1); unsigned blocksize, iblock, length, pos; @@ -1022,9 +953,11 @@ unlock: return err; } -static int trunc_start(struct gfs2_inode *ip, u64 size) +static int trunc_start(struct inode *inode, u64 oldsize, u64 newsize) { - struct gfs2_sbd *sdp = GFS2_SB(ip-i_inode); + struct gfs2_inode *ip = GFS2_I(inode); + struct gfs2_sbd *sdp = GFS2_SB(inode); + struct address_space *mapping = inode-i_mapping; struct buffer_head *dibh; int journaled = gfs2_is_jdata(ip); int error; @@ -1038,29 +971,27 @@ static int trunc_start(struct gfs2_inode *ip, u64 size) if (error) goto out; - if (gfs2_is_stuffed(ip)) { - ip-i_disksize = size; - ip-i_inode.i_mtime = ip-i_inode.i_ctime = CURRENT_TIME; - gfs2_trans_add_bh(ip-i_gl, dibh, 1); - gfs2_dinode_out(ip, dibh-b_data); - gfs2_buffer_clear_tail(dibh, sizeof(struct gfs2_dinode) + size); - error = 1; + gfs2_trans_add_bh(ip-i_gl, dibh, 1); + if (gfs2_is_stuffed(ip)) { + gfs2_buffer_clear_tail(dibh, sizeof(struct gfs2_dinode) + newsize); } else { - if (size (u64)(sdp-sd_sb.sb_bsize - 1)) - error = gfs2_block_truncate_page(ip-i_inode.i_mapping); - - if (!error
[Cluster-devel] GFS2 nmw git treee
Hi, I've rebased this now that I have a couple of new patches since the last pull. I'm still working on the new truncate patches and I'll post an updated set of those patches once I've tracked down a couple of issues I've found in testing, Steve.