Re: [Cluster-devel] Re: gfs2-utils: master - gfs_controld: Remove three unused functions

2009-10-16 Thread Steven Whitehouse
Hi,

On Wed, 2009-10-14 at 12:53 -0500, David Teigland wrote:
 On Wed, Oct 14, 2009 at 02:55:04PM +, Steven Whitehouse wrote:
  gfs_controld: Remove three unused functions
  
  These functions are not called from anywhere and appear
  to be left over from earlier times.
 
 They were just added, but in translating the dlm_controld patch to
 gfs_controld I missed the bits that called them (both in
 cluster.git/STABLE3 and gfs2-utils.git)  I'll reapply this bit with the
 bits that are missing.
 
 Dave
 

I'm not sure I understand the purpose of this code. Is there more to
come yet?

The function find_mg_id() still seems to be unused. So far as I can
figure out the purpose of the new code seems to be to maintain two
timestamps: cluster_add_time whose sole purpose seems to be to check
against cg-create_time but I'm not quite sure why, and
cluster_remove_time which seems to not do anything at all at the moment.

I can't get any clues from dlm_controld because cluster_remove_time
seems to be unused there as well,

Steve.




Re: [Cluster-devel] Re: gfs2-utils: master - gfs_controld: Remove three unused functions

2009-10-16 Thread Steven Whitehouse
Hi,

On Fri, 2009-10-16 at 10:59 -0500, David Teigland wrote:
 On Fri, Oct 16, 2009 at 03:56:05PM +0100, Steven Whitehouse wrote:
  Hi,
  
  On Wed, 2009-10-14 at 12:53 -0500, David Teigland wrote:
   On Wed, Oct 14, 2009 at 02:55:04PM +, Steven Whitehouse wrote:
gfs_controld: Remove three unused functions

These functions are not called from anywhere and appear
to be left over from earlier times.
   
   They were just added, but in translating the dlm_controld patch to
   gfs_controld I missed the bits that called them (both in
   cluster.git/STABLE3 and gfs2-utils.git)  I'll reapply this bit with the
   bits that are missing.
   
   Dave
   
  
  I'm not sure I understand the purpose of this code. Is there more to
  come yet?
  
  The function find_mg_id() still seems to be unused. So far as I can
  figure out the purpose of the new code seems to be to maintain two
  timestamps: cluster_add_time whose sole purpose seems to be to check
  against cg-create_time but I'm not quite sure why, and
  cluster_remove_time which seems to not do anything at all at the moment.
  
  I can't get any clues from dlm_controld because cluster_remove_time
  seems to be unused there as well,
 
 Right, cluster_add_time is used, but cluster_remove_time isn't, although
 it can be very useful to know for debugging.
 
 Dave
 
Yes, but used for what exactly? What is the purpose of this bit of code?

Steve.




Re: [Cluster-devel] Re: gfs2-utils: master - gfs_controld: Remove three unused functions

2009-10-19 Thread Steven Whitehouse
Hi,

On Fri, 2009-10-16 at 11:33 -0500, David Teigland wrote:
 On Fri, Oct 16, 2009 at 05:01:18PM +0100, Steven Whitehouse wrote:
  Hi,
  
  On Fri, 2009-10-16 at 10:59 -0500, David Teigland wrote:
   On Fri, Oct 16, 2009 at 03:56:05PM +0100, Steven Whitehouse wrote:
Hi,

On Wed, 2009-10-14 at 12:53 -0500, David Teigland wrote:
 On Wed, Oct 14, 2009 at 02:55:04PM +, Steven Whitehouse wrote:
  gfs_controld: Remove three unused functions
  
  These functions are not called from anywhere and appear
  to be left over from earlier times.
 
 They were just added, but in translating the dlm_controld patch to
 gfs_controld I missed the bits that called them (both in
 cluster.git/STABLE3 and gfs2-utils.git)  I'll reapply this bit with 
 the
 bits that are missing.
 
 Dave
 

I'm not sure I understand the purpose of this code. Is there more to
come yet?

The function find_mg_id() still seems to be unused. So far as I can
figure out the purpose of the new code seems to be to maintain two
timestamps: cluster_add_time whose sole purpose seems to be to check
against cg-create_time but I'm not quite sure why, and
cluster_remove_time which seems to not do anything at all at the moment.

I can't get any clues from dlm_controld because cluster_remove_time
seems to be unused there as well,
   
   Right, cluster_add_time is used, but cluster_remove_time isn't, although
   it can be very useful to know for debugging.
   
   Dave
   
  Yes, but used for what exactly? What is the purpose of this bit of code?
 
 This bit about cluster_add_time?
 
 +   /* a node's start can't match a change if the node joined the cluster
 +  more recently than the change was created */
 +
 +   node = get_node_history(mg, hd-nodeid);
 +   if (!node) {
 +   log_group(mg, match_change %d:%u skip cg %u no node history,
 + hd-nodeid, seq, cg-seq);
 +   return 0;
 +   }
 +
 +   if (node-cluster_add_time  cg-create_time) {
 +   log_group(mg, match_change %d:%u skip cg %u created %llu 
 + cluster add %llu, hd-nodeid, seq, cg-seq,
 + (unsigned long long)cg-create_time,
 + (unsigned long long)node-cluster_add_time);
 +   return 0;
 +   }
 
Yes, and the other bits that were recently added too. It all seems to be
be related.

 The commit gave a brief summary and pointed to this other commit for the
 long description of the problems with sorting out events after partitions
 and merges:
 
 http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=bcc5fdef8473d99399c624a7bc15423a2af645c1
 
That assuming that you trace the commit log back from gfs2-utils into
dlm and thence into cluster.git.

The question really is why we have all these (apparently) different
ideas of cluster membership. Looking at gfs_controld itself, it uses two
CPGs (one for all gfs_controlds which seems to only be used in
negotiating the protocol, of which there seems to be only one anyway,
and the other on a per mount group basis), each of which have their own
idea of which cluster members exist.

So I guess one question I have is, can we be certain the the per mount
group CPG will always have a membership which is a subset of the all
gfs_controlds CPG? Will the sequencing of delivery of membership events
by synchronised between the two CPGs wrt to other events (i.e. message
delivery) ? I guess that question might be more in Steve Dake's line, so
I've cc'd him too.

Given all that, what is the relationship which the membership events
reported in this new quorum_callback() have with the above? From my
earlier investigations, it appeared that dlm_controld was in charge of
ensuring quorum was attained before fencing took place, so I'm not quite
sure why that should affect gfs_controld directly.

The thing that I've not quite figured out yet is why we need to record
the times at all. My understanding of corosync is that it gives us a
guaranteed ordering of events, so that I'd expect to see a sequence
number rather than an actual timestamp. That is always assuming that the
timestamp isn't just being used as a monotonic sequence number, of
course.

Steve.




Re: [Cluster-devel] [PATCH] GFS2: Improve statfs and quota usability, try 3

2009-10-21 Thread Steven Whitehouse
Hi,

Now in the -nmw tree. Thanks,

Steve.

On Tue, 2009-10-20 at 02:39 -0500, Benjamin Marzinski wrote:
 GFS2 now has three new mount options, statfs_quantum, quota_quantum and
 statfs_percent.  statfs_quantum and quota_quantum simply allow you to
 set the tunables of the same name.  Setting setting statfs_quantum to 0
 will also turn on the statfs_slow tunable.  statfs_percent accepts an
 integer between 0 and 100.  Numbers between 1 and 100 will cause GFS2 to
 do any early sync when the local number of blocks free changes by at
 least statfs_percent from the totoal number of blocks free.  Setting
 statfs_percent to 0 disables this.
 
 Signed-off-by: Benjamin Marzinski bmarz...@redhat.com
 ---
  fs/gfs2/incore.h |4 ++
  fs/gfs2/ops_fstype.c |   14 --
  fs/gfs2/quota.c  |   21 +--
  fs/gfs2/quota.h  |2 +
  fs/gfs2/super.c  |   69 
 ---
  5 files changed, 100 insertions(+), 10 deletions(-)
 
 Index: gfs2-2.6-nmw/fs/gfs2/incore.h
 ===
 --- gfs2-2.6-nmw.orig/fs/gfs2/incore.h
 +++ gfs2-2.6-nmw/fs/gfs2/incore.h
 @@ -430,6 +430,9 @@ struct gfs2_args {
   unsigned int ar_discard:1;  /* discard requests */
   unsigned int ar_errors:2;   /* errors=withdraw | panic */
   int ar_commit;  /* Commit interval */
 + int ar_statfs_quantum;  /* The fast statfs interval */
 + int ar_quota_quantum;   /* The quota interval */
 + int ar_statfs_percent;  /* The % change to force sync */
  };
  
  struct gfs2_tune {
 @@ -558,6 +561,7 @@ struct gfs2_sbd {
   spinlock_t sd_statfs_spin;
   struct gfs2_statfs_change_host sd_statfs_master;
   struct gfs2_statfs_change_host sd_statfs_local;
 + int sd_statfs_force_sync;
  
   /* Resource group stuff */
  
 Index: gfs2-2.6-nmw/fs/gfs2/ops_fstype.c
 ===
 --- gfs2-2.6-nmw.orig/fs/gfs2/ops_fstype.c
 +++ gfs2-2.6-nmw/fs/gfs2/ops_fstype.c
 @@ -63,13 +63,10 @@ static void gfs2_tune_init(struct gfs2_t
   gt-gt_quota_warn_period = 10;
   gt-gt_quota_scale_num = 1;
   gt-gt_quota_scale_den = 1;
 - gt-gt_quota_quantum = 60;
   gt-gt_new_files_jdata = 0;
   gt-gt_max_readahead = 1  18;
   gt-gt_stall_secs = 600;
   gt-gt_complain_secs = 10;
 - gt-gt_statfs_quantum = 30;
 - gt-gt_statfs_slow = 0;
  }
  
  static struct gfs2_sbd *init_sbd(struct super_block *sb)
 @@ -1153,6 +1150,15 @@ static int fill_super(struct super_block
   sdp-sd_fsb2bb = 1  sdp-sd_fsb2bb_shift;
  
   sdp-sd_tune.gt_log_flush_secs = sdp-sd_args.ar_commit;
 + sdp-sd_tune.gt_quota_quantum = sdp-sd_args.ar_quota_quantum;
 + if (sdp-sd_args.ar_statfs_quantum) {
 + sdp-sd_tune.gt_statfs_slow = 0;
 + sdp-sd_tune.gt_statfs_quantum = sdp-sd_args.ar_statfs_quantum;
 + }
 + else {
 + sdp-sd_tune.gt_statfs_slow = 1;
 + sdp-sd_tune.gt_statfs_quantum = 30;
 + }
  
   error = init_names(sdp, silent);
   if (error)
 @@ -1308,6 +1314,8 @@ static int gfs2_get_sb(struct file_syste
   args.ar_quota = GFS2_QUOTA_DEFAULT;
   args.ar_data = GFS2_DATA_DEFAULT;
   args.ar_commit = 60;
 + args.ar_statfs_quantum = 30;
 + args.ar_quota_quantum = 60;
   args.ar_errors = GFS2_ERRORS_DEFAULT;
  
   error = gfs2_mount_args(args, data);
 Index: gfs2-2.6-nmw/fs/gfs2/super.c
 ===
 --- gfs2-2.6-nmw.orig/fs/gfs2/super.c
 +++ gfs2-2.6-nmw/fs/gfs2/super.c
 @@ -70,6 +70,9 @@ enum {
   Opt_commit,
   Opt_err_withdraw,
   Opt_err_panic,
 + Opt_statfs_quantum,
 + Opt_statfs_percent,
 + Opt_quota_quantum,
   Opt_error,
  };
  
 @@ -101,6 +104,9 @@ static const match_table_t tokens = {
   {Opt_commit, commit=%d},
   {Opt_err_withdraw, errors=withdraw},
   {Opt_err_panic, errors=panic},
 + {Opt_statfs_quantum, statfs_quantum=%d},
 + {Opt_statfs_percent, statfs_percent=%d},
 + {Opt_quota_quantum, quota_quantum=%d},
   {Opt_error, NULL}
  };
  
 @@ -214,6 +220,28 @@ int gfs2_mount_args(struct gfs2_args *ar
   return rv ? rv : -EINVAL;
   }
   break;
 + case Opt_statfs_quantum:
 + rv = match_int(tmp[0], args-ar_statfs_quantum);
 + if (rv || args-ar_statfs_quantum  0) {
 + printk(KERN_WARNING GFS2: statfs_quantum mount 
 option requires a non-negative numeric argument\n);
 + return rv ? rv : -EINVAL;
 + }
 + break;
 + case Opt_quota_quantum:
 + rv = match_int(tmp[0], args-ar_quota_quantum);
 +  

[Cluster-devel] Re: linux-next: Tree for October 26 (gfs2)

2009-10-26 Thread Steven Whitehouse
Hi,

On Mon, 2009-10-26 at 08:43 -0700, Randy Dunlap wrote:
 On Mon, 26 Oct 2009 17:21:04 +1100 Stephen Rothwell wrote:
 
  Hi all,
  
  Changes since 20091016:
 
 
 on i386:
 
 (.text+0x723a8b): undefined reference to `__divdi3'
 
 super.c::gfs2_statfs_change():
   percent = (100 * l_sc-sc_free) / m_sc-sc_free;
 
 
 I guess it needs to use div64() etc.
 
 ---
 ~Randy

Yes, it looks like it,

Steve.




[Cluster-devel] Re: [PATCH] GFS2: remove division from new statfs code

2009-10-27 Thread Steven Whitehouse
Hi,

Now in the -nmw git tree. Thanks,

Steve.

On Mon, 2009-10-26 at 13:29 -0500, Benjamin Marzinski wrote:
 It's not necessary to do any 64bit division for the statfs sync code, so
 remove it.
 
 Signed-off-by: Benjamin Marzinski bmarz...@redhat.com
 ---
  fs/gfs2/super.c |   17 +
  1 file changed, 9 insertions(+), 8 deletions(-)
 
 Index: gfs2-2.6-nmw/fs/gfs2/super.c
 ===
 --- gfs2-2.6-nmw.orig/fs/gfs2/super.c
 +++ gfs2-2.6-nmw/fs/gfs2/super.c
 @@ -472,7 +472,8 @@ void gfs2_statfs_change(struct gfs2_sbd 
   struct gfs2_statfs_change_host *l_sc = sdp-sd_statfs_local;
   struct gfs2_statfs_change_host *m_sc = sdp-sd_statfs_master;
   struct buffer_head *l_bh;
 - int percent, sync_percent;
 + s64 x, y;
 + int need_sync = 0;
   int error;
  
   error = gfs2_meta_inode_buffer(l_ip, l_bh);
 @@ -486,16 +487,16 @@ void gfs2_statfs_change(struct gfs2_sbd 
   l_sc-sc_free += free;
   l_sc-sc_dinodes += dinodes;
   gfs2_statfs_change_out(l_sc, l_bh-b_data + sizeof(struct gfs2_dinode));
 - if (m_sc-sc_free)
 - percent = (100 * l_sc-sc_free) / m_sc-sc_free;
 - else
 - percent = 100;
 + if (sdp-sd_args.ar_statfs_percent) {
 + x = 100 * l_sc-sc_free;
 + y = m_sc-sc_free * sdp-sd_args.ar_statfs_percent;
 + if (x = y || x = -y)
 + need_sync = 1;
 + }
   spin_unlock(sdp-sd_statfs_spin);
  
   brelse(l_bh);
 - sync_percent = sdp-sd_args.ar_statfs_percent;
 - if (sync_percent  (percent = sync_percent ||
 -  percent = -sync_percent))
 + if (need_sync)
   gfs2_wake_up_statfs(sdp);
  }
  



[Cluster-devel] Re: [PATCH] gfs2: add barrier/nobarrier mount options

2009-10-30 Thread Steven Whitehouse
Hi,

Thanks for the patch. I've pushed it to the -nmw tree now. I've also
added a two-liner of my own to display the nobarrier option
in /proc/mounts,

Steve.

On Fri, 2009-10-30 at 08:03 +0100, Christoph Hellwig wrote:
 Currently gfs2 issues barrier unconditionally.  There are various reasons
 to disable them, be that just for testing or for stupid devices flushing
 large battert backed caches.  Add a nobarrier option that matches xfs and
 btrfs for this.  Also add a symmetric barrier option to turn it back on
 at remount time.
 
 Signed-off-by: Christoph Hellwig h...@lst.de
 
 Index: linux-2.6/fs/gfs2/incore.h
 ===
 --- linux-2.6.orig/fs/gfs2/incore.h   2009-10-30 07:43:42.246023792 +0100
 +++ linux-2.6/fs/gfs2/incore.h2009-10-30 07:44:11.173255988 +0100
 @@ -429,6 +429,7 @@ struct gfs2_args {
   unsigned int ar_meta:1; /* mount metafs */
   unsigned int ar_discard:1;  /* discard requests */
   unsigned int ar_errors:2;   /* errors=withdraw | panic */
 + unsigned int ar_nobarrier:1;/* do not send barriers */
   int ar_commit;  /* Commit interval */
  };
  
 Index: linux-2.6/fs/gfs2/super.c
 ===
 --- linux-2.6.orig/fs/gfs2/super.c2009-10-30 07:44:29.832024397 +0100
 +++ linux-2.6/fs/gfs2/super.c 2009-10-30 07:53:24.117033618 +0100
 @@ -70,6 +70,8 @@ enum {
   Opt_commit,
   Opt_err_withdraw,
   Opt_err_panic,
 + Opt_barrier,
 + Opt_nobarrier,
   Opt_error,
  };
  
 @@ -98,6 +100,8 @@ static const match_table_t tokens = {
   {Opt_meta, meta},
   {Opt_discard, discard},
   {Opt_nodiscard, nodiscard},
 + {Opt_barrier, barrier},
 + {Opt_nobarrier, nobarrier},
   {Opt_commit, commit=%d},
   {Opt_err_withdraw, errors=withdraw},
   {Opt_err_panic, errors=panic},
 @@ -207,6 +211,12 @@ int gfs2_mount_args(struct gfs2_sbd *sdp
   case Opt_nodiscard:
   args-ar_discard = 0;
   break;
 + case Opt_barrier:
 + args-ar_nobarrier = 0;
 + break;
 + case Opt_nobarrier:
 + args-ar_nobarrier = 1;
 + break;
   case Opt_commit:
   rv = match_int(tmp[0], args-ar_commit);
   if (rv || args-ar_commit = 0) {
 @@ -1097,6 +1107,10 @@ static int gfs2_remount_fs(struct super_
   sb-s_flags |= MS_POSIXACL;
   else
   sb-s_flags = ~MS_POSIXACL;
 + if (sdp-sd_args.ar_nobarrier)
 + set_bit(SDF_NOBARRIERS, sdp-sd_flags);
 + else
 + clear_bit(SDF_NOBARRIERS, sdp-sd_flags);
   spin_lock(gt-gt_spin);
   gt-gt_log_flush_secs = args.ar_commit;
   spin_unlock(gt-gt_spin);
 Index: linux-2.6/fs/gfs2/ops_fstype.c
 ===
 --- linux-2.6.orig/fs/gfs2/ops_fstype.c   2009-10-30 07:52:11.050003877 
 +0100
 +++ linux-2.6/fs/gfs2/ops_fstype.c2009-10-30 07:52:53.053005337 +0100
 @@ -1143,6 +1143,8 @@ static int fill_super(struct super_block
   }
   if (sdp-sd_args.ar_posix_acl)
   sb-s_flags |= MS_POSIXACL;
 + if (sdp-sd_args.ar_nobarrier)
 + set_bit(SDF_NOBARRIERS, sdp-sd_flags);
  
   sb-s_magic = GFS2_MAGIC;
   sb-s_op = gfs2_super_ops;



Re: [Cluster-devel] Re: [PATCH] misc: use a proper range for minor number dynamic allocation

2009-11-10 Thread Steven Whitehouse
Hi,

On Mon, 2009-11-09 at 17:03 -0600, David Teigland wrote:
 On Mon, Nov 09, 2009 at 01:28:36PM -0800, Andrew Morton wrote:
  On Fri, 23 Oct 2009 21:28:17 -0200
  Thadeu Lima de Souza Cascardo casca...@holoscopio.com wrote:
  
   The current dynamic allocation of minor number for misc devices has some
   drawbacks.
   
   First of all, the range for dynamic numbers include some statically
   allocated numbers. It goes from 63 to 0, and we have numbers in the
   range from 1 to 15 already allocated. Although, it gives priority to the
   higher and not allocated numbers, we may end up in a situation where we
   must reject registering a driver which got a static number because a
   driver got its number with dynamic allocation. Considering fs/dlm/user.c
   allocates as many misc devices as lockspaces are created, and that we
   have more than 50 users around, it's not unreasonable to reach that
   situation.
  
  What is this DLM behaviour of which you speak?  It sounds broken.
 
 One for each userland lockspace, I know of three userland apps using dlm:
 1. rgmanager which is at the end of its life
 2. clvmd which is switching to a different lock manager
 3. ocfs2 tools, where the userland portion is transient; it only exists
while the tool executes.
 
 That said, it shouldn't be a problem to switch to a single device in the
 next version of the interface.
 
 Dave
 
As well as the per-userland lockspace misc devices there are also the
misc devices of which there are only one instance shared between all
lock spaces:

dlm_lock - Used for userland communication with posix locks
dlm-monitor - Used to only to check that dlm_controld is running (so far
as I can tell)
dlm-control - Used to create/remove userland dlm lockspaces

I also had a look at other methods used by the dlm to communicate with
userspace, and this is what I've come up with so far:

configfs - Used to set up lockspaces
debugfs - Used to get lock state information for debugging
netlink - Used only to notify lock timeouts to dlm_controld
sysfs - Used to implement a wait for a userland event (wait for write to
a sysfs file)
uevents - Used to trigger dlm_controld into performing an action which
  results in the write to sysfs mentioned above. This is
  netlink again, but with a layer over the top of it.

If a change to the misc devices is planned, I'm wondering if it would be
possible to merge some of the other functions into a single interface to
simplify things a bit. In particular the netlink interface looks dubious
to me since I think it should be doing a broadcast rather than the
rather strange (and possibly a security issue with any process able to
send messages to it and set their own pid so far as I can see). I have
to say that I didn't test that, but there is no obvious check for privs
that I can see in the dlm netlink code.

Steve.







[Cluster-devel] GFS2: Clean up recovery code

2009-11-10 Thread Steven Whitehouse

The following patch cleans up the recovery code and fixes a few
bugs along the way. The bugs are:
 o An incorrect assumption about the size of the journal
 o An issue where the superblock was being used to store variables
   local to the recovery process which would cause a problem if
   multiple journals were recovered at once.
 o Can report incorrect counts of blocks read  recovered in some cases
   (this is harmless, its just a logging issue)

Features:
 o Moves the recovery code from lops.c into recovery.c which allows
   making a number of functions static and removing other bits of code.
 o Removes the before scan functions as they are not needed (partly
   merged into the scan functions)
 o Removes the after scan functions. These have also been merged into
   the scan functions
 o We no longer call any functions which may in turn call withdraw from
   the recovery code. If there is an issue with recovery, we report it
   to the caller (and userspace).
 o New uevent env variable is documented
 o Superblock shrinks by 32 bytes on 64 bit arches.
 o Code shrinks by about 100 lines (probably more since there are more
   comments now)

TODO:
 o Report where error has occurred in log, as well as what the error is
 o Check code for finding journal headers (maybe remove gfs2_log_header_host?)
 o Testing :-)

For the moment, this is just a heads up on what I'm working on. I hope it
won't be too long before I have a final version of this patch,

Steve.

diff --git a/Documentation/filesystems/gfs2-uevents.txt 
b/Documentation/filesystems/gfs2-uevents.txt
index fd966dc..c029596 100644
--- a/Documentation/filesystems/gfs2-uevents.txt
+++ b/Documentation/filesystems/gfs2-uevents.txt
@@ -44,6 +44,10 @@ for every journal recovered, whether it is during the 
initial mount
 process or as the result of gfs_controld requesting a specific journal
 recovery via the /sys/fs/gfs2/fsname/lock_module/recovery file.
 
+If the recovery has failed, then on recent versions of GFS2 the
+ERROR= variable will also be included. This returns a kernel
+error code indicating what went wrong during recovery.
+
 Because the CHANGE uevent was used (in early versions of gfs_controld)
 without checking the environment variables to discover the state, we
 cannot add any more functions to it without running the risk of
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 4792200..e497aaf 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -50,12 +50,6 @@ struct gfs2_log_operations {
void (*lo_add) (struct gfs2_sbd *sdp, struct gfs2_log_element *le);
void (*lo_before_commit) (struct gfs2_sbd *sdp);
void (*lo_after_commit) (struct gfs2_sbd *sdp, struct gfs2_ail *ai);
-   void (*lo_before_scan) (struct gfs2_jdesc *jd,
-   struct gfs2_log_header_host *head, int pass);
-   int (*lo_scan_elements) (struct gfs2_jdesc *jd, unsigned int start,
-struct gfs2_log_descriptor *ld, __be64 *ptr,
-int pass);
-   void (*lo_after_scan) (struct gfs2_jdesc *jd, int error, int pass);
const char *lo_name;
 };
 
@@ -648,15 +642,6 @@ struct gfs2_sbd {
struct list_head sd_ail2_list;
u64 sd_ail_sync_gen;
 
-   /* Replay stuff */
-
-   struct list_head sd_revoke_list;
-   unsigned int sd_replay_tail;
-
-   unsigned int sd_found_blocks;
-   unsigned int sd_found_revokes;
-   unsigned int sd_replayed_blocks;
-
/* For quiescing the filesystem */
 
struct gfs2_holder sd_freeze_gh;
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index de97632..4d301af 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -136,6 +136,12 @@ static void buf_lo_add(struct gfs2_sbd *sdp, struct 
gfs2_log_element *le)
struct gfs2_trans *tr;
 
lock_buffer(bd-bd_bh);
+   mh = (struct gfs2_meta_header *)bd-bd_bh-b_data;
+   if (unlikely(mh-mh_magic != cpu_to_be32(GFS2_MAGIC))) {
+   printk(KERN_ERR GFS2: %s mh error: buf_lo_add block %llu\n,
+  sdp-sd_fsname, (unsigned long 
long)bd-bd_bh-b_blocknr);
+   BUG();
+   }
gfs2_log_lock(sdp);
if (!list_empty(bd-bd_list_tr))
goto out;
@@ -147,9 +153,7 @@ static void buf_lo_add(struct gfs2_sbd *sdp, struct 
gfs2_log_element *le)
goto out;
set_bit(GLF_LFLUSH, bd-bd_gl-gl_flags);
set_bit(GLF_DIRTY, bd-bd_gl-gl_flags);
-   gfs2_meta_check(sdp, bd-bd_bh);
gfs2_pin(sdp, bd-bd_bh);
-   mh = (struct gfs2_meta_header *)bd-bd_bh-b_data;
mh-__pad0 = cpu_to_be64(0);
mh-mh_jid = cpu_to_be32(sdp-sd_jdesc-jd_jid);
sdp-sd_log_num_buf++;
@@ -235,84 +239,6 @@ static void buf_lo_after_commit(struct gfs2_sbd *sdp, 
struct gfs2_ail *ai)
gfs2_assert_warn(sdp, !sdp-sd_log_num_buf);
 }
 
-static void buf_lo_before_scan(struct gfs2_jdesc *jd,
-  struct gfs2_log_header_host *head, int 

[Cluster-devel] Re: [PATCH 2/2] dlm: Add down/up_write_non_owner to keep lockdep happy

2009-11-12 Thread Steven Whitehouse
Hi,

On Thu, 2009-11-12 at 17:45 +0100, Peter Zijlstra wrote:
 On Thu, 2009-11-12 at 11:14 -0600, David Teigland wrote:
  up_write_non_owner()
  addresses this trace, which as you say, is from doing the down and up from
  different threads (which is the intention): 
 
 That's really something I cannot advice to do. Aside from loosing
 lock-dependency validation (not a good thing), asymmetric locking like
 that is generally very hard to analyze since its not clear who 'owns'
 what data when.
 
 There are a few places in the kernel that use the non_owner things, and
 we should generally strive to remove them, not add more.
 
 Please consider solving your problem without adding things like this.
 
The code that does this already exists - it is not being added by the
patch. Its just that in recent kernels lockdep has started noticing the
problem. I did seriously consider changing the locking rather than just
silencing the messages, but it looks rather complicated and not easily
replaced with other primitives.

Any suggestions as to a better solution are welcome,

Steve.




[Cluster-devel] Re: [PATCH 2/2] dlm: Add down/up_write_non_owner to keep lockdep happy

2009-11-13 Thread Steven Whitehouse
Hi,

On Thu, 2009-11-12 at 12:34 -0600, David Teigland wrote:
 On Thu, Nov 12, 2009 at 05:24:12PM +, Steven Whitehouse wrote:
Nov 12 15:10:01 chywoon kernel: [ INFO: possible recursive locking
detected ]
   
   That recursive locking trace is something different.  up_write_non_owner()
   addresses this trace, which as you say, is from doing the down and up from
   different threads (which is the intention):
   
  I don't think it is different, the traces differ due to the ordering of
  running of dlm_recoverd and mount.gfs2,
 
 I explained the recursive locking warning back in Sep:
 
   I've not looked at how to remove this recursive message.  What
   happens is that mount calls dlm_new_lockspace() which returns with
   in_recovery locked.  mount then makes a lock request which blocks on
   in_recovery (as expected) until the dlm_recoverd thread completes
   recovery and releases the in_recovery lock (triggering the unlock
   balance) to allow locking activity.
 
 It doesn't appear to me that up_write_non_owner() would suppress that.
 
 Dave
 
It is simply down to the ordering of the running of the threads as to
which message you get at mount time. There are two possible scenarios:

Scenario 1:

1. mount.gfs2 calls (via mount sys call and gfs2) dlm_newlockspace()
which takes the ls_in_recovery rwsem with a down_write()
2. mount.gfs2 goes on to try and take out a lock on the filesystem, and
calls dlm_lock which tries to do a down_read() on the rwsem. Since this
is from the same thread as the down_write() you get the recursive
locking message reported in the dmesg which I attached to my earlier
email.

In the second scenario, dlm_recoverd runs between step 1 and 2 above.
this results in the trace which you reported, since ls_in_recovery has
then been unlocked from a different thread, which creates the unlocking
balance trace which you posted.

In both cases the cause is the same, its just the running order of the
threads which results in it being reported in a different way. The patch
should fix both of these reports, since it annotates the up  down write
side of the rwsem,

Steve.




[Cluster-devel] GFS2: Move glock ref count drop out of finish_xmote

2009-11-20 Thread Steven Whitehouse

There have been a couple of instances reported recently where the
glock ref count has hit zero too soon. Since the only time when
this can happen is on the demote path (in other cases the ref count
is held elevated by the callers, as well as in the lock operation
itself) there is a good chance that the culprit is at the end
of finish_xmote.

This patch removes the ref count drop from the end of finish_xmote
and moves it into the callers of that function. This will ensure
that in future the ref count cannot be dropped too early.

Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index a3f90ad..3bc7d98 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -513,7 +513,6 @@ retry:
GLOCK_BUG_ON(gl, 1);
}
spin_unlock(gl-gl_spin);
-   gfs2_glock_put(gl);
return;
}
 
@@ -524,8 +523,6 @@ retry:
if (glops-go_xmote_bh) {
spin_unlock(gl-gl_spin);
rv = glops-go_xmote_bh(gl, gh);
-   if (rv == -EAGAIN)
-   return;
spin_lock(gl-gl_spin);
if (rv) {
do_error(gl, rv);
@@ -540,7 +537,6 @@ out:
clear_bit(GLF_LOCK, gl-gl_flags);
 out_locked:
spin_unlock(gl-gl_spin);
-   gfs2_glock_put(gl);
 }
 
 static unsigned int gfs2_lm_lock(struct gfs2_sbd *sdp, void *lock,
@@ -600,7 +596,6 @@ __acquires(gl-gl_spin)
 
if (!(ret  LM_OUT_ASYNC)) {
finish_xmote(gl, ret);
-   gfs2_glock_hold(gl);
if (queue_delayed_work(glock_workqueue, gl-gl_work, 0) == 0)
gfs2_glock_put(gl);
} else {
@@ -712,9 +707,12 @@ static void glock_work_func(struct work_struct *work)
 {
unsigned long delay = 0;
struct gfs2_glock *gl = container_of(work, struct gfs2_glock, 
gl_work.work);
+   int drop_ref = 0;
 
-   if (test_and_clear_bit(GLF_REPLY_PENDING, gl-gl_flags))
+   if (test_and_clear_bit(GLF_REPLY_PENDING, gl-gl_flags)) {
finish_xmote(gl, gl-gl_reply);
+   drop_ref = 1;
+   }
down_read(gfs2_umount_flush_sem);
spin_lock(gl-gl_spin);
if (test_and_clear_bit(GLF_PENDING_DEMOTE, gl-gl_flags) 
@@ -732,6 +730,8 @@ static void glock_work_func(struct work_struct *work)
if (!delay ||
queue_delayed_work(glock_workqueue, gl-gl_work, delay) == 0)
gfs2_glock_put(gl);
+   if (drop_ref)
+   gfs2_glock_put(gl);
 }
 
 /**




Re: [Cluster-devel] [RFC] Proposal to align autotool versions

2009-11-24 Thread Steven Whitehouse
Hi,

On Tue, 2009-11-24 at 14:33 +0100, Fabio M. Di Nitto wrote:
 Hi guys,
 
 I just completed testing of autotools in F13/rawhide and they seem to
 fulfill perfectly what we need so far.
 
 Fedora13 has:
 
 libtool 2.2.6 (doesn´t carry the bug for which we were forcing 2.2.7)
 autoconf 2.64 (higher than what we require now)
 automake 1.11/m4/pkg-config in more than recent enough versions.
 
 corosync/openais will eventually get libtool support.
 cluster-stable3 autotool implementation is eventually on the schedule.
 cluster/master trees are already ported.
 
 My suggestion is simply to use those versions across the projects so
 that developers will not require any longer to manually build autotools
 and can start easily testing again master trees.
 
 Please ACK/NACK.
 
 Fabio
 

The sooner the better as far as I'm concerned,

Steve.




[Cluster-devel] GFS2: Extra early pre-pull patch posting

2009-11-25 Thread Steven Whitehouse
Due to the larger than usual content of new items in this patch set
I'm posting it a bit earlier than normal so that there is more time
for review.

There are a few bug fixes in this set, but most of the content is
new code relating to xattrs and quotas. The ACL support is cleaned
up and support for caching of ACLs has been added. At the same time
the xattr support has been fixed and clean up too.

There are a series of patches which add support for the XFS-style
quota interface to GFS2. There has always been support for quotas
in GFS2, but the interface was via a userland tool which manipulated
the quota file directly. Due to the way in which the GFS2 quotas
were implemented, they were a better fit for the XFS-style interface
than the dquot interface, so that was the one which we chose to
use. We do not support all features of the XFS quotas though (we
don't have project quotas) and quotas are also turned on and off
only via mount options as we still do not support the set_xstate
function (but we do allow querying of the current quota state via
get_xstate). Aside from that, it does cover most of the XFS feature
set, and everything that is needed to manipulate all supported GFS2
quota types.

The userland tools for generic quota manipulation do not yet
understand how to talk to GFS2's quota interface as they
assume that only XFS uses the XFS-style quota interface. That
is a future project.

In addition to that, the quota netlink notification interface is
made into a generic feature so that GFS2 can use it as well as
dquot based systems.

Other features:
 o Added a barrier/nobarrier option in common with other filesystems
   (N.B. this defaults to on if it isn't specified)
 o A spare field in our common-to-many-objects metadata header is now
   used to write the journal id of the last node to modify that bit
   of metadata. This is ignored by the filesystem, but useful for
   debugging purposes.

I have spotted that one of the patches starts FS2: instead of GFS2:
and I'll try and fix that before the merge. Its a pain as it is part
way down the patch series and I don't think I can fix it without
rebasing the tree.

Let me know if you spot anything else thats wrong,

Steve.




[Cluster-devel] [PATCH 02/30] GFS2: Fix -o meta mounts for subsequent mounts (i.e. all but the first one)

2009-11-25 Thread Steven Whitehouse
We have a long term plan to use the -o meta flag to GFS2 mounts to
access the alternate root which is used to store metadata for a GFS2
filesystem. This will allow us to eventually remove support for the
gfs2meta filesystem type (which is in any case just a front end to
the gfs2 filesystem type with the meta/master root).

Currently the -o meta option is only taken into account on the
initial mount of the filesystem. Subsequent mounts of the same
filesystem (i.e. on the same device) result in basically the same
as bind mounting the root of the original mount.

This patch changes that by using what is more or less a copy
of get_sb_bdev() and extending it so that it will take into
account the alternate root in all cases. The main difference
is that we have to parse the mount options a bit earlier. We can
then use them to select the appropriate root towards the end of
the function.

In addition this also fixes a bug where it was possible (but certainly
not desirable) to set different ro/rw options for the meta root
when mounted via the gfs2meta fs compared with the original mount.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
Cc: Alexander Viro av...@redhat.com
---
 fs/gfs2/ops_fstype.c |  135 +++--
 fs/gfs2/super.c  |   16 +++---
 fs/gfs2/super.h  |2 +-
 3 files changed, 127 insertions(+), 26 deletions(-)

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 52fb6c0..e5ee062 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1114,7 +1114,7 @@ void gfs2_online_uevent(struct gfs2_sbd *sdp)
  * Returns: errno
  */
 
-static int fill_super(struct super_block *sb, void *data, int silent)
+static int fill_super(struct super_block *sb, struct gfs2_args *args, int 
silent)
 {
struct gfs2_sbd *sdp;
struct gfs2_holder mount_gh;
@@ -1125,17 +1125,7 @@ static int fill_super(struct super_block *sb, void 
*data, int silent)
printk(KERN_WARNING GFS2: can't alloc struct gfs2_sbd\n);
return -ENOMEM;
}
-
-   sdp-sd_args.ar_quota = GFS2_QUOTA_DEFAULT;
-   sdp-sd_args.ar_data = GFS2_DATA_DEFAULT;
-   sdp-sd_args.ar_commit = 60;
-   sdp-sd_args.ar_errors = GFS2_ERRORS_DEFAULT;
-
-   error = gfs2_mount_args(sdp, sdp-sd_args, data);
-   if (error) {
-   printk(KERN_WARNING GFS2: can't parse mount arguments\n);
-   goto fail;
-   }
+   sdp-sd_args = *args;
 
if (sdp-sd_args.ar_spectator) {
 sb-s_flags |= MS_RDONLY;
@@ -1243,18 +1233,125 @@ fail:
return error;
 }
 
-static int gfs2_get_sb(struct file_system_type *fs_type, int flags,
-  const char *dev_name, void *data, struct vfsmount *mnt)
+static int set_gfs2_super(struct super_block *s, void *data)
 {
-   return get_sb_bdev(fs_type, flags, dev_name, data, fill_super, mnt);
+   s-s_bdev = data;
+   s-s_dev = s-s_bdev-bd_dev;
+
+   /*
+* We set the bdi here to the queue backing, file systems can
+* overwrite this in -fill_super()
+*/
+   s-s_bdi = bdev_get_queue(s-s_bdev)-backing_dev_info;
+   return 0;
 }
 
-static int test_meta_super(struct super_block *s, void *ptr)
+static int test_gfs2_super(struct super_block *s, void *ptr)
 {
struct block_device *bdev = ptr;
return (bdev == s-s_bdev);
 }
 
+/**
+ * gfs2_get_sb - Get the GFS2 superblock
+ * @fs_type: The GFS2 filesystem type
+ * @flags: Mount flags
+ * @dev_name: The name of the device
+ * @data: The mount arguments
+ * @mnt: The vfsmnt for this mount
+ *
+ * Q. Why not use get_sb_bdev() ?
+ * A. We need to select one of two root directories to mount, independent
+ *of whether this is the initial, or subsequent, mount of this sb
+ *
+ * Returns: 0 or -ve on error
+ */
+
+static int gfs2_get_sb(struct file_system_type *fs_type, int flags,
+  const char *dev_name, void *data, struct vfsmount *mnt)
+{
+   struct block_device *bdev;
+   struct super_block *s;
+   fmode_t mode = FMODE_READ;
+   int error;
+   struct gfs2_args args;
+   struct gfs2_sbd *sdp;
+
+   if (!(flags  MS_RDONLY))
+   mode |= FMODE_WRITE;
+
+   bdev = open_bdev_exclusive(dev_name, mode, fs_type);
+   if (IS_ERR(bdev))
+   return PTR_ERR(bdev);
+
+   /*
+* once the super is inserted into the list by sget, s_umount
+* will protect the lockfs code from trying to start a snapshot
+* while we are mounting
+*/
+   mutex_lock(bdev-bd_fsfreeze_mutex);
+   if (bdev-bd_fsfreeze_count  0) {
+   mutex_unlock(bdev-bd_fsfreeze_mutex);
+   error = -EBUSY;
+   goto error_bdev;
+   }
+   s = sget(fs_type, test_gfs2_super, set_gfs2_super, bdev);
+   mutex_unlock(bdev-bd_fsfreeze_mutex);
+   error = PTR_ERR(s);
+   if (IS_ERR(s))
+   goto error_bdev;
+
+   memset(args

[Cluster-devel] [PATCH 03/30] GFS2: Fix up system xattrs

2009-11-25 Thread Steven Whitehouse
This code has been shamelessly stolen from XFS at the suggestion
of Christoph Hellwig. I've not added support for cached ACLs so
far... watch for that in a later patch, although this is designed
in such a way that they should be easy to add.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
Cc: Christoph Hellwig h...@infradead.org
---
 fs/gfs2/acl.c   |  170 +--
 fs/gfs2/acl.h   |   24 ++--
 fs/gfs2/xattr.c |   18 --
 3 files changed, 120 insertions(+), 92 deletions(-)

diff --git a/fs/gfs2/acl.c b/fs/gfs2/acl.c
index 3fc4e3a..2168da1 100644
--- a/fs/gfs2/acl.c
+++ b/fs/gfs2/acl.c
@@ -12,6 +12,7 @@
 #include linux/spinlock.h
 #include linux/completion.h
 #include linux/buffer_head.h
+#include linux/xattr.h
 #include linux/posix_acl.h
 #include linux/posix_acl_xattr.h
 #include linux/gfs2_ondisk.h
@@ -26,61 +27,6 @@
 #include trans.h
 #include util.h
 
-#define ACL_ACCESS 1
-#define ACL_DEFAULT 0
-
-int gfs2_acl_validate_set(struct gfs2_inode *ip, int access,
- struct gfs2_ea_request *er, int *remove, mode_t *mode)
-{
-   struct posix_acl *acl;
-   int error;
-
-   error = gfs2_acl_validate_remove(ip, access);
-   if (error)
-   return error;
-
-   if (!er-er_data)
-   return -EINVAL;
-
-   acl = posix_acl_from_xattr(er-er_data, er-er_data_len);
-   if (IS_ERR(acl))
-   return PTR_ERR(acl);
-   if (!acl) {
-   *remove = 1;
-   return 0;
-   }
-
-   error = posix_acl_valid(acl);
-   if (error)
-   goto out;
-
-   if (access) {
-   error = posix_acl_equiv_mode(acl, mode);
-   if (!error)
-   *remove = 1;
-   else if (error  0)
-   error = 0;
-   }
-
-out:
-   posix_acl_release(acl);
-   return error;
-}
-
-int gfs2_acl_validate_remove(struct gfs2_inode *ip, int access)
-{
-   if (!GFS2_SB(ip-i_inode)-sd_args.ar_posix_acl)
-   return -EOPNOTSUPP;
-   if (!is_owner_or_cap(ip-i_inode))
-   return -EPERM;
-   if (S_ISLNK(ip-i_inode.i_mode))
-   return -EOPNOTSUPP;
-   if (!access  !S_ISDIR(ip-i_inode.i_mode))
-   return -EACCES;
-
-   return 0;
-}
-
 static int acl_get(struct gfs2_inode *ip, const char *name,
   struct posix_acl **acl, struct gfs2_ea_location *el,
   char **datap, unsigned int *lenp)
@@ -277,3 +223,117 @@ out_brelse:
return error;
 }
 
+static int gfs2_acl_type(const char *name)
+{
+   if (strcmp(name, GFS2_POSIX_ACL_ACCESS) == 0)
+   return ACL_TYPE_ACCESS;
+   if (strcmp(name, GFS2_POSIX_ACL_DEFAULT) == 0)
+   return ACL_TYPE_DEFAULT;
+   return -EINVAL;
+}
+
+static int gfs2_xattr_system_get(struct inode *inode, const char *name,
+void *buffer, size_t size)
+{
+   int type;
+
+   type = gfs2_acl_type(name);
+   if (type  0)
+   return type;
+
+   return gfs2_xattr_get(inode, GFS2_EATYPE_SYS, name, buffer, size);
+}
+
+static int gfs2_set_mode(struct inode *inode, mode_t mode)
+{
+   int error = 0;
+
+   if (mode != inode-i_mode) {
+   struct iattr iattr;
+
+   iattr.ia_valid = ATTR_MODE;
+   iattr.ia_mode = mode;
+
+   error = gfs2_setattr_simple(GFS2_I(inode), iattr);
+   }
+
+   return error;
+}
+
+static int gfs2_xattr_system_set(struct inode *inode, const char *name,
+const void *value, size_t size, int flags)
+{
+   struct gfs2_sbd *sdp = GFS2_SB(inode);
+   struct posix_acl *acl = NULL;
+   int error = 0, type;
+
+   if (!sdp-sd_args.ar_posix_acl)
+   return -EOPNOTSUPP;
+
+   type = gfs2_acl_type(name);
+   if (type  0)
+   return type;
+   if (flags  XATTR_CREATE)
+   return -EINVAL;
+   if (type == ACL_TYPE_DEFAULT  !S_ISDIR(inode-i_mode))
+   return value ? -EACCES : 0;
+   if ((current_fsuid() != inode-i_uid)  !capable(CAP_FOWNER))
+   return -EPERM;
+   if (S_ISLNK(inode-i_mode))
+   return -EOPNOTSUPP;
+
+   if (!value)
+   goto set_acl;
+
+   acl = posix_acl_from_xattr(value, size);
+   if (!acl) {
+   /*
+* acl_set_file(3) may request that we set default ACLs with
+* zero length -- defend (gracefully) against that here.
+*/
+   goto out;
+   }
+   if (IS_ERR(acl)) {
+   error = PTR_ERR(acl);
+   goto out;
+   }
+
+   error = posix_acl_valid(acl);
+   if (error)
+   goto out_release;
+
+   error = -EINVAL;
+   if (acl-a_count  GFS2_ACL_MAX_ENTRIES)
+   goto out_release;
+
+   if (type == ACL_TYPE_ACCESS

[Cluster-devel] [PATCH 04/30] VFS: Add forget_all_cached_acls()

2009-11-25 Thread Steven Whitehouse
This is required for cluster filesystems which want to use
cached ACLs so that they can invalidate the cache when
required.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
Cc: Alexander Viro av...@redhat.com
Cc: Christoph Hellwig h...@infradead.org
---
 include/linux/posix_acl.h |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/include/linux/posix_acl.h b/include/linux/posix_acl.h
index 065a365..6760816 100644
--- a/include/linux/posix_acl.h
+++ b/include/linux/posix_acl.h
@@ -147,6 +147,20 @@ static inline void forget_cached_acl(struct inode *inode, 
int type)
if (old != ACL_NOT_CACHED)
posix_acl_release(old);
 }
+
+static inline void forget_all_cached_acls(struct inode *inode)
+{
+   struct posix_acl *old_access, *old_default;
+   spin_lock(inode-i_lock);
+   old_access = inode-i_acl;
+   old_default = inode-i_default_acl;
+   inode-i_acl = inode-i_default_acl = ACL_NOT_CACHED;
+   spin_unlock(inode-i_lock);
+   if (old_access != ACL_NOT_CACHED)
+   posix_acl_release(old_access);
+   if (old_default != ACL_NOT_CACHED)
+   posix_acl_release(old_default);
+}
 #endif
 
 static inline void cache_no_acl(struct inode *inode)
-- 
1.6.2.5



[Cluster-devel] [PATCH 05/30] GFS2: Use forget_all_cached_acls()

2009-11-25 Thread Steven Whitehouse
Invalidate all the cached ACLs when we drop the glock.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/glops.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 6985eef..78554ac 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -13,6 +13,7 @@
 #include linux/buffer_head.h
 #include linux/gfs2_ondisk.h
 #include linux/bio.h
+#include linux/posix_acl.h
 
 #include gfs2.h
 #include incore.h
@@ -184,8 +185,10 @@ static void inode_go_inval(struct gfs2_glock *gl, int 
flags)
if (flags  DIO_METADATA) {
struct address_space *mapping = gl-gl_aspace-i_mapping;
truncate_inode_pages(mapping, 0);
-   if (ip)
+   if (ip) {
set_bit(GIF_INVALID, ip-i_flags);
+   forget_all_cached_acls(ip-i_inode);
+   }
}
 
if (ip == GFS2_I(gl-gl_sbd-sd_rindex))
-- 
1.6.2.5



[Cluster-devel] [PATCH 06/30] GFS2: Use gfs2_set_mode() instead of munge_mode()

2009-11-25 Thread Steven Whitehouse
These two functions do the same thing, so lets only use
one of them.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/acl.c |   46 +++---
 1 files changed, 11 insertions(+), 35 deletions(-)

diff --git a/fs/gfs2/acl.c b/fs/gfs2/acl.c
index 2168da1..1be3148 100644
--- a/fs/gfs2/acl.c
+++ b/fs/gfs2/acl.c
@@ -104,29 +104,20 @@ int gfs2_check_acl(struct inode *inode, int mask)
return -EAGAIN;
 }
 
-static int munge_mode(struct gfs2_inode *ip, mode_t mode)
+static int gfs2_set_mode(struct inode *inode, mode_t mode)
 {
-   struct gfs2_sbd *sdp = GFS2_SB(ip-i_inode);
-   struct buffer_head *dibh;
-   int error;
+   int error = 0;
 
-   error = gfs2_trans_begin(sdp, RES_DINODE, 0);
-   if (error)
-   return error;
+   if (mode != inode-i_mode) {
+   struct iattr iattr;
 
-   error = gfs2_meta_inode_buffer(ip, dibh);
-   if (!error) {
-   gfs2_assert_withdraw(sdp,
-   (ip-i_inode.i_mode  S_IFMT) == (mode  
S_IFMT));
-   ip-i_inode.i_mode = mode;
-   gfs2_trans_add_bh(ip-i_gl, dibh, 1);
-   gfs2_dinode_out(ip, dibh-b_data);
-   brelse(dibh);
-   }
+   iattr.ia_valid = ATTR_MODE;
+   iattr.ia_mode = mode;
 
-   gfs2_trans_end(sdp);
+   error = gfs2_setattr_simple(GFS2_I(inode), iattr);
+   }
 
-   return 0;
+   return error;
 }
 
 int gfs2_acl_create(struct gfs2_inode *dip, struct gfs2_inode *ip)
@@ -151,7 +142,7 @@ int gfs2_acl_create(struct gfs2_inode *dip, struct 
gfs2_inode *ip)
if (!acl) {
mode = ~current_umask();
if (mode != ip-i_inode.i_mode)
-   error = munge_mode(ip, mode);
+   error = gfs2_set_mode(ip-i_inode, mode);
return error;
}
 
@@ -181,7 +172,7 @@ int gfs2_acl_create(struct gfs2_inode *dip, struct 
gfs2_inode *ip)
if (error)
goto out;
 munge:
-   error = munge_mode(ip, mode);
+   error = gfs2_set_mode(ip-i_inode, mode);
 out:
posix_acl_release(acl);
kfree(data);
@@ -244,21 +235,6 @@ static int gfs2_xattr_system_get(struct inode *inode, 
const char *name,
return gfs2_xattr_get(inode, GFS2_EATYPE_SYS, name, buffer, size);
 }
 
-static int gfs2_set_mode(struct inode *inode, mode_t mode)
-{
-   int error = 0;
-
-   if (mode != inode-i_mode) {
-   struct iattr iattr;
-
-   iattr.ia_valid = ATTR_MODE;
-   iattr.ia_mode = mode;
-
-   error = gfs2_setattr_simple(GFS2_I(inode), iattr);
-   }
-
-   return error;
-}
 
 static int gfs2_xattr_system_set(struct inode *inode, const char *name,
 const void *value, size_t size, int flags)
-- 
1.6.2.5



[Cluster-devel] [PATCH 07/30] GFS2: Clean up ACLs

2009-11-25 Thread Steven Whitehouse
To prepare for support for caching of ACLs, this cleans up the GFS2
ACL support by pushing the xattr code back into xattr.c and changing
the acl_get function into one which only returns ACLs so that we
can drop the caching function into it shortly.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/acl.c   |  164 ---
 fs/gfs2/acl.h   |2 +-
 fs/gfs2/inode.c |2 +-
 fs/gfs2/xattr.c |   56 +++
 fs/gfs2/xattr.h |8 +--
 5 files changed, 132 insertions(+), 100 deletions(-)

diff --git a/fs/gfs2/acl.c b/fs/gfs2/acl.c
index 1be3148..bd0fce9 100644
--- a/fs/gfs2/acl.c
+++ b/fs/gfs2/acl.c
@@ -27,53 +27,40 @@
 #include trans.h
 #include util.h
 
-static int acl_get(struct gfs2_inode *ip, const char *name,
-  struct posix_acl **acl, struct gfs2_ea_location *el,
-  char **datap, unsigned int *lenp)
+static const char *gfs2_acl_name(int type)
 {
-   char *data;
-   unsigned int len;
-   int error;
+   switch (type) {
+   case ACL_TYPE_ACCESS:
+   return GFS2_POSIX_ACL_ACCESS;
+   case ACL_TYPE_DEFAULT:
+   return GFS2_POSIX_ACL_DEFAULT;
+   }
+   return NULL;
+}
 
-   el-el_bh = NULL;
+static struct posix_acl *gfs2_acl_get(struct gfs2_inode *ip, int type)
+{
+   struct posix_acl *acl;
+   const char *name;
+   char *data;
+   int len;
 
if (!ip-i_eattr)
-   return 0;
-
-   error = gfs2_ea_find(ip, GFS2_EATYPE_SYS, name, el);
-   if (error)
-   return error;
-   if (!el-el_ea)
-   return 0;
-   if (!GFS2_EA_DATA_LEN(el-el_ea))
-   goto out;
-
-   len = GFS2_EA_DATA_LEN(el-el_ea);
-   data = kmalloc(len, GFP_NOFS);
-   error = -ENOMEM;
-   if (!data)
-   goto out;
+   return NULL;
 
-   error = gfs2_ea_get_copy(ip, el, data, len);
-   if (error  0)
-   goto out_kfree;
-   error = 0;
+   name = gfs2_acl_name(type);
+   if (name == NULL)
+   return ERR_PTR(-EINVAL);
 
-   if (acl) {
-   *acl = posix_acl_from_xattr(data, len);
-   if (IS_ERR(*acl))
-   error = PTR_ERR(*acl);
-   }
+   len = gfs2_xattr_acl_get(ip, name, data);
+   if (len  0)
+   return ERR_PTR(len);
+   if (len == 0)
+   return NULL;
 
-out_kfree:
-   if (error || !datap) {
-   kfree(data);
-   } else {
-   *datap = data;
-   *lenp = len;
-   }
-out:
-   return error;
+   acl = posix_acl_from_xattr(data, len);
+   kfree(data);
+   return acl;
 }
 
 /**
@@ -86,14 +73,12 @@ out:
 
 int gfs2_check_acl(struct inode *inode, int mask)
 {
-   struct gfs2_ea_location el;
-   struct posix_acl *acl = NULL;
+   struct posix_acl *acl;
int error;
 
-   error = acl_get(GFS2_I(inode), GFS2_POSIX_ACL_ACCESS, acl, el, NULL, 
NULL);
-   brelse(el.el_bh);
-   if (error)
-   return error;
+   acl = gfs2_acl_get(GFS2_I(inode), ACL_TYPE_ACCESS);
+   if (IS_ERR(acl))
+   return PTR_ERR(acl);
 
if (acl) {
error = posix_acl_permission(inode, acl, mask);
@@ -120,32 +105,57 @@ static int gfs2_set_mode(struct inode *inode, mode_t mode)
return error;
 }
 
-int gfs2_acl_create(struct gfs2_inode *dip, struct gfs2_inode *ip)
+static int gfs2_acl_set(struct inode *inode, int type, struct posix_acl *acl)
 {
-   struct gfs2_ea_location el;
-   struct gfs2_sbd *sdp = GFS2_SB(dip-i_inode);
-   struct posix_acl *acl = NULL, *clone;
-   mode_t mode = ip-i_inode.i_mode;
-   char *data = NULL;
-   unsigned int len;
int error;
+   int len;
+   char *data;
+   const char *name = gfs2_acl_name(type);
+
+   BUG_ON(name == NULL);
+   len = posix_acl_to_xattr(acl, NULL, 0);
+   if (len == 0)
+   return 0;
+   data = kmalloc(len, GFP_NOFS);
+   if (data == NULL)
+   return -ENOMEM;
+   error = posix_acl_to_xattr(acl, data, len);
+   if (error  0)
+   goto out;
+   error = gfs2_xattr_set(inode, GFS2_EATYPE_SYS, name, data, len, 0);
+out:
+   kfree(data);
+   return error;
+}
+
+int gfs2_acl_create(struct gfs2_inode *dip, struct inode *inode)
+{
+   struct gfs2_sbd *sdp = GFS2_SB(dip-i_inode);
+   struct posix_acl *acl, *clone;
+   mode_t mode = inode-i_mode;
+   int error = 0;
 
if (!sdp-sd_args.ar_posix_acl)
return 0;
-   if (S_ISLNK(ip-i_inode.i_mode))
+   if (S_ISLNK(inode-i_mode))
return 0;
 
-   error = acl_get(dip, GFS2_POSIX_ACL_DEFAULT, acl, el, data, len);
-   brelse(el.el_bh);
-   if (error)
-   return error;
+   acl = gfs2_acl_get(dip, ACL_TYPE_DEFAULT);
+   if (IS_ERR(acl

[Cluster-devel] [PATCH 08/30] GFS2: Add cached ACLs support

2009-11-25 Thread Steven Whitehouse
The other patches in this series have been building towards
being able to support cached ACLs like other filesystems. The
only real difference with GFS2 is that we have to invalidate
the cache when we drop a glock, but that is dealt with in earlier
patches.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/acl.c |   27 +--
 1 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/acl.c b/fs/gfs2/acl.c
index bd0fce9..3eb1ea8 100644
--- a/fs/gfs2/acl.c
+++ b/fs/gfs2/acl.c
@@ -48,6 +48,10 @@ static struct posix_acl *gfs2_acl_get(struct gfs2_inode *ip, 
int type)
if (!ip-i_eattr)
return NULL;
 
+   acl = get_cached_acl(ip-i_inode, type);
+   if (acl != ACL_NOT_CACHED)
+   return acl;
+
name = gfs2_acl_name(type);
if (name == NULL)
return ERR_PTR(-EINVAL);
@@ -123,6 +127,8 @@ static int gfs2_acl_set(struct inode *inode, int type, 
struct posix_acl *acl)
if (error  0)
goto out;
error = gfs2_xattr_set(inode, GFS2_EATYPE_SYS, name, data, len, 0);
+   if (!error)
+   set_cached_acl(inode, type, acl);
 out:
kfree(data);
return error;
@@ -209,6 +215,7 @@ int gfs2_acl_chmod(struct gfs2_inode *ip, struct iattr 
*attr)
posix_acl_to_xattr(acl, data, len);
error = gfs2_xattr_acl_chmod(ip, attr, data);
kfree(data);
+   set_cached_acl(ip-i_inode, ACL_TYPE_ACCESS, acl);
}
 
 out:
@@ -228,15 +235,25 @@ static int gfs2_acl_type(const char *name)
 static int gfs2_xattr_system_get(struct inode *inode, const char *name,
 void *buffer, size_t size)
 {
+   struct posix_acl *acl;
int type;
+   int error;
 
type = gfs2_acl_type(name);
if (type  0)
return type;
 
-   return gfs2_xattr_get(inode, GFS2_EATYPE_SYS, name, buffer, size);
-}
+   acl = gfs2_acl_get(GFS2_I(inode), type);
+   if (IS_ERR(acl))
+   return PTR_ERR(acl);
+   if (acl == NULL)
+   return -ENODATA;
 
+   error = posix_acl_to_xattr(acl, buffer, size);
+   posix_acl_release(acl);
+
+   return error;
+}
 
 static int gfs2_xattr_system_set(struct inode *inode, const char *name,
 const void *value, size_t size, int flags)
@@ -303,6 +320,12 @@ static int gfs2_xattr_system_set(struct inode *inode, 
const char *name,
 
 set_acl:
error = gfs2_xattr_set(inode, GFS2_EATYPE_SYS, name, value, size, 0);
+   if (!error) {
+   if (acl)
+   set_cached_acl(inode, type, acl);
+   else
+   forget_cached_acl(inode, type);
+   }
 out_release:
posix_acl_release(acl);
 out:
-- 
1.6.2.5



[Cluster-devel] [PATCH 09/30] VFS: Use GFP_NOFS in posix_acl_from_xattr()

2009-11-25 Thread Steven Whitehouse
GFS2 needs to call this from under a glock, so we need GFP_NOFS
and I suspect that other filesystems might require this too.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/xattr_acl.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/xattr_acl.c b/fs/xattr_acl.c
index c6ad7c7..05ac0fe 100644
--- a/fs/xattr_acl.c
+++ b/fs/xattr_acl.c
@@ -36,7 +36,7 @@ posix_acl_from_xattr(const void *value, size_t size)
if (count == 0)
return NULL;

-   acl = posix_acl_alloc(count, GFP_KERNEL);
+   acl = posix_acl_alloc(count, GFP_NOFS);
if (!acl)
return ERR_PTR(-ENOMEM);
acl_e = acl-a_entries;
-- 
1.6.2.5



[Cluster-devel] [PATCH 10/30] GFS2: Alter arguments of gfs2_quota/statfs_sync

2009-11-25 Thread Steven Whitehouse
These two functions are altered so that gfs2_quota_sync may
in future be called directly from the VFS. The GFS2 superblock
changes to a VFS super block and there is an addition of an int
argument which is currently ignored.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/quota.c |7 ---
 fs/gfs2/quota.h |2 +-
 fs/gfs2/super.c |7 ---
 fs/gfs2/super.h |2 +-
 fs/gfs2/sys.c   |4 ++--
 5 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 2e9b932..ed9e197 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1069,8 +1069,9 @@ void gfs2_quota_change(struct gfs2_inode *ip, s64 change,
}
 }
 
-int gfs2_quota_sync(struct gfs2_sbd *sdp)
+int gfs2_quota_sync(struct super_block *sb, int type)
 {
+   struct gfs2_sbd *sdp = sb-s_fs_info;
struct gfs2_quota_data **qda;
unsigned int max_qd = gfs2_tune_get(sdp, gt_quota_simul_sync);
unsigned int num_qd;
@@ -1298,12 +1299,12 @@ static void quotad_error(struct gfs2_sbd *sdp, const 
char *msg, int error)
 }
 
 static void quotad_check_timeo(struct gfs2_sbd *sdp, const char *msg,
-  int (*fxn)(struct gfs2_sbd *sdp),
+  int (*fxn)(struct super_block *sb, int type),
   unsigned long t, unsigned long *timeo,
   unsigned int *new_timeo)
 {
if (t = *timeo) {
-   int error = fxn(sdp);
+   int error = fxn(sdp-sd_vfs, 0);
quotad_error(sdp, msg, error);
*timeo = gfs2_tune_get_i(sdp-sd_tune, new_timeo) * HZ;
} else {
diff --git a/fs/gfs2/quota.h b/fs/gfs2/quota.h
index 0fa5fa6..437afa7 100644
--- a/fs/gfs2/quota.h
+++ b/fs/gfs2/quota.h
@@ -25,7 +25,7 @@ extern int gfs2_quota_check(struct gfs2_inode *ip, u32 uid, 
u32 gid);
 extern void gfs2_quota_change(struct gfs2_inode *ip, s64 change,
  u32 uid, u32 gid);
 
-extern int gfs2_quota_sync(struct gfs2_sbd *sdp);
+extern int gfs2_quota_sync(struct super_block *sb, int type);
 extern int gfs2_quota_refresh(struct gfs2_sbd *sdp, int user, u32 id);
 
 extern int gfs2_quota_init(struct gfs2_sbd *sdp);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 42e5458..e7b24d5 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -484,8 +484,9 @@ void update_statfs(struct gfs2_sbd *sdp, struct buffer_head 
*m_bh,
gfs2_statfs_change_out(m_sc, m_bh-b_data + sizeof(struct gfs2_dinode));
 }
 
-int gfs2_statfs_sync(struct gfs2_sbd *sdp)
+int gfs2_statfs_sync(struct super_block *sb, int type)
 {
+   struct gfs2_sbd *sdp = sb-s_fs_info;
struct gfs2_inode *m_ip = GFS2_I(sdp-sd_statfs_inode);
struct gfs2_inode *l_ip = GFS2_I(sdp-sd_sc_inode);
struct gfs2_statfs_change_host *m_sc = sdp-sd_statfs_master;
@@ -712,8 +713,8 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
int error;
 
flush_workqueue(gfs2_delete_workqueue);
-   gfs2_quota_sync(sdp);
-   gfs2_statfs_sync(sdp);
+   gfs2_quota_sync(sdp-sd_vfs, 0);
+   gfs2_statfs_sync(sdp-sd_vfs, 0);
 
error = gfs2_glock_nq_init(sdp-sd_trans_gl, LM_ST_SHARED, GL_NOCACHE,
   t_gh);
diff --git a/fs/gfs2/super.h b/fs/gfs2/super.h
index ed962ea..3df60f2 100644
--- a/fs/gfs2/super.h
+++ b/fs/gfs2/super.h
@@ -44,7 +44,7 @@ extern void gfs2_statfs_change_in(struct 
gfs2_statfs_change_host *sc,
  const void *buf);
 extern void update_statfs(struct gfs2_sbd *sdp, struct buffer_head *m_bh,
  struct buffer_head *l_bh);
-extern int gfs2_statfs_sync(struct gfs2_sbd *sdp);
+extern int gfs2_statfs_sync(struct super_block *sb, int type);
 
 extern int gfs2_freeze_fs(struct gfs2_sbd *sdp);
 extern void gfs2_unfreeze_fs(struct gfs2_sbd *sdp);
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index 4463297..be1b8ac 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -158,7 +158,7 @@ static ssize_t statfs_sync_store(struct gfs2_sbd *sdp, 
const char *buf,
if (simple_strtol(buf, NULL, 0) != 1)
return -EINVAL;
 
-   gfs2_statfs_sync(sdp);
+   gfs2_statfs_sync(sdp-sd_vfs, 0);
return len;
 }
 
@@ -171,7 +171,7 @@ static ssize_t quota_sync_store(struct gfs2_sbd *sdp, const 
char *buf,
if (simple_strtol(buf, NULL, 0) != 1)
return -EINVAL;
 
-   gfs2_quota_sync(sdp);
+   gfs2_quota_sync(sdp-sd_vfs, 0);
return len;
 }
 
-- 
1.6.2.5



[Cluster-devel] [PATCH 11/30] GFS2: Hook gfs2_quota_sync into VFS via gfs2_quotactl_ops

2009-11-25 Thread Steven Whitehouse
The plan is to add further operations to the gfs2_quotactl_ops
in future patches. The sync operation is easy, so we start with
that one.

We plan to use the XFS quota control functions because they more
closely match the GFS2 ones.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/Kconfig  |2 ++
 fs/gfs2/ops_fstype.c |3 +++
 fs/gfs2/quota.c  |4 
 fs/gfs2/quota.h  |1 +
 4 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
index 5971359..4dcddf8 100644
--- a/fs/gfs2/Kconfig
+++ b/fs/gfs2/Kconfig
@@ -8,6 +8,8 @@ config GFS2_FS
select FS_POSIX_ACL
select CRC32
select SLOW_WORK
+   select QUOTA
+   select QUOTACTL
help
  A cluster filesystem.
 
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index e5ee062..36b11cb 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -18,6 +18,7 @@
 #include linux/mount.h
 #include linux/gfs2_ondisk.h
 #include linux/slow-work.h
+#include linux/quotaops.h
 
 #include gfs2.h
 #include incore.h
@@ -1138,6 +1139,8 @@ static int fill_super(struct super_block *sb, struct 
gfs2_args *args, int silent
sb-s_op = gfs2_super_ops;
sb-s_export_op = gfs2_export_ops;
sb-s_xattr = gfs2_xattr_handlers;
+   sb-s_qcop = gfs2_quotactl_ops;
+   sb_dqopt(sb)-flags |= DQUOT_QUOTA_SYS_FILE;
sb-s_time_gran = 1;
sb-s_maxbytes = MAX_LFS_FILESIZE;
 
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index ed9e197..73a43ce 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1378,3 +1378,7 @@ int gfs2_quotad(void *data)
return 0;
 }
 
+const struct quotactl_ops gfs2_quotactl_ops = {
+   .quota_sync = gfs2_quota_sync,
+};
+
diff --git a/fs/gfs2/quota.h b/fs/gfs2/quota.h
index 437afa7..025d15b 100644
--- a/fs/gfs2/quota.h
+++ b/fs/gfs2/quota.h
@@ -50,5 +50,6 @@ static inline int gfs2_quota_lock_check(struct gfs2_inode *ip)
 }
 
 extern int gfs2_shrink_qd_memory(int nr, gfp_t gfp_mask);
+extern const struct quotactl_ops gfs2_quotactl_ops;
 
 #endif /* __QUOTA_DOT_H__ */
-- 
1.6.2.5



[Cluster-devel] [PATCH 12/30] GFS2: Remove obsolete code in quota.c

2009-11-25 Thread Steven Whitehouse
There is no point in testing for GLF_DEMOTE here, we might as
well always release the glock at that point.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/glock.h |9 -
 fs/gfs2/quota.c |   13 +
 2 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index c609894..13f0bd2 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -180,15 +180,6 @@ static inline int gfs2_glock_is_held_shrd(struct 
gfs2_glock *gl)
return gl-gl_state == LM_ST_SHARED;
 }
 
-static inline int gfs2_glock_is_blocking(struct gfs2_glock *gl)
-{
-   int ret;
-   spin_lock(gl-gl_spin);
-   ret = test_bit(GLF_DEMOTE, gl-gl_flags);
-   spin_unlock(gl-gl_spin);
-   return ret;
-}
-
 int gfs2_glock_get(struct gfs2_sbd *sdp,
   u64 number, const struct gfs2_glock_operations *glops,
   int create, struct gfs2_glock **glp);
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 73a43ce..6aaa6c5 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -843,9 +843,8 @@ restart:
if (force_refresh || qd-qd_qb.qb_magic != cpu_to_be32(GFS2_MAGIC)) {
loff_t pos;
gfs2_glock_dq_uninit(q_gh);
-   error = gfs2_glock_nq_init(qd-qd_gl,
-  LM_ST_EXCLUSIVE, GL_NOCACHE,
-  q_gh);
+   error = gfs2_glock_nq_init(qd-qd_gl, LM_ST_EXCLUSIVE,
+  GL_NOCACHE, q_gh);
if (error)
return error;
 
@@ -871,11 +870,9 @@ restart:
qlvb-qb_value = cpu_to_be64(q.qu_value);
qd-qd_qb = *qlvb;
 
-   if (gfs2_glock_is_blocking(qd-qd_gl)) {
-   gfs2_glock_dq_uninit(q_gh);
-   force_refresh = 0;
-   goto restart;
-   }
+   gfs2_glock_dq_uninit(q_gh);
+   force_refresh = 0;
+   goto restart;
}
 
return 0;
-- 
1.6.2.5



[Cluster-devel] [PATCH 13/30] GFS2: Add get_xstate quota function

2009-11-25 Thread Steven Whitehouse
This allows querying of the quota state via the XFS quota
API.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/quota.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 6aaa6c5..e7114be 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -47,6 +47,7 @@
 #include linux/gfs2_ondisk.h
 #include linux/kthread.h
 #include linux/freezer.h
+#include linux/dqblk_xfs.h
 
 #include gfs2.h
 #include incore.h
@@ -1375,7 +1376,29 @@ int gfs2_quotad(void *data)
return 0;
 }
 
+static int gfs2_quota_get_xstate(struct super_block *sb,
+struct fs_quota_stat *fqs)
+{
+   struct gfs2_sbd *sdp = sb-s_fs_info;
+
+   memset(fqs, 0, sizeof(struct fs_quota_stat));
+   fqs-qs_version = FS_QSTAT_VERSION;
+   if (sdp-sd_args.ar_quota == GFS2_QUOTA_ON)
+   fqs-qs_flags = (XFS_QUOTA_UDQ_ENFD | XFS_QUOTA_GDQ_ENFD);
+   else if (sdp-sd_args.ar_quota == GFS2_QUOTA_ACCOUNT)
+   fqs-qs_flags = (XFS_QUOTA_UDQ_ACCT | XFS_QUOTA_GDQ_ACCT);
+   if (sdp-sd_quota_inode) {
+   fqs-qs_uquota.qfs_ino = GFS2_I(sdp-sd_quota_inode)-i_no_addr;
+   fqs-qs_uquota.qfs_nblks = sdp-sd_quota_inode-i_blocks;
+   }
+   fqs-qs_uquota.qfs_nextents = 1; /* unsupported */
+   fqs-qs_gquota = fqs-qs_uquota; /* its the same inode in both cases */
+   fqs-qs_incoredqs = atomic_read(qd_lru_count);
+   return 0;
+}
+
 const struct quotactl_ops gfs2_quotactl_ops = {
.quota_sync = gfs2_quota_sync,
+   .get_xstate = gfs2_quota_get_xstate,
 };
 
-- 
1.6.2.5



[Cluster-devel] [PATCH 14/30] GFS2: Add proper error reporting to quota sync via sysfs

2009-11-25 Thread Steven Whitehouse
For some reason, the errors were not making it to userspace.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/sys.c |   10 ++
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index be1b8ac..c5dad1e 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -178,6 +178,7 @@ static ssize_t quota_sync_store(struct gfs2_sbd *sdp, const 
char *buf,
 static ssize_t quota_refresh_user_store(struct gfs2_sbd *sdp, const char *buf,
size_t len)
 {
+   int error;
u32 id;
 
if (!capable(CAP_SYS_ADMIN))
@@ -185,13 +186,14 @@ static ssize_t quota_refresh_user_store(struct gfs2_sbd 
*sdp, const char *buf,
 
id = simple_strtoul(buf, NULL, 0);
 
-   gfs2_quota_refresh(sdp, 1, id);
-   return len;
+   error = gfs2_quota_refresh(sdp, 1, id);
+   return error ? error : len;
 }
 
 static ssize_t quota_refresh_group_store(struct gfs2_sbd *sdp, const char *buf,
 size_t len)
 {
+   int error;
u32 id;
 
if (!capable(CAP_SYS_ADMIN))
@@ -199,8 +201,8 @@ static ssize_t quota_refresh_group_store(struct gfs2_sbd 
*sdp, const char *buf,
 
id = simple_strtoul(buf, NULL, 0);
 
-   gfs2_quota_refresh(sdp, 0, id);
-   return len;
+   error = gfs2_quota_refresh(sdp, 0, id);
+   return error ? error : len;
 }
 
 static ssize_t demote_rq_store(struct gfs2_sbd *sdp, const char *buf, size_t 
len)
-- 
1.6.2.5



[Cluster-devel] [PATCH 15/30] GFS2: Remove constant argument from qdsb_get()

2009-11-25 Thread Steven Whitehouse
The create argument to qdsb_get() was only ever set to true,
so this patch removes that argument.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/quota.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index e7114be..f790f5a 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -462,12 +462,12 @@ static void qd_unlock(struct gfs2_quota_data *qd)
qd_put(qd);
 }
 
-static int qdsb_get(struct gfs2_sbd *sdp, int user, u32 id, int create,
+static int qdsb_get(struct gfs2_sbd *sdp, int user, u32 id,
struct gfs2_quota_data **qdp)
 {
int error;
 
-   error = qd_get(sdp, user, id, create, qdp);
+   error = qd_get(sdp, user, id, CREATE, qdp);
if (error)
return error;
 
@@ -509,20 +509,20 @@ int gfs2_quota_hold(struct gfs2_inode *ip, u32 uid, u32 
gid)
if (sdp-sd_args.ar_quota == GFS2_QUOTA_OFF)
return 0;
 
-   error = qdsb_get(sdp, QUOTA_USER, ip-i_inode.i_uid, CREATE, qd);
+   error = qdsb_get(sdp, QUOTA_USER, ip-i_inode.i_uid, qd);
if (error)
goto out;
al-al_qd_num++;
qd++;
 
-   error = qdsb_get(sdp, QUOTA_GROUP, ip-i_inode.i_gid, CREATE, qd);
+   error = qdsb_get(sdp, QUOTA_GROUP, ip-i_inode.i_gid, qd);
if (error)
goto out;
al-al_qd_num++;
qd++;
 
if (uid != NO_QUOTA_CHANGE  uid != ip-i_inode.i_uid) {
-   error = qdsb_get(sdp, QUOTA_USER, uid, CREATE, qd);
+   error = qdsb_get(sdp, QUOTA_USER, uid, qd);
if (error)
goto out;
al-al_qd_num++;
@@ -530,7 +530,7 @@ int gfs2_quota_hold(struct gfs2_inode *ip, u32 uid, u32 gid)
}
 
if (gid != NO_QUOTA_CHANGE  gid != ip-i_inode.i_gid) {
-   error = qdsb_get(sdp, QUOTA_GROUP, gid, CREATE, qd);
+   error = qdsb_get(sdp, QUOTA_GROUP, gid, qd);
if (error)
goto out;
al-al_qd_num++;
-- 
1.6.2.5



[Cluster-devel] [PATCH 16/30] GFS2: Remove constant argument from qd_get()

2009-11-25 Thread Steven Whitehouse
This function was only ever called with the create
argument set to true, so we can remove it.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/quota.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index f790f5a..db124af 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -165,7 +165,7 @@ fail:
return error;
 }
 
-static int qd_get(struct gfs2_sbd *sdp, int user, u32 id, int create,
+static int qd_get(struct gfs2_sbd *sdp, int user, u32 id,
  struct gfs2_quota_data **qdp)
 {
struct gfs2_quota_data *qd = NULL, *new_qd = NULL;
@@ -203,7 +203,7 @@ static int qd_get(struct gfs2_sbd *sdp, int user, u32 id, 
int create,
 
spin_unlock(qd_lru_lock);
 
-   if (qd || !create) {
+   if (qd) {
if (new_qd) {
gfs2_glock_put(new_qd-qd_gl);
kmem_cache_free(gfs2_quotad_cachep, new_qd);
@@ -467,7 +467,7 @@ static int qdsb_get(struct gfs2_sbd *sdp, int user, u32 id,
 {
int error;
 
-   error = qd_get(sdp, user, id, CREATE, qdp);
+   error = qd_get(sdp, user, id, qdp);
if (error)
return error;
 
@@ -1117,7 +1117,7 @@ int gfs2_quota_refresh(struct gfs2_sbd *sdp, int user, 
u32 id)
struct gfs2_holder q_gh;
int error;
 
-   error = qd_get(sdp, user, id, CREATE, qd);
+   error = qd_get(sdp, user, id, qd);
if (error)
return error;
 
-- 
1.6.2.5



[Cluster-devel] [PATCH 17/30] GFS2: Clean up gfs2_adjust_quota() and do_glock()

2009-11-25 Thread Steven Whitehouse
Both of these functions contained confusing and in one case
duplicate code. This patch adds a new check in do_glock()
so that we report -ENOENT if we are asked to sync a quota
entry which doesn't exist. Due to the previous patch this is
now reported correctly to userspace.

Also there are a few new comments, and I hope that the code
is easier to understand now.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/quota.c |   82 +-
 1 files changed, 26 insertions(+), 56 deletions(-)

diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index db124af..33e369f 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -15,7 +15,7 @@
  * fuzziness in the current usage value of IDs that are being used on different
  * nodes in the cluster simultaneously.  So, it is possible for a user on
  * multiple nodes to overrun their quota, but that overrun is controlable.
- * Since quota tags are part of transactions, there is no need to a quota check
+ * Since quota tags are part of transactions, there is no need for a quota 
check
  * program to be run on node crashes or anything like that.
  *
  * There are couple of knobs that let the administrator manage the quota
@@ -66,13 +66,6 @@
 #define QUOTA_USER 1
 #define QUOTA_GROUP 0
 
-struct gfs2_quota_host {
-   u64 qu_limit;
-   u64 qu_warn;
-   s64 qu_value;
-   u32 qu_ll_next;
-};
-
 struct gfs2_quota_change_host {
u64 qc_change;
u32 qc_flags; /* GFS2_QCF_... */
@@ -618,33 +611,19 @@ static void do_qc(struct gfs2_quota_data *qd, s64 change)
mutex_unlock(sdp-sd_quota_mutex);
 }
 
-static void gfs2_quota_in(struct gfs2_quota_host *qu, const void *buf)
-{
-   const struct gfs2_quota *str = buf;
-
-   qu-qu_limit = be64_to_cpu(str-qu_limit);
-   qu-qu_warn = be64_to_cpu(str-qu_warn);
-   qu-qu_value = be64_to_cpu(str-qu_value);
-   qu-qu_ll_next = be32_to_cpu(str-qu_ll_next);
-}
-
-static void gfs2_quota_out(const struct gfs2_quota_host *qu, void *buf)
-{
-   struct gfs2_quota *str = buf;
-
-   str-qu_limit = cpu_to_be64(qu-qu_limit);
-   str-qu_warn = cpu_to_be64(qu-qu_warn);
-   str-qu_value = cpu_to_be64(qu-qu_value);
-   str-qu_ll_next = cpu_to_be32(qu-qu_ll_next);
-   memset(str-qu_reserved, 0, sizeof(str-qu_reserved));
-}
-
 /**
- * gfs2_adjust_quota
+ * gfs2_adjust_quota - adjust record of current block usage
+ * @ip: The quota inode
+ * @loc: Offset of the entry in the quota file
+ * @change: The amount of change to record
+ * @qd: The quota data
  *
  * This function was mostly borrowed from gfs2_block_truncate_page which was
  * in turn mostly borrowed from ext3
+ *
+ * Returns: 0 or -ve on error
  */
+
 static int gfs2_adjust_quota(struct gfs2_inode *ip, loff_t loc,
 s64 change, struct gfs2_quota_data *qd)
 {
@@ -656,8 +635,7 @@ static int gfs2_adjust_quota(struct gfs2_inode *ip, loff_t 
loc,
struct buffer_head *bh;
struct page *page;
void *kaddr;
-   char *ptr;
-   struct gfs2_quota_host qp;
+   struct gfs2_quota *qp;
s64 value;
int err = -EIO;
 
@@ -701,18 +679,13 @@ static int gfs2_adjust_quota(struct gfs2_inode *ip, 
loff_t loc,
gfs2_trans_add_bh(ip-i_gl, bh, 0);
 
kaddr = kmap_atomic(page, KM_USER0);
-   ptr = kaddr + offset;
-   gfs2_quota_in(qp, ptr);
-   qp.qu_value += change;
-   value = qp.qu_value;
-   gfs2_quota_out(qp, ptr);
+   qp = kaddr + offset;
+   value = (s64)be64_to_cpu(qp-qu_value) + change;
+   qp-qu_value = cpu_to_be64(value);
+   qd-qd_qb.qb_value = qp-qu_value;
flush_dcache_page(page);
kunmap_atomic(kaddr, KM_USER0);
err = 0;
-   qd-qd_qb.qb_magic = cpu_to_be32(GFS2_MAGIC);
-   qd-qd_qb.qb_value = cpu_to_be64(value);
-   ((struct gfs2_quota_lvb*)(qd-qd_gl-gl_lvb))-qb_magic = 
cpu_to_be32(GFS2_MAGIC);
-   ((struct gfs2_quota_lvb*)(qd-qd_gl-gl_lvb))-qb_value = 
cpu_to_be64(value);
 unlock:
unlock_page(page);
page_cache_release(page);
@@ -741,8 +714,7 @@ static int do_sync(unsigned int num_qd, struct 
gfs2_quota_data **qda)
 
sort(qda, num_qd, sizeof(struct gfs2_quota_data *), sort_qd, NULL);
for (qx = 0; qx  num_qd; qx++) {
-   error = gfs2_glock_nq_init(qda[qx]-qd_gl,
-  LM_ST_EXCLUSIVE,
+   error = gfs2_glock_nq_init(qda[qx]-qd_gl, LM_ST_EXCLUSIVE,
   GL_NOCACHE, ghs[qx]);
if (error)
goto out;
@@ -797,8 +769,7 @@ static int do_sync(unsigned int num_qd, struct 
gfs2_quota_data **qda)
qd = qda[x];
offset = qd2offset(qd);
error = gfs2_adjust_quota(ip, offset, qd-qd_change_sync,
- (struct gfs2_quota_data *)
- qd

[Cluster-devel] [PATCH 18/30] GFS2: Add get_xquota support

2009-11-25 Thread Steven Whitehouse
This adds support for viewing the current GFS2 quota settings
via the XFS quota API. The setting of quotas will be addressed
in a later patch. Fields which are not supported here are left
set to zero.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
Reviewed-by: Bob Peterson rpete...@redhat.com
---
 fs/gfs2/quota.c |   43 +++
 1 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 33e369f..6c5d6aa 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1367,8 +1367,51 @@ static int gfs2_quota_get_xstate(struct super_block *sb,
return 0;
 }
 
+static int gfs2_xquota_get(struct super_block *sb, int type, qid_t id,
+  struct fs_disk_quota *fdq)
+{
+   struct gfs2_sbd *sdp = sb-s_fs_info;
+   struct gfs2_quota_lvb *qlvb;
+   struct gfs2_quota_data *qd;
+   struct gfs2_holder q_gh;
+   int error;
+
+   memset(fdq, 0, sizeof(struct fs_disk_quota));
+
+   if (sdp-sd_args.ar_quota == GFS2_QUOTA_OFF)
+   return -ESRCH; /* Crazy XFS error code */
+
+   if (type == USRQUOTA)
+   type = QUOTA_USER;
+   else if (type == GRPQUOTA)
+   type = QUOTA_GROUP;
+   else
+   return -EINVAL;
+
+   error = qd_get(sdp, type, id, qd);
+   if (error)
+   return error;
+   error = do_glock(qd, FORCE, q_gh);
+   if (error)
+   goto out;
+
+   qlvb = (struct gfs2_quota_lvb *)qd-qd_gl-gl_lvb;
+   fdq-d_version = FS_DQUOT_VERSION;
+   fdq-d_flags = (type == QUOTA_USER) ? XFS_USER_QUOTA : XFS_GROUP_QUOTA;
+   fdq-d_id = id;
+   fdq-d_blk_hardlimit = be64_to_cpu(qlvb-qb_limit);
+   fdq-d_blk_softlimit = be64_to_cpu(qlvb-qb_warn);
+   fdq-d_bcount = be64_to_cpu(qlvb-qb_value);
+
+   gfs2_glock_dq_uninit(q_gh);
+out:
+   qd_put(qd);
+   return error;
+}
+
 const struct quotactl_ops gfs2_quotactl_ops = {
.quota_sync = gfs2_quota_sync,
.get_xstate = gfs2_quota_get_xstate,
+   .get_xquota = gfs2_xquota_get,
 };
 
-- 
1.6.2.5



[Cluster-devel] [PATCH 19/30] GFS2: Add set_xquota support

2009-11-25 Thread Steven Whitehouse
This patch adds the ability to set GFS2 quota limit and
warning levels via the XFS quota API.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/quota.c |  198 +++---
 1 files changed, 172 insertions(+), 26 deletions(-)

diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 6c5d6aa..e8db534 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -615,8 +615,9 @@ static void do_qc(struct gfs2_quota_data *qd, s64 change)
  * gfs2_adjust_quota - adjust record of current block usage
  * @ip: The quota inode
  * @loc: Offset of the entry in the quota file
- * @change: The amount of change to record
+ * @change: The amount of usage change to record
  * @qd: The quota data
+ * @fdq: The updated limits to record
  *
  * This function was mostly borrowed from gfs2_block_truncate_page which was
  * in turn mostly borrowed from ext3
@@ -625,19 +626,21 @@ static void do_qc(struct gfs2_quota_data *qd, s64 change)
  */
 
 static int gfs2_adjust_quota(struct gfs2_inode *ip, loff_t loc,
-s64 change, struct gfs2_quota_data *qd)
+s64 change, struct gfs2_quota_data *qd,
+struct fs_disk_quota *fdq)
 {
struct inode *inode = ip-i_inode;
struct address_space *mapping = inode-i_mapping;
unsigned long index = loc  PAGE_CACHE_SHIFT;
unsigned offset = loc  (PAGE_CACHE_SIZE - 1);
unsigned blocksize, iblock, pos;
-   struct buffer_head *bh;
+   struct buffer_head *bh, *dibh;
struct page *page;
void *kaddr;
struct gfs2_quota *qp;
s64 value;
int err = -EIO;
+   u64 size;
 
if (gfs2_is_stuffed(ip))
gfs2_unstuff_dinode(ip, NULL);
@@ -683,9 +686,34 @@ static int gfs2_adjust_quota(struct gfs2_inode *ip, loff_t 
loc,
value = (s64)be64_to_cpu(qp-qu_value) + change;
qp-qu_value = cpu_to_be64(value);
qd-qd_qb.qb_value = qp-qu_value;
+   if (fdq) {
+   if (fdq-d_fieldmask  FS_DQ_BSOFT) {
+   qp-qu_warn = cpu_to_be64(fdq-d_blk_softlimit);
+   qd-qd_qb.qb_warn = qp-qu_warn;
+   }
+   if (fdq-d_fieldmask  FS_DQ_BHARD) {
+   qp-qu_limit = cpu_to_be64(fdq-d_blk_hardlimit);
+   qd-qd_qb.qb_limit = qp-qu_limit;
+   }
+   }
flush_dcache_page(page);
kunmap_atomic(kaddr, KM_USER0);
-   err = 0;
+
+   err = gfs2_meta_inode_buffer(ip, dibh);
+   if (err)
+   goto unlock;
+
+   size = loc + sizeof(struct gfs2_quota);
+   if (size  inode-i_size) {
+   ip-i_disksize = size;
+   i_size_write(inode, size);
+   }
+   inode-i_mtime = inode-i_atime = CURRENT_TIME;
+   gfs2_trans_add_bh(ip-i_gl, dibh, 1);
+   gfs2_dinode_out(ip, dibh-b_data);
+   brelse(dibh);
+   mark_inode_dirty(inode);
+
 unlock:
unlock_page(page);
page_cache_release(page);
@@ -713,6 +741,7 @@ static int do_sync(unsigned int num_qd, struct 
gfs2_quota_data **qda)
return -ENOMEM;
 
sort(qda, num_qd, sizeof(struct gfs2_quota_data *), sort_qd, NULL);
+   mutex_lock_nested(ip-i_inode.i_mutex, I_MUTEX_QUOTA);
for (qx = 0; qx  num_qd; qx++) {
error = gfs2_glock_nq_init(qda[qx]-qd_gl, LM_ST_EXCLUSIVE,
   GL_NOCACHE, ghs[qx]);
@@ -768,8 +797,7 @@ static int do_sync(unsigned int num_qd, struct 
gfs2_quota_data **qda)
for (x = 0; x  num_qd; x++) {
qd = qda[x];
offset = qd2offset(qd);
-   error = gfs2_adjust_quota(ip, offset, qd-qd_change_sync,
- (struct gfs2_quota_data *)qd);
+   error = gfs2_adjust_quota(ip, offset, qd-qd_change_sync, qd, 
NULL);
if (error)
goto out_end_trans;
 
@@ -789,20 +817,44 @@ out_gunlock:
 out:
while (qx--)
gfs2_glock_dq_uninit(ghs[qx]);
+   mutex_unlock(ip-i_inode.i_mutex);
kfree(ghs);
gfs2_log_flush(ip-i_gl-gl_sbd, ip-i_gl);
return error;
 }
 
+static int update_qd(struct gfs2_sbd *sdp, struct gfs2_quota_data *qd)
+{
+   struct gfs2_inode *ip = GFS2_I(sdp-sd_quota_inode);
+   struct gfs2_quota q;
+   struct gfs2_quota_lvb *qlvb;
+   loff_t pos;
+   int error;
+
+   memset(q, 0, sizeof(struct gfs2_quota));
+   pos = qd2offset(qd);
+   error = gfs2_internal_read(ip, NULL, (char *)q, pos, sizeof(q));
+   if (error  0)
+   return error;
+
+   qlvb = (struct gfs2_quota_lvb *)qd-qd_gl-gl_lvb;
+   qlvb-qb_magic = cpu_to_be32(GFS2_MAGIC);
+   qlvb-__pad = 0;
+   qlvb-qb_limit = q.qu_limit;
+   qlvb-qb_warn = q.qu_warn;
+   qlvb-qb_value = q.qu_value;
+   qd-qd_qb = *qlvb;
+
+   return 0;
+}
+
 static

[Cluster-devel] [PATCH 20/30] VFS: Export dquot_send_warning

2009-11-25 Thread Steven Whitehouse
Sending a message to userspace in a generic format to warn
of events (e.g. quota exceeded) in the quota subsystem is
a generically useful feature. This patch makes some minor
changes to the send_message function from dquot.c renaming
it quota_send_message, moving it to quota.c and exporting it
for use by filesystems which do not use the dquot code.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/quota/Kconfig  |2 +-
 fs/quota/dquot.c  |   93 +
 fs/quota/quota.c  |   93 +
 include/linux/quota.h |   11 ++
 4 files changed, 114 insertions(+), 85 deletions(-)

diff --git a/fs/quota/Kconfig b/fs/quota/Kconfig
index 8047e01..353e78a 100644
--- a/fs/quota/Kconfig
+++ b/fs/quota/Kconfig
@@ -17,7 +17,7 @@ config QUOTA
 
 config QUOTA_NETLINK_INTERFACE
bool Report quota messages through netlink interface
-   depends on QUOTA  NET
+   depends on QUOTACTL  NET
help
  If you say Y here, quota warnings (about exceeding softlimit, reaching
  hardlimit, etc.) will be reported through netlink interface. If 
unsure,
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 39b49c4..9b6ad90 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -77,10 +77,6 @@
 #include linux/capability.h
 #include linux/quotaops.h
 #include linux/writeback.h /* for inode_lock, oddly enough.. */
-#ifdef CONFIG_QUOTA_NETLINK_INTERFACE
-#include net/netlink.h
-#include net/genetlink.h
-#endif
 
 #include asm/uaccess.h
 
@@ -1071,73 +1067,6 @@ static void print_warning(struct dquot *dquot, const int 
warntype)
 }
 #endif
 
-#ifdef CONFIG_QUOTA_NETLINK_INTERFACE
-
-/* Netlink family structure for quota */
-static struct genl_family quota_genl_family = {
-   .id = GENL_ID_GENERATE,
-   .hdrsize = 0,
-   .name = VFS_DQUOT,
-   .version = 1,
-   .maxattr = QUOTA_NL_A_MAX,
-};
-
-/* Send warning to userspace about user which exceeded quota */
-static void send_warning(const struct dquot *dquot, const char warntype)
-{
-   static atomic_t seq;
-   struct sk_buff *skb;
-   void *msg_head;
-   int ret;
-   int msg_size = 4 * nla_total_size(sizeof(u32)) +
-  2 * nla_total_size(sizeof(u64));
-
-   /* We have to allocate using GFP_NOFS as we are called from a
-* filesystem performing write and thus further recursion into
-* the fs to free some data could cause deadlocks. */
-   skb = genlmsg_new(msg_size, GFP_NOFS);
-   if (!skb) {
-   printk(KERN_ERR
- VFS: Not enough memory to send quota warning.\n);
-   return;
-   }
-   msg_head = genlmsg_put(skb, 0, atomic_add_return(1, seq),
-   quota_genl_family, 0, QUOTA_NL_C_WARNING);
-   if (!msg_head) {
-   printk(KERN_ERR
- VFS: Cannot store netlink header in quota warning.\n);
-   goto err_out;
-   }
-   ret = nla_put_u32(skb, QUOTA_NL_A_QTYPE, dquot-dq_type);
-   if (ret)
-   goto attr_err_out;
-   ret = nla_put_u64(skb, QUOTA_NL_A_EXCESS_ID, dquot-dq_id);
-   if (ret)
-   goto attr_err_out;
-   ret = nla_put_u32(skb, QUOTA_NL_A_WARNING, warntype);
-   if (ret)
-   goto attr_err_out;
-   ret = nla_put_u32(skb, QUOTA_NL_A_DEV_MAJOR,
-   MAJOR(dquot-dq_sb-s_dev));
-   if (ret)
-   goto attr_err_out;
-   ret = nla_put_u32(skb, QUOTA_NL_A_DEV_MINOR,
-   MINOR(dquot-dq_sb-s_dev));
-   if (ret)
-   goto attr_err_out;
-   ret = nla_put_u64(skb, QUOTA_NL_A_CAUSED_ID, current_uid());
-   if (ret)
-   goto attr_err_out;
-   genlmsg_end(skb, msg_head);
-
-   genlmsg_multicast(skb, 0, quota_genl_family.id, GFP_NOFS);
-   return;
-attr_err_out:
-   printk(KERN_ERR VFS: Not enough space to compose quota message!\n);
-err_out:
-   kfree_skb(skb);
-}
-#endif
 /*
  * Write warnings to the console and send warning messages over netlink.
  *
@@ -1145,18 +1074,20 @@ err_out:
  */
 static void flush_warnings(struct dquot *const *dquots, char *warntype)
 {
+   struct dquot *dq;
int i;
 
-   for (i = 0; i  MAXQUOTAS; i++)
-   if (dquots[i]  warntype[i] != QUOTA_NL_NOWARN 
-   !warning_issued(dquots[i], warntype[i])) {
+   for (i = 0; i  MAXQUOTAS; i++) {
+   dq = dquots[i];
+   if (dq  warntype[i] != QUOTA_NL_NOWARN 
+   !warning_issued(dq, warntype[i])) {
 #ifdef CONFIG_PRINT_QUOTA_WARNING
-   print_warning(dquots[i], warntype[i]);
-#endif
-#ifdef CONFIG_QUOTA_NETLINK_INTERFACE
-   send_warning(dquots[i], warntype[i]);
+   print_warning(dq, warntype[i]);
 #endif
+   quota_send_warning(dq-dq_type, dq-dq_id

[Cluster-devel] [PATCH 21/30] GFS2: Use dquot_send_warning()

2009-11-25 Thread Steven Whitehouse
This adds support to GFS2 to send quota warnings via netlink.
Also it removes a stray \r which was left over from when the
code used to print warnings on the console.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/quota.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index e8db534..1d4fc04 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -47,6 +47,7 @@
 #include linux/gfs2_ondisk.h
 #include linux/kthread.h
 #include linux/freezer.h
+#include linux/quota.h
 #include linux/dqblk_xfs.h
 
 #include gfs2.h
@@ -1001,7 +1002,7 @@ static int print_message(struct gfs2_quota_data *qd, char 
*type)
 {
struct gfs2_sbd *sdp = qd-qd_gl-gl_sbd;
 
-   printk(KERN_INFO GFS2: fsid=%s: quota %s for %s %u\r\n,
+   printk(KERN_INFO GFS2: fsid=%s: quota %s for %s %u\n,
   sdp-sd_fsname, type,
   (test_bit(QDF_USER, qd-qd_flags)) ? user : group,
   qd-qd_id);
@@ -1038,6 +1039,10 @@ int gfs2_quota_check(struct gfs2_inode *ip, u32 uid, u32 
gid)
 
if (be64_to_cpu(qd-qd_qb.qb_limit)  
(s64)be64_to_cpu(qd-qd_qb.qb_limit)  value) {
print_message(qd, exceeded);
+   quota_send_warning(test_bit(QDF_USER, qd-qd_flags) ?
+  USRQUOTA : GRPQUOTA, qd-qd_id,
+  sdp-sd_vfs-s_dev, 
QUOTA_NL_BHARDWARN);
+
error = -EDQUOT;
break;
} else if (be64_to_cpu(qd-qd_qb.qb_warn) 
@@ -1045,6 +1050,9 @@ int gfs2_quota_check(struct gfs2_inode *ip, u32 uid, u32 
gid)
   time_after_eq(jiffies, qd-qd_last_warn +
 gfs2_tune_get(sdp,
gt_quota_warn_period) * HZ)) {
+   quota_send_warning(test_bit(QDF_USER, qd-qd_flags) ?
+  USRQUOTA : GRPQUOTA, qd-qd_id,
+  sdp-sd_vfs-s_dev, 
QUOTA_NL_BSOFTWARN);
error = print_message(qd, warning);
qd-qd_last_warn = jiffies;
}
-- 
1.6.2.5



[Cluster-devel] [PATCH 22/30] GFS2: Improve statfs and quota usability

2009-11-25 Thread Steven Whitehouse
From: Benjamin Marzinski bmarz...@redhat.com

GFS2 now has three new mount options, statfs_quantum, quota_quantum and
statfs_percent.  statfs_quantum and quota_quantum simply allow you to
set the tunables of the same name.  Setting setting statfs_quantum to 0
will also turn on the statfs_slow tunable.  statfs_percent accepts an
integer between 0 and 100.  Numbers between 1 and 100 will cause GFS2 to
do any early sync when the local number of blocks free changes by at
least statfs_percent from the totoal number of blocks free.  Setting
statfs_percent to 0 disables this.

Signed-off-by: Benjamin Marzinski bmarz...@redhat.com
Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/incore.h |4 +++
 fs/gfs2/ops_fstype.c |   14 --
 fs/gfs2/quota.c  |   21 +--
 fs/gfs2/quota.h  |2 +
 fs/gfs2/super.c  |   69 +++---
 5 files changed, 100 insertions(+), 10 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 6edb423..c239b0f 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -430,6 +430,9 @@ struct gfs2_args {
unsigned int ar_discard:1;  /* discard requests */
unsigned int ar_errors:2;   /* errors=withdraw | panic */
int ar_commit;  /* Commit interval */
+   int ar_statfs_quantum;  /* The fast statfs interval */
+   int ar_quota_quantum;   /* The quota interval */
+   int ar_statfs_percent;  /* The % change to force sync */
 };
 
 struct gfs2_tune {
@@ -558,6 +561,7 @@ struct gfs2_sbd {
spinlock_t sd_statfs_spin;
struct gfs2_statfs_change_host sd_statfs_master;
struct gfs2_statfs_change_host sd_statfs_local;
+   int sd_statfs_force_sync;
 
/* Resource group stuff */
 
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 36b11cb..9744ee9 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -63,13 +63,10 @@ static void gfs2_tune_init(struct gfs2_tune *gt)
gt-gt_quota_warn_period = 10;
gt-gt_quota_scale_num = 1;
gt-gt_quota_scale_den = 1;
-   gt-gt_quota_quantum = 60;
gt-gt_new_files_jdata = 0;
gt-gt_max_readahead = 1  18;
gt-gt_stall_secs = 600;
gt-gt_complain_secs = 10;
-   gt-gt_statfs_quantum = 30;
-   gt-gt_statfs_slow = 0;
 }
 
 static struct gfs2_sbd *init_sbd(struct super_block *sb)
@@ -1153,6 +1150,15 @@ static int fill_super(struct super_block *sb, struct 
gfs2_args *args, int silent
sdp-sd_fsb2bb = 1  sdp-sd_fsb2bb_shift;
 
sdp-sd_tune.gt_log_flush_secs = sdp-sd_args.ar_commit;
+   sdp-sd_tune.gt_quota_quantum = sdp-sd_args.ar_quota_quantum;
+   if (sdp-sd_args.ar_statfs_quantum) {
+   sdp-sd_tune.gt_statfs_slow = 0;
+   sdp-sd_tune.gt_statfs_quantum = sdp-sd_args.ar_statfs_quantum;
+   }
+   else {
+   sdp-sd_tune.gt_statfs_slow = 1;
+   sdp-sd_tune.gt_statfs_quantum = 30;
+   }
 
error = init_names(sdp, silent);
if (error)
@@ -1308,6 +1314,8 @@ static int gfs2_get_sb(struct file_system_type *fs_type, 
int flags,
args.ar_quota = GFS2_QUOTA_DEFAULT;
args.ar_data = GFS2_DATA_DEFAULT;
args.ar_commit = 60;
+   args.ar_statfs_quantum = 30;
+   args.ar_quota_quantum = 60;
args.ar_errors = GFS2_ERRORS_DEFAULT;
 
error = gfs2_mount_args(args, data);
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 1d4fc04..e3bf6ea 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1344,6 +1344,14 @@ static void quotad_check_trunc_list(struct gfs2_sbd *sdp)
}
 }
 
+void gfs2_wake_up_statfs(struct gfs2_sbd *sdp) {
+   if (!sdp-sd_statfs_force_sync) {
+   sdp-sd_statfs_force_sync = 1;
+   wake_up(sdp-sd_quota_wait);
+   }
+}
+
+
 /**
  * gfs2_quotad - Write cached quota changes into the quota file
  * @sdp: Pointer to GFS2 superblock
@@ -1363,8 +1371,15 @@ int gfs2_quotad(void *data)
while (!kthread_should_stop()) {
 
/* Update the master statfs file */
-   quotad_check_timeo(sdp, statfs, gfs2_statfs_sync, t,
-  statfs_timeo, tune-gt_statfs_quantum);
+   if (sdp-sd_statfs_force_sync) {
+   int error = gfs2_statfs_sync(sdp-sd_vfs, 0);
+   quotad_error(sdp, statfs, error);
+   statfs_timeo = gfs2_tune_get(sdp, gt_statfs_quantum) * 
HZ;
+   }
+   else
+   quotad_check_timeo(sdp, statfs, gfs2_statfs_sync, t,
+  statfs_timeo,
+  tune-gt_statfs_quantum);
 
/* Update quota file */
quotad_check_timeo(sdp, sync, gfs2_quota_sync, t,
@@ -1381,7 +1396,7 @@ int gfs2_quotad(void *data

[Cluster-devel] [PATCH 23/30] GFS2: remove division from new statfs code

2009-11-25 Thread Steven Whitehouse
From: Benjamin Marzinski bmarz...@redhat.com

It's not necessary to do any 64bit division for the statfs sync code, so
remove it.

Signed-off-by: Benjamin Marzinski bmarz...@redhat.com
Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/super.c |   17 +
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 3fee2fd..b1dcfab 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -472,7 +472,8 @@ void gfs2_statfs_change(struct gfs2_sbd *sdp, s64 total, 
s64 free,
struct gfs2_statfs_change_host *l_sc = sdp-sd_statfs_local;
struct gfs2_statfs_change_host *m_sc = sdp-sd_statfs_master;
struct buffer_head *l_bh;
-   int percent, sync_percent;
+   s64 x, y;
+   int need_sync = 0;
int error;
 
error = gfs2_meta_inode_buffer(l_ip, l_bh);
@@ -486,16 +487,16 @@ void gfs2_statfs_change(struct gfs2_sbd *sdp, s64 total, 
s64 free,
l_sc-sc_free += free;
l_sc-sc_dinodes += dinodes;
gfs2_statfs_change_out(l_sc, l_bh-b_data + sizeof(struct gfs2_dinode));
-   if (m_sc-sc_free)
-   percent = (100 * l_sc-sc_free) / m_sc-sc_free;
-   else
-   percent = 100;
+   if (sdp-sd_args.ar_statfs_percent) {
+   x = 100 * l_sc-sc_free;
+   y = m_sc-sc_free * sdp-sd_args.ar_statfs_percent;
+   if (x = y || x = -y)
+   need_sync = 1;
+   }
spin_unlock(sdp-sd_statfs_spin);
 
brelse(l_bh);
-   sync_percent = sdp-sd_args.ar_statfs_percent;
-   if (sync_percent  (percent = sync_percent ||
-percent = -sync_percent))
+   if (need_sync)
gfs2_wake_up_statfs(sdp);
 }
 
-- 
1.6.2.5



[Cluster-devel] [PATCH 24/30] GFS2: add barrier/nobarrier mount options

2009-11-25 Thread Steven Whitehouse
From: Christoph Hellwig h...@lst.de

Currently gfs2 issues barrier unconditionally.  There are various reasons
to disable them, be that just for testing or for stupid devices flushing
large battert backed caches.  Add a nobarrier option that matches xfs and
btrfs for this.  Also add a symmetric barrier option to turn it back on
at remount time.

Signed-off-by: Christoph Hellwig h...@lst.de
Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/incore.h |1 +
 fs/gfs2/ops_fstype.c |2 ++
 fs/gfs2/super.c  |   14 ++
 3 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index c239b0f..4792200 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -429,6 +429,7 @@ struct gfs2_args {
unsigned int ar_meta:1; /* mount metafs */
unsigned int ar_discard:1;  /* discard requests */
unsigned int ar_errors:2;   /* errors=withdraw | panic */
+   unsigned int ar_nobarrier:1;/* do not send barriers */
int ar_commit;  /* Commit interval */
int ar_statfs_quantum;  /* The fast statfs interval */
int ar_quota_quantum;   /* The quota interval */
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 9744ee9..edfee24 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1131,6 +1131,8 @@ static int fill_super(struct super_block *sb, struct 
gfs2_args *args, int silent
}
if (sdp-sd_args.ar_posix_acl)
sb-s_flags |= MS_POSIXACL;
+   if (sdp-sd_args.ar_nobarrier)
+   set_bit(SDF_NOBARRIERS, sdp-sd_flags);
 
sb-s_magic = GFS2_MAGIC;
sb-s_op = gfs2_super_ops;
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index b1dcfab..5e4b314 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -73,6 +73,8 @@ enum {
Opt_statfs_quantum,
Opt_statfs_percent,
Opt_quota_quantum,
+   Opt_barrier,
+   Opt_nobarrier,
Opt_error,
 };
 
@@ -107,6 +109,8 @@ static const match_table_t tokens = {
{Opt_statfs_quantum, statfs_quantum=%d},
{Opt_statfs_percent, statfs_percent=%d},
{Opt_quota_quantum, quota_quantum=%d},
+   {Opt_barrier, barrier},
+   {Opt_nobarrier, nobarrier},
{Opt_error, NULL}
 };
 
@@ -253,6 +257,12 @@ int gfs2_mount_args(struct gfs2_args *args, char *options)
}
args-ar_errors = GFS2_ERRORS_PANIC;
break;
+   case Opt_barrier:
+   args-ar_nobarrier = 0;
+   break;
+   case Opt_nobarrier:
+   args-ar_nobarrier = 1;
+   break;
case Opt_error:
default:
printk(KERN_WARNING GFS2: invalid mount option: %s\n, 
o);
@@ -1143,6 +1153,10 @@ static int gfs2_remount_fs(struct super_block *sb, int 
*flags, char *data)
sb-s_flags |= MS_POSIXACL;
else
sb-s_flags = ~MS_POSIXACL;
+   if (sdp-sd_args.ar_nobarrier)
+   set_bit(SDF_NOBARRIERS, sdp-sd_flags);
+   else
+   clear_bit(SDF_NOBARRIERS, sdp-sd_flags);
spin_lock(gt-gt_spin);
gt-gt_log_flush_secs = args.ar_commit;
gt-gt_quota_quantum = args.ar_quota_quantum;
-- 
1.6.2.5



[Cluster-devel] [PATCH 25/30] GFS2: Display nobarrier option in /proc/mounts

2009-11-25 Thread Steven Whitehouse
Since the default is barriers on, this only displays the
nobarrier option when that is active.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/super.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 5e4b314..c282ad4 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1336,6 +1336,9 @@ static int gfs2_show_options(struct seq_file *s, struct 
vfsmount *mnt)
}
seq_printf(s, ,errors=%s, state);
}
+   if (test_bit(SDF_NOBARRIERS, sdp-sd_flags))
+   seq_printf(s, ,nobarrier);
+
return 0;
 }
 
-- 
1.6.2.5



[Cluster-devel] GFS2 git tree

2009-12-07 Thread Steven Whitehouse
Hi,

Linus has pulled the -nmw tree. I'll leave it a few days before I start
adding patches into the tree to avoid any confusion with those still
trying to merge from linux-next, but don't let that delay you in posting
any patches. If they are fixes, then they can go into the -fixes tree
right away (currently empty)

Steve.




[Cluster-devel] GFS2: Metadata address space clean up

2009-12-07 Thread Steven Whitehouse

This is a heads up on a patch I'm working on to clean up the
metadata address space which is used in GFS2. This is a preliminary
version which passes a few basic tests. I'll probably make a few
more changes before the final version.

Since the start of GFS2, an extra inode has been used to store
the metadata belonging to each inode. The only reason for using
this inode was to have an extra address space, the other fields
were unused. This means that the memory usage was rather inefficient.

The reason for keeping each inode's metadata in a separate address
space is that when glocks are requested on remote nodes, we need to
be able to efficiently locate the data and metadata which relating
to that glock (inode) in order to sync or sync and invalidate it
(depending on the remotely requested lock mode).

This patch adds a new type of glock, which has in addition to
its normal fields, has an address space. This applies to all
inode and rgrp glocks (but to no other glock types which remain
as before). As a result, we no longer need to have the second
inode.

This results in three major improvements:
 1. A saving of approx 25% of memory used in caching inodes
 2. A removal of the circular dependency between inodes and glocks
 3. No confusion between normal and metadata inodes in super.c

Although the first of these is the more immediately apparent, the
second is just as important as it now enables a number of clean
ups at umount time. Those will be the subject of future patches.

Steve.



diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 7b8da94..16c8214 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -1061,11 +1061,17 @@ out:
 
 int gfs2_releasepage(struct page *page, gfp_t gfp_mask)
 {
-   struct inode *aspace = page-mapping-host;
-   struct gfs2_sbd *sdp = aspace-i_sb-s_fs_info;
+   struct address_space *mapping = page-mapping;
+   struct inode *inode = mapping-host;
+   struct gfs2_sbd *sdp;
struct buffer_head *bh, *head;
struct gfs2_bufdata *bd;
 
+   if (mapping-a_ops == gfs2_meta_aops)
+   sdp = (((struct gfs2_glock *)mapping) - 1)-gl_sbd;
+   else
+   sdp = inode-i_sb-s_fs_info;
+
if (!page_has_buffers(page))
return 0;
 
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index f455a03..736d05b 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -154,12 +154,14 @@ static unsigned int gl_hash(const struct gfs2_sbd *sdp,
 static void glock_free(struct gfs2_glock *gl)
 {
struct gfs2_sbd *sdp = gl-gl_sbd;
-   struct inode *aspace = gl-gl_aspace;
+   struct address_space *mapping = gfs2_glock2aspace(gl);
+   struct kmem_cache *cachep = gfs2_glock_cachep;
 
-   if (aspace)
-   gfs2_aspace_put(aspace);
+   GLOCK_BUG_ON(gl, mapping  mapping-nrpages);
trace_gfs2_glock_put(gl);
-   sdp-sd_lockstruct.ls_ops-lm_put_lock(gfs2_glock_cachep, gl);
+   if (mapping)
+   cachep = gfs2_glock_aspace_cachep;
+   sdp-sd_lockstruct.ls_ops-lm_put_lock(cachep, gl);
 }
 
 /**
@@ -750,10 +752,11 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
   const struct gfs2_glock_operations *glops, int create,
   struct gfs2_glock **glp)
 {
+   struct super_block *s = sdp-sd_vfs;
struct lm_lockname name = { .ln_number = number, .ln_type = 
glops-go_type };
struct gfs2_glock *gl, *tmp;
unsigned int hash = gl_hash(sdp, name);
-   int error;
+   struct address_space *mapping;
 
read_lock(gl_lock_addr(hash));
gl = search_bucket(hash, sdp, name);
@@ -765,7 +768,10 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
if (!create)
return -ENOENT;
 
-   gl = kmem_cache_alloc(gfs2_glock_cachep, GFP_KERNEL);
+   if (glops-go_flags  GLOF_ASPACE)
+   gl = kmem_cache_alloc(gfs2_glock_aspace_cachep, GFP_KERNEL);
+   else
+   gl = kmem_cache_alloc(gfs2_glock_cachep, GFP_KERNEL);
if (!gl)
return -ENOMEM;
 
@@ -783,18 +789,18 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
gl-gl_tchange = jiffies;
gl-gl_object = NULL;
gl-gl_sbd = sdp;
-   gl-gl_aspace = NULL;
INIT_DELAYED_WORK(gl-gl_work, glock_work_func);
INIT_WORK(gl-gl_delete, delete_work_func);
 
-   /* If this glock protects actual on-disk data or metadata blocks,
-  create a VFS inode to manage the pages/buffers holding them. */
-   if (glops == gfs2_inode_glops || glops == gfs2_rgrp_glops) {
-   gl-gl_aspace = gfs2_aspace_get(sdp);
-   if (!gl-gl_aspace) {
-   error = -ENOMEM;
-   goto fail;
-   }
+   mapping = gfs2_glock2aspace(gl);
+   if (mapping) {
+mapping-a_ops = gfs2_meta_aops;
+   mapping-host = s-s_bdev-bd_inode;
+   mapping-flags = 0;
+   

[Cluster-devel] GFS2: Ensure uptodate inode size when using O_APPEND

2009-12-07 Thread Steven Whitehouse

The VFS reads the inode size during generic_file_aio_write() but
with no locking around it. In order to get the expected result
from O_APPEND opens, this patch updated the inode size before
calling generic_file_aio_write()

There is of course still a race here, in that there is nothing to
prevent another node coming in and extending the file in the
mean time. On the other hand, when used with file locking this
will ensure that the expected results are obtained.

Signed-off-by: Steven Whitehouse swhit...@redhat.com



diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 4eb308a..a6abbae 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -569,6 +569,40 @@ static int gfs2_fsync(struct file *file, struct dentry 
*dentry, int datasync)
return ret;
 }
 
+/**
+ * gfs2_file_aio_write - Perform a write to a file
+ * @iocb: The io context
+ * @iov: The data to write
+ * @nr_segs: Number of @iov segments
+ * @pos: The file position
+ *
+ * We have to do a lock/unlock here to refresh the inode size for
+ * O_APPEND writes, otherwise we can land up writing at the wrong
+ * offset. There is still a race, but provided the app is using its
+ * own file locking, this will make O_APPEND work as expected.
+ *
+ */
+
+static ssize_t gfs2_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
+  unsigned long nr_segs, loff_t pos)
+{
+   struct file *file = iocb-ki_filp;
+
+   if (file-f_flags  O_APPEND) {
+   struct dentry *dentry = file-f_dentry;
+   struct gfs2_inode *ip = GFS2_I(dentry-d_inode);
+   struct gfs2_holder gh;
+   int ret;
+
+   ret = gfs2_glock_nq_init(ip-i_gl, LM_ST_SHARED, 0, gh);
+   if (ret)
+   return ret;
+   gfs2_glock_dq_uninit(gh);
+   }
+
+   return generic_file_aio_write(iocb, iov, nr_segs, pos);
+}
+
 #ifdef CONFIG_GFS2_FS_LOCKING_DLM
 
 /**
@@ -711,7 +745,7 @@ const struct file_operations gfs2_file_fops = {
.read   = do_sync_read,
.aio_read   = generic_file_aio_read,
.write  = do_sync_write,
-   .aio_write  = generic_file_aio_write,
+   .aio_write  = gfs2_file_aio_write,
.unlocked_ioctl = gfs2_ioctl,
.mmap   = gfs2_mmap,
.open   = gfs2_open,
@@ -741,7 +775,7 @@ const struct file_operations gfs2_file_fops_nolock = {
.read   = do_sync_read,
.aio_read   = generic_file_aio_read,
.write  = do_sync_write,
-   .aio_write  = generic_file_aio_write,
+   .aio_write  = gfs2_file_aio_write,
.unlocked_ioctl = gfs2_ioctl,
.mmap   = gfs2_mmap,
.open   = gfs2_open,




[Cluster-devel] GFS2: Remove loopy umount code

2009-12-08 Thread Steven Whitehouse
This is a follow up to the patch I posted yesterday and does
the next bit of the changes.

From 69a14ddaf57449c3f6ecfe96a898df5ded1a4256 Mon Sep 17 00:00:00 2001
From: Steven Whitehouse swhit...@redhat.com
Date: Tue, 8 Dec 2009 15:45:50 +
Subject: GFS2: Remove loopy umount code

As a consequence of the previous patch, we can now remove the
loop which used to be required due to the circular dependency
between the inodes and glocks. Instead we can just invalidate
the inodes, and then clear up any glocks which are left.

Also we no longer need the rwsem since there is no longer any
danger of the inode invalidation calling back into the glock
code (and from there back into the inode code).

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/glock.c  |   35 ---
 fs/gfs2/ops_fstype.c |3 +--
 fs/gfs2/super.c  |1 +
 3 files changed, 6 insertions(+), 33 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 736d05b..6e1e526 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -19,7 +19,6 @@
 #include linux/list.h
 #include linux/wait.h
 #include linux/module.h
-#include linux/rwsem.h
 #include asm/uaccess.h
 #include linux/seq_file.h
 #include linux/debugfs.h
@@ -60,7 +59,6 @@ static int __dump_glock(struct seq_file *seq, const struct 
gfs2_glock *gl);
 #define GLOCK_BUG_ON(gl,x) do { if (unlikely(x)) { __dump_glock(NULL, gl); 
BUG(); } } while(0)
 static void do_xmote(struct gfs2_glock *gl, struct gfs2_holder *gh, unsigned 
int target);
 
-static DECLARE_RWSEM(gfs2_umount_flush_sem);
 static struct dentry *gfs2_root;
 static struct workqueue_struct *glock_workqueue;
 struct workqueue_struct *gfs2_delete_workqueue;
@@ -714,7 +712,6 @@ static void glock_work_func(struct work_struct *work)
finish_xmote(gl, gl-gl_reply);
drop_ref = 1;
}
-   down_read(gfs2_umount_flush_sem);
spin_lock(gl-gl_spin);
if (test_and_clear_bit(GLF_PENDING_DEMOTE, gl-gl_flags) 
gl-gl_state != LM_ST_UNLOCKED 
@@ -727,7 +724,6 @@ static void glock_work_func(struct work_struct *work)
}
run_queue(gl, 0);
spin_unlock(gl-gl_spin);
-   up_read(gfs2_umount_flush_sem);
if (!delay ||
queue_delayed_work(glock_workqueue, gl-gl_work, delay) == 0)
gfs2_glock_put(gl);
@@ -1511,35 +1507,12 @@ void gfs2_glock_thaw(struct gfs2_sbd *sdp)
 
 void gfs2_gl_hash_clear(struct gfs2_sbd *sdp)
 {
-   unsigned long t;
unsigned int x;
-   int cont;
 
-   t = jiffies;
-
-   for (;;) {
-   cont = 0;
-   for (x = 0; x  GFS2_GL_HASH_SIZE; x++) {
-   if (examine_bucket(clear_glock, sdp, x))
-   cont = 1;
-   }
-
-   if (!cont)
-   break;
-
-   if (time_after_eq(jiffies,
- t + gfs2_tune_get(sdp, gt_stall_secs) * HZ)) {
-   fs_warn(sdp, Unmount seems to be stalled. 
-Dumping lock state...\n);
-   gfs2_dump_lockstate(sdp);
-   t = jiffies;
-   }
-
-   down_write(gfs2_umount_flush_sem);
-   invalidate_inodes(sdp-sd_vfs);
-   up_write(gfs2_umount_flush_sem);
-   msleep(10);
-   }
+   for (x = 0; x  GFS2_GL_HASH_SIZE; x++)
+   examine_bucket(clear_glock, sdp, x);
+   flush_workqueue(glock_workqueue);
+   gfs2_dump_lockstate(sdp);
 }
 
 void gfs2_glock_finish_truncate(struct gfs2_inode *ip)
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index edfee24..717222a 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1231,10 +1231,9 @@ fail_sb:
 fail_locking:
init_locking(sdp, mount_gh, UNDO);
 fail_lm:
+   invalidate_inodes(sb);
gfs2_gl_hash_clear(sdp);
gfs2_lm_unmount(sdp);
-   while (invalidate_inodes(sb))
-   yield();
 fail_sys:
gfs2_sys_fs_del(sdp);
 fail:
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 8ddc613..c008b08 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -858,6 +858,7 @@ restart:
gfs2_clear_rgrpd(sdp);
gfs2_jindex_free(sdp);
/*  Take apart glock structures and buffer lists  */
+   invalidate_inodes(sdp-sd_vfs);
gfs2_gl_hash_clear(sdp);
/*  Unmount the locking protocol  */
gfs2_lm_unmount(sdp);
-- 
1.6.2.5





[Cluster-devel] GFS2: Fix locking bug in rename

2009-12-09 Thread Steven Whitehouse
From 07bb4585daae6008fd3ad0f3f081e318a4266d1d Mon Sep 17 00:00:00 2001
From: Steven Whitehouse swhit...@redhat.com
Date: Wed, 9 Dec 2009 13:55:12 +
Subject: GFS2: Fix locking bug in rename

The rename code was taking a resource group lock in cases where
it wasn't actually needed, this caused problems if the rename
was resulting in an inode being unlinked. The patch ensures that
we only take the rgrp lock early if it is really needed.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/ops_inode.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c
index 247436c..78f73ca 100644
--- a/fs/gfs2/ops_inode.c
+++ b/fs/gfs2/ops_inode.c
@@ -748,7 +748,7 @@ static int gfs2_rename(struct inode *odir, struct dentry 
*odentry,
struct gfs2_rgrpd *nrgd;
unsigned int num_gh;
int dir_rename = 0;
-   int alloc_required;
+   int alloc_required = 0;
unsigned int x;
int error;
 
@@ -867,7 +867,9 @@ static int gfs2_rename(struct inode *odir, struct dentry 
*odentry,
goto out_gunlock;
}
 
-   alloc_required = error = gfs2_diradd_alloc_required(ndir, 
ndentry-d_name);
+   if (nip == NULL)
+   alloc_required = gfs2_diradd_alloc_required(ndir, 
ndentry-d_name);
+   error = alloc_required;
if (error  0)
goto out_gunlock;
error = 0;
-- 
1.6.2.5





[Cluster-devel] GFS2: Fix gfs2_xattr_acl_chmod()

2009-12-21 Thread Steven Whitehouse
From a49cd198c9ed316255acc25a937ea147d03bccaa Mon Sep 17 00:00:00 2001
From: Steven Whitehouse swhit...@redhat.com
Date: Mon, 21 Dec 2009 13:55:28 +
Subject: GFS2: Fix gfs2_xattr_acl_chmod()

The ref counting for the bh returned by gfs2_ea_find() was
wrong. This patch ensures that we always drop the ref count
to that bh correctly.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/xattr.c |   21 +++--
 1 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/gfs2/xattr.c b/fs/gfs2/xattr.c
index 8a04108..c2ebdf2 100644
--- a/fs/gfs2/xattr.c
+++ b/fs/gfs2/xattr.c
@@ -1296,6 +1296,7 @@ fail:
 
 int gfs2_xattr_acl_chmod(struct gfs2_inode *ip, struct iattr *attr, char *data)
 {
+   struct gfs2_sbd *sdp = GFS2_SB(ip-i_inode);
struct gfs2_ea_location el;
struct buffer_head *dibh;
int error;
@@ -1305,16 +1306,17 @@ int gfs2_xattr_acl_chmod(struct gfs2_inode *ip, struct 
iattr *attr, char *data)
return error;
 
if (GFS2_EA_IS_STUFFED(el.el_ea)) {
-   error = gfs2_trans_begin(GFS2_SB(ip-i_inode), RES_DINODE + 
RES_EATTR, 0);
-   if (error)
-   return error;
-
-   gfs2_trans_add_bh(ip-i_gl, el.el_bh, 1);
-   memcpy(GFS2_EA2DATA(el.el_ea), data,
-  GFS2_EA_DATA_LEN(el.el_ea));
-   } else
+   error = gfs2_trans_begin(sdp, RES_DINODE + RES_EATTR, 0);
+   if (error == 0) {
+   gfs2_trans_add_bh(ip-i_gl, el.el_bh, 1);
+   memcpy(GFS2_EA2DATA(el.el_ea), data,
+  GFS2_EA_DATA_LEN(el.el_ea));
+   }
+   } else {
error = ea_acl_chmod_unstuffed(ip, el.el_ea, data);
+   }
 
+   brelse(el.el_bh);
if (error)
return error;
 
@@ -1327,8 +1329,7 @@ int gfs2_xattr_acl_chmod(struct gfs2_inode *ip, struct 
iattr *attr, char *data)
brelse(dibh);
}
 
-   gfs2_trans_end(GFS2_SB(ip-i_inode));
-
+   gfs2_trans_end(sdp);
return error;
 }
 
-- 
1.6.2.5





[Cluster-devel] git trees

2010-01-08 Thread Steven Whitehouse
Hi,

After some delay due to a couple of tricky issues, I'm now back updating
the GFS2 git trees again. I will probably send a pull request for the
fixes tree fairly shortly now, I'm just giving it a day or two in -next
first.

At the moment -fixes and -nmw are identical, but I will start pushing
more patches into -nmw shortly too,

Steve.




[Cluster-devel] GFS2: Use MAX_LFS_FILESIZE for meta inode size

2010-01-08 Thread Steven Whitehouse
From 2a6833f27a0ed34ae169dc61961552c414263770 Mon Sep 17 00:00:00 2001
From: Steven Whitehouse swhit...@redhat.com
Date: Fri, 8 Jan 2010 13:44:49 +
Subject: GFS2: Use MAX_LFS_FILESIZE for meta inode size

Using ~0ULL was cauing sign issues in filemap_fdatawrite_range, so
use MAX_LFS_FILESIZE instead.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/meta_io.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index cb8d7a9..6f68a5f 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -121,7 +121,7 @@ struct inode *gfs2_aspace_get(struct gfs2_sbd *sdp)
if (aspace) {
mapping_set_gfp_mask(aspace-i_mapping, GFP_NOFS);
aspace-i_mapping-a_ops = aspace_aops;
-   aspace-i_size = ~0ULL;
+   aspace-i_size = MAX_LFS_FILESIZE;
ip = GFS2_I(aspace);
clear_bit(GIF_USER, ip-i_flags);
insert_inode_hash(aspace);
-- 
1.6.2.5





[Cluster-devel] GFS2: Metadata address space clean up

2010-01-08 Thread Steven Whitehouse
From 89bf4bea39ab65e0aa608cf5927d4d9b9e189c19 Mon Sep 17 00:00:00 2001
From: Steven Whitehouse swhit...@redhat.com
Date: Tue, 8 Dec 2009 12:12:13 +
Subject: GFS2: Metadata address space clean up

Since the start of GFS2, an extra inode has been used to store
the metadata belonging to each inode. The only reason for using
this inode was to have an extra address space, the other fields
were unused. This means that the memory usage was rather inefficient.

The reason for keeping each inode's metadata in a separate address
space is that when glocks are requested on remote nodes, we need to
be able to efficiently locate the data and metadata which relating
to that glock (inode) in order to sync or sync and invalidate it
(depending on the remotely requested lock mode).

This patch adds a new type of glock, which has in addition to
its normal fields, has an address space. This applies to all
inode and rgrp glocks (but to no other glock types which remain
as before). As a result, we no longer need to have the second
inode.

This results in three major improvements:
 1. A saving of approx 25% of memory used in caching inodes
 2. A removal of the circular dependency between inodes and glocks
 3. No confusion between normal and metadata inodes in super.c

Although the first of these is the more immediately apparent, the
second is just as important as it now enables a number of clean
ups at umount time. Those will be the subject of future patches.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/aops.c |4 ++--
 fs/gfs2/glock.c|   40 +---
 fs/gfs2/glock.h|7 +++
 fs/gfs2/glops.c|   16 +---
 fs/gfs2/incore.h   |4 ++--
 fs/gfs2/inode.c|6 ++
 fs/gfs2/lock_dlm.c |5 -
 fs/gfs2/main.c |   28 
 fs/gfs2/meta_io.c  |   46 ++
 fs/gfs2/meta_io.h  |   12 ++--
 fs/gfs2/super.c|   26 --
 fs/gfs2/util.c |1 +
 fs/gfs2/util.h |1 +
 13 files changed, 101 insertions(+), 95 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 7b8da94..0c1d0b8 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -1061,8 +1061,8 @@ out:
 
 int gfs2_releasepage(struct page *page, gfp_t gfp_mask)
 {
-   struct inode *aspace = page-mapping-host;
-   struct gfs2_sbd *sdp = aspace-i_sb-s_fs_info;
+   struct address_space *mapping = page-mapping;
+   struct gfs2_sbd *sdp = gfs2_mapping2sbd(mapping);
struct buffer_head *bh, *head;
struct gfs2_bufdata *bd;
 
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index f455a03..736d05b 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -154,12 +154,14 @@ static unsigned int gl_hash(const struct gfs2_sbd *sdp,
 static void glock_free(struct gfs2_glock *gl)
 {
struct gfs2_sbd *sdp = gl-gl_sbd;
-   struct inode *aspace = gl-gl_aspace;
+   struct address_space *mapping = gfs2_glock2aspace(gl);
+   struct kmem_cache *cachep = gfs2_glock_cachep;
 
-   if (aspace)
-   gfs2_aspace_put(aspace);
+   GLOCK_BUG_ON(gl, mapping  mapping-nrpages);
trace_gfs2_glock_put(gl);
-   sdp-sd_lockstruct.ls_ops-lm_put_lock(gfs2_glock_cachep, gl);
+   if (mapping)
+   cachep = gfs2_glock_aspace_cachep;
+   sdp-sd_lockstruct.ls_ops-lm_put_lock(cachep, gl);
 }
 
 /**
@@ -750,10 +752,11 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
   const struct gfs2_glock_operations *glops, int create,
   struct gfs2_glock **glp)
 {
+   struct super_block *s = sdp-sd_vfs;
struct lm_lockname name = { .ln_number = number, .ln_type = 
glops-go_type };
struct gfs2_glock *gl, *tmp;
unsigned int hash = gl_hash(sdp, name);
-   int error;
+   struct address_space *mapping;
 
read_lock(gl_lock_addr(hash));
gl = search_bucket(hash, sdp, name);
@@ -765,7 +768,10 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
if (!create)
return -ENOENT;
 
-   gl = kmem_cache_alloc(gfs2_glock_cachep, GFP_KERNEL);
+   if (glops-go_flags  GLOF_ASPACE)
+   gl = kmem_cache_alloc(gfs2_glock_aspace_cachep, GFP_KERNEL);
+   else
+   gl = kmem_cache_alloc(gfs2_glock_cachep, GFP_KERNEL);
if (!gl)
return -ENOMEM;
 
@@ -783,18 +789,18 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
gl-gl_tchange = jiffies;
gl-gl_object = NULL;
gl-gl_sbd = sdp;
-   gl-gl_aspace = NULL;
INIT_DELAYED_WORK(gl-gl_work, glock_work_func);
INIT_WORK(gl-gl_delete, delete_work_func);
 
-   /* If this glock protects actual on-disk data or metadata blocks,
-  create a VFS inode to manage the pages/buffers holding them. */
-   if (glops == gfs2_inode_glops || glops == gfs2_rgrp_glops) {
-   gl-gl_aspace

[Cluster-devel] GFS2: Remove loopy umount code

2010-01-08 Thread Steven Whitehouse
From 086332de5db343f8029d4436725090c42fcac7c7 Mon Sep 17 00:00:00 2001
From: Steven Whitehouse swhit...@redhat.com
Date: Fri, 8 Jan 2010 16:14:29 +
Subject: GFS2: Remove loopy umount code

As a consequence of the previous patch, we can now remove the
loop which used to be required due to the circular dependency
between the inodes and glocks. Instead we can just invalidate
the inodes, and then clear up any glocks which are left.

Also we no longer need the rwsem since there is no longer any
danger of the inode invalidation calling back into the glock
code (and from there back into the inode code).

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/glock.c  |   35 ---
 fs/gfs2/incore.h |1 -
 fs/gfs2/ops_fstype.c |4 +---
 fs/gfs2/super.c  |1 +
 fs/gfs2/sys.c|2 --
 5 files changed, 6 insertions(+), 37 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 736d05b..6e1e526 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -19,7 +19,6 @@
 #include linux/list.h
 #include linux/wait.h
 #include linux/module.h
-#include linux/rwsem.h
 #include asm/uaccess.h
 #include linux/seq_file.h
 #include linux/debugfs.h
@@ -60,7 +59,6 @@ static int __dump_glock(struct seq_file *seq, const
struct gfs2_glock *gl);
 #define GLOCK_BUG_ON(gl,x) do { if (unlikely(x)) { __dump_glock(NULL,
gl); BUG(); } } while(0)
 static void do_xmote(struct gfs2_glock *gl, struct gfs2_holder *gh,
unsigned int target);
 
-static DECLARE_RWSEM(gfs2_umount_flush_sem);
 static struct dentry *gfs2_root;
 static struct workqueue_struct *glock_workqueue;
 struct workqueue_struct *gfs2_delete_workqueue;
@@ -714,7 +712,6 @@ static void glock_work_func(struct work_struct
*work)
finish_xmote(gl, gl-gl_reply);
drop_ref = 1;
}
-   down_read(gfs2_umount_flush_sem);
spin_lock(gl-gl_spin);
if (test_and_clear_bit(GLF_PENDING_DEMOTE, gl-gl_flags) 
gl-gl_state != LM_ST_UNLOCKED 
@@ -727,7 +724,6 @@ static void glock_work_func(struct work_struct
*work)
}
run_queue(gl, 0);
spin_unlock(gl-gl_spin);
-   up_read(gfs2_umount_flush_sem);
if (!delay ||
queue_delayed_work(glock_workqueue, gl-gl_work, delay) == 0)
gfs2_glock_put(gl);
@@ -1511,35 +1507,12 @@ void gfs2_glock_thaw(struct gfs2_sbd *sdp)
 
 void gfs2_gl_hash_clear(struct gfs2_sbd *sdp)
 {
-   unsigned long t;
unsigned int x;
-   int cont;
 
-   t = jiffies;
-
-   for (;;) {
-   cont = 0;
-   for (x = 0; x  GFS2_GL_HASH_SIZE; x++) {
-   if (examine_bucket(clear_glock, sdp, x))
-   cont = 1;
-   }
-
-   if (!cont)
-   break;
-
-   if (time_after_eq(jiffies,
- t + gfs2_tune_get(sdp, gt_stall_secs) * HZ)) {
-   fs_warn(sdp, Unmount seems to be stalled. 
-Dumping lock state...\n);
-   gfs2_dump_lockstate(sdp);
-   t = jiffies;
-   }
-
-   down_write(gfs2_umount_flush_sem);
-   invalidate_inodes(sdp-sd_vfs);
-   up_write(gfs2_umount_flush_sem);
-   msleep(10);
-   }
+   for (x = 0; x  GFS2_GL_HASH_SIZE; x++)
+   examine_bucket(clear_glock, sdp, x);
+   flush_workqueue(glock_workqueue);
+   gfs2_dump_lockstate(sdp);
 }
 
 void gfs2_glock_finish_truncate(struct gfs2_inode *ip)
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 0f0d55a..f93f9b9 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -451,7 +451,6 @@ struct gfs2_tune {
unsigned int gt_quota_quantum; /* Secs between syncs to quota file */
unsigned int gt_new_files_jdata;
unsigned int gt_max_readahead; /* Max bytes to read-ahead from disk */
-   unsigned int gt_stall_secs; /* Detects trouble! */
unsigned int gt_complain_secs;
unsigned int gt_statfs_quantum;
unsigned int gt_statfs_slow;
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index edfee24..968a99f 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -65,7 +65,6 @@ static void gfs2_tune_init(struct gfs2_tune *gt)
gt-gt_quota_scale_den = 1;
gt-gt_new_files_jdata = 0;
gt-gt_max_readahead = 1  18;
-   gt-gt_stall_secs = 600;
gt-gt_complain_secs = 10;
 }
 
@@ -1231,10 +1230,9 @@ fail_sb:
 fail_locking:
init_locking(sdp, mount_gh, UNDO);
 fail_lm:
+   invalidate_inodes(sb);
gfs2_gl_hash_clear(sdp);
gfs2_lm_unmount(sdp);
-   while (invalidate_inodes(sb))
-   yield();
 fail_sys:
gfs2_sys_fs_del(sdp);
 fail:
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 8ddc613..c008b08 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -858,6 +858,7

[Cluster-devel] [PATCH 4/4] GFS2: Use MAX_LFS_FILESIZE for meta inode size

2010-01-11 Thread Steven Whitehouse
Using ~0ULL was cauing sign issues in filemap_fdatawrite_range, so
use MAX_LFS_FILESIZE instead.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/meta_io.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index cb8d7a9..6f68a5f 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -121,7 +121,7 @@ struct inode *gfs2_aspace_get(struct gfs2_sbd *sdp)
if (aspace) {
mapping_set_gfp_mask(aspace-i_mapping, GFP_NOFS);
aspace-i_mapping-a_ops = aspace_aops;
-   aspace-i_size = ~0ULL;
+   aspace-i_size = MAX_LFS_FILESIZE;
ip = GFS2_I(aspace);
clear_bit(GIF_USER, ip-i_flags);
insert_inode_hash(aspace);
-- 
1.6.2.5



[Cluster-devel] [PATCH 3/4] GFS2: Fix gfs2_xattr_acl_chmod()

2010-01-11 Thread Steven Whitehouse
The ref counting for the bh returned by gfs2_ea_find() was
wrong. This patch ensures that we always drop the ref count
to that bh correctly.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/xattr.c |   21 +++--
 1 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/gfs2/xattr.c b/fs/gfs2/xattr.c
index 8a04108..c2ebdf2 100644
--- a/fs/gfs2/xattr.c
+++ b/fs/gfs2/xattr.c
@@ -1296,6 +1296,7 @@ fail:
 
 int gfs2_xattr_acl_chmod(struct gfs2_inode *ip, struct iattr *attr, char *data)
 {
+   struct gfs2_sbd *sdp = GFS2_SB(ip-i_inode);
struct gfs2_ea_location el;
struct buffer_head *dibh;
int error;
@@ -1305,16 +1306,17 @@ int gfs2_xattr_acl_chmod(struct gfs2_inode *ip, struct 
iattr *attr, char *data)
return error;
 
if (GFS2_EA_IS_STUFFED(el.el_ea)) {
-   error = gfs2_trans_begin(GFS2_SB(ip-i_inode), RES_DINODE + 
RES_EATTR, 0);
-   if (error)
-   return error;
-
-   gfs2_trans_add_bh(ip-i_gl, el.el_bh, 1);
-   memcpy(GFS2_EA2DATA(el.el_ea), data,
-  GFS2_EA_DATA_LEN(el.el_ea));
-   } else
+   error = gfs2_trans_begin(sdp, RES_DINODE + RES_EATTR, 0);
+   if (error == 0) {
+   gfs2_trans_add_bh(ip-i_gl, el.el_bh, 1);
+   memcpy(GFS2_EA2DATA(el.el_ea), data,
+  GFS2_EA_DATA_LEN(el.el_ea));
+   }
+   } else {
error = ea_acl_chmod_unstuffed(ip, el.el_ea, data);
+   }
 
+   brelse(el.el_bh);
if (error)
return error;
 
@@ -1327,8 +1329,7 @@ int gfs2_xattr_acl_chmod(struct gfs2_inode *ip, struct 
iattr *attr, char *data)
brelse(dibh);
}
 
-   gfs2_trans_end(GFS2_SB(ip-i_inode));
-
+   gfs2_trans_end(sdp);
return error;
 }
 
-- 
1.6.2.5



[Cluster-devel] [PATCH 1/4] GFS2: Ensure uptodate inode size when using O_APPEND

2010-01-11 Thread Steven Whitehouse
The VFS reads the inode size during generic_file_aio_write() but
with no locking around it. In order to get the expected result
from O_APPEND opens, this patch updated the inode size before
calling generic_file_aio_write()

There is of course still a race here, in that there is nothing to
prevent another node coming in and extending the file in the
mean time. On the other hand, when used with file locking this
will ensure that the expected results are obtained.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/file.c |   38 --
 1 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 4eb308a..a6abbae 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -569,6 +569,40 @@ static int gfs2_fsync(struct file *file, struct dentry 
*dentry, int datasync)
return ret;
 }
 
+/**
+ * gfs2_file_aio_write - Perform a write to a file
+ * @iocb: The io context
+ * @iov: The data to write
+ * @nr_segs: Number of @iov segments
+ * @pos: The file position
+ *
+ * We have to do a lock/unlock here to refresh the inode size for
+ * O_APPEND writes, otherwise we can land up writing at the wrong
+ * offset. There is still a race, but provided the app is using its
+ * own file locking, this will make O_APPEND work as expected.
+ *
+ */
+
+static ssize_t gfs2_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
+  unsigned long nr_segs, loff_t pos)
+{
+   struct file *file = iocb-ki_filp;
+
+   if (file-f_flags  O_APPEND) {
+   struct dentry *dentry = file-f_dentry;
+   struct gfs2_inode *ip = GFS2_I(dentry-d_inode);
+   struct gfs2_holder gh;
+   int ret;
+
+   ret = gfs2_glock_nq_init(ip-i_gl, LM_ST_SHARED, 0, gh);
+   if (ret)
+   return ret;
+   gfs2_glock_dq_uninit(gh);
+   }
+
+   return generic_file_aio_write(iocb, iov, nr_segs, pos);
+}
+
 #ifdef CONFIG_GFS2_FS_LOCKING_DLM
 
 /**
@@ -711,7 +745,7 @@ const struct file_operations gfs2_file_fops = {
.read   = do_sync_read,
.aio_read   = generic_file_aio_read,
.write  = do_sync_write,
-   .aio_write  = generic_file_aio_write,
+   .aio_write  = gfs2_file_aio_write,
.unlocked_ioctl = gfs2_ioctl,
.mmap   = gfs2_mmap,
.open   = gfs2_open,
@@ -741,7 +775,7 @@ const struct file_operations gfs2_file_fops_nolock = {
.read   = do_sync_read,
.aio_read   = generic_file_aio_read,
.write  = do_sync_write,
-   .aio_write  = generic_file_aio_write,
+   .aio_write  = gfs2_file_aio_write,
.unlocked_ioctl = gfs2_ioctl,
.mmap   = gfs2_mmap,
.open   = gfs2_open,
-- 
1.6.2.5



[Cluster-devel] [PATCH 2/4] GFS2: Fix locking bug in rename

2010-01-11 Thread Steven Whitehouse
The rename code was taking a resource group lock in cases where
it wasn't actually needed, this caused problems if the rename
was resulting in an inode being unlinked. The patch ensures that
we only take the rgrp lock early if it is really needed.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/ops_inode.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c
index 247436c..78f73ca 100644
--- a/fs/gfs2/ops_inode.c
+++ b/fs/gfs2/ops_inode.c
@@ -748,7 +748,7 @@ static int gfs2_rename(struct inode *odir, struct dentry 
*odentry,
struct gfs2_rgrpd *nrgd;
unsigned int num_gh;
int dir_rename = 0;
-   int alloc_required;
+   int alloc_required = 0;
unsigned int x;
int error;
 
@@ -867,7 +867,9 @@ static int gfs2_rename(struct inode *odir, struct dentry 
*odentry,
goto out_gunlock;
}
 
-   alloc_required = error = gfs2_diradd_alloc_required(ndir, 
ndentry-d_name);
+   if (nip == NULL)
+   alloc_required = gfs2_diradd_alloc_required(ndir, 
ndentry-d_name);
+   error = alloc_required;
if (error  0)
goto out_gunlock;
error = 0;
-- 
1.6.2.5



[Cluster-devel] GFS2: Pre-pull patch posting (fixes)

2010-01-11 Thread Steven Whitehouse
Here are four small fixes for GFS2. Assuming that nobody spots
any errors, I'll be sending a pull request for these shortly,

Steve.



[Cluster-devel] GFS2: Pull request (fixes)

2010-01-11 Thread Steven Whitehouse
Hi,

Please consider pulling the following GFS2 bug fixes,

Steve.


The following changes since commit 74d2e4f8d79ae0c4b6ec027958d5b18058662eea:
  Linus Torvalds (1):
Linux 2.6.33-rc3

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes.git master

Steven Whitehouse (4):
  GFS2: Ensure uptodate inode size when using O_APPEND
  GFS2: Fix locking bug in rename
  GFS2: Fix gfs2_xattr_acl_chmod()
  GFS2: Use MAX_LFS_FILESIZE for meta inode size

 fs/gfs2/file.c  |   38 --
 fs/gfs2/meta_io.c   |2 +-
 fs/gfs2/ops_inode.c |6 --
 fs/gfs2/xattr.c |   21 +++--
 4 files changed, 52 insertions(+), 15 deletions(-)




[Cluster-devel] GFS2 git trees

2010-01-12 Thread Steven Whitehouse
Hi,

Linus pulled the -fixes tree last night, so I'm just about to rebase
both GFS2 git trees to the new upstream kernel,

Steve.




Re: [Cluster-devel] [PATCH] gfs2: Fix refcnt leak on gfs2_follow_link() error path

2010-01-12 Thread Steven Whitehouse
Hi,

Thanks for the patch. I've pushed it into the GFS2 -fixes tree,

Steve.

On Tue, 2010-01-12 at 03:36 +0900, OGAWA Hirofumi wrote:
 If -follow_link handler return the error, it should decrement
 nd-path refcnt.
 
 This patch fix it.
 
 Signed-off-by: OGAWA Hirofumi hirof...@mail.parknet.co.jp
 ---
 
  fs/gfs2/ops_inode.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff -puN fs/gfs2/ops_inode.c~namei-gfs2-follow_link-fix fs/gfs2/ops_inode.c
 --- linux-2.6/fs/gfs2/ops_inode.c~namei-gfs2-follow_link-fix  2010-01-12 
 00:15:12.0 +0900
 +++ linux-2.6-hirofumi/fs/gfs2/ops_inode.c2010-01-12 00:15:12.0 
 +0900
 @@ -1086,7 +1086,8 @@ static void *gfs2_follow_link(struct den
   error = vfs_follow_link(nd, buf);
   if (buf != array)
   kfree(buf);
 - }
 + } else
 + path_put(nd-path);
  
   return ERR_PTR(error);
  }
 _
 




[Cluster-devel] GFS2: Wait for unlock completion on umount

2010-01-14 Thread Steven Whitehouse

This patch adds a wait on umount between the point at which we
dispose of all glocks and the point at which we unmount the
lock protocol. This ensures that we've received all the replies
to our unlock requests before we stop the locking.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
Reported-by: Fabio M. Di Nitto fdini...@redhat.com

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index f93f9b9..b8025e5 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -543,6 +543,8 @@ struct gfs2_sbd {
struct gfs2_holder sd_live_gh;
struct gfs2_glock *sd_rename_gl;
struct gfs2_glock *sd_trans_gl;
+   wait_queue_head_t sd_glock_wait;
+   atomic_t sd_glock_disposal;
 
/* Inode Stuff */
 
diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
index 094839e..484411c 100644
--- a/fs/gfs2/lock_dlm.c
+++ b/fs/gfs2/lock_dlm.c
@@ -21,6 +21,7 @@ static void gdlm_ast(void *arg)
 {
struct gfs2_glock *gl = arg;
unsigned ret = gl-gl_state;
+   struct gfs2_sbd *sdp = gl-gl_sbd;
 
BUG_ON(gl-gl_lksb.sb_flags  DLM_SBF_DEMOTED);
 
@@ -33,6 +34,8 @@ static void gdlm_ast(void *arg)
kmem_cache_free(gfs2_glock_aspace_cachep, gl);
else
kmem_cache_free(gfs2_glock_cachep, gl);
+   if (atomic_dec_and_test(sdp-sd_glock_disposal))
+   wake_up(sdp-sd_glock_wait);
return;
case -DLM_ECANCEL: /* Cancel while getting lock */
ret |= LM_OUT_CANCELED;
@@ -170,7 +173,8 @@ static unsigned int gdlm_lock(struct gfs2_glock *gl,
 static void gdlm_put_lock(struct kmem_cache *cachep, void *ptr)
 {
struct gfs2_glock *gl = ptr;
-   struct lm_lockstruct *ls = gl-gl_sbd-sd_lockstruct;
+   struct gfs2_sbd *sdp = gl-gl_sbd;
+   struct lm_lockstruct *ls = sdp-sd_lockstruct;
int error;
 
if (gl-gl_lksb.sb_lkid == 0) {
@@ -186,6 +190,7 @@ static void gdlm_put_lock(struct kmem_cache *cachep, void 
*ptr)
   (unsigned long long)gl-gl_name.ln_number, error);
return;
}
+   atomic_inc(sdp-sd_glock_disposal);
 }
 
 static void gdlm_cancel(struct gfs2_glock *gl)
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 968a99f..9baa566 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -81,6 +81,8 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
 
gfs2_tune_init(sdp-sd_tune);
 
+   init_waitqueue_head(sdp-sd_glock_wait);
+   atomic_set(sdp-sd_glock_disposal, 0);
spin_lock_init(sdp-sd_statfs_spin);
 
spin_lock_init(sdp-sd_rindex_spin);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index c008b08..e2bf19f 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -21,6 +21,7 @@
 #include linux/gfs2_ondisk.h
 #include linux/crc32.h
 #include linux/time.h
+#include linux/wait.h
 
 #include gfs2.h
 #include incore.h
@@ -860,6 +861,8 @@ restart:
/*  Take apart glock structures and buffer lists  */
invalidate_inodes(sdp-sd_vfs);
gfs2_gl_hash_clear(sdp);
+   /* Wait for dlm to reply to all our unlock requests */
+   wait_event(sdp-sd_glock_wait, atomic_read(sdp-sd_glock_disposal) == 
0);
/*  Unmount the locking protocol  */
gfs2_lm_unmount(sdp);
 




[Cluster-devel] GFS2: Fix previous patch

2010-01-29 Thread Steven Whitehouse
This fixes the rgrp patch,

Steve.

From ea0d7284f2f2bd56386e6c4810bf970e50472054 Mon Sep 17 00:00:00 2001
From: Steven Whitehouse swhit...@redhat.com
Date: Fri, 29 Jan 2010 15:20:34 +
Subject: [PATCH 1/3] GFS2: Fix previous patch

The do_div() call needs to remain.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/rgrp.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 6702b82..46534a5 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -591,6 +591,7 @@ static int gfs2_ri_update(struct gfs2_inode *ip)
u64 rgrp_count = ip-i_disksize;
int error;
 
+   do_div(rgrp_count, sizeof(struct gfs2_rindex));
clear_rgrpdi(sdp);
 
file_ra_state_init(ra_state, inode-i_mapping);
-- 
1.6.2.5





[Cluster-devel] GFS2: Extend umount wait coverage to full glock lifetime

2010-01-29 Thread Steven Whitehouse
From 0f76b65f50e4f17324ba184dd074c35788928ba7 Mon Sep 17 00:00:00 2001
From: Steven Whitehouse swhit...@redhat.com
Date: Fri, 29 Jan 2010 15:21:27 +
Subject: [PATCH 2/3] GFS2: Extend umount wait coverage to full glock lifetime

Although all glocks are, by the time of the umount glock wait,
scheduled for demotion, some of them haven't made it far
enough through the process for the original set of waiting
code to wait for them.

This extends the ref count to the whole glock lifetime in order
to ensure that the waiting does catch all glocks. It does make
it a bit more invasive, but it seems the only sensible solution
at the moment.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/glock.c  |2 ++
 fs/gfs2/glock.h  |2 +-
 fs/gfs2/lock_dlm.c   |6 +++---
 fs/gfs2/ops_fstype.c |   10 +-
 fs/gfs2/super.c  |2 --
 5 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 6e1e526..4773f90 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -771,6 +771,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
if (!gl)
return -ENOMEM;
 
+   atomic_inc(sdp-sd_glock_disposal);
gl-gl_flags = 0;
gl-gl_name = name;
atomic_set(gl-gl_ref, 1);
@@ -1512,6 +1513,7 @@ void gfs2_gl_hash_clear(struct gfs2_sbd *sdp)
for (x = 0; x  GFS2_GL_HASH_SIZE; x++)
examine_bucket(clear_glock, sdp, x);
flush_workqueue(glock_workqueue);
+   wait_event(sdp-sd_glock_wait, atomic_read(sdp-sd_glock_disposal) == 
0);
gfs2_dump_lockstate(sdp);
 }
 
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index dac7261..2bda191 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -123,7 +123,7 @@ struct lm_lockops {
int (*lm_mount) (struct gfs2_sbd *sdp, const char *fsname);
void (*lm_unmount) (struct gfs2_sbd *sdp);
void (*lm_withdraw) (struct gfs2_sbd *sdp);
-   void (*lm_put_lock) (struct kmem_cache *cachep, void *gl);
+   void (*lm_put_lock) (struct kmem_cache *cachep, struct gfs2_glock *gl);
unsigned int (*lm_lock) (struct gfs2_glock *gl,
 unsigned int req_state, unsigned int flags);
void (*lm_cancel) (struct gfs2_glock *gl);
diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
index 484411c..569b462 100644
--- a/fs/gfs2/lock_dlm.c
+++ b/fs/gfs2/lock_dlm.c
@@ -170,15 +170,16 @@ static unsigned int gdlm_lock(struct gfs2_glock *gl,
return LM_OUT_ASYNC;
 }
 
-static void gdlm_put_lock(struct kmem_cache *cachep, void *ptr)
+static void gdlm_put_lock(struct kmem_cache *cachep, struct gfs2_glock *gl)
 {
-   struct gfs2_glock *gl = ptr;
struct gfs2_sbd *sdp = gl-gl_sbd;
struct lm_lockstruct *ls = sdp-sd_lockstruct;
int error;
 
if (gl-gl_lksb.sb_lkid == 0) {
kmem_cache_free(cachep, gl);
+   if (atomic_dec_and_test(sdp-sd_glock_disposal))
+   wake_up(sdp-sd_glock_wait);
return;
}
 
@@ -190,7 +191,6 @@ static void gdlm_put_lock(struct kmem_cache *cachep, void 
*ptr)
   (unsigned long long)gl-gl_name.ln_number, error);
return;
}
-   atomic_inc(sdp-sd_glock_disposal);
 }
 
 static void gdlm_cancel(struct gfs2_glock *gl)
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 9baa566..d405c38 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -984,9 +984,17 @@ static const match_table_t nolock_tokens = {
{ Opt_err, NULL },
 };
 
+static void nolock_put_lock(struct kmem_cache *cachep, struct gfs2_glock *gl)
+{
+   struct gfs2_sbd *sdp = gl-gl_sbd;
+   kmem_cache_free(cachep, gl);
+   if (atomic_dec_and_test(sdp-sd_glock_disposal))
+   wake_up(sdp-sd_glock_wait);
+}
+
 static const struct lm_lockops nolock_ops = {
.lm_proto_name = lock_nolock,
-   .lm_put_lock = kmem_cache_free,
+   .lm_put_lock = nolock_put_lock,
.lm_tokens = nolock_tokens,
 };
 
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index e2bf19f..e5e2262 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -861,8 +861,6 @@ restart:
/*  Take apart glock structures and buffer lists  */
invalidate_inodes(sdp-sd_vfs);
gfs2_gl_hash_clear(sdp);
-   /* Wait for dlm to reply to all our unlock requests */
-   wait_event(sdp-sd_glock_wait, atomic_read(sdp-sd_glock_disposal) == 
0);
/*  Unmount the locking protocol  */
gfs2_lm_unmount(sdp);
 
-- 
1.6.2.5





[Cluster-devel] GFS2: Use GFP_NOFS for alloc structure

2010-01-29 Thread Steven Whitehouse
From 04988c7ee83641ca732910aff427ab08b0faa557 Mon Sep 17 00:00:00 2001
From: Steven Whitehouse swhit...@redhat.com
Date: Fri, 29 Jan 2010 15:48:57 +
Subject: [PATCH 3/3] GFS2: Use GFP_NOFS for alloc structure

This is called under a glock, so its a good plan to use GFP_NOFS

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/rgrp.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 46534a5..503b842 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -911,7 +911,7 @@ void gfs2_rgrp_repolish_clones(struct gfs2_rgrpd *rgd)
 struct gfs2_alloc *gfs2_alloc_get(struct gfs2_inode *ip)
 {
BUG_ON(ip-i_alloc != NULL);
-   ip-i_alloc = kzalloc(sizeof(struct gfs2_alloc), GFP_KERNEL);
+   ip-i_alloc = kzalloc(sizeof(struct gfs2_alloc), GFP_NOFS);
return ip-i_alloc;
 }
 
-- 
1.6.2.5





[Cluster-devel] [PATCH 1/4] GFS2: Fix refcnt leak on gfs2_follow_link() error path

2010-02-02 Thread Steven Whitehouse
From: OGAWA Hirofumi hirof...@mail.parknet.co.jp

If -follow_link handler return the error, it should decrement
nd-path refcnt.

This patch fix it.

Signed-off-by: OGAWA Hirofumi hirof...@mail.parknet.co.jp
Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/ops_inode.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c
index 78f73ca..84350e1 100644
--- a/fs/gfs2/ops_inode.c
+++ b/fs/gfs2/ops_inode.c
@@ -1088,7 +1088,8 @@ static void *gfs2_follow_link(struct dentry *dentry, 
struct nameidata *nd)
error = vfs_follow_link(nd, buf);
if (buf != array)
kfree(buf);
-   }
+   } else
+   path_put(nd-path);
 
return ERR_PTR(error);
 }
-- 
1.6.2.5



[Cluster-devel] [PATCH 4/4] GFS2: Use GFP_NOFS for alloc structure

2010-02-02 Thread Steven Whitehouse
This is called under a glock, so its a good plan to use GFP_NOFS

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/rgrp.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 46534a5..503b842 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -911,7 +911,7 @@ void gfs2_rgrp_repolish_clones(struct gfs2_rgrpd *rgd)
 struct gfs2_alloc *gfs2_alloc_get(struct gfs2_inode *ip)
 {
BUG_ON(ip-i_alloc != NULL);
-   ip-i_alloc = kzalloc(sizeof(struct gfs2_alloc), GFP_KERNEL);
+   ip-i_alloc = kzalloc(sizeof(struct gfs2_alloc), GFP_NOFS);
return ip-i_alloc;
 }
 
-- 
1.6.2.5



[Cluster-devel] [PATCH 2/4] GFS2: Don't withdraw on partial rindex entries

2010-02-02 Thread Steven Whitehouse
From: Benjamin Marzinski bmarz...@redhat.com

ince gfs2 writes the rindex file a block at a time, and releases the
exclusive lock after each block, it is possible that another process
will grab the lock in the middle of the write.  Since rindex entries are
not an even divisor of blocks, that other process may see partial
entries.  On grows, this is fine.  The process can simply ignore the the
partial entires. Previously, the code withdrew when it saw partial
entries. Now it simply ignores them.

Signed-off-by: Benjamin Marzinski bmarz...@redhat.com
Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/rgrp.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 0608f49..6702b82 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -591,11 +591,6 @@ static int gfs2_ri_update(struct gfs2_inode *ip)
u64 rgrp_count = ip-i_disksize;
int error;
 
-   if (do_div(rgrp_count, sizeof(struct gfs2_rindex))) {
-   gfs2_consist_inode(ip);
-   return -EIO;
-   }
-
clear_rgrpdi(sdp);
 
file_ra_state_init(ra_state, inode-i_mapping);
-- 
1.6.2.5



[Cluster-devel] [PATCH 3/4] GFS2: Fix previous patch

2010-02-02 Thread Steven Whitehouse
The do_div() call needs to remain.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/rgrp.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 6702b82..46534a5 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -591,6 +591,7 @@ static int gfs2_ri_update(struct gfs2_inode *ip)
u64 rgrp_count = ip-i_disksize;
int error;
 
+   do_div(rgrp_count, sizeof(struct gfs2_rindex));
clear_rgrpdi(sdp);
 
file_ra_state_init(ra_state, inode-i_mapping);
-- 
1.6.2.5



[Cluster-devel] [GFS2] Pull request (fixes)

2010-02-02 Thread Steven Whitehouse
Hi,

Please consider pulling the following patches,

Steve.


The following changes since commit 066000dd856709b6980123eb39b957fe26993f7b:
  Ananth N Mavinakayanahalli (1):
Revert x86, apic: Use logical flat on intel with = 8 logical cpus

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes.git master

Benjamin Marzinski (1):
  GFS2: Don't withdraw on partial rindex entries

OGAWA Hirofumi (1):
  GFS2: Fix refcnt leak on gfs2_follow_link() error path

Steven Whitehouse (2):
  GFS2: Fix previous patch
  GFS2: Use GFP_NOFS for alloc structure

 fs/gfs2/ops_inode.c |3 ++-
 fs/gfs2/rgrp.c  |8 ++--
 2 files changed, 4 insertions(+), 7 deletions(-)




[Cluster-devel] GFS2: Pre-pull patch posting (fixes)

2010-02-04 Thread Steven Whitehouse
Hi,

Here are a couple of patches which between them fix a problem where
occasionally it was possible for the GFS2 module to be unloaded
before all the glocks were deallocated, which, needless to say, made
the slab allocator unhappy,

Steve.



[Cluster-devel] [PATCH 1/2] GFS2: Wait for unlock completion on umount

2010-02-04 Thread Steven Whitehouse
This patch adds a wait on umount between the point at which we
dispose of all glocks and the point at which we unmount the
lock protocol. This ensures that we've received all the replies
to our unlock requests before we stop the locking.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
Reported-by: Fabio M. Di Nitto fdini...@redhat.com
---
 fs/gfs2/incore.h |2 ++
 fs/gfs2/lock_dlm.c   |7 ++-
 fs/gfs2/ops_fstype.c |2 ++
 fs/gfs2/super.c  |3 +++
 4 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 4792200..bc0ad15 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -544,6 +544,8 @@ struct gfs2_sbd {
struct gfs2_holder sd_live_gh;
struct gfs2_glock *sd_rename_gl;
struct gfs2_glock *sd_trans_gl;
+   wait_queue_head_t sd_glock_wait;
+   atomic_t sd_glock_disposal;
 
/* Inode Stuff */
 
diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
index 46df988..cdd0755 100644
--- a/fs/gfs2/lock_dlm.c
+++ b/fs/gfs2/lock_dlm.c
@@ -21,6 +21,7 @@ static void gdlm_ast(void *arg)
 {
struct gfs2_glock *gl = arg;
unsigned ret = gl-gl_state;
+   struct gfs2_sbd *sdp = gl-gl_sbd;
 
BUG_ON(gl-gl_lksb.sb_flags  DLM_SBF_DEMOTED);
 
@@ -30,6 +31,8 @@ static void gdlm_ast(void *arg)
switch (gl-gl_lksb.sb_status) {
case -DLM_EUNLOCK: /* Unlocked, so glock can be freed */
kmem_cache_free(gfs2_glock_cachep, gl);
+   if (atomic_dec_and_test(sdp-sd_glock_disposal))
+   wake_up(sdp-sd_glock_wait);
return;
case -DLM_ECANCEL: /* Cancel while getting lock */
ret |= LM_OUT_CANCELED;
@@ -167,7 +170,8 @@ static unsigned int gdlm_lock(struct gfs2_glock *gl,
 static void gdlm_put_lock(struct kmem_cache *cachep, void *ptr)
 {
struct gfs2_glock *gl = ptr;
-   struct lm_lockstruct *ls = gl-gl_sbd-sd_lockstruct;
+   struct gfs2_sbd *sdp = gl-gl_sbd;
+   struct lm_lockstruct *ls = sdp-sd_lockstruct;
int error;
 
if (gl-gl_lksb.sb_lkid == 0) {
@@ -183,6 +187,7 @@ static void gdlm_put_lock(struct kmem_cache *cachep, void 
*ptr)
   (unsigned long long)gl-gl_name.ln_number, error);
return;
}
+   atomic_inc(sdp-sd_glock_disposal);
 }
 
 static void gdlm_cancel(struct gfs2_glock *gl)
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index edfee24..9390fc7 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -82,6 +82,8 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
 
gfs2_tune_init(sdp-sd_tune);
 
+   init_waitqueue_head(sdp-sd_glock_wait);
+   atomic_set(sdp-sd_glock_disposal, 0);
spin_lock_init(sdp-sd_statfs_spin);
 
spin_lock_init(sdp-sd_rindex_spin);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index c282ad4..66242b3 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -21,6 +21,7 @@
 #include linux/gfs2_ondisk.h
 #include linux/crc32.h
 #include linux/time.h
+#include linux/wait.h
 
 #include gfs2.h
 #include incore.h
@@ -860,6 +861,8 @@ restart:
gfs2_jindex_free(sdp);
/*  Take apart glock structures and buffer lists  */
gfs2_gl_hash_clear(sdp);
+   /* Wait for dlm to reply to all our unlock requests */
+   wait_event(sdp-sd_glock_wait, atomic_read(sdp-sd_glock_disposal) == 
0);
/*  Unmount the locking protocol  */
gfs2_lm_unmount(sdp);
 
-- 
1.6.2.5



[Cluster-devel] [PATCH 2/2] GFS2: Extend umount wait coverage to full glock lifetime

2010-02-04 Thread Steven Whitehouse
Although all glocks are, by the time of the umount glock wait,
scheduled for demotion, some of them haven't made it far
enough through the process for the original set of waiting
code to wait for them.

This extends the ref count to the whole glock lifetime in order
to ensure that the waiting does catch all glocks. It does make
it a bit more invasive, but it seems the only sensible solution
at the moment.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/glock.c  |4 
 fs/gfs2/glock.h  |2 +-
 fs/gfs2/lock_dlm.c   |6 +++---
 fs/gfs2/ops_fstype.c |   10 +-
 fs/gfs2/super.c  |2 --
 5 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index f455a03..f426633 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -769,6 +769,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
if (!gl)
return -ENOMEM;
 
+   atomic_inc(sdp-sd_glock_disposal);
gl-gl_flags = 0;
gl-gl_name = name;
atomic_set(gl-gl_ref, 1);
@@ -1538,6 +1539,9 @@ void gfs2_gl_hash_clear(struct gfs2_sbd *sdp)
up_write(gfs2_umount_flush_sem);
msleep(10);
}
+   flush_workqueue(glock_workqueue);
+   wait_event(sdp-sd_glock_wait, atomic_read(sdp-sd_glock_disposal) == 
0);
+   gfs2_dump_lockstate(sdp);
 }
 
 void gfs2_glock_finish_truncate(struct gfs2_inode *ip)
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index 13f0bd2..c0262fa 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -123,7 +123,7 @@ struct lm_lockops {
int (*lm_mount) (struct gfs2_sbd *sdp, const char *fsname);
void (*lm_unmount) (struct gfs2_sbd *sdp);
void (*lm_withdraw) (struct gfs2_sbd *sdp);
-   void (*lm_put_lock) (struct kmem_cache *cachep, void *gl);
+   void (*lm_put_lock) (struct kmem_cache *cachep, struct gfs2_glock *gl);
unsigned int (*lm_lock) (struct gfs2_glock *gl,
 unsigned int req_state, unsigned int flags);
void (*lm_cancel) (struct gfs2_glock *gl);
diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
index cdd0755..0e5e0e7 100644
--- a/fs/gfs2/lock_dlm.c
+++ b/fs/gfs2/lock_dlm.c
@@ -167,15 +167,16 @@ static unsigned int gdlm_lock(struct gfs2_glock *gl,
return LM_OUT_ASYNC;
 }
 
-static void gdlm_put_lock(struct kmem_cache *cachep, void *ptr)
+static void gdlm_put_lock(struct kmem_cache *cachep, struct gfs2_glock *gl)
 {
-   struct gfs2_glock *gl = ptr;
struct gfs2_sbd *sdp = gl-gl_sbd;
struct lm_lockstruct *ls = sdp-sd_lockstruct;
int error;
 
if (gl-gl_lksb.sb_lkid == 0) {
kmem_cache_free(cachep, gl);
+   if (atomic_dec_and_test(sdp-sd_glock_disposal))
+   wake_up(sdp-sd_glock_wait);
return;
}
 
@@ -187,7 +188,6 @@ static void gdlm_put_lock(struct kmem_cache *cachep, void 
*ptr)
   (unsigned long long)gl-gl_name.ln_number, error);
return;
}
-   atomic_inc(sdp-sd_glock_disposal);
 }
 
 static void gdlm_cancel(struct gfs2_glock *gl)
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 9390fc7..8a102f7 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -985,9 +985,17 @@ static const match_table_t nolock_tokens = {
{ Opt_err, NULL },
 };
 
+static void nolock_put_lock(struct kmem_cache *cachep, struct gfs2_glock *gl)
+{
+   struct gfs2_sbd *sdp = gl-gl_sbd;
+   kmem_cache_free(cachep, gl);
+   if (atomic_dec_and_test(sdp-sd_glock_disposal))
+   wake_up(sdp-sd_glock_wait);
+}
+
 static const struct lm_lockops nolock_ops = {
.lm_proto_name = lock_nolock,
-   .lm_put_lock = kmem_cache_free,
+   .lm_put_lock = nolock_put_lock,
.lm_tokens = nolock_tokens,
 };
 
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 66242b3..b9dd3da 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -861,8 +861,6 @@ restart:
gfs2_jindex_free(sdp);
/*  Take apart glock structures and buffer lists  */
gfs2_gl_hash_clear(sdp);
-   /* Wait for dlm to reply to all our unlock requests */
-   wait_event(sdp-sd_glock_wait, atomic_read(sdp-sd_glock_disposal) == 
0);
/*  Unmount the locking protocol  */
gfs2_lm_unmount(sdp);
 
-- 
1.6.2.5



[Cluster-devel] GFS2: Pull request (fixes)

2010-02-04 Thread Steven Whitehouse
Hi,

Please consider pulling the following two changes,

Steve.


The following changes since commit 1a45dcfe2525e9432cb4aba461d4994fc2befe42:
  Linus Torvalds (1):
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes.git master

Steven Whitehouse (2):
  GFS2: Wait for unlock completion on umount
  GFS2: Extend umount wait coverage to full glock lifetime

 fs/gfs2/glock.c  |4 
 fs/gfs2/glock.h  |2 +-
 fs/gfs2/incore.h |2 ++
 fs/gfs2/lock_dlm.c   |   11 ---
 fs/gfs2/ops_fstype.c |   12 +++-
 fs/gfs2/super.c  |1 +
 6 files changed, 27 insertions(+), 5 deletions(-)




Re: [Cluster-devel] [PATCH 1/4] gfs2: add IO submission trace points

2010-02-05 Thread Steven Whitehouse
Hi,

On Fri, 2010-02-05 at 16:45 +1100, Dave Chinner wrote:
 Useful for tracking down where specific IOs are being issued
 from.
 
 Signed-off-by: Dave Chinner dchin...@redhat.com
 ---
  fs/gfs2/log.c|6 ++
  fs/gfs2/lops.c   |6 ++
  fs/gfs2/trace_gfs2.h |   41 +
  3 files changed, 53 insertions(+), 0 deletions(-)
 
 diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
 index 4511b08..bd26dff 100644
 --- a/fs/gfs2/log.c
 +++ b/fs/gfs2/log.c
 @@ -121,6 +121,7 @@ __acquires(sdp-sd_log_lock)
   lock_buffer(bh);
   if (test_clear_buffer_dirty(bh)) {
   bh-b_end_io = end_buffer_write_sync;
 + trace_gfs2_submit_bh(bh, WRITE_SYNC_PLUG, 
 __func__);
   submit_bh(WRITE_SYNC_PLUG, bh);
This looks like it could be a generically useful function, I wonder if
it would be possible to do this directly in submit_bh, since we should
be able to use __builtin_return_address(0) to find out the origin of the
call?

Steve.




Re: [Cluster-devel] [PATCH 2/4] gfs2: ordered writes are backwards

2010-02-05 Thread Steven Whitehouse
Hi,

This looks good. There is an argument for trying to sort the buffers as
we write them (in case the application writes them out of order) but
this seems a sensible change to catch 90% of cases. I'm just about to
give this a quick test and I'll push this one in straight away if it
looks good on my test,

Steve.

On Fri, 2010-02-05 at 16:45 +1100, Dave Chinner wrote:
 When we queue data buffers for ordered write, the buffers are added
 to the head of the ordered write list. When the log needs to push
 these buffers to disk, it also walks the list from the head. The
 result is that the the ordered buffers are submitted to disk in
 reverse order.
 
 For large writes, this means that whenever the log flushes large
 streams of reverse sequential order buffers are pushed down into the
 block layers. The elevators don't handle this particularly well, so
 IO rates tend to be significantly lower than if the IO was issued in
 ascending block order.
 
 Queue new ordered buffers to the tail of the ordered buffer list to
 ensure that IO is dispatched in the order it was submitted. This
 should significantly improve large sequential write speeds. On a
 disk capable of 85MB/s, speeds increase from 50MB/s to 65MB/s for
 noop and from 38MB/s to 50MB/s for cfq.
 
 Signed-off-by: Dave Chinner dchin...@redhat.com
 ---
  fs/gfs2/lops.c |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
 index 5708edf..7278cf0 100644
 --- a/fs/gfs2/lops.c
 +++ b/fs/gfs2/lops.c
 @@ -532,9 +532,9 @@ static void databuf_lo_add(struct gfs2_sbd *sdp, struct 
 gfs2_log_element *le)
   gfs2_pin(sdp, bd-bd_bh);
   tr-tr_num_databuf_new++;
   sdp-sd_log_num_databuf++;
 - list_add(le-le_list, sdp-sd_log_le_databuf);
 + list_add_tail(le-le_list, sdp-sd_log_le_databuf);
   } else {
 - list_add(le-le_list, sdp-sd_log_le_ordered);
 + list_add_tail(le-le_list, sdp-sd_log_le_ordered);
   }
  out:
   gfs2_log_unlock(sdp);




Re: [Cluster-devel] [PATCH 3/4] gfs2: ordered buffer writes are not sync

2010-02-05 Thread Steven Whitehouse
Hi,

On Fri, 2010-02-05 at 16:45 +1100, Dave Chinner wrote:
 Currently gfs2 ordered buffer writes use WRITE_SYNC_PLUG as the IO
 type being dispatched. They aren't sync writes; we issue all the IO
 pending, then wait for it all. IOWs, this is async IO with a bulk
 wait on the end.
 
 We should use normal WRITE tagging for this, and before we start
 waiting make sure that all the Io is issued by unplugging the
 device. The use of normal WRITEs for these buffers should
 significantly reduce the overhead of processing in the cfq elevator
 and enable the disk subsystem to get much closer to disk bandwidth
 for large sequential writes.
 
 Signed-off-by: Dave Chinner dchin...@redhat.com

That sounds reasonable. With respect to the new trace point, I'd raise
the same question as per the initial patch in the series. Also I'm
wondering about the calls to blk_run_backing_dev() as I'd thought that
this would happen automatically when we get to wait for the I/O.

Bearing in mind that your tests show no particular increase in
performance for this change, I'm tempted to be a bit more cautious about
applying it for now,

Steve.




Re: [Cluster-devel] [PATCH 4/4] gfs2: introduce AIL lock

2010-02-05 Thread Steven Whitehouse
Hi,

On Fri, 2010-02-05 at 16:45 +1100, Dave Chinner wrote:
 THe log lock is currently used to protect the AIL lists and
 the movements of buffers into and out of them. The lists
 are self contained and no log specific items outside the
 lists are accessed when starting or emptying the AIL lists.
 
 Hence the operation of the AIL does not require the protection
 of the log lock so split them out into a new AIL specific lock
 to reduce the amount of traffic on the log lock. This will
 also reduce the amount of serialisation that occurs when
 the gfs2_logd pushes on the AIL to move it forward.
 
 This reduces the impact of log pushing on sequential write
 throughput. On no-op scheduler on a disk that can do 85MB/s,
 this increases the write rate from 65MB/s with the ordering
 fixes to 75MB/s.
 
 Signed-off-by: Dave Chinner dchin...@redhat.com

This looks good, but a couple of comments:

 ---
  fs/gfs2/glops.c  |   10 --
  fs/gfs2/incore.h |1 +
  fs/gfs2/log.c|   32 +---
  fs/gfs2/log.h|   22 ++
  fs/gfs2/lops.c   |5 -
  fs/gfs2/ops_fstype.c |1 +
  6 files changed, 53 insertions(+), 18 deletions(-)
 
 diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
 index 78554ac..65048f9 100644
 --- a/fs/gfs2/glops.c
 +++ b/fs/gfs2/glops.c
 @@ -57,20 +57,26 @@ static void gfs2_ail_empty_gl(struct gfs2_glock *gl)
   BUG_ON(current-journal_info);
   current-journal_info = tr;
  
 - gfs2_log_lock(sdp);
 + gfs2_ail_lock(sdp);
   this abstraction of a spinlock is left over from the old
gfs1 code. I'd prefer when adding new locks just to use spinlock()
directly, rather than abstracting it out like this. That way we don't
have to think about what kind of lock it is.

[snip]
 diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
 index 0fe2f3c..342d65e 100644
 --- a/fs/gfs2/lops.c
 +++ b/fs/gfs2/lops.c
 @@ -80,7 +80,7 @@ static void gfs2_unpin(struct gfs2_sbd *sdp, struct 
 buffer_head *bh,
   mark_buffer_dirty(bh);
   clear_buffer_pinned(bh);
  
 - gfs2_log_lock(sdp);
 + gfs2_ail_lock(sdp);
   if (bd-bd_ail) {
   list_del(bd-bd_ail_st_list);
   brelse(bh);
 @@ -91,6 +91,9 @@ static void gfs2_unpin(struct gfs2_sbd *sdp, struct 
 buffer_head *bh,
   }
   bd-bd_ail = ai;
   list_add(bd-bd_ail_st_list, ai-ai_ail1_list);
 + gfs2_ail_unlock(sdp);
 +
 + gfs2_log_lock(sdp);
   clear_bit(GLF_LFLUSH, bd-bd_gl-gl_flags);
   trace_gfs2_pin(bd, 0);
   gfs2_log_unlock(sdp);
I don't think the gfs2_log_lock() is actually required at this point.
the LFLUSH bit is protected by the sd_log_flush_lock rwsem
and the tracing doesn't need the log lock either,

Steve.




Re: [Cluster-devel] [GFS2 PATCH] - Bug 537201 - Better error reporting when mounting a gfs fs without enough journals

2010-02-08 Thread Steven Whitehouse
Hi,

Now in the GFS2 -nmw git tree. Thanks,

Steve.

On Fri, 2010-02-05 at 18:25 -0500, Abhijith Das wrote:
 Please ignore the previous patch. The patch inlining didn't work right. 
 Here's the unmangled one.
 
 --Abhi
 
 - Abhijith Das a...@redhat.com wrote:
 
  From: Abhijith Das a...@redhat.com
  To: cluster-devel cluster-devel@redhat.com
  Sent: Friday, February 5, 2010 5:17:56 PM GMT -06:00 US/Canada Central
  Subject: [Cluster-devel] [GFS2 PATCH] - Bug 537201 - Better error reporting 
  when mounting a gfs fs without enough
  journals
 
  Hi,
  
  We need this one-liner to signal the mount helper of the 'insufficient
  journals' condition.
  
  Signed-off-by: Abhijith Das a...@redhat.com
  
  diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
  index d405c38..a054b52 100644
  --- a/fs/gfs2/ops_fstype.c
  +++ b/fs/gfs2/ops_fstype.c
  @@ -724,7 +724,7 @@ static int init_journal(struct gfs2_sbd *sdp, int
  undo)
  goto fail;
  }
   
  -   error = -EINVAL;
  +   error = -EUSERS;
  if (!gfs2_jindex_size(sdp)) {
  fs_err(sdp, no journals!\n);
  goto fail_jindex;




[Cluster-devel] GFS2: Fix bmap allocation corner-case bug

2010-02-11 Thread Steven Whitehouse

This patch solves a corner case during allocation which occurs if both
metadata (indirect) and data blocks are required but there is an
obstacle in the filesystem (e.g. a resource group header or another
allocated block) such that when the allocation is requested only
enough blocks for the metadata are returned.

By changing the exit condition of this loop, we ensure that a
minimum of one data block will always be returned.

Signed-off-by: Steven Whitehouse swhit...@redhat.com

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 6d47379..583e823 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -541,7 +541,7 @@ static int gfs2_bmap_alloc(struct inode *inode, const 
sector_t lblock,
*ptr++ = cpu_to_be64(bn++);
break;
}
-   } while (state != ALLOC_DATA);
+   } while ((state != ALLOC_DATA) || !dblock);
 
ip-i_height = height;
gfs2_add_inode_blocks(ip-i_inode, alloced);




[Cluster-devel] [PATCH 2/2] GFS2: Fix bmap allocation corner-case bug

2010-02-12 Thread Steven Whitehouse
This patch solves a corner case during allocation which occurs if both
metadata (indirect) and data blocks are required but there is an
obstacle in the filesystem (e.g. a resource group header or another
allocated block) such that when the allocation is requested only
enough blocks for the metadata are returned.

By changing the exit condition of this loop, we ensure that a
minimum of one data block will always be returned.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/bmap.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 6d47379..583e823 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -541,7 +541,7 @@ static int gfs2_bmap_alloc(struct inode *inode, const 
sector_t lblock,
*ptr++ = cpu_to_be64(bn++);
break;
}
-   } while (state != ALLOC_DATA);
+   } while ((state != ALLOC_DATA) || !dblock);
 
ip-i_height = height;
gfs2_add_inode_blocks(ip-i_inode, alloced);
-- 
1.6.2.5



[Cluster-devel] [GFS2] Pre-pull patch posting (fixes)

2010-02-12 Thread Steven Whitehouse
Hi,

Here are a couple of GFS2 fixes. Both are one-liners,

Steve.



[Cluster-devel] [PATCH 1/2] GFS2: Fix error code

2010-02-12 Thread Steven Whitehouse
From: Abhijith Das a...@redhat.com

We need this one-liner to signal the mount helper of the 'insufficient 
journals' condition.

Signed-off-by: Abhijith Das a...@redhat.com
Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/ops_fstype.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 8a102f7..a86ed63 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -725,7 +725,7 @@ static int init_journal(struct gfs2_sbd *sdp, int undo)
goto fail;
}
 
-   error = -EINVAL;
+   error = -EUSERS;
if (!gfs2_jindex_size(sdp)) {
fs_err(sdp, no journals!\n);
goto fail_jindex;
-- 
1.6.2.5



[Cluster-devel] GFS2: Pull request (fixes)

2010-02-12 Thread Steven Whitehouse
Hi,

Please consider pulling the following changes,

Steve.

-
The following changes since commit 676ad585531e965416fd958747894541dabcec96:
  Linus Torvalds (1):
Merge branch 'for-linus' of git://git.kernel.org/.../bp/bp

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes.git master

Abhijith Das (1):
  GFS2: Fix error code

Steven Whitehouse (1):
  GFS2: Fix bmap allocation corner-case bug

 fs/gfs2/bmap.c   |2 +-
 fs/gfs2/ops_fstype.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)




[Cluster-devel] GFS2: -nmw git tree

2010-02-13 Thread Steven Whitehouse
Hi,

Linus has pulled a couple of fixes, so I've rebased the -nmw git tree
again,

Steve.




[Cluster-devel] [dlm] Two small sysfs patches

2010-02-17 Thread Steven Whitehouse
Hi,

Please queue the following two patches for the next merge window
for dlm. The first one adds a new sysfs variable so that the
lockspace can be obtained without resorting to parsing the
initial line of the sysfs message.

The second one removes some obsolete code relating to one of the
sysfs files,

Steve.




[Cluster-devel] [PATCH 1/2] dlm: Send lockspace name with uevents

2010-02-17 Thread Steven Whitehouse
Although it is possible to get this information from the path,
its much easier to provide the lockspace as a seperate env
variable.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/dlm/lockspace.c |   14 +-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
index c010ecf..26a8bd4 100644
--- a/fs/dlm/lockspace.c
+++ b/fs/dlm/lockspace.c
@@ -191,6 +191,18 @@ static int do_uevent(struct dlm_ls *ls, int in)
return error;
 }
 
+static int dlm_uevent(struct kset *kset, struct kobject *kobj,
+ struct kobj_uevent_env *env)
+{
+   struct dlm_ls *ls = container_of(kobj, struct dlm_ls, ls_kobj);
+
+   add_uevent_var(env, LOCKSPACE=%s, ls-ls_name);
+   return 0;
+}
+
+static struct kset_uevent_ops dlm_uevent_ops = {
+   .uevent = dlm_uevent,
+};
 
 int __init dlm_lockspace_init(void)
 {
@@ -199,7 +211,7 @@ int __init dlm_lockspace_init(void)
INIT_LIST_HEAD(lslist);
spin_lock_init(lslist_lock);
 
-   dlm_kset = kset_create_and_add(dlm, NULL, kernel_kobj);
+   dlm_kset = kset_create_and_add(dlm, dlm_uevent_ops, kernel_kobj);
if (!dlm_kset) {
printk(KERN_WARNING %s: can not create kset\n, __func__);
return -ENOMEM;
-- 
1.6.2.5



[Cluster-devel] [PATCH 2/2] dlm: Remove obsolete lockspace lookup

2010-02-17 Thread Steven Whitehouse
We don't need to look up the lockspace in this particular
case since we already have a pointer to it (which was being
dereferenced in order to do the lookup in the first place).

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/dlm/lockspace.c |6 +-
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
index 26a8bd4..ce0fdf5 100644
--- a/fs/dlm/lockspace.c
+++ b/fs/dlm/lockspace.c
@@ -37,10 +37,6 @@ static ssize_t dlm_control_store(struct dlm_ls *ls, const 
char *buf, size_t len)
ssize_t ret = len;
int n = simple_strtol(buf, NULL, 0);
 
-   ls = dlm_find_lockspace_local(ls-ls_local_handle);
-   if (!ls)
-   return -EINVAL;
-
switch (n) {
case 0:
dlm_ls_stop(ls);
@@ -51,7 +47,7 @@ static ssize_t dlm_control_store(struct dlm_ls *ls, const 
char *buf, size_t len)
default:
ret = -EINVAL;
}
-   dlm_put_lockspace(ls);
+
return ret;
 }
 
-- 
1.6.2.5



[Cluster-devel] dlm: Remove/bypass astd

2010-02-17 Thread Steven Whitehouse

While investigating Red Hat bug #537010 I started looking at the dlm's astd
thread. The way in which the cast and bast requests are queued looked
as if it might cause reordering since the bast requests are always
delivered after any pending cast requests which is not always the
correct ordering. This patch doesn't fix that bug, but it will prevent any
races in that bit of code, and the performance benefits are also well
worth having.

I noticed that astd seems to be extraneous to requirements. The notifications
to astd are already running in process context, so they could be delivered
directly. That should improve smp performance since all the notifications
would no longer be funneled through a single thread.

Also, the only other function of astd seemed to be stopping the delivery
of these notifications during recovery. Since, however, the notifications
which are intercepted at recovery time are neither modified, nor filtered
in any way, the only effect is to delay notifications for no obvious reason.

I thought that probably removing the astd thread and delivering the cast
and bast notifications directly would improve performance due to the
elimination of a scheduling delay. I wrote a small test module which
creates a dlm lock space, and does 100,000 NL - EX - NL lock conversions.

Having run this test 10 times each on a 2.6.33-rc8 kernel and then the modified
kernel including this patch, I got the following results:

Original: Avg time 24.62 us per conversion (NL - EX - NL)
Modified: Avg time 9.93 us per conversion

Which is a fairly dramatic speed up. Please consider applying this patch.
I've tested it in both clustered and single node GFS2 configurations. The test
figures are from a single node configuration which was a deliberate choice
in order to avoid any effects of network latency.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/dlm/Makefile|3 +-
 fs/dlm/ast.c   |  165 
 fs/dlm/ast.h   |   26 
 fs/dlm/lock.c  |   16 -
 fs/dlm/lockspace.c |   17 +-
 fs/dlm/recover.c   |1 -
 fs/dlm/recoverd.c  |   11 
 7 files changed, 15 insertions(+), 224 deletions(-)
 delete mode 100644 fs/dlm/ast.c
 delete mode 100644 fs/dlm/ast.h

diff --git a/fs/dlm/Makefile b/fs/dlm/Makefile
index ca1c912..8f9f4d2 100644
--- a/fs/dlm/Makefile
+++ b/fs/dlm/Makefile
@@ -1,6 +1,5 @@
 obj-$(CONFIG_DLM) +=   dlm.o
-dlm-y :=   ast.o \
-   config.o \
+dlm-y :=   config.o \
dir.o \
lock.o \
lockspace.o \
diff --git a/fs/dlm/ast.c b/fs/dlm/ast.c
deleted file mode 100644
index dc2ad60..000
--- a/fs/dlm/ast.c
+++ /dev/null
@@ -1,165 +0,0 @@
-/**
-***
-**
-**  Copyright (C) Sistina Software, Inc.  1997-2003  All rights reserved.
-**  Copyright (C) 2004-2008 Red Hat, Inc.  All rights reserved.
-**
-**  This copyrighted material is made available to anyone wishing to use,
-**  modify, copy, or redistribute it subject to the terms and conditions
-**  of the GNU General Public License v.2.
-**
-***
-**/
-
-#include dlm_internal.h
-#include lock.h
-#include user.h
-#include ast.h
-
-#define WAKE_ASTS  0
-
-static struct list_headast_queue;
-static spinlock_t  ast_queue_lock;
-static struct task_struct *astd_task;
-static unsigned long   astd_wakeflags;
-static struct mutexastd_running;
-
-
-void dlm_del_ast(struct dlm_lkb *lkb)
-{
-   spin_lock(ast_queue_lock);
-   if (lkb-lkb_ast_type  (AST_COMP | AST_BAST))
-   list_del(lkb-lkb_astqueue);
-   spin_unlock(ast_queue_lock);
-}
-
-void dlm_add_ast(struct dlm_lkb *lkb, int type, int bastmode)
-{
-   if (lkb-lkb_flags  DLM_IFL_USER) {
-   dlm_user_add_ast(lkb, type, bastmode);
-   return;
-   }
-
-   spin_lock(ast_queue_lock);
-   if (!(lkb-lkb_ast_type  (AST_COMP | AST_BAST))) {
-   kref_get(lkb-lkb_ref);
-   list_add_tail(lkb-lkb_astqueue, ast_queue);
-   }
-   lkb-lkb_ast_type |= type;
-   if (bastmode)
-   lkb-lkb_bastmode = bastmode;
-   spin_unlock(ast_queue_lock);
-
-   set_bit(WAKE_ASTS, astd_wakeflags);
-   wake_up_process(astd_task);
-}
-
-static void process_asts(void)
-{
-   struct dlm_ls *ls = NULL;
-   struct dlm_rsb *r = NULL;
-   struct dlm_lkb *lkb;
-   void (*cast) (void *astparam);
-   void (*bast) (void *astparam, int mode);
-   int type = 0, bastmode;
-
-repeat:
-   spin_lock(ast_queue_lock

Re: [Cluster-devel] dlm: Remove/bypass astd

2010-02-17 Thread Steven Whitehouse
Hi,

On Wed, 2010-02-17 at 13:43 +, Christine Caulfield wrote:
 One of the reasons that ASTs are delivered in a separate thread was to 
 allow ASTs do do other locking operations without causing a deadlock. 
 eg. it would allow locks to be dropped or converted inside a blocking 
 AST callback routine.
 
Hmm... GFS2 doesn't require that at all, nor is it ever likely to since
we have the glock layer to deal with that. I've looked at the OCFS2 code
and I don't think they need it either - maybe Mark or Joel can confirm
that for certain. Those are the only two users at the moment.

If it were to be the case that locking operations were being done in the
context of the astd thread, the performance would be pretty poor since
its a single thread no matter how many locks and lock spaces are in use.
The only reasonable use for such a thing would also involve having to
deal with the cache control for the locked object too (which for all
current cases means disk I/O and/or cache invalidation), which would
then also be limited to this single thread.

 So maybe either the new code already allows for this or it's 
 functionality that's not needed in the kernel. It should still be an 
 option for userspace applications, but that's a different story 
 altogether, of course
 
 Chrissie
 
Yes, I've left the userspace interface code alone for now. That
continues to work in the original way. My main concern is with the
kernel interface at the moment,

Steve.

 On 17/02/10 13:23, Steven Whitehouse wrote:
 
  While investigating Red Hat bug #537010 I started looking at the dlm's astd
  thread. The way in which the cast and bast requests are queued looked
  as if it might cause reordering since the bast requests are always
  delivered after any pending cast requests which is not always the
  correct ordering. This patch doesn't fix that bug, but it will prevent any
  races in that bit of code, and the performance benefits are also well
  worth having.
 
  I noticed that astd seems to be extraneous to requirements. The 
  notifications
  to astd are already running in process context, so they could be delivered
  directly. That should improve smp performance since all the notifications
  would no longer be funneled through a single thread.
 
  Also, the only other function of astd seemed to be stopping the delivery
  of these notifications during recovery. Since, however, the notifications
  which are intercepted at recovery time are neither modified, nor filtered
  in any way, the only effect is to delay notifications for no obvious reason.
 
  I thought that probably removing the astd thread and delivering the cast
  and bast notifications directly would improve performance due to the
  elimination of a scheduling delay. I wrote a small test module which
  creates a dlm lock space, and does 100,000 NL -  EX -  NL lock 
  conversions.
 
  Having run this test 10 times each on a 2.6.33-rc8 kernel and then the 
  modified
  kernel including this patch, I got the following results:
 
  Original: Avg time 24.62 us per conversion (NL -  EX -  NL)
  Modified: Avg time 9.93 us per conversion
 
  Which is a fairly dramatic speed up. Please consider applying this patch.
  I've tested it in both clustered and single node GFS2 configurations. The 
  test
  figures are from a single node configuration which was a deliberate choice
  in order to avoid any effects of network latency.
 
  Signed-off-by: Steven Whitehouseswhit...@redhat.com
  ---




Re: [Cluster-devel] [PATCH 2/2] dlm: Remove obsolete lockspace lookup

2010-02-18 Thread Steven Whitehouse
Hi,

On Wed, 2010-02-17 at 15:12 -0500, David Teigland wrote:
 On Wed, Feb 17, 2010 at 09:41:35AM +, Steven Whitehouse wrote:
  We don't need to look up the lockspace in this particular
  case since we already have a pointer to it (which was being
  dereferenced in order to do the lookup in the first place).
 
 It'll take more to convince me that that reference from find isn't needed.
 My assumption is that I added it because it was.
 
 Dave
 
I'm not sure what more I can say here this is a sysfs file store
function and one of the reasons for using it is that sysfs looks after
the ref counting for you.

Even aside from that, if you don't have a reference to the lockspace,
then the dereference that is done to discover the lockspace name would
be invalid, since the structure might have already been freed before the
reference is obtained.

You could also compare with with the other store and show functions in
that same file and notice that none of them try to grab a reference to
the lockspace in that way. So if this is required, then it must be
required for those functions too.

Either way there is something not quite right here and having studied
the code in some detail, I'm pretty sure this is the correct fix,

Steve.

  Signed-off-by: Steven Whitehouse swhit...@redhat.com
  ---
   fs/dlm/lockspace.c |6 +-
   1 files changed, 1 insertions(+), 5 deletions(-)
  
  diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
  index 26a8bd4..ce0fdf5 100644
  --- a/fs/dlm/lockspace.c
  +++ b/fs/dlm/lockspace.c
  @@ -37,10 +37,6 @@ static ssize_t dlm_control_store(struct dlm_ls *ls, 
  const char *buf, size_t len)
  ssize_t ret = len;
  int n = simple_strtol(buf, NULL, 0);
   
  -   ls = dlm_find_lockspace_local(ls-ls_local_handle);
  -   if (!ls)
  -   return -EINVAL;
  -
  switch (n) {
  case 0:
  dlm_ls_stop(ls);
  @@ -51,7 +47,7 @@ static ssize_t dlm_control_store(struct dlm_ls *ls, const 
  char *buf, size_t len)
  default:
  ret = -EINVAL;
  }
  -   dlm_put_lockspace(ls);
  +
  return ret;
   }
   
  -- 
  1.6.2.5




Re: [Cluster-devel] [PATCH 2/2] dlm: Remove obsolete lockspace lookup

2010-02-19 Thread Steven Whitehouse
Hi,

On Thu, 2010-02-18 at 16:04 -0500, David Teigland wrote:
 On Thu, Feb 18, 2010 at 09:16:03AM +, Steven Whitehouse wrote:
  I'm not sure what more I can say here this is a sysfs file store
  function and one of the reasons for using it is that sysfs looks after
  the ref counting for you.
  
  Even aside from that, if you don't have a reference to the lockspace,
  then the dereference that is done to discover the lockspace name would
  be invalid, since the structure might have already been freed before the
  reference is obtained.
  
  You could also compare with with the other store and show functions in
  that same file and notice that none of them try to grab a reference to
  the lockspace in that way. So if this is required, then it must be
  required for those functions too.
  
  Either way there is something not quite right here and having studied
  the code in some detail, I'm pretty sure this is the correct fix,
 
 I guess you didn't see this oops in your tests.  Can you show that the
 situation in this commit is no longer possible?
 
No, I didn't hit it. I'm not sure how to reproduce whatever situation
led to this in the first place.

There was a clue though in the patch prior to the one you pointed out in
the git tree, the comment in this patch doesn't make a lot of sense
until without the context from that patch. I noticed that where the
sysfs function does this:

 + ls = dlm_find_lockspace_local(ls-ls_local_handle);
 + if (!ls)
 + return -EINVAL;
 +

it isn't primarily a ref count operation. Yes, it does get a ref count
on the object if it is successful, but the main purpose is testing to
see if the shutdown process has started (i.e. is the lockspace still on
the ls_list). If the list removal used a list_del_init rather than a
list del, the dlm_find_lockspace_local() call could be replaced with:

spin_lock(lslist_lock);
ret = list_empty(ls-ls_list);
if (!ret)
ls-ls_count++;
spin_unlock(lslist_lock);
if (ret)
return -EINVAL;

which might be a bit less confusing, and also saves traversing the list
of lockspaces. This is basically a hold operation, rather than a
find/get type operation.

My confusion has arisen from the fact that there are three ref counters
for the lockspace object. One is ls_count, one is ls_create_count and
the other the is kobject ref count.

ls_create_count seems to deal with user references, ls_count seems to be
used for internal references and the kobject ref count only seems to be
incremented/decremented on initial object creation/removal.

Probably the correct long term solution is to at least merge the
ls_count into kobject ref count system, and maybe the ls_create_count
too. I'll have to do some more investigation before I can see whether
there are any reasons why that isn't possible.

Either way, we are getting away from what was originally a small and
simple patch, so I'll suggest to ignore this one for now, and just apply
the first one of the two which I sent. I'll have another look at this in
the mean time,

Steve.





[Cluster-devel] [PATCH 4/5] GFS2: ordered writes are backwards

2010-03-01 Thread Steven Whitehouse
From: Dave Chinner dchin...@redhat.com

When we queue data buffers for ordered write, the buffers are added
to the head of the ordered write list. When the log needs to push
these buffers to disk, it also walks the list from the head. The
result is that the the ordered buffers are submitted to disk in
reverse order.

For large writes, this means that whenever the log flushes large
streams of reverse sequential order buffers are pushed down into the
block layers. The elevators don't handle this particularly well, so
IO rates tend to be significantly lower than if the IO was issued in
ascending block order.

Queue new ordered buffers to the tail of the ordered buffer list to
ensure that IO is dispatched in the order it was submitted. This
should significantly improve large sequential write speeds. On a
disk capable of 85MB/s, speeds increase from 50MB/s to 65MB/s for
noop and from 38MB/s to 50MB/s for cfq.

Signed-off-by: Dave Chinner dchin...@redhat.com
Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/lops.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index de97632..adc260f 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -528,9 +528,9 @@ static void databuf_lo_add(struct gfs2_sbd *sdp, struct 
gfs2_log_element *le)
gfs2_pin(sdp, bd-bd_bh);
tr-tr_num_databuf_new++;
sdp-sd_log_num_databuf++;
-   list_add(le-le_list, sdp-sd_log_le_databuf);
+   list_add_tail(le-le_list, sdp-sd_log_le_databuf);
} else {
-   list_add(le-le_list, sdp-sd_log_le_ordered);
+   list_add_tail(le-le_list, sdp-sd_log_le_ordered);
}
 out:
gfs2_log_unlock(sdp);
-- 
1.6.2.5



[Cluster-devel] [PATCH 5/5] GFS2: print glock numbers in hex

2010-03-01 Thread Steven Whitehouse
From: Bob Peterson rpete...@redhat.com

This patch changes glock numbers from printing in decimal to hex.
Since DLM prints corresponding resource IDs in hex, it makes debugging
easier.

Signed-off-by: Bob Peterson rpete...@redhat.com
Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/glock.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 4773f90..454d4b4 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1658,7 +1658,7 @@ static int __dump_glock(struct seq_file *seq, const 
struct gfs2_glock *gl)
dtime *= 100/HZ; /* demote time in uSec */
if (!test_bit(GLF_DEMOTE, gl-gl_flags))
dtime = 0;
-   gfs2_print_dbg(seq, G:  s:%s n:%u/%llu f:%s t:%s d:%s/%llu a:%d 
r:%d\n,
+   gfs2_print_dbg(seq, G:  s:%s n:%u/%llx f:%s t:%s d:%s/%llu a:%d 
r:%d\n,
  state2str(gl-gl_state),
  gl-gl_name.ln_type,
  (unsigned long long)gl-gl_name.ln_number,
-- 
1.6.2.5



[Cluster-devel] [PATCH 2/5] GFS2: Remove loopy umount code

2010-03-01 Thread Steven Whitehouse
As a consequence of the previous patch, we can now remove the
loop which used to be required due to the circular dependency
between the inodes and glocks. Instead we can just invalidate
the inodes, and then clear up any glocks which are left.

Also we no longer need the rwsem since there is no longer any
danger of the inode invalidation calling back into the glock
code (and from there back into the inode code).

Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/glock.c  |   33 ++---
 fs/gfs2/incore.h |1 -
 fs/gfs2/ops_fstype.c |4 +---
 fs/gfs2/super.c  |1 +
 fs/gfs2/sys.c|2 --
 5 files changed, 4 insertions(+), 37 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index dfb10a4..4773f90 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -19,7 +19,6 @@
 #include linux/list.h
 #include linux/wait.h
 #include linux/module.h
-#include linux/rwsem.h
 #include asm/uaccess.h
 #include linux/seq_file.h
 #include linux/debugfs.h
@@ -60,7 +59,6 @@ static int __dump_glock(struct seq_file *seq, const struct 
gfs2_glock *gl);
 #define GLOCK_BUG_ON(gl,x) do { if (unlikely(x)) { __dump_glock(NULL, gl); 
BUG(); } } while(0)
 static void do_xmote(struct gfs2_glock *gl, struct gfs2_holder *gh, unsigned 
int target);
 
-static DECLARE_RWSEM(gfs2_umount_flush_sem);
 static struct dentry *gfs2_root;
 static struct workqueue_struct *glock_workqueue;
 struct workqueue_struct *gfs2_delete_workqueue;
@@ -714,7 +712,6 @@ static void glock_work_func(struct work_struct *work)
finish_xmote(gl, gl-gl_reply);
drop_ref = 1;
}
-   down_read(gfs2_umount_flush_sem);
spin_lock(gl-gl_spin);
if (test_and_clear_bit(GLF_PENDING_DEMOTE, gl-gl_flags) 
gl-gl_state != LM_ST_UNLOCKED 
@@ -727,7 +724,6 @@ static void glock_work_func(struct work_struct *work)
}
run_queue(gl, 0);
spin_unlock(gl-gl_spin);
-   up_read(gfs2_umount_flush_sem);
if (!delay ||
queue_delayed_work(glock_workqueue, gl-gl_work, delay) == 0)
gfs2_glock_put(gl);
@@ -1512,35 +1508,10 @@ void gfs2_glock_thaw(struct gfs2_sbd *sdp)
 
 void gfs2_gl_hash_clear(struct gfs2_sbd *sdp)
 {
-   unsigned long t;
unsigned int x;
-   int cont;
 
-   t = jiffies;
-
-   for (;;) {
-   cont = 0;
-   for (x = 0; x  GFS2_GL_HASH_SIZE; x++) {
-   if (examine_bucket(clear_glock, sdp, x))
-   cont = 1;
-   }
-
-   if (!cont)
-   break;
-
-   if (time_after_eq(jiffies,
- t + gfs2_tune_get(sdp, gt_stall_secs) * HZ)) {
-   fs_warn(sdp, Unmount seems to be stalled. 
-Dumping lock state...\n);
-   gfs2_dump_lockstate(sdp);
-   t = jiffies;
-   }
-
-   down_write(gfs2_umount_flush_sem);
-   invalidate_inodes(sdp-sd_vfs);
-   up_write(gfs2_umount_flush_sem);
-   msleep(10);
-   }
+   for (x = 0; x  GFS2_GL_HASH_SIZE; x++)
+   examine_bucket(clear_glock, sdp, x);
flush_workqueue(glock_workqueue);
wait_event(sdp-sd_glock_wait, atomic_read(sdp-sd_glock_disposal) == 
0);
gfs2_dump_lockstate(sdp);
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 1de7e1b..b8025e5 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -451,7 +451,6 @@ struct gfs2_tune {
unsigned int gt_quota_quantum; /* Secs between syncs to quota file */
unsigned int gt_new_files_jdata;
unsigned int gt_max_readahead; /* Max bytes to read-ahead from disk */
-   unsigned int gt_stall_secs; /* Detects trouble! */
unsigned int gt_complain_secs;
unsigned int gt_statfs_quantum;
unsigned int gt_statfs_slow;
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index a86ed63..a054b52 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -65,7 +65,6 @@ static void gfs2_tune_init(struct gfs2_tune *gt)
gt-gt_quota_scale_den = 1;
gt-gt_new_files_jdata = 0;
gt-gt_max_readahead = 1  18;
-   gt-gt_stall_secs = 600;
gt-gt_complain_secs = 10;
 }
 
@@ -1241,10 +1240,9 @@ fail_sb:
 fail_locking:
init_locking(sdp, mount_gh, UNDO);
 fail_lm:
+   invalidate_inodes(sb);
gfs2_gl_hash_clear(sdp);
gfs2_lm_unmount(sdp);
-   while (invalidate_inodes(sb))
-   yield();
 fail_sys:
gfs2_sys_fs_del(sdp);
 fail:
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index ad7bc2d..e5e2262 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -859,6 +859,7 @@ restart:
gfs2_clear_rgrpd(sdp);
gfs2_jindex_free(sdp);
/*  Take apart glock structures and buffer lists  */
+   invalidate_inodes(sdp-sd_vfs

[Cluster-devel] GFS2: Pull request

2010-03-02 Thread Steven Whitehouse
Hi,

Please consider pulling the following GFS2 changes,

Steve.

--
The following changes since commit 30ff056c42c665b9ea535d8515890857ae382540:
  Linus Torvalds (1):
Merge branch 'x86-uv-for-linus' of 
git://git.kernel.org/.../tip/linux-2.6-tip

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw.git master

Abhijith Das (1):
  GFS2: Remove old, unused linked list code from quota

Bob Peterson (1):
  GFS2: print glock numbers in hex

Dave Chinner (1):
  GFS2: ordered writes are backwards

Steven Whitehouse (2):
  GFS2: Metadata address space clean up
  GFS2: Remove loopy umount code

 fs/gfs2/aops.c  |4 +-
 fs/gfs2/glock.c |   75 ++-
 fs/gfs2/glock.h |7 
 fs/gfs2/glops.c |   16 +
 fs/gfs2/incore.h|5 +--
 fs/gfs2/inode.c |6 +--
 fs/gfs2/lock_dlm.c  |5 ++-
 fs/gfs2/lops.c  |4 +-
 fs/gfs2/main.c  |   28 
 fs/gfs2/meta_io.c   |   46 +++---
 fs/gfs2/meta_io.h   |   12 ++-
 fs/gfs2/ops_fstype.c|4 +--
 fs/gfs2/super.c |   27 +--
 fs/gfs2/sys.c   |2 -
 fs/gfs2/util.c  |1 +
 fs/gfs2/util.h  |1 +
 include/linux/gfs2_ondisk.h |   30 +
 17 files changed, 109 insertions(+), 164 deletions(-)




Re: [Cluster-devel] [PATCH] gfs2: do not select QUOTA

2010-03-03 Thread Steven Whitehouse
Hi,

Looks good. Since I'm waiting for Linus to pull at the moment, I'll wait
for that to happen and send this in the next batch of patches,

Steve.

On Wed, 2010-03-03 at 08:53 -0500, Christoph Hellwig wrote:
 gfs2 only needs the quotactl code, not the generic quota implementation.
 
 Signed-off-by: Christoph Hellwig h...@lst.de
 
 Index: linux-2.6/fs/gfs2/Kconfig
 ===
 --- linux-2.6.orig/fs/gfs2/Kconfig2010-03-03 14:48:00.292026869 +0100
 +++ linux-2.6/fs/gfs2/Kconfig 2010-03-03 14:48:03.546284090 +0100
 @@ -8,7 +8,6 @@ config GFS2_FS
   select FS_POSIX_ACL
   select CRC32
   select SLOW_WORK
 - select QUOTA
   select QUOTACTL
   help
 A cluster filesystem.




[Cluster-devel] GFS2 -nmw git tree

2010-03-09 Thread Steven Whitehouse
Hi,

Now that 2.6.34-rc1 is out, I've redone the -nmw git tree. There is only
one small patch in it at the moment. No doubt there will be more in the
not too distant future,

Steve.




[Cluster-devel] GFS2: Pre-pull patch posting

2010-03-11 Thread Steven Whitehouse
Here are three small (but important!) fixes to GFS2.

Steve.



[Cluster-devel] [PATCH 2/3] GFS2: Allow the number of committed revokes to temporarily be negative

2010-03-11 Thread Steven Whitehouse
From: Benjamin Marzinski bmarz...@redhat.com

GFS2 tracks the number of revokes and unrevokes that are part of committed
transactions via sd_log_commited_revoke. It is possible for one process to add
revokes during its transaction, while another process unrevokes them during its
transaction. If the second process finishes its transaction first,
sd_log_commited_revoke will be decremented by the number of unrevokes that the
second process did, without first being incremented by the number of revokes
the first process did. This is fine, since all started transactions must be
completed before the journal can be flushed.  However, sd_log_commited_revoke
is an unsigned integer, and log_refund() causes an assertion failure if it
would go negative at the end of a transaction.  This patch makes
sd_log_commited_revoke a signed integer and allows it to go negative.
__gfs2_log_flush() still checks that it mataches the actual number of revokes.

Signed-off-by: Benjamin Marzinski bmarz...@redhat.com
Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/incore.h |2 +-
 fs/gfs2/log.c|3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index b8025e5..3aac46f 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -616,7 +616,7 @@ struct gfs2_sbd {
unsigned int sd_log_blks_reserved;
unsigned int sd_log_commited_buf;
unsigned int sd_log_commited_databuf;
-   unsigned int sd_log_commited_revoke;
+   int sd_log_commited_revoke;
 
unsigned int sd_log_num_buf;
unsigned int sd_log_num_revoke;
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 4511b08..e5bf4b5 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -417,7 +417,7 @@ static unsigned int calc_reserved(struct gfs2_sbd *sdp)
databufhdrs_needed = (sdp-sd_log_commited_databuf +
  (dbuf_limit - 1)) / dbuf_limit;
 
-   if (sdp-sd_log_commited_revoke)
+   if (sdp-sd_log_commited_revoke  0)
revokes = gfs2_struct2blk(sdp, sdp-sd_log_commited_revoke,
  sizeof(u64));
 
@@ -790,7 +790,6 @@ static void log_refund(struct gfs2_sbd *sdp, struct 
gfs2_trans *tr)
gfs2_assert_withdraw(sdp, (((int)sdp-sd_log_commited_buf) = 0) ||
 (((int)sdp-sd_log_commited_databuf) = 0));
sdp-sd_log_commited_revoke += tr-tr_num_revoke - tr-tr_num_revoke_rm;
-   gfs2_assert_withdraw(sdp, ((int)sdp-sd_log_commited_revoke) = 0);
reserved = calc_reserved(sdp);
gfs2_assert_withdraw(sdp, sdp-sd_log_blks_reserved + tr-tr_reserved 
= reserved);
unused = sdp-sd_log_blks_reserved - reserved + tr-tr_reserved;
-- 
1.6.2.5



[Cluster-devel] [PATCH 1/3] GFS2: do not select QUOTA

2010-03-11 Thread Steven Whitehouse
From: Christoph Hellwig h...@infradead.org

gfs2 only needs the quotactl code, not the generic quota implementation.

Signed-off-by: Christoph Hellwig h...@lst.de
Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/Kconfig |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
index 4dcddf8..a47b431 100644
--- a/fs/gfs2/Kconfig
+++ b/fs/gfs2/Kconfig
@@ -8,7 +8,6 @@ config GFS2_FS
select FS_POSIX_ACL
select CRC32
select SLOW_WORK
-   select QUOTA
select QUOTACTL
help
  A cluster filesystem.
-- 
1.6.2.5



[Cluster-devel] [PATCH 3/3] GFS2: Skip check for mandatory locks when unlocking

2010-03-11 Thread Steven Whitehouse
From: Sachin Prabhu spra...@redhat.com

gfs2_lock() will skip locks on file which have mode set to 02666. This is a 
problem in cases where the mode of the file is changed after a process has 
obtained a lock on the file. Such a lock will be skipped and will result in a 
BUG in locks_remove_flock().

gfs2_lock() should skip the check for mandatory locks when unlocking a file.

Signed-off-by: Sachin Prabhu spra...@redhat.com
Signed-off-by: Steven Whitehouse swhit...@redhat.com
---
 fs/gfs2/file.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index a6abbae..e6dd2ae 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -640,7 +640,7 @@ static int gfs2_lock(struct file *file, int cmd, struct 
file_lock *fl)
 
if (!(fl-fl_flags  FL_POSIX))
return -ENOLCK;
-   if (__mandatory_lock(ip-i_inode))
+   if (__mandatory_lock(ip-i_inode)  fl-fl_type != F_UNLCK)
return -ENOLCK;
 
if (cmd == F_CANCELLK) {
-- 
1.6.2.5



[Cluster-devel] GFS2: Pull request (fixes)

2010-03-11 Thread Steven Whitehouse
Hi,

Please consider pulling the following small fixes,

Steve.

--
The following changes since commit 57d54889cd00db2752994b389ba714138652e60c:
  Linus Torvalds (1):
Linux 2.6.34-rc1

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes.git master

Benjamin Marzinski (1):
  GFS2: Allow the number of committed revokes to temporarily be negative

Christoph Hellwig (1):
  GFS2: do not select QUOTA

Sachin Prabhu (1):
  GFS2: Skip check for mandatory locks when unlocking

 fs/gfs2/Kconfig  |1 -
 fs/gfs2/file.c   |2 +-
 fs/gfs2/incore.h |2 +-
 fs/gfs2/log.c|3 +--
 4 files changed, 3 insertions(+), 5 deletions(-)




[Cluster-devel] GFS2: New truncate sequence

2010-03-12 Thread Steven Whitehouse

I've been working on this on and off in spare moments. This has now
passed a few basic tests, so I think its time to dust it down and post
it more widely.

There are in fact two parts to this, the second part is to remove the
i_disksize variable since after this, initial patch, it will always be
identical to the inode's i_size. Thats a simple exercise, for a follow
up patch.

This is a nice clean up of the truncate code, reducing the code size by
approx 50 lines of code. This patch also ensures that we correctly
truncate files which have been extended but not written to (e.g. if the
copy from userspace results in a segfault).

Signed-off-by: Steven Whitehouse swhit...@redhat.com
Cc: Nick Piggin npig...@suse.de


diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 0c1d0b8..371bea5 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -698,8 +698,11 @@ out:
return 0;
 
page_cache_release(page);
+   gfs2_trans_end(sdp);
if (pos + len  ip-i_inode.i_size)
-   vmtruncate(ip-i_inode, ip-i_inode.i_size);
+   gfs2_trim_blocks(ip-i_inode);
+   goto out_trans_fail;
+
 out_endtrans:
gfs2_trans_end(sdp);
 out_trans_fail:
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 583e823..56aed67 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -884,83 +884,14 @@ out:
 }
 
 /**
- * do_grow - Make a file look bigger than it is
- * @ip: the inode
- * @size: the size to set the file to
- *
- * Called with an exclusive lock on @ip.
- *
- * Returns: errno
- */
-
-static int do_grow(struct gfs2_inode *ip, u64 size)
-{
-   struct gfs2_sbd *sdp = GFS2_SB(ip-i_inode);
-   struct gfs2_alloc *al;
-   struct buffer_head *dibh;
-   int error;
-
-   al = gfs2_alloc_get(ip);
-   if (!al)
-   return -ENOMEM;
-
-   error = gfs2_quota_lock_check(ip);
-   if (error)
-   goto out;
-
-   al-al_requested = sdp-sd_max_height + RES_DATA;
-
-   error = gfs2_inplace_reserve(ip);
-   if (error)
-   goto out_gunlock_q;
-
-   error = gfs2_trans_begin(sdp,
-   sdp-sd_max_height + al-al_rgd-rd_length +
-   RES_JDATA + RES_DINODE + RES_STATFS + RES_QUOTA, 0);
-   if (error)
-   goto out_ipres;
-
-   error = gfs2_meta_inode_buffer(ip, dibh);
-   if (error)
-   goto out_end_trans;
-
-   if (size  sdp-sd_sb.sb_bsize - sizeof(struct gfs2_dinode)) {
-   if (gfs2_is_stuffed(ip)) {
-   error = gfs2_unstuff_dinode(ip, NULL);
-   if (error)
-   goto out_brelse;
-   }
-   }
-
-   ip-i_disksize = size;
-   ip-i_inode.i_mtime = ip-i_inode.i_ctime = CURRENT_TIME;
-   gfs2_trans_add_bh(ip-i_gl, dibh, 1);
-   gfs2_dinode_out(ip, dibh-b_data);
-
-out_brelse:
-   brelse(dibh);
-out_end_trans:
-   gfs2_trans_end(sdp);
-out_ipres:
-   gfs2_inplace_release(ip);
-out_gunlock_q:
-   gfs2_quota_unlock(ip);
-out:
-   gfs2_alloc_put(ip);
-   return error;
-}
-
-
-/**
  * gfs2_block_truncate_page - Deal with zeroing out data for truncate
  *
  * This is partly borrowed from ext3.
  */
-static int gfs2_block_truncate_page(struct address_space *mapping)
+static int gfs2_block_truncate_page(struct address_space *mapping, loff_t from)
 {
struct inode *inode = mapping-host;
struct gfs2_inode *ip = GFS2_I(inode);
-   loff_t from = inode-i_size;
unsigned long index = from  PAGE_CACHE_SHIFT;
unsigned offset = from  (PAGE_CACHE_SIZE-1);
unsigned blocksize, iblock, length, pos;
@@ -1022,9 +953,11 @@ unlock:
return err;
 }
 
-static int trunc_start(struct gfs2_inode *ip, u64 size)
+static int trunc_start(struct inode *inode, u64 oldsize, u64 newsize)
 {
-   struct gfs2_sbd *sdp = GFS2_SB(ip-i_inode);
+   struct gfs2_inode *ip = GFS2_I(inode);
+   struct gfs2_sbd *sdp = GFS2_SB(inode);
+   struct address_space *mapping = inode-i_mapping;
struct buffer_head *dibh;
int journaled = gfs2_is_jdata(ip);
int error;
@@ -1038,29 +971,27 @@ static int trunc_start(struct gfs2_inode *ip, u64 size)
if (error)
goto out;
 
-   if (gfs2_is_stuffed(ip)) {
-   ip-i_disksize = size;
-   ip-i_inode.i_mtime = ip-i_inode.i_ctime = CURRENT_TIME;
-   gfs2_trans_add_bh(ip-i_gl, dibh, 1);
-   gfs2_dinode_out(ip, dibh-b_data);
-   gfs2_buffer_clear_tail(dibh, sizeof(struct gfs2_dinode) + size);
-   error = 1;
+   gfs2_trans_add_bh(ip-i_gl, dibh, 1);
 
+   if (gfs2_is_stuffed(ip)) {
+   gfs2_buffer_clear_tail(dibh, sizeof(struct gfs2_dinode) + 
newsize);
} else {
-   if (size  (u64)(sdp-sd_sb.sb_bsize - 1))
-   error = gfs2_block_truncate_page(ip-i_inode.i_mapping);
-
-   if (!error

[Cluster-devel] GFS2 nmw git treee

2010-03-23 Thread Steven Whitehouse
Hi,

I've rebased this now that I have a couple of new patches since the last
pull. I'm still working on the new truncate patches and I'll post an
updated set of those patches once I've tracked down a couple of issues
I've found in testing,

Steve.




<    1   2   3   4   5   6   7   8   9   10   >