from:"Bob Peterson"

Re: [Cluster-devel] [GFS2] glock dump with hex lock name

2007-08-21 Thread Bob Peterson

On Tue, 2007-08-21 at 10:38 -0400, Wendy Cheng wrote:
 Trivial change ... Wendy
 
Wendy / Abhi,

Can we please change this to have 0x in front?
+   print_dbg(gi, Glock 0x%p (%u, 0x%llx)\n, gl, gl-gl_name.ln_type,

That'll make it more user-friendly to paste into gfs2_edit.

Bob

Re: [Cluster-devel] [PATCH] Cleanup some common CFLAGS

2007-08-27 Thread Bob Peterson

On Sun, 2007-08-26 at 07:35 +0200, Fabio Massimo Di Nitto wrote:
 The following patch does:
 
 - change the default CFLAGS to -Wall -O2 -g
 - add --debug option to configure that will override the default
   CFLAGS to -Wall -O0 -DDEBUG -g
 - clean up all the relevant Makefiles
 - add a few missing ; to configure script (almost cosmetic since
   perl didn't trip on them).
 
 NOTE that some subproject were using -ggdb with notes that had to be removed.
 This is basically the only major change in the set since it will switch to a
 more common -g. According to gcc 4.1 man page -g is still good enough to
 generate enough debugging information for gdb.
 
 Please ACK or apply.
 
 Fabio
 
 PS The patch depends on the previous Makefile cleanup I posted on the mailing
 list right before this one.

ACK, but I only looked at the defaults and the gfs2 userland bits.

Regards,

Bob Peterson

[Cluster-devel] [PATCH] [GFS2] bz 276631 : GFS2: chmod hung

2007-09-13 Thread Bob Peterson

The problem boiled down to a race between the gdlm_init_threads()
function initializing thread1 and its setting of blist = 1.
Essentially, if (current == ls-thread1) was checked by the thread
before the thread creator set ls-thread1.

Since thread1 is the only thread who is allowed to work on the
blocking queue, and since neither thread thought it was thread1, no one
was working on the queue.  So everything just sat.

This patch reuses the ls-async_lock spin_lock to fix the race,
and it fixes the problem.  I've done more than 2000 iterations of the
loop that was recreating the failure and it seems to work.

Dave Teigland brought up the question of whether we should do this
another way.  For example, by checking for the task name lock_dlm1
instead.  I'm open to opinions.

[Cluster-devel] [PATCH] [GFS2] bz 276631 : GFS2: chmod hung

2007-09-13 Thread Bob Peterson

The following is a patch for bugzilla bug 276631.

The problem boiled down to a race between the gdlm_init_threads()
function initializing thread1 and its setting of blist = 1.
Essentially, if (current == ls-thread1) was checked by the thread
before the thread creator set ls-thread1.

Since thread1 is the only thread who is allowed to work on the
blocking queue, and since neither thread thought it was thread1, no one
was working on the queue.  So everything just sat.

This patch reuses the ls-async_lock spin_lock to fix the race,
and it fixes the problem.  I've done more than 2000 iterations of the
loop that was recreating the failure and it seems to work.

Dave Teigland brought up the question of whether we should do this
another way.  For example, by checking for the task name lock_dlm1
instead.  I'm open to opinions.
--
Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
diff -pur a/fs/gfs2/locking/dlm/thread.c b/fs/gfs2/locking/dlm/thread.c
--- a/fs/gfs2/locking/dlm/thread.c  2007-09-13 15:51:08.0 -0500
+++ b/fs/gfs2/locking/dlm/thread.c  2007-09-13 15:21:07.0 -0500
@@ -279,8 +279,10 @@ static int gdlm_thread(void *data)
/* Only thread1 is allowed to do blocking callbacks since gfs
   may wait for a completion callback within a blocking cb. */
 
+   spin_lock(ls-async_lock);
if (current == ls-thread1)
blist = 1;
+   spin_unlock(ls-async_lock);
 
while (!kthread_should_stop()) {
set_current_state(TASK_INTERRUPTIBLE);
@@ -338,6 +340,7 @@ int gdlm_init_threads(struct gdlm_ls *ls
struct task_struct *p;
int error;
 
+   spin_lock(ls-async_lock);
p = kthread_run(gdlm_thread, ls, lock_dlm1);
error = IS_ERR(p);
if (error) {
@@ -354,6 +357,7 @@ int gdlm_init_threads(struct gdlm_ls *ls
return error;
}
ls-thread2 = p;
+   spin_unlock(ls-async_lock);
 
return 0;
 }

[Cluster-devel] [PATCH] [GFS2] bz 276631 : GFS2: chmod hung - TRY 3

2007-09-14 Thread Bob Peterson

This is a rewrite of the patch.  We decided it was a better
approach to call separate wrapper functions than trying to work around
the problem with a spin_lock.
--
The problem boiled down to a race between the gdlm_init_threads()
function initializing thread1 and its setting of blist = 1.
Essentially, if (current == ls-thread1) was checked by the thread
before the thread creator set ls-thread1.

Since thread1 is the only thread who is allowed to work on the
blocking queue, and since neither thread thought it was thread1, no one
was working on the queue.  So everything just sat.

This patch reuses the ls-async_lock spin_lock to fix the race,
and it fixes the problem.  I've done more than 2000 iterations of the
loop that was recreating the failure and it seems to work.

Dave Teigland brought up the question of whether we should do this
another way.  For example, by checking for the task name lock_dlm1
instead.  I'm open to opinions.
--
Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
diff -pur a/fs/gfs2/locking/dlm/thread.c b/fs/gfs2/locking/dlm/thread.c
--- a/fs/gfs2/locking/dlm/thread.c  2007-09-13 17:33:58.0 -0500
+++ b/fs/gfs2/locking/dlm/thread.c  2007-09-14 09:16:07.0 -0500
@@ -268,20 +268,16 @@ static inline int check_drop(struct gdlm
return 0;
 }
 
-static int gdlm_thread(void *data)
+static int gdlm_thread(void *data, int blist)
 {
struct gdlm_ls *ls = (struct gdlm_ls *) data;
struct gdlm_lock *lp = NULL;
-   int blist = 0;
uint8_t complete, blocking, submit, drop;
DECLARE_WAITQUEUE(wait, current);
 
/* Only thread1 is allowed to do blocking callbacks since gfs
   may wait for a completion callback within a blocking cb. */
 
-   if (current == ls-thread1)
-   blist = 1;
-
while (!kthread_should_stop()) {
set_current_state(TASK_INTERRUPTIBLE);
add_wait_queue(ls-thread_wait, wait);
@@ -333,12 +329,22 @@ static int gdlm_thread(void *data)
return 0;
 }
 
+static int gdlm_thread1(void *data)
+{
+   return gdlm_thread(data, 1);
+}
+
+static int gdlm_thread2(void *data)
+{
+   return gdlm_thread(data, 0);
+}
+
 int gdlm_init_threads(struct gdlm_ls *ls)
 {
struct task_struct *p;
int error;
 
-   p = kthread_run(gdlm_thread, ls, lock_dlm1);
+   p = kthread_run(gdlm_thread1, ls, lock_dlm1);
error = IS_ERR(p);
if (error) {
log_error(can't start lock_dlm1 thread %d, error);
@@ -346,7 +352,7 @@ int gdlm_init_threads(struct gdlm_ls *ls
}
ls-thread1 = p;
 
-   p = kthread_run(gdlm_thread, ls, lock_dlm2);
+   p = kthread_run(gdlm_thread2, ls, lock_dlm2);
error = IS_ERR(p);
if (error) {
log_error(can't start lock_dlm2 thread %d, error);

[Cluster-devel] [PATCH][GFS2] Given device ID rather than s_id in id sysfs file

2007-11-02 Thread Bob Peterson

Hi,

This patch changes the /sys/fs/gfs2/s_id/id file to give the device
id major:minor rather than the s_id.  That enables gfs2_tool to
match devices properly (by id, not name) when locating the tuning files.

Regards,

Bob Peterson
--
Signed-off-by: Bob Peterson [EMAIL PROTECTED]
--
 fs/gfs2/sys.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index 06e0b77..10807b7 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -32,7 +32,8 @@ spinlock_t gfs2_sys_margs_lock;
 
 static ssize_t id_show(struct gfs2_sbd *sdp, char *buf)
 {
-   return snprintf(buf, PAGE_SIZE, %s\n, sdp-sd_vfs-s_id);
+   return snprintf(buf, PAGE_SIZE, %u:%u\n,
+   MAJOR(sdp-sd_vfs-s_dev), MINOR(sdp-sd_vfs-s_dev));
 }
 
 static ssize_t fsname_show(struct gfs2_sbd *sdp, char *buf)

Re: [Cluster-devel] [PATCH][GFS2] Given device ID rather than s_id in id sysfs file

2007-11-02 Thread Bob Peterson

On Fri, 2007-11-02 at 09:55 -0500, David Teigland wrote:
 We have to be extremely cautious when changing the kernel abi like this;
 have you verified that it doesn't break any existing programs?

Dave has a good point here.  I've verified that no other gfs2 util
uses the id except for gfs2_tool.  However, this kernel change
makes previous userland versions of gfs2_tool stop working in some
cases, for example, gfs2_tool gettune and settune.
Customers would need to upgrade their gfs2_tool to use the new
gfs2 kernel module.

For that reason, perhaps we should revert this patch and instead
export the device id to a separate /sys/fs file called
/sys/fs/gfs2/s_id/device_id or some such.  That would ensure
backward compatibility with older userland tools.

BTW, this also led me to discover that the gfs2 quota tool
uses /sys/fs/gfs2/lock table/xxx which, in the case of stand-alone
file systems, would be a NULL string.  The quota tool should
be changed to use the same interface as gfs2_tool to determine
the proper path in sysfs.  Perhaps I'll open a bugzilla record on it.

Regards,

Bob Peterson
Red Hat Cluster Suite

[Cluster-devel] [GFS2][Patch] Remove function gfs2_get_block

2007-12-10 Thread Bob Peterson

Hi,

This patch is just a cleanup.  Function gfs2_get_block() just calls
function gfs2_block_map reversing the last two parameters.  By
reversing the parameters, gfs2_block_map() may be called directly
and function gfs2_get_block may be eliminated altogether.
Since this function is done for every block operation,
this streamlines the code and makes it a little bit more efficient.

Regards,

Bob Peterson

Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
 fs/gfs2/bmap.c|8 
 fs/gfs2/bmap.h|2 +-
 fs/gfs2/log.c |2 +-
 fs/gfs2/ops_address.c |   30 +++---
 fs/gfs2/ops_address.h |2 --
 fs/gfs2/ops_file.c|2 +-
 fs/gfs2/quota.c   |4 ++--
 fs/gfs2/recovery.c|2 +-
 8 files changed, 17 insertions(+), 35 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 1cfd493..4948602 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -452,8 +452,8 @@ static inline void bmap_unlock(struct inode *inode, int 
create)
  * Returns: errno
  */
 
-int gfs2_block_map(struct inode *inode, u64 lblock, int create,
-  struct buffer_head *bh_map)
+int gfs2_block_map(struct inode *inode, sector_t lblock,
+  struct buffer_head *bh_map, int create)
 {
struct gfs2_inode *ip = GFS2_I(inode);
struct gfs2_sbd *sdp = GFS2_SB(inode);
@@ -559,7 +559,7 @@ int gfs2_extent_map(struct inode *inode, u64 lblock, int 
*new, u64 *dblock, unsi
BUG_ON(!new);
 
bh.b_size = 1  (inode-i_blkbits + 5);
-   ret = gfs2_block_map(inode, lblock, create, bh);
+   ret = gfs2_block_map(inode, lblock, bh, create);
*extlen = bh.b_size  inode-i_blkbits;
*dblock = bh.b_blocknr;
if (buffer_new(bh))
@@ -909,7 +909,7 @@ static int gfs2_block_truncate_page(struct address_space 
*mapping)
err = 0;
 
if (!buffer_mapped(bh)) {
-   gfs2_get_block(inode, iblock, bh, 0);
+   gfs2_block_map(inode, iblock, bh, 0);
/* unmapped? It's a hole - nothing to do */
if (!buffer_mapped(bh))
goto unlock;
diff --git a/fs/gfs2/bmap.h b/fs/gfs2/bmap.h
index ac2fd04..4e6cde2 100644
--- a/fs/gfs2/bmap.h
+++ b/fs/gfs2/bmap.h
@@ -15,7 +15,7 @@ struct gfs2_inode;
 struct page;
 
 int gfs2_unstuff_dinode(struct gfs2_inode *ip, struct page *page);
-int gfs2_block_map(struct inode *inode, u64 lblock, int create, struct 
buffer_head *bh);
+int gfs2_block_map(struct inode *inode, sector_t lblock, struct buffer_head 
*bh, int create);
 int gfs2_extent_map(struct inode *inode, u64 lblock, int *new, u64 *dblock, 
unsigned *extlen);
 
 int gfs2_truncatei(struct gfs2_inode *ip, u64 size);
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 143311c..7f9ab89 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -337,7 +337,7 @@ static u64 log_bmap(struct gfs2_sbd *sdp, unsigned int lbn)
struct buffer_head bh_map = { .b_state = 0, .b_blocknr = 0 };
 
bh_map.b_size = 1  inode-i_blkbits;
-   error = gfs2_block_map(inode, lbn, 0, bh_map);
+   error = gfs2_block_map(inode, lbn, bh_map, 0);
if (error || !bh_map.b_blocknr)
printk(KERN_INFO error=%d, dbn=%llu lbn=%u, error,
   (unsigned long long)bh_map.b_blocknr, lbn);
diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c
index 7353933..8f94e30 100644
--- a/fs/gfs2/ops_address.c
+++ b/fs/gfs2/ops_address.c
@@ -59,22 +59,6 @@ static void gfs2_page_add_databufs(struct gfs2_inode *ip, 
struct page *page,
 }
 
 /**
- * gfs2_get_block - Fills in a buffer head with details about a block
- * @inode: The inode
- * @lblock: The block number to look up
- * @bh_result: The buffer head to return the result in
- * @create: Non-zero if we may add block to the file
- *
- * Returns: errno
- */
-
-int gfs2_get_block(struct inode *inode, sector_t lblock,
-  struct buffer_head *bh_result, int create)
-{
-   return gfs2_block_map(inode, lblock, create, bh_result);
-}
-
-/**
  * gfs2_get_block_noalloc - Fills in a buffer head with details about a block
  * @inode: The inode
  * @lblock: The block number to look up
@@ -89,7 +73,7 @@ static int gfs2_get_block_noalloc(struct inode *inode, 
sector_t lblock,
 {
int error;
 
-   error = gfs2_block_map(inode, lblock, 0, bh_result);
+   error = gfs2_block_map(inode, lblock, bh_result, 0);
if (error)
return error;
if (!buffer_mapped(bh_result))
@@ -100,7 +84,7 @@ static int gfs2_get_block_noalloc(struct inode *inode, 
sector_t lblock,
 static int gfs2_get_block_direct(struct inode *inode, sector_t lblock,
 struct buffer_head *bh_result, int create)
 {
-   return gfs2_block_map(inode, lblock, 0, bh_result);
+   return gfs2_block_map(inode, lblock, bh_result, 0);
 }
 
 /**
@@ -504,7 +488,7 @@ static int __gfs2_readpage(void *file, struct page *page)
error = stuffed_readpage

[Cluster-devel] [GFS2][Patch 0/10] GFS2 performance tweaks

2007-12-11 Thread Bob Peterson

Hi GFS2 Folks,

The following is a set of ten patches designed to give gfs2 somewhat
better performance than before.  Some are trivial and may or may not
have a real impact on performance.  I'll let Steve Whitehouse decide
which ones should go upstream and which ones aren't worth it.
The patches may be summarized as follows:

1. Journal extent mapping
2. Get rid of useless found variable in quota.c
3. Run through full bitmaps quicker in gfs2_bitfit
4. Get rid of sd_statfs_mutex
5. Shortcut in gfs2_write_alloc_required if writing past eof.
6. Reorganize function gfs2_glmutex_lock
7. Only fetch the dinode once in block_map
8. Only find indirect pointer buffers once in block_map
9. Move meta_inval to glops.c and declare static, more attach_bufdata
   to trans.c and declare static.
10. Function meta_read optimization.

Regards,

Bob Peterson
Red Hat GFS

[Cluster-devel] [GFS2] [Patch 1/10] Journal extent mapping

2007-12-11 Thread Bob Peterson

Hi,

This patch saves a little time when gfs2 writes to the journals by
keeping a mapping between logical and physical blocks on disk.
That's better than constantly looking up indirect pointers in
buffers, when the journals are several levels of indirection
(which they typically are).

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
 fs/gfs2/incore.h |   11 +++-
 fs/gfs2/log.c|   22 ++-
 fs/gfs2/ops_fstype.c |   68 +-
 fs/gfs2/super.c  |   13 -
 4 files changed, 97 insertions(+), 17 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 743ebba..14862d1 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
- * Copyright (C) 2004-2006 Red Hat, Inc.  All rights reserved.
+ * Copyright (C) 2004-2007 Red Hat, Inc.  All rights reserved.
  *
  * This copyrighted material is made available to anyone wishing to use,
  * modify, copy, or redistribute it subject to the terms and conditions
@@ -360,8 +360,17 @@ struct gfs2_ail {
u64 ai_sync_gen;
 };
 
+struct gfs2_journal_extent {
+   struct list_head extent_list;
+
+   unsigned int lblock; /* First logical block */
+   u64 dblock; /* First disk block */
+   u64 blocks;
+};
+
 struct gfs2_jdesc {
struct list_head jd_list;
+   struct list_head extent_list;
 
struct inode *jd_inode;
unsigned int jd_jid;
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 9bece94..0833e27 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
- * Copyright (C) 2004-2006 Red Hat, Inc.  All rights reserved.
+ * Copyright (C) 2004-2007 Red Hat, Inc.  All rights reserved.
  *
  * This copyrighted material is made available to anyone wishing to use,
  * modify, copy, or redistribute it subject to the terms and conditions
@@ -332,18 +332,14 @@ retry:
 
 static u64 log_bmap(struct gfs2_sbd *sdp, unsigned int lbn)
 {
-   struct inode *inode = sdp-sd_jdesc-jd_inode;
-   int error;
-   struct buffer_head bh_map = { .b_state = 0, .b_blocknr = 0 };
-
-   bh_map.b_size = 1  inode-i_blkbits;
-   error = gfs2_block_map(inode, lbn, bh_map, 0);
-   if (error || !bh_map.b_blocknr)
-   printk(KERN_INFO error=%d, dbn=%llu lbn=%u, error,
-  (unsigned long long)bh_map.b_blocknr, lbn);
-   gfs2_assert_withdraw(sdp, !error  bh_map.b_blocknr);
-
-   return bh_map.b_blocknr;
+   struct gfs2_journal_extent *je;
+
+   list_for_each_entry(je, sdp-sd_jdesc-extent_list, extent_list) {
+   if (lbn = je-lblock  lbn  je-lblock + je-blocks)
+   return je-dblock + lbn;
+   }
+
+   return -1;
 }
 
 /**
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 6f5fa5e..35ec630 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
- * Copyright (C) 2004-2006 Red Hat, Inc.  All rights reserved.
+ * Copyright (C) 2004-2007 Red Hat, Inc.  All rights reserved.
  *
  * This copyrighted material is made available to anyone wishing to use,
  * modify, copy, or redistribute it subject to the terms and conditions
@@ -21,6 +21,7 @@
 
 #include gfs2.h
 #include incore.h
+#include bmap.h
 #include daemon.h
 #include glock.h
 #include glops.h
@@ -303,6 +304,68 @@ out:
return error;
 }
 
+/**
+ * map_journal_extents - create a reusable extent mapping from all logical
+ * blocks to all physical blocks for the given journal.  This will save
+ * us time when writing journal blocks.  Most journals will have only one
+ * extent that maps all their logical blocks.  That's because gfs2.mkfs
+ * arranges the journal blocks sequentially to maximize performance.
+ * So the extent would map the first block for the entire file length.
+ * However, gfs2_jadd can happen while file activity is happening, so
+ * those journals may not be sequential.  Less likely is the case where
+ * the users created their own journals by mounting the metafs and
+ * laying it out.  But it's still possible.  These journals might have
+ * several extents.
+ *
+ * TODO: This should be done in bigger chunks rather than one block at a time,
+ *   but since it's only done at mount time, I'm not worried about the
+ *   time it takes.
+ */
+static int map_journal_extents(struct gfs2_sbd *sdp)
+{
+   struct gfs2_jdesc *jd = sdp-sd_jdesc;
+   unsigned int lb;
+   u64 db, prev_db; /* logical block, disk block, prev disk block */
+   struct gfs2_inode *ip = GFS2_I(jd-jd_inode);
+   struct gfs2_journal_extent *jext = NULL;
+   struct buffer_head bh;
+   int rc = 0;
+
+   INIT_LIST_HEAD(jd-extent_list);
+   prev_db = 0;
+
+   for (lb = 0; lb  ip

[Cluster-devel] [GFS2] [Patch 3/10] Run through full bitmaps quicker in gfs2_bitfit

2007-12-11 Thread Bob Peterson

Hi,

I eliminated the passing of an unused parameter into gfs2_bitfit called rgd.

This also changes the gfs2_bitfit code that searches for free (or used) blocks.
Before, the code was trying to check for bytes that indicated 4 blocks in
the undesired state.  The problem is, it was spending more time trying to
do this than it actually was saving.  This version only optimizes the case
where we're looking for free blocks, and it checks a machine word at a time.
So on 32-bit machines, it will check 32-bits (16 blocks) and on 64-bit
machines, it will check 64-bits (32 blocks) at a time.  The compiler
optimizes that quite well and we save some time, especially when running
through full bitmaps (like the bitmaps allocated for the journals).

There's probably a more elegant or optimized way to do this, but I haven't
thought of it yet.  I'm open to suggestions.

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
 .../fs/gfs2/rgrp.c |   54 +++-
 70 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/gfs2-2.6.git.patch2/fs/gfs2/rgrp.c 
b/gfs2-2.6.git.patch3/fs/gfs2/rgrp.c
index e0ee195..d7ff9cf 100644
--- a/gfs2-2.6.git.patch2/fs/gfs2/rgrp.c
+++ b/gfs2-2.6.git.patch3/fs/gfs2/rgrp.c
@@ -126,41 +126,46 @@ static unsigned char gfs2_testbit(struct gfs2_rgrpd *rgd, 
unsigned char *buffer,
  * Return: the block number (bitmap buffer scope) that was found
  */
 
-static u32 gfs2_bitfit(struct gfs2_rgrpd *rgd, unsigned char *buffer,
-   unsigned int buflen, u32 goal,
-   unsigned char old_state)
+static u32 gfs2_bitfit(unsigned char *buffer, unsigned int buflen, u32 goal,
+  unsigned char old_state)
 {
-   unsigned char *byte, *end, alloc;
+   unsigned char *byte;
u32 blk = goal;
-   unsigned int bit;
+   unsigned int bit, bitlong;
+   unsigned long *plong, plong55;
+   static int c = 0;
 
byte = buffer + (goal / GFS2_NBBY);
+   plong = buffer + (goal / GFS2_NBBY);
bit = (goal % GFS2_NBBY) * GFS2_BIT_SIZE;
-   end = buffer + buflen;
-   alloc = (old_state == GFS2_BLKST_FREE) ? 0x55 : 0;
-
-   while (byte  end) {
-   /* If we're looking for a free block we can eliminate all
-  bitmap settings with 0x55, which represents four data
-  blocks in a row.  If we're looking for a data block, we can
-  eliminate 0x00 which corresponds to four free blocks. */
-   if ((*byte  0x55) == alloc) {
-   blk += (8 - bit)  1;
-
-   bit = 0;
-   byte++;
-
+   bitlong = bit;
+#if BITS_PER_LONG == 32
+   plong55 = 0x;
+#else
+   plong55 = 0x;
+#endif
+   while (byte  buffer + buflen) {
+
+   if (bitlong == 0  old_state == 0  *plong == plong55) {
+   plong++;
+   byte += sizeof(unsigned long);
+   blk += sizeof(unsigned long) * GFS2_NBBY;
continue;
}
-
-   if (((*byte  bit)  GFS2_BIT_MASK) == old_state)
+   if (((*byte  bit)  GFS2_BIT_MASK) == old_state) {
+   c++;
return blk;
-
+   }
bit += GFS2_BIT_SIZE;
if (bit = 8) {
bit = 0;
byte++;
}
+   bitlong += GFS2_BIT_SIZE;
+   if (bitlong = sizeof(unsigned long) * 8) {
+   bitlong = 0;
+   plong++;
+   }
 
blk++;
}
@@ -1318,11 +1323,10 @@ static u32 rgblk_search(struct gfs2_rgrpd *rgd, u32 
goal,
/* The GFS2_BLKST_UNLINKED state doesn't apply to the clone
   bitmaps, so we must search the originals for that. */
if (old_state != GFS2_BLKST_UNLINKED  bi-bi_clone)
-   blk = gfs2_bitfit(rgd, bi-bi_clone + bi-bi_offset,
+   blk = gfs2_bitfit(bi-bi_clone + bi-bi_offset,
  bi-bi_len, goal, old_state);
else
-   blk = gfs2_bitfit(rgd,
- bi-bi_bh-b_data + bi-bi_offset,
+   blk = gfs2_bitfit(bi-bi_bh-b_data + bi-bi_offset,
  bi-bi_len, goal, old_state);
if (blk != BFITNOENT)
break;

[Cluster-devel] [GFS2] [Patch 7/10] Only fetch the dinode once in block_map

2007-12-11 Thread Bob Peterson

Hi,

Function gfs2_block_map was often looking up the disk inode twice.
This optimizes it so that only does it once.

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
 .../fs/gfs2/bmap.c |   14 +++---
  1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/gfs2-2.6.git.patch6/fs/gfs2/bmap.c 
b/gfs2-2.6.git.patch7/fs/gfs2/bmap.c
index 4cdf4d4..0974912 100644
--- a/gfs2-2.6.git.patch6/fs/gfs2/bmap.c
+++ b/gfs2-2.6.git.patch7/fs/gfs2/bmap.c
@@ -469,6 +469,7 @@ int gfs2_block_map(struct inode *inode, sector_t lblock,
unsigned int maxlen = bh_map-b_size  inode-i_blkbits;
struct metapath mp;
u64 size;
+   struct buffer_head *dibh = NULL;
 
BUG_ON(maxlen == 0);
 
@@ -499,6 +500,8 @@ int gfs2_block_map(struct inode *inode, sector_t lblock,
error = gfs2_meta_inode_buffer(ip, bh);
if (error)
goto out_fail;
+   dibh = bh;
+   get_bh(dibh);
 
for (x = 0; x  end_of_metadata; x++) {
lookup_block(ip, bh, x, mp, create, new, dblock);
@@ -517,13 +520,8 @@ int gfs2_block_map(struct inode *inode, sector_t lblock,
if (boundary)
set_buffer_boundary(bh_map);
if (new) {
-   struct buffer_head *dibh;
-   error = gfs2_meta_inode_buffer(ip, dibh);
-   if (!error) {
-   gfs2_trans_add_bh(ip-i_gl, dibh, 1);
-   gfs2_dinode_out(ip, dibh-b_data);
-   brelse(dibh);
-   }
+   gfs2_trans_add_bh(ip-i_gl, dibh, 1);
+   gfs2_dinode_out(ip, dibh-b_data);
set_buffer_new(bh_map);
goto out_brelse;
}
@@ -544,6 +542,8 @@ out_brelse:
 out_ok:
error = 0;
 out_fail:
+   if (dibh)
+   brelse(dibh);
bmap_unlock(inode, create);
return error;
 }

[Cluster-devel] [GFS2] [Patch 9/10] Move functions and make them static

2007-12-11 Thread Bob Peterson

Hi,

This patch doesn't change any code, it just moves it around.
Function gfs2_meta_inval is moved from meta_io.c to glops.c
and function gfs2_attach_bufdata is moved from meta_io,c
to trans.c.  Both functions are then declared static.

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
 .../fs/gfs2/glops.c|   23 +++-
 .../fs/gfs2/meta_io.c  |   59 +---
 .../fs/gfs2/meta_io.h  |6 +--
 .../fs/gfs2/trans.c|   38 -
  4 files changed, 61 insertions(+), 65 deletions(-)

diff --git a/gfs2-2.6.git.patch8/fs/gfs2/glops.c 
b/gfs2-2.6.git.patch9/fs/gfs2/glops.c
index c663b7a..1c51758 100644
--- a/gfs2-2.6.git.patch8/fs/gfs2/glops.c
+++ b/gfs2-2.6.git.patch9/fs/gfs2/glops.c
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
- * Copyright (C) 2004-2006 Red Hat, Inc.  All rights reserved.
+ * Copyright (C) 2004-2007 Red Hat, Inc.  All rights reserved.
  *
  * This copyrighted material is made available to anyone wishing to use,
  * modify, copy, or redistribute it subject to the terms and conditions
@@ -114,6 +114,27 @@ static void meta_go_sync(struct gfs2_glock *gl)
 }
 
 /**
+ * gfs2_meta_inval - Invalidate all buffers associated with a glock
+ * @gl: the glock
+ *
+ */
+
+static void gfs2_meta_inval(struct gfs2_glock *gl)
+{
+   struct gfs2_sbd *sdp = gl-gl_sbd;
+   struct inode *aspace = gl-gl_aspace;
+   struct address_space *mapping = gl-gl_aspace-i_mapping;
+
+   gfs2_assert_withdraw(sdp, !atomic_read(gl-gl_ail_count));
+
+   atomic_inc(aspace-i_writecount);
+   truncate_inode_pages(mapping, 0);
+   atomic_dec(aspace-i_writecount);
+
+   gfs2_assert_withdraw(sdp, !mapping-nrpages);
+}
+
+/**
  * meta_go_inval - invalidate the metadata for this glock
  * @gl: the glock
  * @flags:
diff --git a/gfs2-2.6.git.patch8/fs/gfs2/meta_io.c 
b/gfs2-2.6.git.patch9/fs/gfs2/meta_io.c
index 9688785..9fdbfd3 100644
--- a/gfs2-2.6.git.patch8/fs/gfs2/meta_io.c
+++ b/gfs2-2.6.git.patch9/fs/gfs2/meta_io.c
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
- * Copyright (C) 2004-2006 Red Hat, Inc.  All rights reserved.
+ * Copyright (C) 2004-2007 Red Hat, Inc.  All rights reserved.
  *
  * This copyrighted material is made available to anyone wishing to use,
  * modify, copy, or redistribute it subject to the terms and conditions
@@ -88,27 +88,6 @@ void gfs2_aspace_put(struct inode *aspace)
 }
 
 /**
- * gfs2_meta_inval - Invalidate all buffers associated with a glock
- * @gl: the glock
- *
- */
-
-void gfs2_meta_inval(struct gfs2_glock *gl)
-{
-   struct gfs2_sbd *sdp = gl-gl_sbd;
-   struct inode *aspace = gl-gl_aspace;
-   struct address_space *mapping = gl-gl_aspace-i_mapping;
-
-   gfs2_assert_withdraw(sdp, !atomic_read(gl-gl_ail_count));
-
-   atomic_inc(aspace-i_writecount);
-   truncate_inode_pages(mapping, 0);
-   atomic_dec(aspace-i_writecount);
-
-   gfs2_assert_withdraw(sdp, !mapping-nrpages);
-}
-
-/**
  * gfs2_meta_sync - Sync all buffers associated with a glock
  * @gl: The glock
  *
@@ -262,42 +241,6 @@ int gfs2_meta_wait(struct gfs2_sbd *sdp, struct 
buffer_head *bh)
return 0;
 }
 
-/**
- * gfs2_attach_bufdata - attach a struct gfs2_bufdata structure to a buffer
- * @gl: the glock the buffer belongs to
- * @bh: The buffer to be attached to
- * @meta: Flag to indicate whether its metadata or not
- */
-
-void gfs2_attach_bufdata(struct gfs2_glock *gl, struct buffer_head *bh,
-int meta)
-{
-   struct gfs2_bufdata *bd;
-
-   if (meta)
-   lock_page(bh-b_page);
-
-   if (bh-b_private) {
-   if (meta)
-   unlock_page(bh-b_page);
-   return;
-   }
-
-   bd = kmem_cache_zalloc(gfs2_bufdata_cachep, GFP_NOFS | __GFP_NOFAIL),
-   bd-bd_bh = bh;
-   bd-bd_gl = gl;
-
-   INIT_LIST_HEAD(bd-bd_list_tr);
-   if (meta)
-   lops_init_le(bd-bd_le, gfs2_buf_lops);
-   else
-   lops_init_le(bd-bd_le, gfs2_databuf_lops);
-   bh-b_private = bd;
-
-   if (meta)
-   unlock_page(bh-b_page);
-}
-
 void gfs2_remove_from_journal(struct buffer_head *bh, struct gfs2_trans *tr, 
int meta)
 {
struct gfs2_sbd *sdp = GFS2_SB(bh-b_page-mapping-host);
diff --git a/gfs2-2.6.git.patch8/fs/gfs2/meta_io.h 
b/gfs2-2.6.git.patch9/fs/gfs2/meta_io.h
index 73e3b1c..07820c7 100644
--- a/gfs2-2.6.git.patch8/fs/gfs2/meta_io.h
+++ b/gfs2-2.6.git.patch9/fs/gfs2/meta_io.h
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
- * Copyright (C) 2004-2006 Red Hat, Inc.  All rights reserved.
+ * Copyright (C) 2004-2007 Red Hat, Inc.  All rights reserved.
  *
  * This copyrighted material is made available to anyone

[Cluster-devel] [GFS2] [Patch 10/10] Function meta_read optimization

2007-12-11 Thread Bob Peterson

Hi,

This patch optimizes function gfs2_meta_read.  Basically, gfs2_meta_wait
was being called regardless of whether a disk read was requested.
This just pulls that wait into the if that triggers the read.

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
 .../fs/gfs2/meta_io.c  |   13 +++--
  1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/gfs2-2.6.git.patch9/fs/gfs2/meta_io.c 
b/gfs2-2.6.git.patch10/fs/gfs2/meta_io.c
index 9fdbfd3..f7400c4 100644
--- a/gfs2-2.6.git.patch9/fs/gfs2/meta_io.c
+++ b/gfs2-2.6.git.patch10/fs/gfs2/meta_io.c
@@ -201,13 +201,14 @@ int gfs2_meta_read(struct gfs2_glock *gl, u64 blkno, int 
flags,
   struct buffer_head **bhp)
 {
*bhp = getbuf(gl, blkno, CREATE);
-   if (!buffer_uptodate(*bhp))
+   if (!buffer_uptodate(*bhp)) {
ll_rw_block(READ_META, 1, bhp);
-   if (flags  DIO_WAIT) {
-   int error = gfs2_meta_wait(gl-gl_sbd, *bhp);
-   if (error) {
-   brelse(*bhp);
-   return error;
+   if (flags  DIO_WAIT) {
+   int error = gfs2_meta_wait(gl-gl_sbd, *bhp);
+   if (error) {
+   brelse(*bhp);
+   return error;
+   }
}
}

Re: [Cluster-devel] [GFS2] [Patch 4/10] Get rid of sd_statfs_mutex

2007-12-12 Thread Bob Peterson


On Wed, 2007-12-12 at 09:18 +, Steven Whitehouse wrote:
 You can't do gfs2_trans_add_bh under a spinlock, but there is no reason
 why you can't just reverse the order of these two statements to fix it,
 
 Steve.

Hi Steve,

If we reverse the two statements, the trans_add_bh is not protected at
all, which I assume was the purpose of the mutex in the first place.
I'm not sure this is buying us much anyway, so perhaps we should forget
it.

Regards,

Bob Peterson

Re: [Cluster-devel] [GFS2] [Patch 3/10] Run through full bitmaps quicker in gfs2_bitfit

2007-12-12 Thread Bob Peterson

On Wed, 2007-12-12 at 11:07 +, Steven Whitehouse wrote:
 Hi,
 
 fs/gfs2/rgrp.c: In function ‘gfs2_bitfit’:
 fs/gfs2/rgrp.c:139: warning: assignment from incompatible pointer type
 
 Please can you send me a fix for the above. Thanks,
 
 Steve.

Hi,

Sorry about that.  Here is a replacement for patch3 for the compile
error.

Note that the previous patch3 I sent also had some leftover useless
debug code in it (regarding static int c) which needed to be taken out.

Regards,

Bob Peterson
--
Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
 .../fs/gfs2/rgrp.c |   49 ++--
  1 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index e0ee195..7a28339 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -126,41 +126,43 @@ static unsigned char gfs2_testbit(struct gfs2_rgrpd *rgd, 
unsigned char *buffer,
  * Return: the block number (bitmap buffer scope) that was found
  */
 
-static u32 gfs2_bitfit(struct gfs2_rgrpd *rgd, unsigned char *buffer,
-   unsigned int buflen, u32 goal,
-   unsigned char old_state)
+static u32 gfs2_bitfit(unsigned char *buffer, unsigned int buflen, u32 goal,
+  unsigned char old_state)
 {
-   unsigned char *byte, *end, alloc;
+   unsigned char *byte;
u32 blk = goal;
-   unsigned int bit;
+   unsigned int bit, bitlong;
+   unsigned long *plong, plong55;
 
byte = buffer + (goal / GFS2_NBBY);
+   plong = (unsigned long *)buffer + (goal / GFS2_NBBY);
bit = (goal % GFS2_NBBY) * GFS2_BIT_SIZE;
-   end = buffer + buflen;
-   alloc = (old_state == GFS2_BLKST_FREE) ? 0x55 : 0;
-
-   while (byte  end) {
-   /* If we're looking for a free block we can eliminate all
-  bitmap settings with 0x55, which represents four data
-  blocks in a row.  If we're looking for a data block, we can
-  eliminate 0x00 which corresponds to four free blocks. */
-   if ((*byte  0x55) == alloc) {
-   blk += (8 - bit)  1;
-
-   bit = 0;
-   byte++;
-
+   bitlong = bit;
+#if BITS_PER_LONG == 32
+   plong55 = 0x;
+#else
+   plong55 = 0x;
+#endif
+   while (byte  buffer + buflen) {
+
+   if (bitlong == 0  old_state == 0  *plong == plong55) {
+   plong++;
+   byte += sizeof(unsigned long);
+   blk += sizeof(unsigned long) * GFS2_NBBY;
continue;
}
-
if (((*byte  bit)  GFS2_BIT_MASK) == old_state)
return blk;
-
bit += GFS2_BIT_SIZE;
if (bit = 8) {
bit = 0;
byte++;
}
+   bitlong += GFS2_BIT_SIZE;
+   if (bitlong = sizeof(unsigned long) * 8) {
+   bitlong = 0;
+   plong++;
+   }
 
blk++;
}
@@ -1318,11 +1320,10 @@ static u32 rgblk_search(struct gfs2_rgrpd *rgd, u32 
goal,
/* The GFS2_BLKST_UNLINKED state doesn't apply to the clone
   bitmaps, so we must search the originals for that. */
if (old_state != GFS2_BLKST_UNLINKED  bi-bi_clone)
-   blk = gfs2_bitfit(rgd, bi-bi_clone + bi-bi_offset,
+   blk = gfs2_bitfit(bi-bi_clone + bi-bi_offset,
  bi-bi_len, goal, old_state);
else
-   blk = gfs2_bitfit(rgd,
- bi-bi_bh-b_data + bi-bi_offset,
+   blk = gfs2_bitfit(bi-bi_bh-b_data + bi-bi_offset,
  bi-bi_len, goal, old_state);
if (blk != BFITNOENT)
break;

[Cluster-devel] [GFS2 patch] Initialize extent_list earlier

2008-01-03 Thread Bob Peterson

Hi,

Here is a patch for the latest upstream GFS2 code:
The journal extent map needs to be initialized sooner than it
currently is.  Otherwise failed mount attempts (e.g. not enough
journals, etc.) may panic trying to access the uninitialized list.

Regards,

Bob Peterson

Signed-off-by: Bob Peterson [EMAIL PROTECTED]
--
 fs/gfs2/ops_fstype.c |1 -
 fs/gfs2/super.c  |1 +
 2 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index a77d41f..9ffc10b 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -330,7 +330,6 @@ static int map_journal_extents(struct gfs2_sbd *sdp)
struct buffer_head bh;
int rc = 0;
 
-   INIT_LIST_HEAD(jd-extent_list);
prev_db = 0;
 
for (lb = 0; lb  ip-i_di.di_size  sdp-sd_sb.sb_bsize_shift; lb++) {
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 73e49df..fa86038 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -385,6 +385,7 @@ int gfs2_jindex_hold(struct gfs2_sbd *sdp, struct 
gfs2_holder *ji_gh)
if (!jd)
break;
 
+   INIT_LIST_HEAD(jd-extent_list);
jd-jd_inode = gfs2_lookupi(sdp-sd_jindex, name, 1, NULL);
if (!jd-jd_inode || IS_ERR(jd-jd_inode)) {
if (!jd-jd_inode)

[Cluster-devel] [GFS2] [Patch] gfs2_alloc_required performance

2008-01-11 Thread Bob Peterson

Hi,

This is a small I/O performance enhancement to gfs2.  (Actually, it is a rework 
of
an earlier version I got wrong).  The idea here is to check if the write extends
past the last block in the file.  If so, the function can save itself a lot of
time and trouble because it knows an allocate will be required.  Benchmarks like
iozone should see better performance.

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
 fs/gfs2/bmap.c   |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 73dfad7..4356cc2 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1224,6 +1224,11 @@ int gfs2_write_alloc_required(struct gfs2_inode *ip, u64 
offset,
unsigned int shift = sdp-sd_sb.sb_bsize_shift;
lblock = offset  shift;
lblock_stop = (offset + len + sdp-sd_sb.sb_bsize - 1)  shift;
+   if (lblock_stop  ip-i_di.di_blocks) { /* writing past the
+  last block */
+   *alloc_required = 1;
+   return 0;
+   }
}
 
for (; lblock  lblock_stop; lblock += extlen) {

[Cluster-devel] [GFS2] [Patch] Remove unneeded i_spin

2008-01-11 Thread Bob Peterson

Hi,

This patch removes a vestigial variable i_spin from the gfs2_inode
structure.  This not only saves us memory (30 of these in memory
for the oom test) it also saves us time because we don't have to
spend time initializing it (i.e. slightly better performance).

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
 fs/gfs2/incore.h |1 -
 fs/gfs2/main.c   |1 -
 2 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index bd92a6d..1339996 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -267,7 +267,6 @@ struct gfs2_inode {
struct gfs2_alloc *i_alloc;
u64 i_last_rg_alloc;
 
-   spinlock_t i_spin;
struct rw_semaphore i_rw_mutex;
 };
 
diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
index 88686fc..9c7765c 100644
--- a/fs/gfs2/main.c
+++ b/fs/gfs2/main.c
@@ -29,7 +29,6 @@ static void gfs2_init_inode_once(struct kmem_cache *cachep, 
void *foo)
struct gfs2_inode *ip = foo;
 
inode_init_once(ip-i_inode);
-   spin_lock_init(ip-i_spin);
init_rwsem(ip-i_rw_mutex);
ip-i_alloc = NULL;
 }

Re: [Cluster-devel] [PATCH] Clean up unused loop variable

2008-01-15 Thread Bob Peterson


On Sun, 2008-01-13 at 16:45 +, Andrew Price wrote:
 Reading through the gfs2_edit code I noticed that a variable is
 initialised and incremented in this for loop but is not used inside the
 loop and the final value of the variable is not used subsequently. This
 patch removes the initialisation and increment of the variable.
 
 --
 Andy Price

Hi Andy,

Thanks for your help.  I've applied your patch to the HEAD branch of
CVS.  I'll try to work it into the RHEL5 branch when I have to update
that next.  I think RHEL5 needs a bugzilla record, so that one may not
get done for a little while.

Regards,

Bob Peterson
Red Hat GFS

[Cluster-devel] [PATCH][GFS2] Lockup on error

2008-01-19 Thread Bob Peterson

Hi,

I spotted this bug while I was digging around.  Looks like it could cause
a lockup in some rare error condition.

Regards,

Bob Peterson
--
Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
 fs/gfs2/inode.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index c84764a..728d316 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -860,7 +860,7 @@ static int link_dinode(struct gfs2_inode *dip, const struct 
qstr *name,
 
error = alloc_required = gfs2_diradd_alloc_required(dip-i_inode, 
name);
if (alloc_required  0)
-   goto fail;
+   goto fail_quota_locks;
if (alloc_required) {
error = gfs2_quota_check(dip, dip-i_inode.i_uid, 
dip-i_inode.i_gid);
if (error)

[Cluster-devel] [GFS2 PATCH]

2008-01-28 Thread Bob Peterson

Hi,

I noticed that the latest change to i_height got rid of the
value from the inode dump.  This patch adds it back.

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
 fs/gfs2/inode.c   |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 084c741..11c5ce4 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
- * Copyright (C) 2004-2006 Red Hat, Inc.  All rights reserved.
+ * Copyright (C) 2004-2008 Red Hat, Inc.  All rights reserved.
  *
  * This copyrighted material is made available to anyone wishing to use,
  * modify, copy, or redistribute it subject to the terms and conditions
@@ -1435,6 +1435,7 @@ void gfs2_dinode_print(const struct gfs2_inode *ip)
printk(KERN_INFO   di_goal_data = %llu\n,
   (unsigned long long)di-di_goal_data);
printk(KERN_INFO   di_flags = 0x%.8X\n, di-di_flags);
+   printk(KERN_INFO   i_height = %u\n, ip-i_height);
printk(KERN_INFO   di_depth = %u\n, di-di_depth);
printk(KERN_INFO   di_entries = %u\n, di-di_entries);
printk(KERN_INFO   di_eattr = %llu\n,

[Cluster-devel] [GFS2 PATCH] Only do lo_incore_commit once

2008-01-28 Thread Bob Peterson

Hi,

This patch is performance related.  When we're doing a log flush,
I noticed we were calling buf_lo_incore_commit twice: once for
data bufs and once for metadata bufs.  Since this is the same
function and does the same thing in both cases, there should be
no reason to call it twice.  Since we only need to call it once,
we can also make it faster by removing it from the generic lops
code and making it a stand-along static function. 

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
 fs/gfs2/incore.h  |1 -
 fs/gfs2/log.c |   17 -
 fs/gfs2/lops.c|   17 -
 fs/gfs2/lops.h|   11 +--
 4 files changed, 17 insertions(+), 28 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 010eb70..c1518f0 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -44,7 +44,6 @@ struct gfs2_log_header_host {
 
 struct gfs2_log_operations {
void (*lo_add) (struct gfs2_sbd *sdp, struct gfs2_log_element *le);
-   void (*lo_incore_commit) (struct gfs2_sbd *sdp, struct gfs2_trans *tr);
void (*lo_before_commit) (struct gfs2_sbd *sdp);
void (*lo_after_commit) (struct gfs2_sbd *sdp, struct gfs2_ail *ai);
void (*lo_before_scan) (struct gfs2_jdesc *jd,
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 161ab6f..b335304 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -779,6 +779,21 @@ static void log_refund(struct gfs2_sbd *sdp, struct 
gfs2_trans *tr)
gfs2_log_unlock(sdp);
 }
 
+static void buf_lo_incore_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
+{
+   struct list_head *head = tr-tr_list_buf;
+   struct gfs2_bufdata *bd;
+
+   gfs2_log_lock(sdp);
+   while (!list_empty(head)) {
+   bd = list_entry(head-next, struct gfs2_bufdata, bd_list_tr);
+   list_del_init(bd-bd_list_tr);
+   tr-tr_num_buf--;
+   }
+   gfs2_log_unlock(sdp);
+   gfs2_assert_warn(sdp, !tr-tr_num_buf);
+}
+
 /**
  * gfs2_log_commit - Commit a transaction to the log
  * @sdp: the filesystem
@@ -790,7 +805,7 @@ static void log_refund(struct gfs2_sbd *sdp, struct 
gfs2_trans *tr)
 void gfs2_log_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
 {
log_refund(sdp, tr);
-   lops_incore_commit(sdp, tr);
+   buf_lo_incore_commit(sdp, tr);
 
sdp-sd_vfs-s_dirt = 1;
up_read(sdp-sd_log_flush_lock);
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index fae59d6..7138737 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -152,21 +152,6 @@ out:
unlock_buffer(bd-bd_bh);
 }
 
-static void buf_lo_incore_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
-{
-   struct list_head *head = tr-tr_list_buf;
-   struct gfs2_bufdata *bd;
-
-   gfs2_log_lock(sdp);
-   while (!list_empty(head)) {
-   bd = list_entry(head-next, struct gfs2_bufdata, bd_list_tr);
-   list_del_init(bd-bd_list_tr);
-   tr-tr_num_buf--;
-   }
-   gfs2_log_unlock(sdp);
-   gfs2_assert_warn(sdp, !tr-tr_num_buf);
-}
-
 static void buf_lo_before_commit(struct gfs2_sbd *sdp)
 {
struct buffer_head *bh;
@@ -737,7 +722,6 @@ static void databuf_lo_after_commit(struct gfs2_sbd *sdp, 
struct gfs2_ail *ai)
 
 const struct gfs2_log_operations gfs2_buf_lops = {
.lo_add = buf_lo_add,
-   .lo_incore_commit = buf_lo_incore_commit,
.lo_before_commit = buf_lo_before_commit,
.lo_after_commit = buf_lo_after_commit,
.lo_before_scan = buf_lo_before_scan,
@@ -763,7 +747,6 @@ const struct gfs2_log_operations gfs2_rg_lops = {
 
 const struct gfs2_log_operations gfs2_databuf_lops = {
.lo_add = databuf_lo_add,
-   .lo_incore_commit = buf_lo_incore_commit,
.lo_before_commit = databuf_lo_before_commit,
.lo_after_commit = databuf_lo_after_commit,
.lo_scan_elements = databuf_lo_scan_elements,
diff --git a/fs/gfs2/lops.h b/fs/gfs2/lops.h
index 41a00df..3c0b273 100644
--- a/fs/gfs2/lops.h
+++ b/fs/gfs2/lops.h
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
- * Copyright (C) 2004-2006 Red Hat, Inc.  All rights reserved.
+ * Copyright (C) 2004-2008 Red Hat, Inc.  All rights reserved.
  *
  * This copyrighted material is made available to anyone wishing to use,
  * modify, copy, or redistribute it subject to the terms and conditions
@@ -57,15 +57,6 @@ static inline void lops_add(struct gfs2_sbd *sdp, struct 
gfs2_log_element *le)
le-le_ops-lo_add(sdp, le);
 }
 
-static inline void lops_incore_commit(struct gfs2_sbd *sdp,
- struct gfs2_trans *tr)
-{
-   int x;
-   for (x = 0; gfs2_log_ops[x]; x++)
-   if (gfs2_log_ops[x]-lo_incore_commit)
-   gfs2_log_ops[x]-lo_incore_commit(sdp, tr);
-}
-
 static inline void lops_before_commit(struct gfs2_sbd *sdp)
 {
int x;

[Cluster-devel] [GFS2 PATCH] Misc fixups

2008-01-28 Thread Bob Peterson

Hi,

This patch contains two small fixups that didn't fit elsewhere.
They are: (1) get rid of temp variable in find_metapath.
(2) Remove vestigial ret variable from gfs2_writepage_common.

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
 fs/gfs2/bmap.c|3 +--
 fs/gfs2/ops_address.c |4 +---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 2a90084..359231e 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -305,11 +305,10 @@ static void find_metapath(struct gfs2_inode *ip, u64 
block,
  struct metapath *mp)
 {
struct gfs2_sbd *sdp = GFS2_SB(ip-i_inode);
-   u64 b = block;
unsigned int i;
 
for (i = ip-i_height; i--;)
-   mp-mp_list[i] = do_div(b, sdp-sd_inptrs);
+   mp-mp_list[i] = do_div(block, sdp-sd_inptrs);
 
 }
 
diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c
index 38dbe99..e601016 100644
--- a/fs/gfs2/ops_address.c
+++ b/fs/gfs2/ops_address.c
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
- * Copyright (C) 2004-2007 Red Hat, Inc.  All rights reserved.
+ * Copyright (C) 2004-2008 Red Hat, Inc.  All rights reserved.
  *
  * This copyrighted material is made available to anyone wishing to use,
  * modify, copy, or redistribute it subject to the terms and conditions
@@ -104,11 +104,9 @@ static int gfs2_writepage_common(struct page *page,
loff_t i_size = i_size_read(inode);
pgoff_t end_index = i_size  PAGE_CACHE_SHIFT;
unsigned offset;
-   int ret = -EIO;
 
if (gfs2_assert_withdraw(sdp, gfs2_glock_is_held_excl(ip-i_gl)))
goto out;
-   ret = 0;
if (current-journal_info)
goto redirty;
/* Is the page fully outside i_size? (truncate in progress) */

[Cluster-devel] [GFS2 PATCH] Plug an unlikely leak

2008-01-28 Thread Bob Peterson

--
 fs/gfs2/lops.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 7138737..4390f6f 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -404,8 +404,10 @@ static int revoke_lo_scan_elements(struct gfs2_jdesc *jd, 
unsigned int start,
blkno = be64_to_cpu(*(__be64 *)(bh-b_data + offset));
 
error = gfs2_revoke_add(sdp, blkno, start);
-   if (error  0)
+   if (error  0) {
+   brelse(bh);
return error;
+   }
else if (error)
sdp-sd_found_revokes++;

[Cluster-devel] [GFS2 PATCH] Allocate gfs2_rgrpd from slab memory

2008-01-28 Thread Bob Peterson

Hi,

This patch moves the gfs2_rgrpd structure to its own slab
memory.  This makes it easier to control and monitor, and 
yields less memory fragmentation.

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
 fs/gfs2/main.c |   10 ++
 fs/gfs2/rgrp.c |4 ++--
 fs/gfs2/util.c |1 +
 fs/gfs2/util.h |1 +
 4 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
index 9c7765c..053e2eb 100644
--- a/fs/gfs2/main.c
+++ b/fs/gfs2/main.c
@@ -89,6 +89,12 @@ static int __init init_gfs2_fs(void)
if (!gfs2_bufdata_cachep)
goto fail;
 
+   gfs2_rgrpd_cachep = kmem_cache_create(gfs2_rgrpd,
+ sizeof(struct gfs2_rgrpd),
+ 0, 0, NULL);
+   if (!gfs2_rgrpd_cachep)
+   goto fail;
+
error = register_filesystem(gfs2_fs_type);
if (error)
goto fail;
@@ -108,6 +114,9 @@ fail_unregister:
 fail:
gfs2_glock_exit();
 
+   if (gfs2_rgrpd_cachep)
+   kmem_cache_destroy(gfs2_rgrpd_cachep);
+
if (gfs2_bufdata_cachep)
kmem_cache_destroy(gfs2_bufdata_cachep);
 
@@ -133,6 +142,7 @@ static void __exit exit_gfs2_fs(void)
unregister_filesystem(gfs2_fs_type);
unregister_filesystem(gfs2meta_fs_type);
 
+   kmem_cache_destroy(gfs2_rgrpd_cachep);
kmem_cache_destroy(gfs2_bufdata_cachep);
kmem_cache_destroy(gfs2_inode_cachep);
kmem_cache_destroy(gfs2_glock_cachep);
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 7b9d6f1..dc7e83e 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -353,7 +353,7 @@ static void clear_rgrpdi(struct gfs2_sbd *sdp)
}
 
kfree(rgd-rd_bits);
-   kfree(rgd);
+   kmem_cache_free(gfs2_rgrpd_cachep, rgd);
}
 }
 
@@ -516,7 +516,7 @@ static int read_rindex_entry(struct gfs2_inode *ip,
return error;
}
 
-   rgd = kzalloc(sizeof(struct gfs2_rgrpd), GFP_NOFS);
+   rgd = kmem_cache_zalloc(gfs2_rgrpd_cachep, GFP_NOFS);
error = -ENOMEM;
if (!rgd)
return error;
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 424a077..fe9c28e 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -25,6 +25,7 @@
 struct kmem_cache *gfs2_glock_cachep __read_mostly;
 struct kmem_cache *gfs2_inode_cachep __read_mostly;
 struct kmem_cache *gfs2_bufdata_cachep __read_mostly;
+struct kmem_cache *gfs2_rgrpd_cachep __read_mostly;
 
 void gfs2_assert_i(struct gfs2_sbd *sdp)
 {
diff --git a/fs/gfs2/util.h b/fs/gfs2/util.h
index 28938a4..ac0c567 100644
--- a/fs/gfs2/util.h
+++ b/fs/gfs2/util.h
@@ -147,6 +147,7 @@ gfs2_io_error_bh_i((sdp), (bh), __FUNCTION__, __FILE__, 
__LINE__);
 extern struct kmem_cache *gfs2_glock_cachep;
 extern struct kmem_cache *gfs2_inode_cachep;
 extern struct kmem_cache *gfs2_bufdata_cachep;
+extern struct kmem_cache *gfs2_rgrpd_cachep;
 
 static inline unsigned int gfs2_tune_get_i(struct gfs2_tune *gt,
   unsigned int *p)

[Cluster-devel] [GFS2 PATCH] Get rid of gl_waiters2

2008-01-28 Thread Bob Peterson

Hi,

This patch reduces memory by replacing the int variable
gl_waiters2 by a single bit in the gl_flags.

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
 fs/gfs2/glock.c  |7 ---
 fs/gfs2/incore.h |4 ++--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 894c70e..aa6f32e 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -594,11 +594,12 @@ static void run_queue(struct gfs2_glock *gl)
blocked = rq_mutex(gh);
} else if (test_bit(GLF_DEMOTE, gl-gl_flags)) {
blocked = rq_demote(gl);
-   if (gl-gl_waiters2  !blocked) {
+   if (test_bit(GLF_WAITERS2, gl-gl_flags) 
+!blocked) {
set_bit(GLF_DEMOTE, gl-gl_flags);
gl-gl_demote_state = LM_ST_UNLOCKED;
}
-   gl-gl_waiters2 = 0;
+   clear_bit(GLF_WAITERS2, gl-gl_flags);
} else if (!list_empty(gl-gl_waiters3)) {
gh = list_entry(gl-gl_waiters3.next,
struct gfs2_holder, gh_list);
@@ -704,7 +705,7 @@ static void handle_callback(struct gfs2_glock *gl, unsigned 
int state,
} else if (gl-gl_demote_state != LM_ST_UNLOCKED 
gl-gl_demote_state != state) {
if (test_bit(GLF_DEMOTE_IN_PROGRESS,  gl-gl_flags)) 
-   gl-gl_waiters2 = 1;
+   set_bit(GLF_WAITERS2, gl-gl_flags);
else 
gl-gl_demote_state = LM_ST_UNLOCKED;
}
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index ab682c0..217ecb0 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
- * Copyright (C) 2004-2007 Red Hat, Inc.  All rights reserved.
+ * Copyright (C) 2004-2008 Red Hat, Inc.  All rights reserved.
  *
  * This copyrighted material is made available to anyone wishing to use,
  * modify, copy, or redistribute it subject to the terms and conditions
@@ -167,6 +167,7 @@ enum {
GLF_DIRTY   = 5,
GLF_DEMOTE_IN_PROGRESS  = 6,
GLF_LFLUSH  = 7,
+   GLF_WAITERS2= 8,
 };
 
 struct gfs2_glock {
@@ -186,7 +187,6 @@ struct gfs2_glock {
struct list_head gl_holders;
struct list_head gl_waiters1;   /* HIF_MUTEX */
struct list_head gl_waiters3;   /* HIF_PROMOTE */
-   int gl_waiters2;/* GIF_DEMOTE */
 
const struct gfs2_glock_operations *gl_ops;

Re: [Cluster-devel] [GFS2] Add consts to various bits of rgrp.c

2008-01-29 Thread Bob Peterson

On Tue, 2008-01-29 at 14:32 +, Steven Whitehouse wrote:
 From 15792c4b1735f6859edb81a2038af39c298250d1 Mon Sep 17 00:00:00 2001
 From: Steven Whitehouse [EMAIL PROTECTED]
 Date: Tue, 29 Jan 2008 13:30:20 +
 Subject: [PATCH] [GFS2] Add consts to various bits of rgrp.c
 
 There are a couple of routines which scan bitmaps where we can
 mark the bitmaps const, plus a couple of call sites that can
 be updated too.
 
 Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]
 
 diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c

ACK fwiw

Regards,

Bob Peterson

Re: [Cluster-devel] [GFS2] Introduce array of buffers to struct metapath

2008-01-29 Thread Bob Peterson

On Tue, 2008-01-29 at 14:32 +, Steven Whitehouse wrote:
 From 9fdea9ec9922417b9c884edbf27db9f83600cf95 Mon Sep 17 00:00:00 2001
 From: Steven Whitehouse [EMAIL PROTECTED]
 Date: Tue, 29 Jan 2008 09:12:55 +
 Subject: [PATCH] [GFS2] Introduce array of buffers to struct metapath

 The reason for doing this is to allow all the block mapping code
 to share the same array. As a result we can remove two arguments
 from lookup_metapath since they are now returned via the array.

 We also add a function to drop all refs to buffer heads when we
 are done with the metapath. The build_height function shares the
 struct metapath, but currently still frees its own buffers, and
 this will change in a future patch.

 Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]

 diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c

ACK fwiw

Regards,

Bob Peterson

Re: [Cluster-devel] [GFS2] Move part of gfs2_block_map into a separate function

2008-01-29 Thread Bob Peterson

On Tue, 2008-01-29 at 14:31 +, Steven Whitehouse wrote:
 From cea26d2d43d69e88298154da7d023ab2af5eae7a Mon Sep 17 00:00:00 2001
 From: Steven Whitehouse [EMAIL PROTECTED]
 Date: Mon, 28 Jan 2008 15:10:29 +
 Subject: [PATCH] [GFS2] Move part of gfs2_block_map into a separate function
 
 This is required to enable future changes to the block
 mapping code.
 
 Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]
 
 diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c

ACK fwiw

Regards,

Bob Peterson

[Cluster-devel] [GFS2 PATCH] Eliminate gh_state

2008-01-29 Thread Bob Peterson

Hi,

This patch reduces the memory requirements of GFS2 by eliminating
the gh_state variable from the gfs2_holder structure.  With this
patch the state is now held in the upper two bits of the gh_flags.
There are new macros to fetch and/or set the new state flags.

I hesitated sending this patch just because of its sheer size.
Since gfs2_holder is used almost everywhere, the fix is quite
pervasive.  Also, I don't know the performance impact of it
(I haven't taken the time to study the impact).  On the one hand,
it could be slower because some functions that used to just
fetch the state are now required to do masking and shifting.
On the other hand, it could also be faster because the unified
flags field means that about one hundred calls with four
parameters have been replaced by calls with three parameters.

The good news is that there is no tricky logic here.  If the
macros are correct, the rest of the code changes are mostly
just direct substitutions.

So take it or leave it as you see fit.

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
 fs/gfs2/eattr.c   |   19 ++--
 fs/gfs2/glock.c   |   39 ---
 fs/gfs2/glock.h   |   28 
 fs/gfs2/glops.c   |2 +-
 fs/gfs2/incore.h  |   26 ++-
 fs/gfs2/inode.c   |   31 ---
 fs/gfs2/ops_address.c |   13 +++
 fs/gfs2/ops_dentry.c  |3 +-
 fs/gfs2/ops_export.c  |8 --
 fs/gfs2/ops_file.c|   27 ---
 fs/gfs2/ops_fstype.c  |   20 ++
 fs/gfs2/ops_inode.c   |   54 +
 fs/gfs2/ops_super.c   |7 +++--
 fs/gfs2/quota.c   |   16 --
 fs/gfs2/recovery.c|9 ---
 fs/gfs2/rgrp.c|   13 ++-
 fs/gfs2/super.c   |   31 +++
 fs/gfs2/trans.c   |5 ++-
 18 files changed, 196 insertions(+), 155 deletions(-)

diff --git a/fs/gfs2/eattr.c b/fs/gfs2/eattr.c
index 04febbc..deabf8a 100644
--- a/fs/gfs2/eattr.c
+++ b/fs/gfs2/eattr.c
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
- * Copyright (C) 2004-2006 Red Hat, Inc.  All rights reserved.
+ * Copyright (C) 2004-2008 Red Hat, Inc.  All rights reserved.
  *
  * This copyrighted material is made available to anyone wishing to use,
  * modify, copy, or redistribute it subject to the terms and conditions
@@ -250,7 +250,8 @@ static int ea_dealloc_unstuffed(struct gfs2_inode *ip, 
struct buffer_head *bh,
return -EIO;
}
 
-   error = gfs2_glock_nq_init(rgd-rd_gl, LM_ST_EXCLUSIVE, 0, rg_gh);
+   error = gfs2_glock_nq_init(rgd-rd_gl, gh_stflag(LM_ST_EXCLUSIVE),
+  rg_gh);
if (error)
return error;
 
@@ -411,7 +412,8 @@ int gfs2_ea_list(struct gfs2_inode *ip, struct 
gfs2_ea_request *er)
er-er_data_len = 0;
}
 
-   error = gfs2_glock_nq_init(ip-i_gl, LM_ST_SHARED, LM_FLAG_ANY, i_gh);
+   error = gfs2_glock_nq_init(ip-i_gl, gh_stflag(LM_ST_SHARED) |
+  LM_FLAG_ANY, i_gh);
if (error)
return error;
 
@@ -559,7 +561,8 @@ int gfs2_ea_get(struct gfs2_inode *ip, struct 
gfs2_ea_request *er)
er-er_data_len = 0;
}
 
-   error = gfs2_glock_nq_init(ip-i_gl, LM_ST_SHARED, LM_FLAG_ANY, i_gh);
+   error = gfs2_glock_nq_init(ip-i_gl, gh_stflag(LM_ST_SHARED) |
+  LM_FLAG_ANY, i_gh);
if (error)
return error;
 
@@ -1093,7 +1096,8 @@ int gfs2_ea_set(struct gfs2_inode *ip, struct 
gfs2_ea_request *er)
if (error)
return error;
 
-   error = gfs2_glock_nq_init(ip-i_gl, LM_ST_EXCLUSIVE, 0, i_gh);
+   error = gfs2_glock_nq_init(ip-i_gl, gh_stflag(LM_ST_EXCLUSIVE),
+  i_gh);
if (error)
return error;
 
@@ -1185,7 +1189,8 @@ int gfs2_ea_remove(struct gfs2_inode *ip, struct 
gfs2_ea_request *er)
if (!er-er_name_len || er-er_name_len  GFS2_EA_MAX_NAME_LEN)
return -EINVAL;
 
-   error = gfs2_glock_nq_init(ip-i_gl, LM_ST_EXCLUSIVE, 0, i_gh);
+   error = gfs2_glock_nq_init(ip-i_gl, gh_stflag(LM_ST_EXCLUSIVE),
+  i_gh);
if (error)
return error;
 
@@ -1429,7 +1434,7 @@ static int ea_dealloc_block(struct gfs2_inode *ip)
return -EIO;
}
 
-   error = gfs2_glock_nq_init(rgd-rd_gl, LM_ST_EXCLUSIVE, 0,
+   error = gfs2_glock_nq_init(rgd-rd_gl, gh_stflag(LM_ST_EXCLUSIVE),
   al-al_rgd_gh);
if (error)
return error;
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 78cd1cd..6e1f1ea 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -132,19 +132,20 @@ static inline rwlock_t

[Cluster-devel] [GFS2] Faster gfs2_bitfit algorithm

2008-03-07 Thread Bob Peterson

Hi,

This version of the gfs2_bitfit algorithm is up to four
times faster than its predecessor.

Regards,

Bob Peterson

Signed-off-by: Bob Peterson [EMAIL PROTECTED]
--
 fs/gfs2/rgrp.c |   79 +--
 1 files changed, 47 insertions(+), 32 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 4291375..4dee88b 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -33,6 +33,16 @@
 #define BFITNOENT ((u32)~0)
 #define NO_BLOCK ((u64)~0)
 
+#if BITS_PER_LONG == 32
+#define LBITMASK   (unsigned long)(0x)
+#define LBITSKIP55 (unsigned long)(0x)
+#define LBITSKIP00 (unsigned long)(0x)
+#else
+#define LBITMASK   (unsigned long)(0x)
+#define LBITSKIP55 (unsigned long)(0x)
+#define LBITSKIP00 (unsigned long)(0x)
+#endif
+
 /*
  * These routines are used by the resource group routines (rgrp.c)
  * to keep track of block allocation.  Each block is represented by two
@@ -138,43 +148,48 @@ static inline unsigned char gfs2_testbit(struct 
gfs2_rgrpd *rgd,
 static u32 gfs2_bitfit(const u8 *buffer, unsigned int buflen, u32 goal,
   u8 old_state)
 {
-   const u8 *byte;
-   u32 blk = goal;
-   unsigned int bit, bitlong;
-   const unsigned long *plong;
-#if BITS_PER_LONG == 32
-   const unsigned long plong55 = 0x;
-#else
-   const unsigned long plong55 = 0x;
-#endif
-
-   byte = buffer + (goal / GFS2_NBBY);
-   plong = (const unsigned long *)(buffer + (goal / GFS2_NBBY));
-   bit = (goal % GFS2_NBBY) * GFS2_BIT_SIZE;
-   bitlong = bit;
-
-   while (byte  buffer + buflen) {
-
-   if (bitlong == 0  old_state == 0  *plong == plong55) {
-   plong++;
-   byte += sizeof(unsigned long);
-   blk += sizeof(unsigned long) * GFS2_NBBY;
-   continue;
+   const u8 *byte, *start, *end;
+   int bit, startbit;
+   u32 g1, g2, misaligned;
+   unsigned long *plong, lskipval;
+   unsigned long lskipvals[4] = {LBITSKIP55, LBITSKIP00,
+ LBITSKIP55, LBITSKIP00};
+
+   lskipval = lskipvals[old_state];
+   g1 = (goal / GFS2_NBBY);
+   start = buffer + g1;
+   byte = start;
+end = buffer + buflen;
+   g2 = ALIGN(g1, sizeof(unsigned long));
+   plong = (unsigned long *)(buffer + g2);
+   startbit = bit = (goal % GFS2_NBBY) * GFS2_BIT_SIZE;
+   misaligned = g2 - g1;
+   while (byte  end) {
+
+   if (bit == 0  !misaligned) {
+   if (((*plong)  LBITMASK) == lskipval) {
+   plong++;
+   byte += sizeof(unsigned long);
+   continue;
+   }
+   }
+   if (((*byte  bit)  GFS2_BIT_MASK) == old_state) {
+   return goal +
+   (((byte - start) * GFS2_NBBY) +
+((bit - startbit)  1));
}
-   if (((*byte  bit)  GFS2_BIT_MASK) == old_state)
-   return blk;
bit += GFS2_BIT_SIZE;
-   if (bit = 8) {
+   if (bit = GFS2_NBBY * GFS2_BIT_SIZE) {
bit = 0;
byte++;
+   /* If we were misaligned, adjust the counter */
+   if (misaligned)
+   misaligned--;
+   else { /* If we were aligned, we're not anymore. */
+   misaligned += sizeof(unsigned long) - 1;
+   plong++;
+   }
}
-   bitlong += GFS2_BIT_SIZE;
-   if (bitlong = sizeof(unsigned long) * 8) {
-   bitlong = 0;
-   plong++;
-   }
-
-   blk++;
}
 
return BFITNOENT;

[Cluster-devel] [GFS2] Faster gfs2_bitfit algorithm

2008-03-10 Thread Bob Peterson

Hi,

This version of the gfs2_bitfit algorithm includes the latest
suggestions from Steve Whitehouse.  It is typically eight to
ten times faster than the version we're using today.  If there
is a lot of metadata mixed in (lots of small files) the
algorithm is often 15 times faster, and given the right
conditions, I've seen peaks of 20 times faster.

Regards,

Bob Peterson

Signed-off-by: Bob Peterson [EMAIL PROTECTED]
--
 fs/gfs2/rgrp.c |   93 ---
 1 files changed, 61 insertions(+), 32 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 4291375..7e8f0b1 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -14,6 +14,7 @@
 #include linux/fs.h
 #include linux/gfs2_ondisk.h
 #include linux/lm_interface.h
+#include linux/prefetch.h
 
 #include gfs2.h
 #include incore.h
@@ -33,6 +34,16 @@
 #define BFITNOENT ((u32)~0)
 #define NO_BLOCK ((u64)~0)
 
+#if BITS_PER_LONG == 32
+#define LBITMASK   (0xUL)
+#define LBITSKIP55 (0xUL)
+#define LBITSKIP00 (0xUL)
+#else
+#define LBITMASK   (0xUL)
+#define LBITSKIP55 (0xUL)
+#define LBITSKIP00 (0xUL)
+#endif
+
 /*
  * These routines are used by the resource group routines (rgrp.c)
  * to keep track of block allocation.  Each block is represented by two
@@ -138,45 +149,63 @@ static inline unsigned char gfs2_testbit(struct 
gfs2_rgrpd *rgd,
 static u32 gfs2_bitfit(const u8 *buffer, unsigned int buflen, u32 goal,
   u8 old_state)
 {
-   const u8 *byte;
-   u32 blk = goal;
-   unsigned int bit, bitlong;
-   const unsigned long *plong;
-#if BITS_PER_LONG == 32
-   const unsigned long plong55 = 0x;
-#else
-   const unsigned long plong55 = 0x;
-#endif
-
-   byte = buffer + (goal / GFS2_NBBY);
-   plong = (const unsigned long *)(buffer + (goal / GFS2_NBBY));
-   bit = (goal % GFS2_NBBY) * GFS2_BIT_SIZE;
-   bitlong = bit;
-
-   while (byte  buffer + buflen) {
-
-   if (bitlong == 0  old_state == 0  *plong == plong55) {
-   plong++;
-   byte += sizeof(unsigned long);
-   blk += sizeof(unsigned long) * GFS2_NBBY;
-   continue;
+   const u8 *byte, *start, *end;
+   int bit, startbit;
+   u32 g1, g2, misaligned;
+   unsigned long *plong;
+   unsigned long lskipval;
+
+   lskipval = (old_state  GFS2_BLKST_USED) ? LBITSKIP00 : LBITSKIP55;
+   g1 = (goal / GFS2_NBBY);
+   start = buffer + g1;
+   byte = start;
+end = buffer + buflen;
+   g2 = ALIGN(g1, sizeof(unsigned long));
+   plong = (unsigned long *)(buffer + g2);
+   startbit = bit = (goal % GFS2_NBBY) * GFS2_BIT_SIZE;
+   misaligned = g2 - g1;
+   if (!misaligned)
+   goto ulong_aligned;
+/* parse the bitmap a byte at a time */
+misaligned:
+   while (byte  end) {
+   if (((*byte  bit)  GFS2_BIT_MASK) == old_state) {
+   return goal +
+   (((byte - start) * GFS2_NBBY) +
+((bit - startbit)  1));
}
-   if (((*byte  bit)  GFS2_BIT_MASK) == old_state)
-   return blk;
bit += GFS2_BIT_SIZE;
-   if (bit = 8) {
+   if (bit = GFS2_NBBY * GFS2_BIT_SIZE) {
bit = 0;
byte++;
+   misaligned--;
+   if (!misaligned) {
+   plong = (unsigned long *)byte;
+   goto ulong_aligned;
+   }
}
-   bitlong += GFS2_BIT_SIZE;
-   if (bitlong = sizeof(unsigned long) * 8) {
-   bitlong = 0;
-   plong++;
-   }
-
-   blk++;
}
+   return BFITNOENT;
 
+/* parse the bitmap a unsigned long at a time */
+ulong_aligned:
+   /* Stop at end - 1 or else prefetch can go past the end and segfault.
+  We could if it but we'd lose some of the performance gained.
+  This way will only slow down searching the very last 4/8 bytes
+  depending on architecture.  I've experimented with several ways
+  of writing this section such as using an else before the goto
+  but this one seems to be the fastest. */
+   while ((unsigned char *)plong  end - 1) {
+   prefetch(plong + 1);
+   if (((*plong)  LBITMASK) != lskipval)
+   break;
+   plong++;
+   }
+   if ((unsigned char *)plong  end) {
+   byte = (const u8 *)plong;
+   misaligned += sizeof(unsigned long) - 1;
+   goto misaligned;
+   }
return BFITNOENT;
 }

Re: [Cluster-devel] Cluster Project tag, gfs_6_1_16, created. gfs-kernel_2_6_9_76-16-gc118d0c

2008-03-19 Thread Bob Peterson

On Wed, 2008-03-19 at 06:12 +0100, Fabio M. Di Nitto wrote:
 Is this commit suitable for master / STABLE2 branch?
 
 Thanks
 Fabio

Hi Fabio,

Done for both fixes; sorry about that.  I'm still getting used 
to how we are supposed to do things in git.

Regards,

Bob Peterson

[Cluster-devel] RHEL4 Test Patch: bz 345401

2008-03-19 Thread Bob Peterson

Hi,

This is a RHEL5.x gfs2 patch for bug #345401.  I just thought I'd
toss it out here to see if this fix makes sense and get comments
from people.  My explanation will be longer than the patch itself.

It seems like there were a couple things going on here, so I'll
detail each part of the fix.  I'll work from the bottom up because
the code change at the bottom of the patch may be the most
important one.

In the failing scenario, millions (literally) of files are being
opened, written, truncated and closed.  In many cases, the block
numbers used for dinodes are being reused, resulting in the glocks
and gfs2_inode structures that are nearly indistinguishable from
previous incarnations.  Some of these structures may still be in
memory and therein lies the problem.

(1) First change, at the bottom of the patch:

First, in inode_go_lock, when a glock was locked, it could call
gfs2_truncatei_resume on new gfs2_inodes in cases where the flags
were not yet set.  There is a GL_SKIP gh_flag to indicate whether
the gfs2_inode is new, so I added code to skip the truncate
resume path in that case.  So a reused gfs2_inode / glock should
no longer call the truncatei_resume code path anymore.

(2) Second change, in the middle of the patch:

Second, in function scan_glock, the code was putting glocks on 
the reclaim list if there were no holders.  However, it may be
possible that there is still someone waiting for the glock, (i.e.
on the waiters1, 3 list, or waiters2).  There might also be
items on the active items list.  So I changed it so it would not
put glocks on the reclaim list if they have any waiters or
active items.  I'm not sure if the waiters part of this fix
is necessary because if there are waiters pending, they should
be immediately granted the glock and become holders themselves.
I don't know if scan_glock can get in there while this is happening.

(3) Third change, at the top of the patch:

In function search_bucket, it is looking for a glock to use,
searching for it by hash key.  If the dinode block was reused,
it can have the same hash key.  My theory is that the glock
was being accessed while it was on the reclaim list, and then
the reclaim daemon was nuking it part-way through some operations
(during times when the glock was released).  The reclaim daemon
just blasts all glocks on the reclaim list.  The code change is
to make the code not find glocks by hash-key if the glock in
question is on the reclaim list because you can't guarantee
the thing won't go away at an inconvenient time.
Previous instrumentation indicated this was happening.

Regards,

Bob Peterson
--
diff -pur a/fs/gfs2/glock.c b/fs/gfs2/glock.c
--- a/fs/gfs2/glock.c   2008-02-22 10:42:48.0 -0600
+++ b/fs/gfs2/glock.c   2008-03-19 10:40:23.0 -0500
@@ -251,6 +251,8 @@ static struct gfs2_glock *search_bucket(
continue;
if (gl-gl_sbd != sdp)
continue;
+   if (!list_empty(gl-gl_reclaim))
+   continue;
 
atomic_inc(gl-gl_ref);
 
@@ -1688,6 +1690,12 @@ static void scan_glock(struct gfs2_glock
if (gl-gl_ops == gfs2_inode_glops  gl-gl_object)
return;
 
+   if (!list_empty(gl-gl_waiters1) ||
+   !list_empty(gl-gl_waiters3) ||
+   !list_empty(gl-gl_ail_list) ||
+   test_bit(GLF_WAITERS2, gl-gl_flags))
+   return;
+
if (gfs2_glmutex_trylock(gl)) {
if (list_empty(gl-gl_holders) 
gl-gl_state != LM_ST_UNLOCKED  demote_ok(gl))
diff -pur a/fs/gfs2/glops.c b/fs/gfs2/glops.c
--- a/fs/gfs2/glops.c   2008-02-22 10:42:48.0 -0600
+++ b/fs/gfs2/glops.c   2008-03-19 10:41:00.0 -0500
@@ -306,7 +306,7 @@ static int inode_go_lock(struct gfs2_hol
struct gfs2_inode *ip = gl-gl_object;
int error = 0;
 
-   if (!ip)
+   if (!ip || (gh-gh_flags  GL_SKIP))
return 0;
 
if (test_bit(GIF_INVALID, ip-i_flags)) {

Re: [Cluster-devel] RHEL4 Test Patch: bz 345401

2008-03-19 Thread Bob Peterson

Obviously the subject should read RHEL5, not RHEL4, since
this is gfs2.  I'm making this distinction because I don't
want it to be mistaken for a patch to the upstream gfs2.

[Cluster-devel] [GFS2 patch] Make dump_glock a bit more friendly

2008-04-16 Thread Bob Peterson

Hi,

This patch makes the glock dump a little more user-friendly.
My primary goal was to get rid of the very-misleading report
of the glock being (unlocked) based on gl_flag, but it goes
a step further.  If it's too verbose, feel free to say no.

Regards,

Bob Peterson
--
 fs/gfs2/glock.c |   34 +++---
 1 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index d636b3e..53396e7 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -122,6 +122,20 @@ static inline rwlock_t *gl_lock_addr(unsigned int x)
 }
 #endif
 
+const char *gl_flags[] = {,
+ GLF_LOCK,
+ GLF_STICKY,
+ GLF_DEMOTE,
+ GLF_PENDING_DEMOTE,
+ GLF_DIRTY
+};
+
+const char *gl_states[] = {LM_ST_UNLOCKED,
+  LM_ST_EXCLUSIVE,
+  LM_ST_DEFERRED,
+  LM_ST_SHARED
+};
+
 /**
  * relaxed_state_ok - is a requested lock compatible with the current lock 
mode?
  * @actual: the current state of the lock
@@ -1903,6 +1917,7 @@ static int dump_glock(struct glock_iter *gi, struct 
gfs2_glock *gl)
unsigned int x;
int error = -ENOBUFS;
struct task_struct *gl_owner;
+   int first, count;
 
spin_lock(gl-gl_spin);
 
@@ -1913,11 +1928,22 @@ static int dump_glock(struct glock_iter *gi, struct 
gfs2_glock *gl)
if (test_bit(x, gl-gl_flags))
print_dbg(gi,  %u, x);
}
-   if (!test_bit(GLF_LOCK, gl-gl_flags))
-   print_dbg(gi,  (unlocked));
+   first = 1;
+   count = 0;
+   for (x = GLF_LOCK; x = GLF_DIRTY; x++) {
+   if (test_bit(x, gl-gl_flags)) {
+   print_dbg(gi, %c, first ? '(' : '|');
+   print_dbg(gi, %s, gl_flags[x]);
+   first = 0;
+   count++;
+   }
+   }
+   if (count)
+   print_dbg(gi, ));
print_dbg(gi,  \n);
print_dbg(gi,   gl_ref = %d\n, atomic_read(gl-gl_ref));
-   print_dbg(gi,   gl_state = %u\n, gl-gl_state);
+   print_dbg(gi,   gl_state = %u (%s)\n, gl-gl_state,
+ gl_states[gl-gl_state  0x3]);
if (gl-gl_owner_pid) {
gl_owner = pid_task(gl-gl_owner_pid, PIDTYPE_PID);
if (gl_owner)
@@ -1932,6 +1958,8 @@ static int dump_glock(struct glock_iter *gi, struct 
gfs2_glock *gl)
print_dbg(gi,   req_gh = %s\n, (gl-gl_req_gh) ? yes : no);
print_dbg(gi,   lvb_count = %d\n, atomic_read(gl-gl_lvb_count));
print_dbg(gi,   object = %s\n, (gl-gl_object) ? yes : no);
+   print_dbg(gi,   le = %s\n,
+  (list_empty(gl-gl_le.le_list)) ? no : yes);
print_dbg(gi,   reclaim = %s\n,
   (list_empty(gl-gl_reclaim)) ? no : yes);
if (gl-gl_aspace)

[Cluster-devel] [GFS2 patch] Performance gain creating dinodes

2008-04-16 Thread Bob Peterson

Hi,

In function gfs2_inode_lookup, it calls gfs2_glock_get to fetch
the glock associated with the inode.  However, when a new inode
is being created, the calling function, gfs2_createi, already
has that information.  By passing the existing glock in as a
parameter, it can avoid looking up the glock in the hash table,
which involves locking the hash table and possible contention.

Regards,

Bob Peterson
Red Hat Clustering  GFS
--
 fs/gfs2/dir.c|3 ++-
 fs/gfs2/inode.c  |   18 --
 fs/gfs2/inode.h  |3 ++-
 fs/gfs2/ops_export.c |2 +-
 fs/gfs2/ops_fstype.c |2 +-
 fs/gfs2/rgrp.c   |2 +-
 7 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index eed040d..bd18ffc 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -1501,7 +1501,8 @@ struct inode *gfs2_dir_search(struct inode *dir, const 
struct qstr *name)
inode = gfs2_inode_lookup(dir-i_sb, 
be16_to_cpu(dent-de_type),
be64_to_cpu(dent-de_inum.no_addr),
-   be64_to_cpu(dent-de_inum.no_formal_ino), 0);
+   be64_to_cpu(dent-de_inum.no_formal_ino), 0,
+   NULL);
brelse(bh);
return inode;
}
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 3a9ef52..8e69d1f 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -166,10 +166,11 @@ void gfs2_set_iop(struct inode *inode)
  * Returns: A VFS inode, or an error
  */
 
-struct inode *gfs2_inode_lookup(struct super_block *sb, 
+struct inode *gfs2_inode_lookup(struct super_block *sb,
unsigned int type,
u64 no_addr,
-   u64 no_formal_ino, int skip_freeing)
+   u64 no_formal_ino, int skip_freeing,
+   struct gfs2_glock *existing_gl)
 {
struct inode *inode;
struct gfs2_inode *ip;
@@ -190,9 +191,14 @@ struct inode *gfs2_inode_lookup(struct super_block *sb,
inode-i_private = ip;
ip-i_no_formal_ino = no_formal_ino;
 
-   error = gfs2_glock_get(sdp, no_addr, gfs2_inode_glops, CREATE, 
ip-i_gl);
-   if (unlikely(error))
-   goto fail;
+   if (existing_gl) {
+   ip-i_gl = existing_gl;
+   atomic_inc(ip-i_gl-gl_ref);
+   } else {
+   error = gfs2_glock_get(sdp, no_addr, gfs2_inode_glops, 
CREATE, ip-i_gl);
+   if (unlikely(error))
+   goto fail;
+   }
ip-i_gl-gl_object = ip;
 
error = gfs2_glock_get(sdp, no_addr, gfs2_iopen_glops, CREATE, 
io_gl);
@@ -1016,7 +1022,7 @@ struct inode *gfs2_createi(struct gfs2_holder *ghs, const 
struct qstr *name,
 
inode = gfs2_inode_lookup(dir-i_sb, IF2DT(mode),
inum.no_addr,
-   inum.no_formal_ino, 0);
+ inum.no_formal_ino, 0, ghs[1].gh_gl);
if (IS_ERR(inode))
goto fail_gunlock2;
 
diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h
index 580da45..072dce3 100644
--- a/fs/gfs2/inode.h
+++ b/fs/gfs2/inode.h
@@ -76,7 +76,8 @@ void gfs2_inode_attr_in(struct gfs2_inode *ip);
 void gfs2_set_iop(struct inode *inode);
 struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned type, 
u64 no_addr, u64 no_formal_ino,
-   int skip_freeing);
+   int skip_freeing,
+   struct gfs2_glock *existing_gl);
 struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr);
 
 int gfs2_inode_refresh(struct gfs2_inode *ip);
diff --git a/fs/gfs2/ops_export.c b/fs/gfs2/ops_export.c
index 990d9f4..65d8630 100644
--- a/fs/gfs2/ops_export.c
+++ b/fs/gfs2/ops_export.c
@@ -203,7 +203,7 @@ static struct dentry *gfs2_get_dentry(struct super_block 
*sb,
 
inode = gfs2_inode_lookup(sb, DT_UNKNOWN,
inum-no_addr,
-   0, 0);
+ 0, 0, NULL);
if (IS_ERR(inode)) {
error = PTR_ERR(inode);
goto fail;
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index ef9c6c4..f1e70ef 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -228,7 +228,7 @@ fail:
 static inline struct inode *gfs2_lookup_root(struct super_block *sb,
 u64 no_addr)
 {
-   return gfs2_inode_lookup(sb, DT_DIR, no_addr, 0, 0);
+   return gfs2_inode_lookup(sb, DT_DIR, no_addr, 0, 0, NULL);
 }
 
 static int init_sb(struct gfs2_sbd *sdp, int silent, int undo)
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c

[Cluster-devel] [GFS2 Patch] bz 450156 - kernel panic mounting volume

2008-06-09 Thread Bob Peterson

Hi,

This patch fixes bugzilla bug 450156.

This started with a not-too-improbable mount failure because the
locking protocol was never set back to its proper lock_dlm after the
system was rebooted in the middle of a gfs2_fsck.  That left a
(purposely) invalid locking protocol in the superblock, which caused an
error when the file system was mounted the next time.

When there's an error mounting, vfs calls DQUOT_OFF, which calls
vfs_quota_off which calls gfs2_sync_fs.  Next, gfs2_sync_fs calls
gfs2_log_flush passing s_fs_info.  But due to the error, s_fs_info
had been previously set to NULL, and so we have the kernel oops.

My solution in this patch is to test for the NULL value before passing
it.  I tested this patch and it fixes the problem.

I believe the problem was caused due to changes in what the DQUOTA_OFF
macro does in newer kernels.  I could not recreate the
problem on a RHEL kernel and don't believe this affects RHEL.

Regards,

Bob Peterson

Signed-off-by: Bob Peterson [EMAIL PROTECTED] 
--
 fs/gfs2/ops_super.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/ops_super.c b/fs/gfs2/ops_super.c
index 0b7cc92..6690792 100644
--- a/fs/gfs2/ops_super.c
+++ b/fs/gfs2/ops_super.c
@@ -155,7 +155,7 @@ static void gfs2_write_super(struct super_block *sb)
 static int gfs2_sync_fs(struct super_block *sb, int wait)
 {
sb-s_dirt = 0;
-   if (wait)
+   if (wait  sb-s_fs_info)
gfs2_log_flush(sb-s_fs_info, NULL);
return 0;
 }

Re: [Cluster-devel] Re: [Bug 10827] 2.6.26rc4 GFS2 oops.

2008-06-23 Thread Bob Peterson

On Sun, 2008-06-22 at 12:09 +0300, Adrian Bunk wrote:
 On Sat, Jun 14, 2008 at 10:12:03PM +0200, Rafael J. Wysocki wrote:
  This message has been generated automatically as a part of a report
  of recent regressions.
  
  The following bug entry is on the current list of known regressions
  from 2.6.25.  Please verify if it still should be listed.
  
  
  Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=10827
  Subject : 2.6.26rc4 GFS2 oops.
  Submitter   : Dave Jones [EMAIL PROTECTED]
  Date: 2008-05-27 15:44 (19 days old)
  References  : http://lkml.org/lkml/2008/5/27/297
 
 Dave, what is the status of this bug?
 
 It's currently listed as a 2.6.26-rc regression.
 
 Is it actually confirmed that 2.6.25 is fine?
 
 According to the thread of the bug report there should now be a bug 
 report in the Red Hat Bugzilla for it. Bug number?
 
 Thanks
 Adrian

Hi,

This appears to be a known bug.  There's a Fedora bugzilla record for
it here, which contains a patch to fix the problem:

https://bugzilla.redhat.com/show_bug.cgi?id=448866

The bug does not appear to be in 2.6.25; 2.6.25 is fine afaict.

Regards,

Bob Peterson
Red Hat GFS

Re: [Cluster-devel] Re: [Bug 10827] 2.6.26rc4 GFS2 oops.

2008-06-23 Thread Bob Peterson

 Yup, the patch in your Bugzilla is for code that is new in 2.6.26.
 
 Can you push your patch for inclusion into 2.6.26 so that 2.6.26 won't 
 get released with this regression?
 
 Thanks
 Adrian

Hi Adrian,

Unfortunately, I cannot.  All access to the gfs2 -nmw git tree is
controlled by Steve Whitehouse, and he is on vacation/holiday until
tomorrow.

I've submitted the patch to cluster-devel, so hopefully he'll push it
as soon as he returns tomorrow.

Regards,

Bob Peterson
Red Hat GFS

[Cluster-devel] [GFS2 Patch] bz458289: rm on multiple nodes causes panic

2008-08-11 Thread Bob Peterson

This patch fixes a problem whereby simultaneous delete operations
(e.g. rm -fR *) from multiple nodes on the same GFS2 file system
can cause kernel panics, hangs, and/or memory corruption.

Regards,

Bob Peterson

Signed-off-by: Bob Peterson [EMAIL PROTECTED]
--
 fs/gfs2/ops_inode.c |   24 ++--
 1 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c
index e2c62f7..a072c9a 100644
--- a/fs/gfs2/ops_inode.c
+++ b/fs/gfs2/ops_inode.c
@@ -288,25 +288,17 @@ static int gfs2_unlink(struct inode *dir, struct dentry 
*dentry)
gfs2_holder_init(rgd-rd_gl, LM_ST_EXCLUSIVE, 0, ghs + 2);
 
 
-   error = gfs2_glock_nq(ghs); /* parent */
-   if (error)
-   goto out_parent;
-
-   error = gfs2_glock_nq(ghs + 1); /* child */
-   if (error)
-   goto out_child;
-
-   error = gfs2_glock_nq(ghs + 2); /* rgrp */
+   error = gfs2_glock_nq_m(3, ghs);
if (error)
-   goto out_rgrp;
+   goto out;
 
error = gfs2_unlink_ok(dip, dentry-d_name, ip);
if (error)
-   goto out_rgrp;
+   goto out;
 
error = gfs2_trans_begin(sdp, 2*RES_DINODE + RES_LEAF + RES_RG_BIT, 0);
if (error)
-   goto out_rgrp;
+   goto out;
 
error = gfs2_dir_del(dip, dentry-d_name);
 if (error)
@@ -316,14 +308,10 @@ static int gfs2_unlink(struct inode *dir, struct dentry 
*dentry)
 
 out_end_trans:
gfs2_trans_end(sdp);
-   gfs2_glock_dq(ghs + 2);
-out_rgrp:
+out:
+   gfs2_glock_dq_m(3, ghs);
gfs2_holder_uninit(ghs + 2);
-   gfs2_glock_dq(ghs + 1);
-out_child:
gfs2_holder_uninit(ghs + 1);
-   gfs2_glock_dq(ghs);
-out_parent:
gfs2_holder_uninit(ghs);
gfs2_glock_dq_uninit(ri_gh);
return error;

[Cluster-devel] Re: [Linux-cluster] Feature Request: gfs_fsck has a yes to all response.

2009-02-23 Thread Bob Peterson

- Stewart Walters sp...@iinet.net.au wrote:
| I've just had a GFS volume massively corrupt itself.
(snip)
| Is there any way for the user to say Yes to all?
| 
| At least if the default choice was Yes when the Enter key was
| pressed, the user
| could hold down the Enter key until the entire list of blocks had been
| fixed.
| 
| Regards,
| 
| Stewart

Hi Stewart,

You should be able to do: gfs_fsck -y /dev/your/device and it will
answer 'y' to all the questions.  This is similar to other fscks.
You can also use -n to answer 'no' to all questions.

Regards,

Bob Peterson
Red Hat GFS

[Cluster-devel] Re: [Linux-cluster] Feature Request: gfs_fsck has a yes to all response.

2009-02-26 Thread Bob Peterson

- Stewart Walters sp...@iinet.net.au wrote:
| Thanks Bob, but I was sort of hoping for an option of doing yes to
| all once
| you've already started the fsck. I'm not exactly sure if there is any
| fallout to
| stopping gfs_fsck part way through it's repair operations on a broken
| GFS volume.
| 
| Not to worry - Does anyone know what's going on with that weird 'Fix
| bitmap for
| block' number continually changing when you press the Enter key?
| 
| I suppose this is a weird gfs_fsck display quirk when the response is
| not a 'y'
| or a 'n'.
| 
| Regards,
| 
| Stewart

Hi Stewart,

An a (all) answer to the yes/no questions is certainly easy to do.
I've thought about doing it, too, but AFAIK, none of the other fscks
allow this response.  I'll ask around and see if anyone objects.

Regards,

Bob Peterson
Red Hat GFS

Re: [Cluster-devel] GFS2: Fix panic in glock memory shrinker

2009-06-30 Thread Bob Peterson

- Benjamin Marzinski bmarz...@redhat.com wrote:
| To: cluster-devel@redhat.com
| Subject: GFS2: Fix panic in glock memory shrinker
| 
| It is possible for gfs2_shrink_glock_memory() to check a glock for
| demotion
| that's in the process of being freed by gfs2_glock_put().  In this
| case,
| gfs2_shrink_glock_memory() will acquire a new reference to this
| glock,
| and
| then try to free the glock itself when it drops the refernce.  To
| solve
| this, gfs2_shrink_glock_memory() just needs to check if the glock is
| in
| the process of being freed, and if so skip it without ever unlocking
| the
| lru_lock.
| 
| Signed-off-by: Benjamin Marzinski bmarz...@redhat.com
| ---
|  fs/gfs2/glock.c |4 
|  1 file changed, 4 insertions(+)
| 
| Index: kernel-upstream/fs/gfs2/glock.c
| ===
| --- kernel-upstream.orig/fs/gfs2/glock.c
| +++ kernel-upstream/fs/gfs2/glock.c
| @@ -1314,6 +1314,10 @@ static int gfs2_shrink_glock_memory(int 
|   list_del_init(gl-gl_lru);
|   atomic_dec(lru_count);
|  
| + /* Check if glock is about to be freed */
| + if (atomic_read(gl-gl_ref) == 0)
| + continue;
| +
|   /* Test for being demotable */
|   if (!test_and_set_bit(GLF_LOCK, gl-gl_flags)) {
|   gfs2_glock_hold(gl);

Hi,

ACKed by Bob Peterson rpete...@redhat.com

Regards,

Bob Peterson
Red Hat File Systems

Re: [Cluster-devel] waiting in init.d/cman

2009-08-05 Thread Bob Peterson

- Fabio M. Di Nitto fdini...@redhat.com wrote:
| automatically. A bit offtopic but perhaps it might be the case to
| revive
| the idea that mount.gfs2 should spawn gfs_controld if required and
| not
| running and libdlm would spawn dlm_controld.
| 
| Fabio

I seem to recall something about getting rid of the mount.gfs2 helper
altogether upstream.

Bob Peterson

Re: [Cluster-devel] GFS2 and uevents document

2009-08-13 Thread Bob Peterson

- Steven Whitehouse swhit...@redhat.com wrote:
| Hi,
| 
| Below is a first draft at a document explaining the uevents produced
| by
| GFS2. I'm intending to add it under
| linux-2.6/Documentation/filesystems/gfs-uevents.txt
| 
| Let me know if you spot anything thats wrong or could be better
| explained,
| 
| Steve.

Hi Steve,

Good writeup.  Here are some minor suggestions.

Bob Peterson
--
--- /home/msp/rpeterso/gfs_uevents.orig.txt 2009-08-13 10:02:25.0 
-0500
+++ /home/msp/rpeterso/gfs_uevents.bobs.txt 2009-08-13 10:10:33.0 
-0500
@@ -1,9 +1,9 @@
   uevents and GFS2
  ==
 
-During the lifetime of a GFS2 mount, a number of uevents are generated,
-this document explains what the events are and what they are used
-for (by gfs_controld in gfs2-utils)
+During the lifetime of a GFS2 mount, a number of uevents are generated.
+This document explains what the events are and what they are used
+for (by gfs_controld in gfs2-utils).
 
 A list of GFS2 uevents
 ---
@@ -12,26 +12,26 @@ A list of GFS2 uevents
 
 The ADD event occurs at mount time. It will always be the first
 uevent generated by the newly created filesystem. If the mount
-is successful, an ONLINE uevent will follow, if it is not successful
+is successful, an ONLINE uevent will follow.  If it is not successful
 then a REMOVE uevent will follow.
 
 The ADD uevent has two environment variables: SPECTATOR=[0|1]
-and RDONLY=[0|1] which specify the spectator (no journal assigned,
-implies a read-only mount) and read-only status of the filesystem
-respectively.
+and RDONLY=[0|1] that specify the spectator status (a read-only mount
+with no journal assigned), and read-only (with journal assigned) status
+of the filesystem respectively.
 
 2. ONLINE
 
 The ONLINE uevent is generated after a successful mount or remount. It
 has the same environment variables as the ADD uevent. The ONLINE
 uevent, along with the two environment variables for spectator and
-rdonly are a relatively recent addition (2.6.32-rc+) and will not
+RDONLY are a relatively recent addition (2.6.32-rc+) and will not
 be generated by older kernels.
 
 3. CHANGE
 
 The CHANGE uevent is used in two places. One is when reporting the
-sucessful mount of the filesystem by the first node (FIRSTMOUNT=Done).
+successful mount of the filesystem by the first node (FIRSTMOUNT=Done).
 This is used as a signal by gfs_controld that it is then ok for other
 nodes in the cluster to mount the filesystem.
 
@@ -40,7 +40,7 @@ of journal recovery for one of the files
 two environment variables, JID= which specifies the journal id which
 has just been recovered, and RECOVERY=[Done|Failed] to indicate the
 success (or otherwise) of the operation. These uevents are generated
-for every journal recovered whether it is during the initial mount
+for every journal recovered, whether it is during the initial mount
 process or as the result of gfs_controld requesting a specific journal
 recovery via the /sys/fs/gfs2/fsname/lock_module/recovery file.
 
@@ -79,7 +79,7 @@ able to join the cluster.
 
 2. LOCKPROTO=
 
-The LOCKPROTO is a string, again its value depends on that set
+The LOCKPROTO is a string, and its value depends on what is set
 on the mount command line, or via fstab. It will be either
 lock_nolock or lock_dlm. In the future other lock managers
 may be supported.
@@ -93,7 +93,7 @@ numeric journal id in all GFS2 uevents.
 4. UUID=
 
 With recent versions of gfs2-utils, mkfs.gfs2 writes a UUID
-into the filesystem superblock. If it exists, then this will
+into the filesystem superblock. If it exists, this will
 be included in every uevent relating to the filesystem.

Re: [Cluster-devel] [PATCH 1/4] libgfs2: Fix 'dubious one-bit signed bitfield' sparse errors

2009-08-19 Thread Bob Peterson

- Andrew Price a...@andrewprice.me.uk wrote:
| Fix these sparse errors:
| 
| libgfs2.h:575:11: error: dubious one-bit signed bitfield
| libgfs2.h:576:10: error: dubious one-bit signed bitfield
| libgfs2.h:577:13: error: dubious one-bit signed bitfield
| 
| Signed-off-by: Andrew Price a...@andrewprice.me.uk
| ---
|  gfs2/libgfs2/libgfs2.h |6 +++---
|  1 files changed, 3 insertions(+), 3 deletions(-)
| 
| diff --git a/gfs2/libgfs2/libgfs2.h b/gfs2/libgfs2/libgfs2.h
| index 558283d..b3c9483 100644
| --- a/gfs2/libgfs2/libgfs2.h
| +++ b/gfs2/libgfs2/libgfs2.h
| @@ -572,9 +572,9 @@ extern struct gfs2_inode *gfs_inode_get(struct
| gfs2_sbd *sdp,
|  /* gfs2_log.c */
|  struct gfs2_options {
|   char *device;
| - int yes:1;
| - int no:1;
| - int query:1;
| + unsigned int yes:1;
| + unsigned int no:1;
| + unsigned int query:1;
|  };
|  
|  #define MSG_DEBUG   7
| -- 
| 1.6.3.3
Hi,

ACKed by Bob Peterson rpete...@redhat.com

Regards,

Bob Peterson
Red Hat File Systems

Re: [Cluster-devel] [PATCH 2/4] gfs2_quota: Fix sparse error

2009-08-19 Thread Bob Peterson

- Andrew Price a...@andrewprice.me.uk wrote:
| Fix sparse error marked inline, but without a definition and make
| write_quota_internal static as it's only used in main.c
| 
| Signed-off-by: Andrew Price a...@andrewprice.me.uk
| ---
|  gfs2/quota/gfs2_quota.h |4 +---
|  gfs2/quota/main.c   |4 ++--
|  2 files changed, 3 insertions(+), 5 deletions(-)
| 
| diff --git a/gfs2/quota/gfs2_quota.h b/gfs2/quota/gfs2_quota.h
| index 462246f..f15c25b 100644
| --- a/gfs2/quota/gfs2_quota.h
| +++ b/gfs2/quota/gfs2_quota.h
| @@ -66,10 +66,8 @@ void cleanup(void);
|  void read_superblock(struct gfs2_sb *sb, struct gfs2_sbd *sdp);
|  void get_last_quota_id(int fd, uint32_t *max_id);
|  int is_valid_quota_list(int fd);
| -inline void read_quota_internal(int fd, unsigned int id, int id_type,
| 
| +void read_quota_internal(int fd, unsigned int id, int id_type,
|   struct gfs2_quota *q);
| -inline void write_quota_internal(int fd, unsigned int id, int
| id_type, 
| -  struct gfs2_quota *q);
|  void print_quota_list_warning(void);
|  
|  /*  check.c  */
| diff --git a/gfs2/quota/main.c b/gfs2/quota/main.c
| index 16692dd..f80313d 100644
| --- a/gfs2/quota/main.c
| +++ b/gfs2/quota/main.c
| @@ -308,7 +308,7 @@ read_superblock(struct gfs2_sb *sb, struct
| gfs2_sbd *sdp)
|   close(fd);
|  }
|  
| -inline void 
| +void
|  read_quota_internal(int fd, uint32_t id, int id_type, struct
| gfs2_quota *q)
|  {
|   /* seek to the appropriate offset in the quota file and read the 
| @@ -331,7 +331,7 @@ read_quota_internal(int fd, uint32_t id, int
| id_type, struct gfs2_quota *q)
|   gfs2_quota_in(q, buf);
|  }
|  
| -inline void 
| +static inline void
|  write_quota_internal(int fd, uint32_t id, int id_type, struct
| gfs2_quota *q)
|  {
|   /* seek to the appropriate offset in the quota file and read the
| -- 
| 1.6.3.3
Hi,

ACKed by Bob Peterson rpete...@redhat.com

Regards,

Bob Peterson
Red Hat File Systems

Re: [Cluster-devel] [PATCH 3/4] libgfs2: Fix Value stored is never read warnings

2009-08-19 Thread Bob Peterson

- Andrew Price a...@andrewprice.me.uk wrote:
| Building with CC=clang --analyze gave the following warnings:
| 
| - misc.c:126:8: warning: Although the value stored to 'ret' is used
| in
|   the enclosing expression, the value is never actually read from
| 'ret'
| - fs_geometry.c:82:3: warning: Value stored to 'rgsize_specified' is
|   never read
| - gfs1.c:308:3: warning: Value stored to 'f' is never read
| 
| This patch makes them go away.
| 
| Signed-off-by: Andrew Price a...@andrewprice.me.uk
| ---
(snip)
| + if (6 != ret)
Hi,

I prefer the form if (ret != 6) but this will work.
ACKed by Bob Peterson rpete...@redhat.com

Regards,

Bob Peterson
Red Hat File Systems

Re: [Cluster-devel] [PATCH 4/4] fsck.gfs2: Make block_mounters static

2009-08-19 Thread Bob Peterson

- Andrew Price a...@andrewprice.me.uk wrote:
| Make block_mounters static - it's only used in initialize.c
| 
| Signed-off-by: Andrew Price a...@andrewprice.me.uk
| ---
|  gfs2/fsck/initialize.c |2 +-
|  1 files changed, 1 insertions(+), 1 deletions(-)
| 
| diff --git a/gfs2/fsck/initialize.c b/gfs2/fsck/initialize.c
| index 8bf5782..789895e 100644
| --- a/gfs2/fsck/initialize.c
| +++ b/gfs2/fsck/initialize.c
| @@ -31,7 +31,7 @@
|   * Change the lock protocol so nobody can mount the fs
|   *
|   */
| -int block_mounters(struct gfs2_sbd *sbp, int block_em)
| +static int block_mounters(struct gfs2_sbd *sbp, int block_em)
|  {
|   if(block_em) {
|   /* verify it starts with lock_ */
| -- 
| 1.6.3.3
Hi,

ACKed by Bob Peterson rpete...@redhat.com

Regards,

Bob Peterson
Red Hat File Systems

[Cluster-devel] [GFS2 patch] Add -o errors=panic|withdraw mount options

2009-08-20 Thread Bob Peterson

Hi,

This patch adds -o errors=panic and -o errors=withdraw to the
gfs2 mount options.  The errors=withdraw option is today's
current behaviour, meaning to withdraw from the file system if a
non-serious gfs2 error occurs.  The new errors=panic option
tells gfs2 to force a kernel panic if a non-serious gfs2 file
system error occurs.  This may be useful, for example, where
fabric-level fencing is used that has no way to reboot (such as
fence_scsi).

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson rpete...@redhat.com 
--
 fs/gfs2/incore.h |7 +++
 fs/gfs2/ops_fstype.c |1 +
 fs/gfs2/super.c  |   36 
 fs/gfs2/util.c   |   41 +++--
 4 files changed, 71 insertions(+), 14 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 61801ad..1d11e6e 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -406,6 +406,12 @@ struct gfs2_statfs_change_host {
 #define GFS2_DATA_WRITEBACK1
 #define GFS2_DATA_ORDERED  2
 
+#define GFS2_ERRORS_DEFAULT GFS2_ERRORS_WITHDRAW
+#define GFS2_ERRORS_WITHDRAW0
+#define GFS2_ERRORS_CONTINUE1 /* place holder for future feature */
+#define GFS2_ERRORS_RO  2 /* place holder for future feature */
+#define GFS2_ERRORS_PANIC   3
+
 struct gfs2_args {
char ar_lockproto[GFS2_LOCKNAME_LEN];   /* Name of the Lock Protocol */
char ar_locktable[GFS2_LOCKNAME_LEN];   /* Name of the Lock Table */
@@ -422,6 +428,7 @@ struct gfs2_args {
unsigned int ar_data:2; /* ordered/writeback */
unsigned int ar_meta:1; /* mount metafs */
unsigned int ar_discard:1;  /* discard requests */
+   unsigned int ar_errors:2;   /* errors=withdraw | panic */
int ar_commit;  /* Commit interval */
 };
 
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 39021c0..165518a 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1168,6 +1168,7 @@ static int fill_super(struct super_block *sb, void *data, 
int silent)
sdp-sd_args.ar_quota = GFS2_QUOTA_DEFAULT;
sdp-sd_args.ar_data = GFS2_DATA_DEFAULT;
sdp-sd_args.ar_commit = 60;
+   sdp-sd_args.ar_errors = GFS2_ERRORS_DEFAULT;
 
error = gfs2_mount_args(sdp, sdp-sd_args, data);
if (error) {
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 85bd2bc..7a5c128 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -68,6 +68,8 @@ enum {
Opt_discard,
Opt_nodiscard,
Opt_commit,
+   Opt_err_withdraw,
+   Opt_err_panic,
Opt_error,
 };
 
@@ -97,6 +99,8 @@ static const match_table_t tokens = {
{Opt_discard, discard},
{Opt_nodiscard, nodiscard},
{Opt_commit, commit=%d},
+   {Opt_err_withdraw, errors=withdraw},
+   {Opt_err_panic, errors=panic},
{Opt_error, NULL}
 };
 
@@ -152,6 +156,11 @@ int gfs2_mount_args(struct gfs2_sbd *sdp, struct gfs2_args 
*args, char *options)
args-ar_localcaching = 1;
break;
case Opt_debug:
+   if (args-ar_errors == GFS2_ERRORS_PANIC) {
+   fs_info(sdp, -o debug and -o errors=panic 
+  are mutually exclusive.\n);
+   return -EINVAL;
+   }
args-ar_debug = 1;
break;
case Opt_nodebug:
@@ -205,6 +214,17 @@ int gfs2_mount_args(struct gfs2_sbd *sdp, struct gfs2_args 
*args, char *options)
return rv ? rv : -EINVAL;
}
break;
+   case Opt_err_withdraw:
+   args-ar_errors = GFS2_ERRORS_WITHDRAW;
+   break;
+   case Opt_err_panic:
+   if (args-ar_debug) {
+   fs_info(sdp, -o debug and -o errors=panic 
+   are mutually exclusive.\n);
+   return -EINVAL;
+   }
+   args-ar_errors = GFS2_ERRORS_PANIC;
+   break;
case Opt_error:
default:
fs_info(sdp, invalid mount option: %s\n, o);
@@ -1226,6 +1246,22 @@ static int gfs2_show_options(struct seq_file *s, struct 
vfsmount *mnt)
lfsecs = sdp-sd_tune.gt_log_flush_secs;
if (lfsecs != 60)
seq_printf(s, ,commit=%d, lfsecs);
+   if (args-ar_errors != GFS2_ERRORS_DEFAULT) {
+   const char *state;
+
+   switch (args-ar_errors) {
+   case GFS2_ERRORS_WITHDRAW:
+   state = withdraw;
+   break;
+   case GFS2_ERRORS_PANIC:
+   state = panic

[Cluster-devel] [PATCH GFS2] gfs2_stuffed_write_end modifying source buffer?

2009-09-23 Thread Bob Peterson

Hi,

Maybe I'm wrong, but this looks like a bug to me: It looks like
GFS2's function gfs2_stuffed_write_end is zeroing out portions
of the source buffer.  So if I create a character array and
filled it with X then wrote only one byte to a very
small file, all the other X's in my buffer would get nuked.
Just a theory at this point but perhaps Steve Whitehouse can tell.

Regards,

Bob Peterson
Red Hat File Systems
--
 fs/gfs2/aops.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 7ebae9a..6a23ba2 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -801,7 +801,6 @@ static int gfs2_stuffed_write_end(struct inode *inode, 
struct buffer_head *dibh,
BUG_ON((pos + len)  (dibh-b_size - sizeof(struct gfs2_dinode)));
kaddr = kmap_atomic(page, KM_USER0);
memcpy(buf + pos, kaddr + pos, copied);
-   memset(kaddr + pos + copied, 0, len - copied);
flush_dcache_page(page);
kunmap_atomic(kaddr, KM_USER0);

[Cluster-devel] [PATCH] GFS2: print glock numbers in hex

2010-02-23 Thread Bob Peterson

Hi,

This patch changes glock numbers from printing in decimal to hex.
Since DLM prints corresponding resource IDs in hex, it makes debugging
easier.

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson rpete...@redhat.com 
--
 fs/gfs2/glock.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 4773f90..454d4b4 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1658,7 +1658,7 @@ static int __dump_glock(struct seq_file *seq, const 
struct gfs2_glock *gl)
dtime *= 100/HZ; /* demote time in uSec */
if (!test_bit(GLF_DEMOTE, gl-gl_flags))
dtime = 0;
-   gfs2_print_dbg(seq, G:  s:%s n:%u/%llu f:%s t:%s d:%s/%llu a:%d 
r:%d\n,
+   gfs2_print_dbg(seq, G:  s:%s n:%u/%llx f:%s t:%s d:%s/%llu a:%d 
r:%d\n,
  state2str(gl-gl_state),
  gl-gl_name.ln_type,
  (unsigned long long)gl-gl_name.ln_number,

Re: [Cluster-devel] [Linux-cluster] Cluster 3.0.8 stable release

2010-02-24 Thread Bob Peterson

- Fabio M. Di Nitto fdini...@redhat.com wrote:
| -BEGIN PGP SIGNED MESSAGE-
| Hash: SHA1
| 
| The cluster team and its community are proud to announce the 3.0.8
| stable release from the STABLE3 branch.
| 
| This release contains several major bug fixes. We strongly recommend
| people to update your clusters.

Sorry, guys, but we've discovered a problem with mkfs.gfs2 that makes
it impossible to create gfs2 file systems.  The problem has been fixed
in the STABLE3 and master branches of the git repos, and 3.0.9 will be
released very soon.

Regards,

Bob Peterson
Red Hat File Systems

Re: [Cluster-devel] [PATCH] GFS2: Allow the number of committed revokes to temporarily be negative

2010-03-29 Thread Bob Peterson

- Benjamin Marzinski bmarz...@redhat.com wrote:
| GFS2 tracks the number of revokes and unrevokes that are part of
| committed
| transactions via sd_log_commited_revoke. It is possible for one
| process to add
| revokes during its transaction, while another process unrevokes them
| during its
| transaction. If the second process finishes its transaction first,
| sd_log_commited_revoke will be decremented by the number of unrevokes
| that the
| second process did, without first being incremented by the number of
| revokes
| the first process did. This is fine, since all started transactions
| must be
| completed before the journal can be flushed.  However,
| sd_log_commited_revoke
| is an unsigned integer, and log_refund() causes an assertion failure
| if it
| would go negative at the end of a transaction.  This patch makes
| sd_log_commited_revoke a signed integer and allows it to go negative.
| __gfs2_log_flush() still checks that it mataches the actual number of
| revokes. 
| 
| Signed-off-by: Benjamin Marzinski bmarz...@redhat.com
| ---
|  fs/gfs2/incore.h |2 +-
|  fs/gfs2/log.c|3 +--
|  2 files changed, 2 insertions(+), 3 deletions(-)

Hi,

ACKed by Bob Peterson rpete...@redhat.com

Regards,

Bob Peterson
Red Hat File Systems

[Cluster-devel] [PATCH GFS2] livelock while reclaiming unlinked dinodes

2010-04-14 Thread Bob Peterson

Hi,

Here is a patch for bugzilla bug #570182.  Explanation in the
patch.

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson rpete...@redhat.com 
--
Author: Bob Peterson b...@krishna.(none)
Date:   Tue Apr 13 08:49:33 2010 -0500

GFS2: glock livelock

This patch fixes a couple gfs2 problems with the reclaiming of
unlinked dinodes.  First, there were a couple of livelocks where
everything would come to a halt waiting for a glock that was
seemingly held by a process that no longer existed.  In fact, the
process did exist, it just had the wrong pid number in the holder
information.  Second, there was a lock ordering problem between
inode locking and glock locking.  Third, glock/inode contention
could sometimes cause inodes to be improperly marked invalid by
iget_failed.

rhbz#570182

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 297d7e5..5f1cc15 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -1507,7 +1507,7 @@ struct inode *gfs2_dir_search(struct inode *dir, const 
struct qstr *name)
inode = gfs2_inode_lookup(dir-i_sb, 
be16_to_cpu(dent-de_type),
be64_to_cpu(dent-de_inum.no_addr),
-   be64_to_cpu(dent-de_inum.no_formal_ino), 0);
+   be64_to_cpu(dent-de_inum.no_formal_ino));
brelse(bh);
return inode;
}
diff --git a/fs/gfs2/export.c b/fs/gfs2/export.c
index d15876e..d81bc7e 100644
--- a/fs/gfs2/export.c
+++ b/fs/gfs2/export.c
@@ -169,7 +169,7 @@ static struct dentry *gfs2_get_dentry(struct super_block 
*sb,
if (error)
goto fail;
 
-   inode = gfs2_inode_lookup(sb, DT_UNKNOWN, inum-no_addr, 0, 0);
+   inode = gfs2_inode_lookup(sb, DT_UNKNOWN, inum-no_addr, 0);
if (IS_ERR(inode)) {
error = PTR_ERR(inode);
goto fail;
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index c69d5fd..847892b 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -857,6 +857,9 @@ void gfs2_holder_reinit(unsigned int state, unsigned flags, 
struct gfs2_holder *
gh-gh_flags = flags;
gh-gh_iflags = 0;
gh-gh_ip = (unsigned long)__builtin_return_address(0);
+   if (gh-gh_owner_pid)
+   put_pid(gh-gh_owner_pid);
+   gh-gh_owner_pid = get_pid(task_pid(current));
 }
 
 /**
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 6380cd9..40c1ed0 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -160,7 +160,6 @@ void gfs2_set_iop(struct inode *inode)
  * @sb: The super block
  * @no_addr: The inode number
  * @type: The type of the inode
- * @skip_freeing: set this not return an inode if it is currently being freed.
  *
  * Returns: A VFS inode, or an error
  */
@@ -168,17 +167,14 @@ void gfs2_set_iop(struct inode *inode)
 struct inode *gfs2_inode_lookup(struct super_block *sb,
unsigned int type,
u64 no_addr,
-   u64 no_formal_ino, int skip_freeing)
+   u64 no_formal_ino)
 {
struct inode *inode;
struct gfs2_inode *ip;
struct gfs2_glock *io_gl;
int error;
 
-   if (skip_freeing)
-   inode = gfs2_iget_skip(sb, no_addr);
-   else
-   inode = gfs2_iget(sb, no_addr);
+   inode = gfs2_iget(sb, no_addr);
ip = GFS2_I(inode);
 
if (!inode)
@@ -236,13 +232,100 @@ fail_glock:
 fail_iopen:
gfs2_glock_put(io_gl);
 fail_put:
-   ip-i_gl-gl_object = NULL;
+   if (inode-i_state  I_NEW)
+   ip-i_gl-gl_object = NULL;
gfs2_glock_put(ip-i_gl);
 fail:
-   iget_failed(inode);
+   if (inode-i_state  I_NEW)
+   iget_failed(inode);
+   else
+   iput(inode);
return ERR_PTR(error);
 }
 
+/**
+ * gfs2_unlinked_inode_lookup - Lookup an unlinked inode for reclamation
+ * @sb: The super block
+ * no_addr: The inode number
+ * @@inode: A pointer to the inode found, if any
+ *
+ * Returns: 0 and *inode if no errors occurred.  If an error occurs,
+ *  the resulting *inode may or may not be NULL.
+ */
+
+int gfs2_unlinked_inode_lookup(struct super_block *sb, u64 no_addr,
+  struct inode **inode)
+{
+   struct gfs2_sbd *sdp;
+   struct gfs2_inode *ip;
+   struct gfs2_glock *io_gl;
+   int error;
+   struct gfs2_holder gh;
+
+   *inode = gfs2_iget_skip(sb, no_addr);
+
+   if (!(*inode))
+   return -ENOBUFS;
+
+   if (!((*inode)-i_state  I_NEW))
+   return -ENOBUFS;
+
+   ip = GFS2_I(*inode);
+   sdp = GFS2_SB(*inode);
+   ip-i_no_formal_ino = -1;
+
+   error = gfs2_glock_get(sdp, no_addr, gfs2_inode_glops, CREATE, 
ip-i_gl);
+   if (unlikely(error))
+   goto fail;
+   ip-i_gl-gl_object = ip;
+
+   error

[Cluster-devel] [GFS2 Patch] Eliminate useless err variable

2010-05-11 Thread Bob Peterson

Hi,

This patch removes an unneeded err variable that is always
returned as zero.

Regards,

Bob Peterson
Red Hat File Systems
--
 fs/gfs2/meta_io.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index abafda1..18176d0 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -34,7 +34,6 @@
 
 static int gfs2_aspace_writepage(struct page *page, struct writeback_control 
*wbc)
 {
-   int err;
struct buffer_head *bh, *head;
int nr_underway = 0;
int write_op = (1  BIO_RW_META) | ((wbc-sync_mode == WB_SYNC_ALL ?
@@ -86,11 +85,10 @@ static int gfs2_aspace_writepage(struct page *page, struct 
writeback_control *wb
} while (bh != head);
unlock_page(page);
 
-   err = 0;
if (nr_underway == 0)
end_page_writeback(page);
 
-   return err;
+   return 0;
 }
 
 const struct address_space_operations gfs2_meta_aops = {

[Cluster-devel] [GFS2 Patch] GFS2: stuck in inode wait, no glocks stuck

2010-05-11 Thread Bob Peterson

Hi,

This patch changes the lock ordering when gfs2 reclaims
unlinked dinodes, thereby avoiding a livelock.

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson rpete...@redhat.com 
--
GFS2: stuck in inode wait, no glocks stuck

This patch changes the lock ordering when gfs2 reclaims
unlinked dinodes, thereby avoiding a livelock.

rhbz#583737

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 3739155..8bce73e 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -952,16 +952,14 @@ static int try_rgrp_fit(struct gfs2_rgrpd *rgd, struct 
gfs2_alloc *al)
  *  The inode, if one has been found, in inode.
  */
 
-static int try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked,
-  u64 skip, struct inode **inode)
+static u64 try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked,
+  u64 skip)
 {
u32 goal = 0, block;
u64 no_addr;
struct gfs2_sbd *sdp = rgd-rd_sbd;
unsigned int n;
-   int error = 0;
 
-   *inode = NULL;
for(;;) {
if (goal = rgd-rd_data)
break;
@@ -981,10 +979,7 @@ static int try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 
*last_unlinked,
if (no_addr == skip)
continue;
*last_unlinked = no_addr;
-   error = gfs2_unlinked_inode_lookup(rgd-rd_sbd-sd_vfs,
-  no_addr, inode);
-   if (*inode || error)
-   return error;
+   return no_addr;
}
 
rgd-rd_flags = ~GFS2_RDF_CHECK;
@@ -1069,11 +1064,12 @@ static void forward_rgrp_set(struct gfs2_sbd *sdp, 
struct gfs2_rgrpd *rgd)
  * Try to acquire rgrp in way which avoids contending with others.
  *
  * Returns: errno
+ *  unlinked: the block address of an unlinked block to be reclaimed
  */
 
-static struct inode *get_local_rgrp(struct gfs2_inode *ip, u64 *last_unlinked)
+static int get_local_rgrp(struct gfs2_inode *ip, u64 *unlinked,
+ u64 *last_unlinked)
 {
-   struct inode *inode = NULL;
struct gfs2_sbd *sdp = GFS2_SB(ip-i_inode);
struct gfs2_rgrpd *rgd, *begin = NULL;
struct gfs2_alloc *al = ip-i_alloc;
@@ -1082,6 +1078,7 @@ static struct inode *get_local_rgrp(struct gfs2_inode 
*ip, u64 *last_unlinked)
int loops = 0;
int error, rg_locked;
 
+   *unlinked = 0;
rgd = gfs2_blk2rgrpd(sdp, ip-i_goal);
 
while (rgd) {
@@ -1103,29 +1100,19 @@ static struct inode *get_local_rgrp(struct gfs2_inode 
*ip, u64 *last_unlinked)
   because that would require an iput which can only
   happen after the rgrp is unlocked. */
if (!rg_locked  rgd-rd_flags  GFS2_RDF_CHECK)
-   error = try_rgrp_unlink(rgd, last_unlinked,
-   ip-i_no_addr, inode);
+   *unlinked = try_rgrp_unlink(rgd, last_unlinked,
+  ip-i_no_addr);
if (!rg_locked)
gfs2_glock_dq_uninit(al-al_rgd_gh);
-   if (inode) {
-   if (error) {
-   if (inode-i_state  I_NEW)
-   iget_failed(inode);
-   else
-   iput(inode);
-   return ERR_PTR(error);
-   }
-   return inode;
-   }
-   if (error)
-   return ERR_PTR(error);
+   if (*unlinked)
+   return -EAGAIN;
/* fall through */
case GLR_TRYFAILED:
rgd = recent_rgrp_next(rgd);
break;
 
default:
-   return ERR_PTR(error);
+   return error;
}
}
 
@@ -1148,22 +1135,12 @@ static struct inode *get_local_rgrp(struct gfs2_inode 
*ip, u64 *last_unlinked)
if (try_rgrp_fit(rgd, al))
goto out;
if (!rg_locked  rgd-rd_flags  GFS2_RDF_CHECK)
-   error = try_rgrp_unlink(rgd, last_unlinked,
-   ip-i_no_addr, inode);
+   *unlinked = try_rgrp_unlink(rgd, last_unlinked,
+   ip-i_no_addr);
if (!rg_locked)
gfs2_glock_dq_uninit(al-al_rgd_gh);
-   if (inode

[Cluster-devel] [GFS2 PATCH] Rework reclaiming unlinked dinodes

2010-05-20 Thread Bob Peterson

Hi,

The previous patch I wrote for reclaiming unlinked dinodes
had some shortcomings and did not prevent all hangs.
This version is much cleaner and more logical, and has
passed very difficult testing.  Sorry for the churn.

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson rpete...@redhat.com 
--
 fs/gfs2/inode.c |   54 +-
 fs/gfs2/inode.h |3 +--
 fs/gfs2/log.c   |2 +-
 fs/gfs2/log.h   |   29 +++--
 fs/gfs2/rgrp.c  |   20 
 5 files changed, 54 insertions(+), 54 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 51d8061..b5612cb 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -242,34 +242,38 @@ fail:
 }
 
 /**
- * gfs2_unlinked_inode_lookup - Lookup an unlinked inode for reclamation
+ * gfs2_process_unlinked_inode - Lookup an unlinked inode for reclamation
+ *   and try to reclaim it by doing iput.
+ *
+ * This function assumes no rgrp locks are currently held.
+ *
  * @sb: The super block
  * no_addr: The inode number
- * @@inode: A pointer to the inode found, if any
  *
- * Returns: 0 and *inode if no errors occurred.  If an error occurs,
- *  the resulting *inode may or may not be NULL.
  */
 
-int gfs2_unlinked_inode_lookup(struct super_block *sb, u64 no_addr,
-  struct inode **inode)
+void gfs2_process_unlinked_inode(struct super_block *sb, u64 no_addr)
 {
struct gfs2_sbd *sdp;
struct gfs2_inode *ip;
struct gfs2_glock *io_gl;
int error;
struct gfs2_holder gh;
+   struct inode *inode;
 
-   *inode = gfs2_iget_skip(sb, no_addr);
+   inode = gfs2_iget_skip(sb, no_addr);
 
-   if (!(*inode))
-   return -ENOBUFS;
+   if (!inode)
+   return;
 
-   if (!((*inode)-i_state  I_NEW))
-   return -ENOBUFS;
+   /* If it's not a new inode, someone's using it, so leave it alone. */
+   if (!(inode-i_state  I_NEW)) {
+   iput(inode);
+   return;
+   }
 
-   ip = GFS2_I(*inode);
-   sdp = GFS2_SB(*inode);
+   ip = GFS2_I(inode);
+   sdp = GFS2_SB(inode);
ip-i_no_formal_ino = -1;
 
error = gfs2_glock_get(sdp, no_addr, gfs2_inode_glops, CREATE, 
ip-i_gl);
@@ -284,15 +288,13 @@ int gfs2_unlinked_inode_lookup(struct super_block *sb, 
u64 no_addr,
set_bit(GIF_INVALID, ip-i_flags);
error = gfs2_glock_nq_init(io_gl, LM_ST_SHARED, LM_FLAG_TRY | GL_EXACT,
   ip-i_iopen_gh);
-   if (unlikely(error)) {
-   if (error == GLR_TRYFAILED)
-   error = 0;
+   if (unlikely(error))
goto fail_iopen;
-   }
+
ip-i_iopen_gh.gh_gl-gl_object = ip;
gfs2_glock_put(io_gl);
 
-   (*inode)-i_mode = DT2IF(DT_UNKNOWN);
+   inode-i_mode = DT2IF(DT_UNKNOWN);
 
/*
 * We must read the inode in order to work out its type in
@@ -303,16 +305,17 @@ int gfs2_unlinked_inode_lookup(struct super_block *sb, 
u64 no_addr,
 */
error = gfs2_glock_nq_init(ip-i_gl, LM_ST_EXCLUSIVE, LM_FLAG_TRY,
   gh);
-   if (unlikely(error)) {
-   if (error == GLR_TRYFAILED)
-   error = 0;
+   if (unlikely(error))
goto fail_glock;
-   }
+
/* Inode is now uptodate */
gfs2_glock_dq_uninit(gh);
-   gfs2_set_iop(*inode);
+   gfs2_set_iop(inode);
+
+   /* The iput will cause it to be deleted. */
+   iput(inode);
+   return;
 
-   return 0;
 fail_glock:
gfs2_glock_dq(ip-i_iopen_gh);
 fail_iopen:
@@ -321,7 +324,8 @@ fail_put:
ip-i_gl-gl_object = NULL;
gfs2_glock_put(ip-i_gl);
 fail:
-   return error;
+   iget_failed(inode);
+   return;
 }
 
 static int gfs2_dinode_in(struct gfs2_inode *ip, const void *buf)
diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h
index e161461..300ada3 100644
--- a/fs/gfs2/inode.h
+++ b/fs/gfs2/inode.h
@@ -84,8 +84,7 @@ static inline void gfs2_inum_out(const struct gfs2_inode *ip,
 extern void gfs2_set_iop(struct inode *inode);
 extern struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned type, 
   u64 no_addr, u64 no_formal_ino);
-extern int gfs2_unlinked_inode_lookup(struct super_block *sb, u64 no_addr,
- struct inode **inode);
+extern void gfs2_process_unlinked_inode(struct super_block *sb, u64 no_addr);
 extern struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr);
 
 extern int gfs2_inode_refresh(struct gfs2_inode *ip);
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index b593f0e..6a857e2 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -696,7 +696,7 @@ static void gfs2_ordered_wait(struct gfs2_sbd *sdp)
  *
  */
 
-void __gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl

[Cluster-devel] [PATCH GFS2] Fix kernel NULL pointer dereference by dlm_astd

2010-06-15 Thread Bob Peterson

Hi,

This patch fixes a problem in an error path when looking
up dinodes.  There are two sister-functions, gfs2_inode_lookup
and gfs2_process_unlinked_inode.  Both functions acquire and
hold the i_iopen glock for the dinode being looked up. The last
thing they try to do is hold the i_gl glock for the dinode.
If that glock fails for some reason, the error path was
incorrectly calling gfs2_glock_put for the i_iopen glock twice.
This resulted in the glock being prematurely freed.  The
minimum hold time usually kept the glock in memory, but the
lock interface to dlm (aka lock_dlm) freed its memory for the
glock.  In some circumstances, it would cause dlm's dlm_astd daemon
to try to call the bast function for the freed lock_dlm memory,
which resulted in a NULL pointer dereference.

This problem was discovered while testing bugzilla bug #595397.

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson rpete...@redhat.com 
--
 fs/gfs2/inode.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index b5612cb..43e06ff 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -197,8 +197,6 @@ struct inode *gfs2_inode_lookup(struct super_block *sb,
goto fail_iopen;
ip-i_iopen_gh.gh_gl-gl_object = ip;
 
-   gfs2_glock_put(io_gl);
-
if ((type == DT_UNKNOWN)  (no_formal_ino == 0))
goto gfs2_nfsbypass;
 
@@ -224,6 +222,8 @@ struct inode *gfs2_inode_lookup(struct super_block *sb,
}
 
 gfs2_nfsbypass:
+   gfs2_glock_put(io_gl);
+
return inode;
 fail_glock:
gfs2_glock_dq(ip-i_iopen_gh);
@@ -292,7 +292,6 @@ void gfs2_process_unlinked_inode(struct super_block *sb, 
u64 no_addr)
goto fail_iopen;
 
ip-i_iopen_gh.gh_gl-gl_object = ip;
-   gfs2_glock_put(io_gl);
 
inode-i_mode = DT2IF(DT_UNKNOWN);
 
@@ -310,6 +309,7 @@ void gfs2_process_unlinked_inode(struct super_block *sb, 
u64 no_addr)
 
/* Inode is now uptodate */
gfs2_glock_dq_uninit(gh);
+   gfs2_glock_put(io_gl);
gfs2_set_iop(inode);
 
/* The iput will cause it to be deleted. */

Re: [Cluster-devel] [PATCH GFS2] Fix kernel NULL pointer dereference by dlm_astd

2010-06-16 Thread Bob Peterson

- Steven Whitehouse swhit...@redhat.com wrote:
| Hi,
| 
| Now in the -nmw GFS2 tree. Thanks,
| 
| Steve.
| 
| On Tue, 2010-06-15 at 12:07 -0400, Bob Peterson wrote:
|  Hi,
|  
|  This patch fixes a problem in an error path when looking
|  up dinodes.  There are two sister-functions, gfs2_inode_lookup
|  and gfs2_process_unlinked_inode.  Both functions acquire and
|  hold the i_iopen glock for the dinode being looked up. The last
|  thing they try to do is hold the i_gl glock for the dinode.
|  If that glock fails for some reason, the error path was
|  incorrectly calling gfs2_glock_put for the i_iopen glock twice.
|  This resulted in the glock being prematurely freed.  The
|  minimum hold time usually kept the glock in memory, but the
|  lock interface to dlm (aka lock_dlm) freed its memory for the
|  glock.  In some circumstances, it would cause dlm's dlm_astd daemon
|  to try to call the bast function for the freed lock_dlm memory,
|  which resulted in a NULL pointer dereference.
|  
|  This problem was discovered while testing bugzilla bug #595397.
|  
|  Regards,
|  
|  Bob Peterson
|  Red Hat GFS

Hi,

Actually, it's not yet in the -nmw git tree.  I think Steve W.
forgot to push it before he left on holiday.  At any rate, that's a
good thing because my testing has uncovered a possible problem
with this patch.  I'm planning to rework it and re-post when
I get a stable version.

Regards,

Bob Peterson
Red Hat File Systems

[Cluster-devel] [GFS2 Patch] GFS2: O_TRUNC not working on stuffed files across cluster

2010-06-24 Thread Bob Peterson

Hi,

This patch replaces a statement that got dropped out by accident.
Without the patch, truncates on stuffed (very small) files cause
those files to have an unpredictable size.

Regards,

Bob Peterson
Red Hat File Systems


Signed-off-by: Bob Peterson rpete...@redhat.com
--
 fs/gfs2/bmap.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 0db0cd9..a7b1c7c 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1042,6 +1042,7 @@ static int trunc_start(struct gfs2_inode *ip, u64 size)
 
if (gfs2_is_stuffed(ip)) {
u64 dsize = size + sizeof(struct gfs2_inode);
+   ip-i_disksize = size;
ip-i_inode.i_mtime = ip-i_inode.i_ctime = CURRENT_TIME;
gfs2_trans_add_bh(ip-i_gl, dibh, 1);
gfs2_dinode_out(ip, dibh-b_data);

[Cluster-devel] [GFS2 Patch] Simplify gfs2_write_alloc_required

2010-06-24 Thread Bob Peterson

Hi,

Here is a patch for a clean up I spotted:

Function gfs2_write_alloc_required always returned zero as its
return code.  Therefore, it doesn't need to return a return code
at all.  Given that, we can use the return value to return whether
or not the dinode needs block allocations rather than passing
that value in, which in turn simplifies a bunch of error checking.

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson rpete...@redhat.com 
--
 fs/gfs2/aops.c  |4 +---
 fs/gfs2/bmap.c  |   15 +--
 fs/gfs2/bmap.h  |2 +-
 fs/gfs2/file.c  |4 +---
 fs/gfs2/quota.c |   15 +++
 fs/gfs2/super.c |9 +++--
 6 files changed, 14 insertions(+), 35 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 9485a88..5e96cbd 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -634,9 +634,7 @@ static int gfs2_write_begin(struct file *file, struct 
address_space *mapping,
}
}
 
-   error = gfs2_write_alloc_required(ip, pos, len, alloc_required);
-   if (error)
-   goto out_unlock;
+   alloc_required = gfs2_write_alloc_required(ip, pos, len);
 
if (alloc_required || gfs2_is_jdata(ip))
gfs2_write_calc_reserv(ip, len, data_blocks, ind_blocks);
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 4a48c0f..11837fc 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1243,13 +1243,12 @@ int gfs2_file_dealloc(struct gfs2_inode *ip)
  * @ip: the file being written to
  * @offset: the offset to write to
  * @len: the number of bytes being written
- * @alloc_required: set to 1 if an alloc is required, 0 otherwise
  *
- * Returns: errno
+ * Returns: 1 if an alloc is required, 0 otherwise
  */
 
 int gfs2_write_alloc_required(struct gfs2_inode *ip, u64 offset,
- unsigned int len, int *alloc_required)
+ unsigned int len)
 {
struct gfs2_sbd *sdp = GFS2_SB(ip-i_inode);
struct buffer_head bh;
@@ -1257,26 +1256,23 @@ int gfs2_write_alloc_required(struct gfs2_inode *ip, 
u64 offset,
u64 lblock, lblock_stop, size;
u64 end_of_file;
 
-   *alloc_required = 0;
-
if (!len)
return 0;
 
if (gfs2_is_stuffed(ip)) {
if (offset + len 
sdp-sd_sb.sb_bsize - sizeof(struct gfs2_dinode))
-   *alloc_required = 1;
+   return 1;
return 0;
}
 
-   *alloc_required = 1;
shift = sdp-sd_sb.sb_bsize_shift;
BUG_ON(gfs2_is_dir(ip));
end_of_file = (ip-i_disksize + sdp-sd_sb.sb_bsize - 1)  shift;
lblock = offset  shift;
lblock_stop = (offset + len + sdp-sd_sb.sb_bsize - 1)  shift;
if (lblock_stop  end_of_file)
-   return 0;
+   return 1;
 
size = (lblock_stop - lblock)  shift;
do {
@@ -1284,12 +1280,11 @@ int gfs2_write_alloc_required(struct gfs2_inode *ip, 
u64 offset,
bh.b_size = size;
gfs2_block_map(ip-i_inode, lblock, bh, 0);
if (!buffer_mapped(bh))
-   return 0;
+   return 1;
size -= bh.b_size;
lblock += (bh.b_size  ip-i_inode.i_blkbits);
} while(size  0);
 
-   *alloc_required = 0;
return 0;
 }
 
diff --git a/fs/gfs2/bmap.h b/fs/gfs2/bmap.h
index c983177..a20a521 100644
--- a/fs/gfs2/bmap.h
+++ b/fs/gfs2/bmap.h
@@ -52,6 +52,6 @@ int gfs2_truncatei(struct gfs2_inode *ip, u64 size);
 int gfs2_truncatei_resume(struct gfs2_inode *ip);
 int gfs2_file_dealloc(struct gfs2_inode *ip);
 int gfs2_write_alloc_required(struct gfs2_inode *ip, u64 offset,
- unsigned int len, int *alloc_required);
+ unsigned int len);
 
 #endif /* __BMAP_DOT_H__ */
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index ed9a94f..4edd662 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -351,7 +351,6 @@ static int gfs2_page_mkwrite(struct vm_area_struct *vma, 
struct vm_fault *vmf)
unsigned long last_index;
u64 pos = page-index  PAGE_CACHE_SHIFT;
unsigned int data_blocks, ind_blocks, rblocks;
-   int alloc_required = 0;
struct gfs2_holder gh;
struct gfs2_alloc *al;
int ret;
@@ -364,8 +363,7 @@ static int gfs2_page_mkwrite(struct vm_area_struct *vma, 
struct vm_fault *vmf)
set_bit(GLF_DIRTY, ip-i_gl-gl_flags);
set_bit(GIF_SW_PAGED, ip-i_flags);
 
-   ret = gfs2_write_alloc_required(ip, pos, PAGE_CACHE_SIZE, 
alloc_required);
-   if (ret || !alloc_required)
+   if (!gfs2_write_alloc_required(ip, pos, PAGE_CACHE_SIZE))
goto out_unlock;
ret = -ENOMEM;
al = gfs2_alloc_get(ip);
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 49667d6..b0954ea 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -789,15 +789,9 @@ static int do_sync(unsigned int num_qd, struct

[Cluster-devel] [GFS2 Patch] GFS2: rename causes kernel Oops

2010-07-14 Thread Bob Peterson

Hi,

This patch fixes a kernel Oops in the GFS2 rename code.

The problem was in the way the gfs2 directory code was trying
to re-use sentinel directory entries.  

In the failing case, gfs2's rename function was renaming a
file to another name that had the same non-trivial length.
The file being renamed happened to be the first directory
entry on the leaf block.

First, the rename code (gfs2_rename in ops_inode.c) found the
original directory entry and decided it could do its job by
simply replacing the directory entry with another.  Therefore
it determined correctly that no block allocations were needed.

Next, the rename code deleted the old directory entry prior to
replacing it with the new name.  Therefore, the soon-to-be
replaced directory entry was temporarily made into a directory
entry sentinel or a place holder at the start of a leaf block.

Lastly, it went to re-add the replacement directory entry in
that leaf block.  However, when gfs2_dirent_find_space was
looking for space in the leaf block, it used the wrong value
for the sentinel.  That threw off its calculations so later
it decides it can't really re-use the sentinel and therefore
must allocate a new leaf block.  But because it previously decided
to re-use the directory entry, it didn't waste the time to
grab a new block allocation for the inode.  Therefore, the
inode's i_alloc pointer was still NULL and it crashes trying to
reference it.

In the case of sentinel directory entries, the entire dirent is
reused, not just the free space portion of it, and therefore
the function gfs2_dirent_find_space should use the value 0
rather than GFS2_DIRENT_SIZE(0) for the actual dirent size.

Fixing this calculation enables the reproducer programs to work
properly.

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson rpete...@redhat.com 
--
 fs/gfs2/dir.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 8295c5b..26ca336 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -392,7 +392,7 @@ static int gfs2_dirent_find_space(const struct gfs2_dirent 
*dent,
unsigned totlen = be16_to_cpu(dent-de_rec_len);
 
if (gfs2_dirent_sentinel(dent))
-   actual = GFS2_DIRENT_SIZE(0);
+   actual = 0;
if (totlen - actual = required)
return 1;
return 0;

[Cluster-devel] [GFS2 Patch] fsck.gfs2 reported statfs error after gfs2_grow

2010-12-07 Thread Bob Peterson

Hi,

When you do gfs2_grow it failed to take the very last
rgrp into account when adding up the new free space due
to an off-by-one error.  It was not reading the last
rgrp from the rindex because of a check for = that
should have been .  Therefore, fsck.gfs2 was finding
(and fixing) an error with the system statfs file.

Regards,

Bob Peterson
Red Hat GFS

Signed-off-by: Bob Peterson rpete...@redhat.com 
--
 fs/gfs2/rgrp.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 25dbe5c..7293ea2 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -500,7 +500,7 @@ u64 gfs2_ri_total(struct gfs2_sbd *sdp)
for (rgrps = 0;; rgrps++) {
loff_t pos = rgrps * sizeof(struct gfs2_rindex);
 
-   if (pos + sizeof(struct gfs2_rindex) = i_size_read(inode))
+   if (pos + sizeof(struct gfs2_rindex)  i_size_read(inode))
break;
error = gfs2_internal_read(ip, ra_state, buf, pos,
   sizeof(struct gfs2_rindex));

[Cluster-devel] [PATCH][GFS2] Bouncing locks in a cluster is slow in GFS2

2011-01-26 Thread Bob Peterson

Hi,

This patch is a performance improvement for GFS2 in a clustered
environment.  It makes the glock hold time self-adjusting.

Regards,

Bob Peterson
Red Hat File Systems

Signed-off-by: Bob Peterson rpete...@redhat.com 

Bouncing locks in a cluster is slow in GFS2
--
 fs/gfs2/glock.c  |   89 --
 fs/gfs2/glock.h  |6 
 fs/gfs2/glops.c  |2 -
 fs/gfs2/incore.h |2 +-
 4 files changed, 73 insertions(+), 26 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index c75d499..117d8e2 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -58,7 +58,6 @@ static int __dump_glock(struct seq_file *seq, const struct 
gfs2_glock *gl);
 static void do_xmote(struct gfs2_glock *gl, struct gfs2_holder *gh, unsigned 
int target);
 
 static struct dentry *gfs2_root;
-static struct workqueue_struct *glock_workqueue;
 struct workqueue_struct *gfs2_delete_workqueue;
 static LIST_HEAD(lru_list);
 static atomic_t lru_count = ATOMIC_INIT(0);
@@ -67,9 +66,23 @@ static DEFINE_SPINLOCK(lru_lock);
 #define GFS2_GL_HASH_SHIFT  15
 #define GFS2_GL_HASH_SIZE   (1  GFS2_GL_HASH_SHIFT)
 #define GFS2_GL_HASH_MASK   (GFS2_GL_HASH_SIZE - 1)
+#define GL_WORKQUEUES0x2
+#define GL_WQ_MASK   0x1
 
 static struct hlist_bl_head gl_hash_table[GFS2_GL_HASH_SIZE];
 static struct dentry *gfs2_root;
+static struct workqueue_struct *glock_workqueue[GL_WORKQUEUES];
+
+static inline int qwork(struct gfs2_glock *gl, unsigned long delay)
+{
+   struct workqueue_struct *wq;
+
+   wq = glock_workqueue[gl-gl_name.ln_type  GL_WQ_MASK];
+
+   if (gl-gl_name.ln_type != LM_TYPE_INODE)
+   delay = 0;
+   return queue_delayed_work(wq, gl-gl_work, delay);
+}
 
 /**
  * gl_hash() - Turn glock number into hash bucket number
@@ -407,6 +420,10 @@ static void state_change(struct gfs2_glock *gl, unsigned 
int new_state)
if (held1  held2  list_empty(gl-gl_holders))
clear_bit(GLF_QUEUED, gl-gl_flags);
 
+   if (new_state != gl-gl_target)
+   /* shorten our minimum hold time */
+   gl-gl_hold_time = max(gl-gl_hold_time - GL_GLOCK_HOLD_DECR,
+  GL_GLOCK_MIN_HOLD);
gl-gl_state = new_state;
gl-gl_tchange = jiffies;
 }
@@ -550,7 +567,7 @@ __acquires(gl-gl_spin)
GLOCK_BUG_ON(gl, ret);
} else { /* lock_nolock */
finish_xmote(gl, target);
-   if (queue_delayed_work(glock_workqueue, gl-gl_work, 0) == 0)
+   if (qwork(gl, 0) == 0)
gfs2_glock_put(gl);
}
 
@@ -623,7 +640,7 @@ out_sched:
clear_bit(GLF_LOCK, gl-gl_flags);
smp_mb__after_clear_bit();
gfs2_glock_hold(gl);
-   if (queue_delayed_work(glock_workqueue, gl-gl_work, 0) == 0)
+   if (qwork(gl, 0) == 0)
gfs2_glock_put_nolock(gl);
return;
 
@@ -670,15 +687,14 @@ static void glock_work_func(struct work_struct *work)
gl-gl_state != LM_ST_UNLOCKED 
gl-gl_demote_state != LM_ST_EXCLUSIVE) {
unsigned long holdtime, now = jiffies;
-   holdtime = gl-gl_tchange + gl-gl_ops-go_min_hold_time;
+   holdtime = gl-gl_tchange + gl-gl_hold_time;
if (time_before(now, holdtime))
delay = holdtime - now;
set_bit(delay ? GLF_PENDING_DEMOTE : GLF_DEMOTE, gl-gl_flags);
}
run_queue(gl, 0);
spin_unlock(gl-gl_spin);
-   if (!delay ||
-   queue_delayed_work(glock_workqueue, gl-gl_work, delay) == 0)
+   if (!delay || qwork(gl, delay) == 0)
gfs2_glock_put(gl);
if (drop_ref)
gfs2_glock_put(gl);
@@ -741,6 +757,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
gl-gl_tchange = jiffies;
gl-gl_object = NULL;
gl-gl_sbd = sdp;
+   gl-gl_hold_time = GL_GLOCK_DFT_HOLD;
INIT_DELAYED_WORK(gl-gl_work, glock_work_func);
INIT_WORK(gl-gl_delete, delete_work_func);
 
@@ -852,8 +869,15 @@ static int gfs2_glock_demote_wait(void *word)
 
 static void wait_on_holder(struct gfs2_holder *gh)
 {
+   unsigned long time1 = jiffies;
+
might_sleep();
wait_on_bit(gh-gh_iflags, HIF_WAIT, gfs2_glock_holder_wait, 
TASK_UNINTERRUPTIBLE);
+   if (time_after(jiffies, time1 + HZ)) /* have we waited  a second? */
+   /* Lengthen the minimum hold time. */
+   gh-gh_gl-gl_hold_time = min(gh-gh_gl-gl_hold_time +
+ GL_GLOCK_HOLD_INCR,
+ GL_GLOCK_MAX_HOLD);
 }
 
 static void wait_on_demote(struct gfs2_glock *gl)
@@ -1087,8 +,8 @@ void gfs2_glock_dq(struct gfs2_holder *gh)
gfs2_glock_hold(gl);
if (test_bit(GLF_PENDING_DEMOTE, gl-gl_flags) 
!test_bit(GLF_DEMOTE, gl-gl_flags))
-   delay = gl

Re: [Cluster-devel] [PATCH][GFS2] Bouncing locks in a cluster is slow in GFS2 - Try #2

2011-01-26 Thread Bob Peterson

- Original Message -
| Hi,
| 
| You shouldn't need to do the dual workqueue trick upstream since the
| workqueues will already start as many threads as required (so even
| better than just using the two here). If that isn't happening we
| should
| ask Tejun about it.
| 
| So I think we only need the min hold time bit here,
| 
| Steve.
Hi,

Based on your feedback, I reworked this patch a bit.  This is take two.
Rather than using a qwork function to police the delay based on
glock type, I went back to the original queue_delayed_work and
adjusted delay based on glock type.  So this version has fewer lines
changed, but still accomplishes the same thing.

Note that I had to tweak function glock_work_func in order to
preserve the rather odd logic that decides whether or not to
actually queue the work.

This patch is a performance improvement for GFS2 in a clustered
environment. It makes the glock hold time self-adjusting.

Regards,

Bob Peterson
Red Hat File Systems

Signed-off-by: Bob Peterson rpete...@redhat.com

Bouncing locks in a cluster is slow in GFS2
--
[Cluster-devel] [PATCH][GFS2] Bouncing locks in a cluster is slow in GFS2

 fs/gfs2/glock.c  |   39 +--
 fs/gfs2/glock.h  |6 ++
 fs/gfs2/glops.c  |2 --
 fs/gfs2/incore.h |2 +-
 4 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index c75d499..0523b20 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -407,6 +407,10 @@ static void state_change(struct gfs2_glock *gl, unsigned 
int new_state)
if (held1  held2  list_empty(gl-gl_holders))
clear_bit(GLF_QUEUED, gl-gl_flags);
 
+   if (new_state != gl-gl_target)
+   /* shorten our minimum hold time */
+   gl-gl_hold_time = max(gl-gl_hold_time - GL_GLOCK_HOLD_DECR,
+  GL_GLOCK_MIN_HOLD);
gl-gl_state = new_state;
gl-gl_tchange = jiffies;
 }
@@ -670,16 +674,21 @@ static void glock_work_func(struct work_struct *work)
gl-gl_state != LM_ST_UNLOCKED 
gl-gl_demote_state != LM_ST_EXCLUSIVE) {
unsigned long holdtime, now = jiffies;
-   holdtime = gl-gl_tchange + gl-gl_ops-go_min_hold_time;
+   holdtime = gl-gl_tchange + gl-gl_hold_time;
if (time_before(now, holdtime))
delay = holdtime - now;
set_bit(delay ? GLF_PENDING_DEMOTE : GLF_DEMOTE, gl-gl_flags);
}
run_queue(gl, 0);
spin_unlock(gl-gl_spin);
-   if (!delay ||
-   queue_delayed_work(glock_workqueue, gl-gl_work, delay) == 0)
+   if (!delay)
gfs2_glock_put(gl);
+   else {
+   if (gl-gl_name.ln_type != LM_TYPE_INODE)
+   delay = 0;
+   if (queue_delayed_work(glock_workqueue, gl-gl_work, delay) == 
0)
+   gfs2_glock_put(gl);
+   }
if (drop_ref)
gfs2_glock_put(gl);
 }
@@ -741,6 +750,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
gl-gl_tchange = jiffies;
gl-gl_object = NULL;
gl-gl_sbd = sdp;
+   gl-gl_hold_time = GL_GLOCK_DFT_HOLD;
INIT_DELAYED_WORK(gl-gl_work, glock_work_func);
INIT_WORK(gl-gl_delete, delete_work_func);
 
@@ -852,8 +862,15 @@ static int gfs2_glock_demote_wait(void *word)
 
 static void wait_on_holder(struct gfs2_holder *gh)
 {
+   unsigned long time1 = jiffies;
+
might_sleep();
wait_on_bit(gh-gh_iflags, HIF_WAIT, gfs2_glock_holder_wait, 
TASK_UNINTERRUPTIBLE);
+   if (time_after(jiffies, time1 + HZ)) /* have we waited  a second? */
+   /* Lengthen the minimum hold time. */
+   gh-gh_gl-gl_hold_time = min(gh-gh_gl-gl_hold_time +
+ GL_GLOCK_HOLD_INCR,
+ GL_GLOCK_MAX_HOLD);
 }
 
 static void wait_on_demote(struct gfs2_glock *gl)
@@ -1086,8 +1103,9 @@ void gfs2_glock_dq(struct gfs2_holder *gh)
 
gfs2_glock_hold(gl);
if (test_bit(GLF_PENDING_DEMOTE, gl-gl_flags) 
-   !test_bit(GLF_DEMOTE, gl-gl_flags))
-   delay = gl-gl_ops-go_min_hold_time;
+   !test_bit(GLF_DEMOTE, gl-gl_flags) 
+   gl-gl_name.ln_type == LM_TYPE_INODE)
+   delay = gl-gl_hold_time;
if (queue_delayed_work(glock_workqueue, gl-gl_work, delay) == 0)
gfs2_glock_put(gl);
 }
@@ -1270,12 +1288,13 @@ void gfs2_glock_cb(struct gfs2_glock *gl, unsigned int 
state)
unsigned long now = jiffies;
 
gfs2_glock_hold(gl);
-   holdtime = gl-gl_tchange + gl-gl_ops-go_min_hold_time;
-   if (test_bit(GLF_QUEUED, gl-gl_flags)) {
+   holdtime = gl-gl_tchange + gl-gl_hold_time;
+   if (test_bit(GLF_QUEUED, gl-gl_flags) 
+   gl-gl_name.ln_type == LM_TYPE_INODE) {
if (time_before(now, holdtime

Re: [Cluster-devel] [Linux-cluster] hi,question about gfs2

2011-02-22 Thread Bob Peterson

- Original Message -
| 1.if i can deploy gfs2 on fedora12. if it is ok to build from source
| code ?

Yes you can.  However, as Andrew said, it's probably a mistake.
You're better off using Fedora 14 where the code base is newer
and it will be supported longer.

You can build it from source, but find a source that's compatible
with your kernel may be a challenge.  GFS2 has advanced to match
ongoing kernel development.

You can build the Fedora 12 kernel from source RPMS, but you're
likely going to encounter bugs that have already been fixed by
later revisions.

If you go with Fedora 14, it may be easier to compile the latest
source from the GFS2 kernel git repo.

| 2.the max node gfs2 can manger? i use san, if i have 100 machines,if
| gfs2 can work over those nodes?
| 
| thanks

GFS2 does not care how many nodes are in your cluster.
The only thing that cares is the rest of the cluster infrastructure.
However, we don't recommend that many nodes for various reasons.
For one thing, your network may be clogged with lots of traffic,
which may interfere with proper cluster communications.

Regards,

Bob Peterson
Red Hat File Systems

[Cluster-devel] [GFS2 PATCH] GFS2: unlink performance patch

2011-02-23 Thread Bob Peterson

Hi,

This patch is a performance improvement to GFS2's unlink code.
Rather than update the quota file and statfs file for every
single block that's stripped off in unlink function do_strip,
this patch keeps track and updates them once for every layer
that's stripped.  This is done entirely inside the existing
transaction, so there should be no risk of corruption.
The other functions that deallocate blocks will be unaffected
because they are using wrapper functions that do the same
thing that they do today.

I tested this code on my roth cluster by creating 200
files in a directory, each of which is 100MB, then on
four nodes, I simultaneously deleted the files, thus competing
for GFS2 resources (but different files).  The commands
I used were:

[root@roth-01]# time for i in `seq 1 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; 
done
[root@roth-02]# time for i in `seq 2 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; 
done
[root@roth-03]# time for i in `seq 3 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; 
done
[root@roth-05]# time for i in `seq 4 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; 
done

The performance increase was significant:

 roth-01 roth-02 roth-03 roth-05
 -   -   -   -
old: real0m34.0270m25.021s   0m23.906s   0m35.646s
new: real0m22.379s   0m24.362s   0m24.133s   0m18.562s

Total time spent deleting:
old: 118.6s
new:  89.4

For this particular case, this showed a 25% performance increase for
GFS2 unlinks.

Regards,

Bob Peterson
Red Hat File Systems

Signed-off-by: Bob Peterson rpete...@redhat.com 

--
 fs/gfs2/bmap.c |   20 +++-
 fs/gfs2/rgrp.c |   34 +++---
 fs/gfs2/rgrp.h |2 ++
 3 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 3c4039d..ef3dc4b 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -21,6 +21,7 @@
 #include meta_io.h
 #include quota.h
 #include rgrp.h
+#include super.h
 #include trans.h
 #include dir.h
 #include util.h
@@ -757,7 +758,7 @@ static int do_strip(struct gfs2_inode *ip, struct 
buffer_head *dibh,
struct gfs2_sbd *sdp = GFS2_SB(ip-i_inode);
struct gfs2_rgrp_list rlist;
u64 bn, bstart;
-   u32 blen;
+   u32 blen, btotal;
__be64 *p;
unsigned int rg_blocks = 0;
int metadata;
@@ -839,6 +840,7 @@ static int do_strip(struct gfs2_inode *ip, struct 
buffer_head *dibh,
 
bstart = 0;
blen = 0;
+   btotal = 0;
 
for (p = top; p  bottom; p++) {
if (!*p)
@@ -851,9 +853,11 @@ static int do_strip(struct gfs2_inode *ip, struct 
buffer_head *dibh,
else {
if (bstart) {
if (metadata)
-   gfs2_free_meta(ip, bstart, blen);
+   __gfs2_free_meta(ip, bstart, blen);
else
-   gfs2_free_data(ip, bstart, blen);
+   __gfs2_free_data(ip, bstart, blen);
+
+   btotal += blen;
}
 
bstart = bn;
@@ -865,11 +869,17 @@ static int do_strip(struct gfs2_inode *ip, struct 
buffer_head *dibh,
}
if (bstart) {
if (metadata)
-   gfs2_free_meta(ip, bstart, blen);
+   __gfs2_free_meta(ip, bstart, blen);
else
-   gfs2_free_data(ip, bstart, blen);
+   __gfs2_free_data(ip, bstart, blen);
+
+   btotal += blen;
}
 
+   gfs2_statfs_change(sdp, 0, +btotal, 0);
+   gfs2_quota_change(ip, -(s64)btotal, ip-i_inode.i_uid,
+ ip-i_inode.i_gid);
+
ip-i_inode.i_mtime = ip-i_inode.i_ctime = CURRENT_TIME;
 
gfs2_dinode_out(ip, dibh-b_data);
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 7293ea2..cf930cd 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1602,7 +1602,7 @@ rgrp_error:
  *
  */
 
-void gfs2_free_data(struct gfs2_inode *ip, u64 bstart, u32 blen)
+void __gfs2_free_data(struct gfs2_inode *ip, u64 bstart, u32 blen)
 {
struct gfs2_sbd *sdp = GFS2_SB(ip-i_inode);
struct gfs2_rgrpd *rgd;
@@ -1617,7 +1617,21 @@ void gfs2_free_data(struct gfs2_inode *ip, u64 bstart, 
u32 blen)
gfs2_rgrp_out(rgd, rgd-rd_bits[0].bi_bh-b_data);
 
gfs2_trans_add_rg(rgd);
+}
 
+/**
+ * gfs2_free_data - free a contiguous run of data block(s)
+ * @ip: the inode these blocks are being freed from
+ * @bstart: first block of a run of contiguous blocks
+ * @blen: the length of the block run
+ *
+ */
+
+void gfs2_free_data(struct gfs2_inode *ip, u64 bstart, u32 blen)
+{
+   struct gfs2_sbd *sdp = GFS2_SB(ip-i_inode);
+
+   __gfs2_free_data(ip, bstart, blen);
gfs2_statfs_change(sdp, 0, +blen, 0);
gfs2_quota_change(ip, -(s64)blen

[Cluster-devel] [GFS2 PATCH] Optimize glock multiple-dequeue code

2011-03-10 Thread Bob Peterson

Hi,

This is a small patch that optimizes multiple glock dequeue
operations.  It changes the unlock order to be more efficient
and makes it easier for lock debugging tools to unravel.  It
also eliminates the need for the temp variable x, although
that would likely be optimized out.

Regards,

Bob Peterson
Red Hat File Systems

Signed-off-by: Bob Peterson rpete...@redhat.com 
--
 fs/gfs2/glock.c |   12 
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 3f45a14..8648409 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1248,10 +1248,8 @@ int gfs2_glock_nq_m(unsigned int num_gh, struct 
gfs2_holder *ghs)
 
 void gfs2_glock_dq_m(unsigned int num_gh, struct gfs2_holder *ghs)
 {
-   unsigned int x;
-
-   for (x = 0; x  num_gh; x++)
-   gfs2_glock_dq(ghs[x]);
+   while (num_gh--)
+   gfs2_glock_dq(ghs[num_gh]);
 }
 
 /**
@@ -1263,10 +1261,8 @@ void gfs2_glock_dq_m(unsigned int num_gh, struct 
gfs2_holder *ghs)
 
 void gfs2_glock_dq_uninit_m(unsigned int num_gh, struct gfs2_holder *ghs)
 {
-   unsigned int x;
-
-   for (x = 0; x  num_gh; x++)
-   gfs2_glock_dq_uninit(ghs[x]);
+   while (num_gh--)
+   gfs2_glock_dq_uninit(ghs[num_gh]);
 }
 
 void gfs2_glock_cb(struct gfs2_glock *gl, unsigned int state)

[Cluster-devel] [GFS2 Patch] GFS2 filesystem hang caused by incorrect lock order

2011-03-17 Thread Bob Peterson

Hi,

This patch fixes a deadlock in GFS2 where two processes are trying
to reclaim an unlinked dinode:
One holds the inode glock and calls gfs2_lookup_by_inum trying to look
up the inode, which it can't, due to I_FREEING.  The other has set
I_FREEING from vfs and is at the beginning of gfs2_delete_inode
waiting for the glock, which is held by the first.  The solution is to
add a new non_block parameter to the gfs2_iget function that causes it
to return -ENOENT if the inode is being freed.

Regards,

Bob Peterson
Red Hat File Systems

Signed-off-by: Bob Peterson rpete...@redhat.com 
--
 fs/gfs2/dir.c|2 +-
 fs/gfs2/export.c |   10 +---
 fs/gfs2/inode.c  |   56 -
 fs/gfs2/inode.h  |3 +-
 fs/gfs2/ops_fstype.c |2 +-
 fs/gfs2/rgrp.c   |4 +-
 fs/gfs2/super.c  |9 +++-
 7 files changed, 61 insertions(+), 25 deletions(-)

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 5c356d0..f789c57 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -1506,7 +1506,7 @@ struct inode *gfs2_dir_search(struct inode *dir, const 
struct qstr *name)
inode = gfs2_inode_lookup(dir-i_sb, 
be16_to_cpu(dent-de_type),
be64_to_cpu(dent-de_inum.no_addr),
-   be64_to_cpu(dent-de_inum.no_formal_ino));
+   be64_to_cpu(dent-de_inum.no_formal_ino), 0);
brelse(bh);
return inode;
}
diff --git a/fs/gfs2/export.c b/fs/gfs2/export.c
index b5a5e60..e0166f0 100644
--- a/fs/gfs2/export.c
+++ b/fs/gfs2/export.c
@@ -145,15 +145,17 @@ static struct dentry *gfs2_get_dentry(struct super_block 
*sb,
iput(inode);
return ERR_PTR(-ESTALE);
}
-   goto out_inode;
+   } else {
+   inode = gfs2_lookup_by_inum(sdp, inum-no_addr,
+   inum-no_formal_ino,
+   GFS2_BLKST_DINODE);
+   if (inode == ERR_PTR(-ENOENT))
+   inode = gfs2_ilookup(sb, inum-no_addr);
}
 
-   inode = gfs2_lookup_by_inum(sdp, inum-no_addr, inum-no_formal_ino,
-   GFS2_BLKST_DINODE);
if (IS_ERR(inode))
return ERR_CAST(inode);
 
-out_inode:
return d_obtain_alias(inode);
 }
 
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 97d54a2..9134dcb 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -40,37 +40,61 @@ struct gfs2_inum_range_host {
u64 ir_length;
 };
 
+struct gfs2_skip_data {
+   u64 no_addr;
+   int skipped;
+   int non_block;
+};
+
 static int iget_test(struct inode *inode, void *opaque)
 {
struct gfs2_inode *ip = GFS2_I(inode);
-   u64 *no_addr = opaque;
+   struct gfs2_skip_data *data = opaque;
 
-   if (ip-i_no_addr == *no_addr)
+   if (ip-i_no_addr == data-no_addr) {
+   if (data-non_block 
+   inode-i_state  (I_FREEING|I_CLEAR|I_WILL_FREE)) {
+   data-skipped = 1;
+   return 0;
+   }
return 1;
-
+   }
return 0;
 }
 
 static int iget_set(struct inode *inode, void *opaque)
 {
struct gfs2_inode *ip = GFS2_I(inode);
-   u64 *no_addr = opaque;
+   struct gfs2_skip_data *data = opaque;
 
-   inode-i_ino = (unsigned long)*no_addr;
-   ip-i_no_addr = *no_addr;
+   if (data-skipped)
+   return -ENOENT;
+   inode-i_ino = (unsigned long)(data-no_addr);
+   ip-i_no_addr = data-no_addr;
return 0;
 }
 
 struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr)
 {
unsigned long hash = (unsigned long)no_addr;
-   return ilookup5(sb, hash, iget_test, no_addr);
+   struct gfs2_skip_data data;
+
+   data.no_addr = no_addr;
+   data.skipped = 0;
+   data.non_block = 0;
+   return ilookup5(sb, hash, iget_test, data);
 }
 
-static struct inode *gfs2_iget(struct super_block *sb, u64 no_addr)
+static struct inode *gfs2_iget(struct super_block *sb, u64 no_addr,
+  int non_block)
 {
+   struct gfs2_skip_data data;
unsigned long hash = (unsigned long)no_addr;
-   return iget5_locked(sb, hash, iget_test, iget_set, no_addr);
+
+   data.no_addr = no_addr;
+   data.skipped = 0;
+   data.non_block = non_block;
+   return iget5_locked(sb, hash, iget_test, iget_set, data);
 }
 
 /**
@@ -111,19 +135,20 @@ static void gfs2_set_iop(struct inode *inode)
  * @sb: The super block
  * @no_addr: The inode number
  * @type: The type of the inode
+ * non_block: Can we block on inodes that are being freed?
  *
  * Returns: A VFS inode, or an error
  */
 
 struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type,
-   u64 no_addr, u64

[Cluster-devel] [GFS2 PATCH] gfs2: remove *leaf_call_t and simplify leaf_dealloc

2011-03-22 Thread Bob Peterson

Hi,

Since foreach_leaf is only called with leaf_dealloc as its only possible
call function, we can simplify the code by making it call leaf_dealloc
directly.  This simplifies the code and eliminates the need for
leaf_call_t, the generic call method.  This is a first small step in
simplifying the directory leaf deallocation code.

Regards,

Bob Peterson
Red Hat File Systems

Signed-off-by: Bob Peterson rpete...@redhat.com 
--
diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 5c356d0..10a4bbe 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -82,11 +82,11 @@
 struct qstr gfs2_qdot __read_mostly;
 struct qstr gfs2_qdotdot __read_mostly;
 
-typedef int (*leaf_call_t) (struct gfs2_inode *dip, u32 index, u32 len,
-   u64 leaf_no, void *data);
 typedef int (*gfs2_dscan_t)(const struct gfs2_dirent *dent,
const struct qstr *name, void *opaque);
 
+static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
+   u64 leaf_no);
 
 int gfs2_dir_get_new_buffer(struct gfs2_inode *ip, u64 block,
struct buffer_head **bhp)
@@ -1770,13 +1770,11 @@ int gfs2_dir_mvino(struct gfs2_inode *dip, const struct 
qstr *filename,
 /**
  * foreach_leaf - call a function for each leaf in a directory
  * @dip: the directory
- * @lc: the function to call for each each
- * @data: private data to pass to it
  *
  * Returns: errno
  */
 
-static int foreach_leaf(struct gfs2_inode *dip, leaf_call_t lc, void *data)
+static int foreach_leaf(struct gfs2_inode *dip)
 {
struct gfs2_sbd *sdp = GFS2_SB(dip-i_inode);
struct buffer_head *bh;
@@ -1823,7 +1821,7 @@ static int foreach_leaf(struct gfs2_inode *dip, 
leaf_call_t lc, void *data)
len = 1  (dip-i_depth - be16_to_cpu(leaf-lf_depth));
brelse(bh);
 
-   error = lc(dip, index, len, leaf_no, data);
+   error = leaf_dealloc(dip, index, len, leaf_no);
if (error)
goto out;
 
@@ -1855,7 +1853,7 @@ out:
  */
 
 static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
-   u64 leaf_no, void *data)
+   u64 leaf_no)
 {
struct gfs2_sbd *sdp = GFS2_SB(dip-i_inode);
struct gfs2_leaf *tmp_leaf;
@@ -1978,7 +1976,7 @@ int gfs2_dir_exhash_dealloc(struct gfs2_inode *dip)
int error;
 
/* Dealloc on-disk leaves to FREEMETA state */
-   error = foreach_leaf(dip, leaf_dealloc, NULL);
+   error = foreach_leaf(dip);
if (error)
return error;

[Cluster-devel] [GFS2 PATCH] gfs2: Combine transaction from gfs2_dir_exhash_dealloc

2011-03-22 Thread Bob Peterson

Hi,

At the end of function gfs2_dir_exhash_dealloc, it was setting the dinode
type to file to prevent directory corruption in case of a crash.
It was doing so in its own journal transaction.  This patch makes the
change occur when the last call is make to leaf_dealloc, since it needs
to rewrite the directory dinode at that time anyway.

Regards,

Bob Peterson
Red Hat File Systems

Signed-off-by: Bob Peterson rpete...@redhat.com 
--
diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 10a4bbe..6791e49 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -86,7 +86,7 @@ typedef int (*gfs2_dscan_t)(const struct gfs2_dirent *dent,
const struct qstr *name, void *opaque);
 
 static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
-   u64 leaf_no);
+   u64 leaf_no, int last_dealloc);
 
 int gfs2_dir_get_new_buffer(struct gfs2_inode *ip, u64 block,
struct buffer_head **bhp)
@@ -1781,10 +1781,10 @@ static int foreach_leaf(struct gfs2_inode *dip)
struct gfs2_leaf *leaf;
u32 hsize, len;
u32 ht_offset, lp_offset, ht_offset_cur = -1;
-   u32 index = 0;
+   u32 index = 0, next_index;
__be64 *lp;
u64 leaf_no;
-   int error = 0;
+   int error = 0, last;
 
hsize = 1  dip-i_depth;
if (hsize * sizeof(u64) != i_size_read(dip-i_inode)) {
@@ -1819,13 +1819,13 @@ static int foreach_leaf(struct gfs2_inode *dip)
goto out;
leaf = (struct gfs2_leaf *)bh-b_data;
len = 1  (dip-i_depth - be16_to_cpu(leaf-lf_depth));
+   next_index = (index  ~(len - 1)) + len;
+   last = ((next_index = hsize) ? 1 : 0);
brelse(bh);
-
-   error = leaf_dealloc(dip, index, len, leaf_no);
+   error = leaf_dealloc(dip, index, len, leaf_no, last);
if (error)
goto out;
-
-   index = (index  ~(len - 1)) + len;
+   index = next_index;
} else
index++;
}
@@ -1847,13 +1847,13 @@ out:
  * @index: the hash table offset in the directory
  * @len: the number of pointers to this leaf
  * @leaf_no: the leaf number
- * @data: not used
+ * last_dealloc: 1 if this is the final dealloc for the leaf, else 0
  *
  * Returns: errno
  */
 
 static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
-   u64 leaf_no)
+   u64 leaf_no, int last_dealloc)
 {
struct gfs2_sbd *sdp = GFS2_SB(dip-i_inode);
struct gfs2_leaf *tmp_leaf;
@@ -1940,6 +1940,10 @@ static int leaf_dealloc(struct gfs2_inode *dip, u32 
index, u32 len,
goto out_end_trans;
 
gfs2_trans_add_bh(dip-i_gl, dibh, 1);
+   /* On the last dealloc, make this a regular file in case we crash.
+  (We don't want to free these blocks a second time.)  */
+   if (last_dealloc)
+   dip-i_inode.i_mode = S_IFREG;
gfs2_dinode_out(dip, dibh-b_data);
brelse(dibh);
 
@@ -1971,33 +1975,8 @@ out:
 
 int gfs2_dir_exhash_dealloc(struct gfs2_inode *dip)
 {
-   struct gfs2_sbd *sdp = GFS2_SB(dip-i_inode);
-   struct buffer_head *bh;
-   int error;
-
/* Dealloc on-disk leaves to FREEMETA state */
-   error = foreach_leaf(dip);
-   if (error)
-   return error;
-
-   /* Make this a regular file in case we crash.
-  (We don't want to free these blocks a second time.)  */
-
-   error = gfs2_trans_begin(sdp, RES_DINODE, 0);
-   if (error)
-   return error;
-
-   error = gfs2_meta_inode_buffer(dip, bh);
-   if (!error) {
-   gfs2_trans_add_bh(dip-i_gl, bh, 1);
-   ((struct gfs2_dinode *)bh-b_data)-di_mode =
-   cpu_to_be32(S_IFREG);
-   brelse(bh);
-   }
-
-   gfs2_trans_end(sdp);
-
-   return error;
+   return foreach_leaf(dip);
 }
 
 /**

[Cluster-devel] [GFS2 PATCH] gfs2: pass leaf_bh into leaf_dealloc

2011-03-22 Thread Bob Peterson

Hi,

Function foreach_leaf used to look up the leaf block address and get
a buffer_head.  Then it would call leaf_dealloc which did the same
lookup.  This patch combines the two operations by making foreach_leaf
pass the leaf bh to leaf_dealloc.

Regards,

Bob Peterson
Red Hat File Systems

Signed-off-by: Bob Peterson rpete...@redhat.com 
--
diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 6791e49..4a925a7 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -86,7 +86,8 @@ typedef int (*gfs2_dscan_t)(const struct gfs2_dirent *dent,
const struct qstr *name, void *opaque);
 
 static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
-   u64 leaf_no, int last_dealloc);
+   u64 leaf_no, struct buffer_head *leaf_bh,
+   int last_dealloc);
 
 int gfs2_dir_get_new_buffer(struct gfs2_inode *ip, u64 block,
struct buffer_head **bhp)
@@ -1821,8 +1822,9 @@ static int foreach_leaf(struct gfs2_inode *dip)
len = 1  (dip-i_depth - be16_to_cpu(leaf-lf_depth));
next_index = (index  ~(len - 1)) + len;
last = ((next_index = hsize) ? 1 : 0);
+   error = leaf_dealloc(dip, index, len, leaf_no, bh,
+last);
brelse(bh);
-   error = leaf_dealloc(dip, index, len, leaf_no, last);
if (error)
goto out;
index = next_index;
@@ -1847,13 +1849,15 @@ out:
  * @index: the hash table offset in the directory
  * @len: the number of pointers to this leaf
  * @leaf_no: the leaf number
+ * @leaf_bh: buffer_head for the starting leaf
  * last_dealloc: 1 if this is the final dealloc for the leaf, else 0
  *
  * Returns: errno
  */
 
 static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
-   u64 leaf_no, int last_dealloc)
+   u64 leaf_no, struct buffer_head *leaf_bh,
+   int last_dealloc)
 {
struct gfs2_sbd *sdp = GFS2_SB(dip-i_inode);
struct gfs2_leaf *tmp_leaf;
@@ -1885,14 +1889,18 @@ static int leaf_dealloc(struct gfs2_inode *dip, u32 
index, u32 len,
goto out_qs;
 
/*  Count the number of leaves  */
+   bh = leaf_bh;
 
for (blk = leaf_no; blk; blk = nblk) {
-   error = get_leaf(dip, blk, bh);
-   if (error)
-   goto out_rlist;
+   if (blk != leaf_no) {
+   error = get_leaf(dip, blk, bh);
+   if (error)
+   goto out_rlist;
+   }
tmp_leaf = (struct gfs2_leaf *)bh-b_data;
nblk = be64_to_cpu(tmp_leaf-lf_next);
-   brelse(bh);
+   if (blk != leaf_no)
+   brelse(bh);
 
gfs2_rlist_add(sdp, rlist, blk);
l_blocks++;
@@ -1916,13 +1924,18 @@ static int leaf_dealloc(struct gfs2_inode *dip, u32 
index, u32 len,
if (error)
goto out_rg_gunlock;
 
+   bh = leaf_bh;
+
for (blk = leaf_no; blk; blk = nblk) {
-   error = get_leaf(dip, blk, bh);
-   if (error)
-   goto out_end_trans;
+   if (blk != leaf_no) {
+   error = get_leaf(dip, blk, bh);
+   if (error)
+   goto out_end_trans;
+   }
tmp_leaf = (struct gfs2_leaf *)bh-b_data;
nblk = be64_to_cpu(tmp_leaf-lf_next);
-   brelse(bh);
+   if (blk != leaf_no)
+   brelse(bh);
 
gfs2_free_meta(dip, blk, 1);
gfs2_add_inode_blocks(dip-i_inode, -1);

[Cluster-devel] [GFS2 PATCH] GFS2: eliminate i_generation from memory

2011-03-23 Thread Bob Peterson

Hi,

Since GFS2 doesn't rely upon generation numbers like GFS1, we do not
need to copy the generation number into memory and out of memory.
This patch eliminates the variable from the in-core structure and
will reduce the memory requirements of GFS2.

Regards,

Bob Peterson
Red Hat File Systems

Signed-off-by: Bob Peterson rpete...@redhat.com 
--
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 870a89d..2329296 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -272,7 +272,6 @@ struct gfs2_inode {
struct inode i_inode;
u64 i_no_addr;
u64 i_no_formal_ino;
-   u64 i_generation;
u64 i_eattr;
unsigned long i_flags;  /* GIF_... */
struct gfs2_glock *i_gl; /* Move into i_gh? */
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 97d54a2..b4e416a 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -261,7 +261,6 @@ static int gfs2_dinode_in(struct gfs2_inode *ip, const void 
*buf)
ip-i_inode.i_ctime.tv_nsec = be32_to_cpu(str-di_ctime_nsec);
 
ip-i_goal = be64_to_cpu(str-di_goal_meta);
-   ip-i_generation = be64_to_cpu(str-di_generation);
 
ip-i_diskflags = be32_to_cpu(str-di_flags);
gfs2_set_inode_flags(ip-i_inode);
@@ -626,7 +625,6 @@ static void init_dinode(struct gfs2_inode *dip, struct 
gfs2_glock *gl,
di-di_major = cpu_to_be32(MAJOR(dev));
di-di_minor = cpu_to_be32(MINOR(dev));
di-di_goal_meta = di-di_goal_data = cpu_to_be64(inum-no_addr);
-   di-di_generation = cpu_to_be64(*generation);
di-di_flags = 0;
 
if (S_ISREG(mode)) {
@@ -944,7 +942,6 @@ void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf)
 
str-di_goal_meta = cpu_to_be64(ip-i_goal);
str-di_goal_data = cpu_to_be64(ip-i_goal);
-   str-di_generation = cpu_to_be64(ip-i_generation);
 
str-di_flags = cpu_to_be32(ip-i_diskflags);
str-di_height = cpu_to_be16(ip-i_height);

[Cluster-devel] [GFS2 Patch] GFS2: Processes waiting on inode glock that no processes are holding

2011-05-24 Thread Bob Peterson

Hi,

This patch fixes a race in the GFS2 glock state machine that may
result in lockups.  The symptom is that all nodes but one will
hang, waiting for a particular glock.  All the holder records
will have the W (Waiting) bit set.  The other node will
typically have the glock stuck in Exclusive mode (EX) with no
holder records, but the dinode will be cached.  In other words,
an entry with I: will appear in the glock dump for that glock,
but nothing else.

The race has to do with the glock Pending Demote bit, which
can be set, then immediately reset, thus losing the fact that
another node needs the glock.  The sequence of events is:

1. Something schedules the glock workqueue (e.g. glock request from fs)
2. The glock workqueue gets to the point between the test of the reply pending
bit and the spin lock:

if (test_and_clear_bit(GLF_REPLY_PENDING, gl-gl_flags)) {
finish_xmote(gl, gl-gl_reply);
drop_ref = 1;
}
down_read(gfs2_umount_flush_sem);  i.e. here
spin_lock(gl-gl_spin);

3. In comes (a) the reply to our EX lock request setting GLF_REPLY_PENDING and
(b) the demote request which sets GLF_PENDING_DEMOTE

4. The following test is executed:

if (test_and_clear_bit(GLF_PENDING_DEMOTE, gl-gl_flags) 
gl-gl_state != LM_ST_UNLOCKED 
gl-gl_demote_state != LM_ST_EXCLUSIVE) {

This resets the pending demote flag, and gl-gl_demote_state is not equal to
exclusive, however because the reply from the dlm arrived after we checked for
the GLF_REPLY_PENDING flag, gl-gl_state is still equal to unlocked, so
although we reset the GLF_PENDING_DEMOTE flag, we didn't then set the
GLF_DEMOTE flag or reinstate the GLF_PENDING_DEMOTE_FLAG.

The patch closes the timing window by only transitioning the
Pending demote bit to the demote flag once we know the
other conditions (not unlocked and not exclusive) are met.

Regards,

Bob Peterson
Red Hat File Systems

Signed-off-by: Bob Peterson rpete...@redhat.com 
--
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index a2a6abb..7137750 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -663,14 +663,19 @@ static void glock_work_func(struct work_struct *work)
drop_ref = 1;
}
spin_lock(gl-gl_spin);
-   if (test_and_clear_bit(GLF_PENDING_DEMOTE, gl-gl_flags) 
+   if (test_bit(GLF_PENDING_DEMOTE, gl-gl_flags) 
gl-gl_state != LM_ST_UNLOCKED 
gl-gl_demote_state != LM_ST_EXCLUSIVE) {
unsigned long holdtime, now = jiffies;
+
holdtime = gl-gl_tchange + gl-gl_ops-go_min_hold_time;
if (time_before(now, holdtime))
delay = holdtime - now;
-   set_bit(delay ? GLF_PENDING_DEMOTE : GLF_DEMOTE, gl-gl_flags);
+
+   if (!delay) {
+   clear_bit(GLF_PENDING_DEMOTE, gl-gl_flags);
+   set_bit(GLF_DEMOTE, gl-gl_flags);
+   }
}
run_queue(gl, 0);
spin_unlock(gl-gl_spin);

[Cluster-devel] [PATCH][GFS2] Bouncing locks in a cluster is slow in GFS2 - Try #3

2011-06-15 Thread Bob Peterson

Hi,

This is a rebase of a patch I sent on January 26, 2011.
Now that we've resolved the other pending issues blocking
this one, I'm submitting it again.

This patch is a performance improvement for GFS2 in a clustered
environment. It makes the glock hold time self-adjusting.

Regards,

Bob Peterson
Red Hat File Systems
--
 fs/gfs2/glock.c  |   39 +--
 fs/gfs2/glock.h  |6 ++
 fs/gfs2/glops.c  |2 --
 fs/gfs2/incore.h |2 +-
 4 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 1c1336e..88e8a23 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -409,6 +409,10 @@ static void state_change(struct gfs2_glock *gl, unsigned 
int new_state)
if (held1  held2  list_empty(gl-gl_holders))
clear_bit(GLF_QUEUED, gl-gl_flags);
 
+   if (new_state != gl-gl_target)
+   /* shorten our minimum hold time */
+   gl-gl_hold_time = max(gl-gl_hold_time - GL_GLOCK_HOLD_DECR,
+  GL_GLOCK_MIN_HOLD);
gl-gl_state = new_state;
gl-gl_tchange = jiffies;
 }
@@ -668,7 +672,7 @@ static void glock_work_func(struct work_struct *work)
gl-gl_demote_state != LM_ST_EXCLUSIVE) {
unsigned long holdtime, now = jiffies;
 
-   holdtime = gl-gl_tchange + gl-gl_ops-go_min_hold_time;
+   holdtime = gl-gl_tchange + gl-gl_hold_time;
if (time_before(now, holdtime))
delay = holdtime - now;
 
@@ -679,9 +683,14 @@ static void glock_work_func(struct work_struct *work)
}
run_queue(gl, 0);
spin_unlock(gl-gl_spin);
-   if (!delay ||
-   queue_delayed_work(glock_workqueue, gl-gl_work, delay) == 0)
+   if (!delay)
gfs2_glock_put(gl);
+   else {
+   if (gl-gl_name.ln_type != LM_TYPE_INODE)
+   delay = 0;
+   if (queue_delayed_work(glock_workqueue, gl-gl_work, delay) == 
0)
+   gfs2_glock_put(gl);
+   }
if (drop_ref)
gfs2_glock_put(gl);
 }
@@ -743,6 +752,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
gl-gl_tchange = jiffies;
gl-gl_object = NULL;
gl-gl_sbd = sdp;
+   gl-gl_hold_time = GL_GLOCK_DFT_HOLD;
INIT_DELAYED_WORK(gl-gl_work, glock_work_func);
INIT_WORK(gl-gl_delete, delete_work_func);
 
@@ -855,8 +865,15 @@ static int gfs2_glock_demote_wait(void *word)
 
 static void wait_on_holder(struct gfs2_holder *gh)
 {
+   unsigned long time1 = jiffies;
+
might_sleep();
wait_on_bit(gh-gh_iflags, HIF_WAIT, gfs2_glock_holder_wait, 
TASK_UNINTERRUPTIBLE);
+   if (time_after(jiffies, time1 + HZ)) /* have we waited  a second? */
+   /* Lengthen the minimum hold time. */
+   gh-gh_gl-gl_hold_time = min(gh-gh_gl-gl_hold_time +
+ GL_GLOCK_HOLD_INCR,
+ GL_GLOCK_MAX_HOLD);
 }
 
 static void wait_on_demote(struct gfs2_glock *gl)
@@ -1093,8 +1110,9 @@ void gfs2_glock_dq(struct gfs2_holder *gh)
 
gfs2_glock_hold(gl);
if (test_bit(GLF_PENDING_DEMOTE, gl-gl_flags) 
-   !test_bit(GLF_DEMOTE, gl-gl_flags))
-   delay = gl-gl_ops-go_min_hold_time;
+   !test_bit(GLF_DEMOTE, gl-gl_flags) 
+   gl-gl_name.ln_type == LM_TYPE_INODE)
+   delay = gl-gl_hold_time;
if (queue_delayed_work(glock_workqueue, gl-gl_work, delay) == 0)
gfs2_glock_put(gl);
 }
@@ -1273,12 +1291,13 @@ void gfs2_glock_cb(struct gfs2_glock *gl, unsigned int 
state)
unsigned long now = jiffies;
 
gfs2_glock_hold(gl);
-   holdtime = gl-gl_tchange + gl-gl_ops-go_min_hold_time;
-   if (test_bit(GLF_QUEUED, gl-gl_flags)) {
+   holdtime = gl-gl_tchange + gl-gl_hold_time;
+   if (test_bit(GLF_QUEUED, gl-gl_flags) 
+   gl-gl_name.ln_type == LM_TYPE_INODE) {
if (time_before(now, holdtime))
delay = holdtime - now;
if (test_bit(GLF_REPLY_PENDING, gl-gl_flags))
-   delay = gl-gl_ops-go_min_hold_time;
+   delay = gl-gl_hold_time;
}
 
spin_lock(gl-gl_spin);
@@ -1667,7 +1686,7 @@ static int __dump_glock(struct seq_file *seq, const 
struct gfs2_glock *gl)
dtime *= 100/HZ; /* demote time in uSec */
if (!test_bit(GLF_DEMOTE, gl-gl_flags))
dtime = 0;
-   gfs2_print_dbg(seq, G:  s:%s n:%u/%llx f:%s t:%s d:%s/%llu a:%d v:%d 
r:%d\n,
+   gfs2_print_dbg(seq, G:  s:%s n:%u/%llx f:%s t:%s d:%s/%llu a:%d v:%d 
r:%d m:%ld\n,
  state2str(gl-gl_state),
  gl-gl_name.ln_type,
  (unsigned long long)gl-gl_name.ln_number,
@@ -1676,7 +1695,7 @@ static int __dump_glock(struct seq_file *seq, const 
struct

Re: [Cluster-devel] [PATCH RHEL6 1/2] tunegfs2: Ensure we don't try to open a null device

2011-07-25 Thread Bob Peterson

- Original Message -
| This is based on upstream commit
| a830c8747cf7dcc899dc92ad13c3a3b1a3738092
| 
| Return codes are now taken from sysexits.h rather than trying to pass
| negative
| errno values as per kernel code. There is not an exact error code for
| all
| potential events, but they are close enough I think.
| 
| The code also checks that there is exactly one non-option argument
| (i.e. the
| device) given before attempting to open it.
| 
| rhbz#719124
| 
| Signed-off-by: Andrew Price anpr...@redhat.com
| Signed-off-by: Steven Whitehouse swhit...@redhat.com
| Reported-by: Nathan Straz nst...@redhat.com

Hi,

Looks good.  ACK.

Bob Peterson
Red Hat File Systems

Re: [Cluster-devel] [PATCH RHEL6 2/2] tunegfs2: Fix usage message

2011-07-25 Thread Bob Peterson

- Original Message -
| This is based on upstream commit
| a830c8747cf7dcc899dc92ad13c3a3b1a3738092
| 
| The help message is updated to include all supported options.
| 
| rhbz#719126
| 
| Signed-off-by: Steven Whitehouse swhit...@redhat.com
| Signed-off-by: Andrew Price anpr...@redhat.com
| Reported-by: Nathan Straz nst...@redhat.com
| ---

Hi,

ACK,

Regards,

Bob Peterson
Red Hat File Systems

Re: [Cluster-devel] [PATCH] i18n: strings review

2011-07-26 Thread Bob Peterson

- Original Message -
| This patch is the first review of the gfs2-utils
| strings that will be translated. The changes in
| this patch were based according with the current
| strings added to the .pot file
| 
| Please, disregard the previous one since this
| contains the changes suggested by Steve
| ---

Hi Carlos,

Looks good.  ACK.

Bob Peterson
Red Hat File Systems

[Cluster-devel] [PATCH 00/44] fsck.gfs2: Support for checking gfs1 file systems

2011-08-11 Thread Bob Peterson

Hi,

For a long time now I've been working on a series of patches that
allow fsck.gfs2 to analyze and fix GFS (GFS1) file systems as well
as GFS2 file systems.  There are several reasons to do this:

1. There is no gfs_fsck in upstream, RHEL6, Fedora or newer, which
   means that anyone who has an existing GFS1 file system who
   upgrades their cluster to newer software will not be able to
   check their file system before porting it.  In other words,
   when gfs2_convert runs to convert a file system from GFS to GFS2
   it instructs the user to fsck their file system before the
   conversion.  Right now the fsck is impossible unless the gfs_fsck
   is done prior to the software upgrade.
2. The fsck.gfs2 tool has been debugged, maintained and tested much
   more thoroughly than gfs_fsck, so it has made many major advances:
   it makes better decisions and may be able to recover more data
   than gfs_fsck can.
3. The fsck.gfs2 tool is _much_ faster than gfs_fsck.  In some
   extreme cases the speed is orders of magnitude faster.  Some
   fscks that would take gfs_fsck 5 days to run now complete in
   about an hour's time.

In getting fsck.gfs2 to operate on GFS file systems, I've run
various prototypes through a series of GFS (and GFS2) metadata in
order to test how it behaves.  After the tests, I've analyzed the
output to determine if it's making sane decisions about the file
system.  In picking through all this output, I've found and fixed
many problems with today's fsck.gfs2.  My collection of GFS metadata
exposed many problems I would never find in my GFS2 metadata
collection, despite it being bigger. (My collection includes
about 36 GFS1 metadata sets and 84 GFS2 metadata sets.)

I've created a set of these 44 patches to address these problems
and add the capability to fix GFS1:

01/44 fsck.gfs2: Make functions consistently use sdp rather than sbp
02/44 fsck.gfs2: Change if( to if (
03/44 libgfs1: Add a centralized gfs1 variable to superblock variables
04/44 libgfs2: Make check_sb and read_sb operate on gfs1 file systems
05/44 libgfs2: add generic risize function, move gfs1 structures to libgfs2
06/44 fsck.gfs2: Check for blocks wrongly inside resource groups
07/44 fsck.gfs2: Rename function check_leaf to check_ealeaf_block
08/44 fsck.gfs2: eliminate vestigial buffer_head variable in check_leaf
09/44 fsck.gfs2: Rename the nlink functions to make them more intuitive
10/44 fsck.gfs2: Keep di_nlink in sync when adding links for lost+found

11/44 fsck.gfs2: directory entry count was only 16 bits in check_entries
12/44 fsck.gfs2: get rid of triple negative logic
13/44 dirent_repair needs to mark the buffer as modified
14/44 fsck.gfs2: Ask to reclaim unlinked meta on a per-rgrp basis only
15/44 fsck.gfs2: Factor out function to add .. entry when linking to 
lost+found
16/44 libgfs2: Use __FUNCTION__ rather than __FILE__ for debug messages
17/44 fsck.gfs2: Don't stop invalidating blocks if an invalid one is found
18/44 fsck.gfs2: Find and clear duplicate references that are leaf blocks
19/44 fsck.gfs2: Move function check_num_ptrs from metawalk.c to pass1.c
20/44 fsck.gfs2: Add duplicate reference processing for leaf blocks

21/44 fsck.gfs2: split function check_leaf_blks to make it more understandable
22/44 fsck.gfs2: Shorten output
23/44 fsck.gfs2: Make output messages more sensible
24/44 fsck.gfs2 pass2: Refactor function set_dotdor_dir
25/44 fsck.gfs2 pass2: When deleting an inode, delete its extended attributes
26/44 fsck.gfs2 pass2: Only delete metadata if bad (not invalid) inode
27/44 fsck.gfs2 pass3: Refactor mark_and_return_parent
28/44 fsck.gfs2: misc cosmetic changes
29/44 fsck.gfs2: check_leaf_blks: Don't use old_leaf if it was a duplicate
30/44 fsck.gfs2: Add find_remove_dup and free_block_if_notdup

31/44 fsck.gfs2: don't free previous rgrp list when trying to repair rgrps
32/44 libgfs2: eliminate gfs1_readi in favor of gfs2_readi
33/44 libgfs2: when adding a new GFS1 block, mark buffers modified
34/44 libgfs2: when mapping gfs1 dinode blocks, use dinode buffer
35/44 libgfs2: move block_map functions to fsck.gfs2
36/44 libgfs2: eliminate gfs1_rindex_read in favor of rindex_read
37/44 libgfs2: combine ri_update and gfs1_ri_update
38/44 libgfs2: combine gfs_inode_read and gfs_inode_get
39/44 libgfs2: move gfs1 functions from edit to libgfs2
40/44 gfs2_edit savemeta: save_inode_data backwards for gfs1

41/44 libgfs2: expand libgfs2's capabilities to operate on gfs1
42/44 fsck.gfs2: Combine block and char device inode types
43/44 fsck.gfs2: four-step duplicate elimination process
44/44 fsck.gfs2: Add ability to check gfs1 file systems

The patches will be posted shortly.  Testing is ongoing.
If you have GFS1 or GFS2 file systems and would like to contribute
metadata for testing, please use gfs2_edit savemeta, gzip the
output, and send me a link where I can download it.

Regards,

Bob Peterson
Red Hat File Systems

[Cluster-devel] [Patch 01/44] fsck.gfs2: Make functions consistently use sdp rather than sbp

2011-08-11 Thread Bob Peterson

From f4326def0744e6a01a5e7665eb4f8261d6996af7 Mon Sep 17 00:00:00 2001
From: Bob Peterson rpete...@redhat.com
Date: Mon, 8 Aug 2011 08:41:36 -0500
Subject: [PATCH 01/44] fsck.gfs2: Make functions consistently use sdp rather
 than sbp

For years, the fsck.gfs2 tool used two different variable names to refer to the
same structure: sdp and sbp.  This patch changes them all to sdp so they are now
consistent with the kernel code.

rhbz#675723
---
 gfs2/fsck/fsck.h   |   20 ++--
 gfs2/fsck/initialize.c |   84 
 gfs2/fsck/main.c   |   48 ++--
 gfs2/fsck/metawalk.c   |   24 +++---
 gfs2/fsck/metawalk.h   |4 +-
 gfs2/fsck/pass1.c  |   18 +-
 gfs2/fsck/pass1b.c |   34 ++--
 gfs2/fsck/pass1c.c |   26 +++---
 gfs2/fsck/pass2.c  |   58 
 gfs2/fsck/pass3.c  |   28 
 gfs2/fsck/pass4.c  |   14 
 gfs2/fsck/pass5.c  |   16 
 12 files changed, 187 insertions(+), 187 deletions(-)

diff --git a/gfs2/fsck/fsck.h b/gfs2/fsck/fsck.h
index bc14b88..25bc3b9 100644
--- a/gfs2/fsck/fsck.h
+++ b/gfs2/fsck/fsck.h
@@ -92,21 +92,21 @@ enum rgindex_trust_level { /* how far can we trust our RG 
index? */
   must have been converted from gfs2_convert. */
 };
 
-extern struct gfs2_inode *fsck_load_inode(struct gfs2_sbd *sbp, uint64_t 
block);
+extern struct gfs2_inode *fsck_load_inode(struct gfs2_sbd *sdp, uint64_t 
block);
 extern struct gfs2_inode *fsck_inode_get(struct gfs2_sbd *sdp,
  struct gfs2_buffer_head *bh);
 extern void fsck_inode_put(struct gfs2_inode **ip);
 
-extern int initialize(struct gfs2_sbd *sbp, int force_check, int preen,
+extern int initialize(struct gfs2_sbd *sdp, int force_check, int preen,
  int *all_clean);
-extern void destroy(struct gfs2_sbd *sbp);
-extern int pass1(struct gfs2_sbd *sbp);
-extern int pass1b(struct gfs2_sbd *sbp);
-extern int pass1c(struct gfs2_sbd *sbp);
-extern int pass2(struct gfs2_sbd *sbp);
-extern int pass3(struct gfs2_sbd *sbp);
-extern int pass4(struct gfs2_sbd *sbp);
-extern int pass5(struct gfs2_sbd *sbp);
+extern void destroy(struct gfs2_sbd *sdp);
+extern int pass1(struct gfs2_sbd *sdp);
+extern int pass1b(struct gfs2_sbd *sdp);
+extern int pass1c(struct gfs2_sbd *sdp);
+extern int pass2(struct gfs2_sbd *sdp);
+extern int pass3(struct gfs2_sbd *sdp);
+extern int pass4(struct gfs2_sbd *sdp);
+extern int pass5(struct gfs2_sbd *sdp);
 extern int rg_repair(struct gfs2_sbd *sdp, int trust_lvl, int *rg_count,
 int *sane);
 extern void gfs2_dup_free(void);
diff --git a/gfs2/fsck/initialize.c b/gfs2/fsck/initialize.c
index 0930ba6..55a4f19 100644
--- a/gfs2/fsck/initialize.c
+++ b/gfs2/fsck/initialize.c
@@ -38,26 +38,26 @@ static struct master_dir fix_md;
  * Change the lock protocol so nobody can mount the fs
  *
  */
-static int block_mounters(struct gfs2_sbd *sbp, int block_em)
+static int block_mounters(struct gfs2_sbd *sdp, int block_em)
 {
if(block_em) {
/* verify it starts with lock_ */
-   if(!strncmp(sbp-sd_sb.sb_lockproto, lock_, 5)) {
+   if(!strncmp(sdp-sd_sb.sb_lockproto, lock_, 5)) {
/* Change lock_ to fsck_ */
-   memcpy(sbp-sd_sb.sb_lockproto, fsck_, 5);
+   memcpy(sdp-sd_sb.sb_lockproto, fsck_, 5);
}
/* FIXME: Need to do other verification in the else
 * case */
} else {
/* verify it starts with fsck_ */
/* verify it starts with lock_ */
-   if(!strncmp(sbp-sd_sb.sb_lockproto, fsck_, 5)) {
+   if(!strncmp(sdp-sd_sb.sb_lockproto, fsck_, 5)) {
/* Change fsck_ to lock_ */
-   memcpy(sbp-sd_sb.sb_lockproto, lock_, 5);
+   memcpy(sdp-sd_sb.sb_lockproto, lock_, 5);
}
}
 
-   if(write_sb(sbp)) {
+   if(write_sb(sdp)) {
stack;
return -1;
}
@@ -1180,7 +1180,7 @@ static int fill_super_block(struct gfs2_sbd *sdp)
  * initialize - initialize superblock pointer
  *
  */
-int initialize(struct gfs2_sbd *sbp, int force_check, int preen,
+int initialize(struct gfs2_sbd *sdp, int force_check, int preen,
   int *all_clean)
 {
int clean_journals = 0, open_flag;
@@ -1192,8 +1192,8 @@ int initialize(struct gfs2_sbd *sbp, int force_check, int 
preen,
else
open_flag = O_RDWR | O_EXCL;
 
-   sbp-device_fd = open(opts.device, open_flag);
-   if (sbp-device_fd  0) {
+   sdp-device_fd = open(opts.device, open_flag);
+   if (sdp-device_fd  0) {
int is_mounted, ro;
 
if (open_flag == O_RDONLY || errno != EBUSY) {
@@ -1207,10 +1207,10 @@ int

[Cluster-devel] [Patch 04/44] libgfs2: Make check_sb and read_sb operate on gfs1 file systems

2011-08-11 Thread Bob Peterson

From e85543c0c03fcaeb4ada7ee7b4ecbef361b16ffc Mon Sep 17 00:00:00 2001
From: Bob Peterson rpete...@redhat.com
Date: Mon, 8 Aug 2011 10:48:35 -0500
Subject: [PATCH 04/44] libgfs2: Make check_sb and read_sb operate on gfs1
 file systems

This patch adds allow_gfs1 parameters to the read_sb and check_sb functions.
This will allow gfs2-utils to read and operate on gfs1 file systems in
follow-up patches.

rhbz#675723
---
 gfs2/edit/savemeta.c   |   27 ++
 gfs2/fsck/initialize.c |4 +-
 gfs2/libgfs2/libgfs2.h |   28 +--
 gfs2/libgfs2/super.c   |   57 ---
 gfs2/mkfs/main_grow.c  |2 +-
 5 files changed, 69 insertions(+), 49 deletions(-)

diff --git a/gfs2/edit/savemeta.c b/gfs2/edit/savemeta.c
index e8bcf8f..1587438 100644
--- a/gfs2/edit/savemeta.c
+++ b/gfs2/edit/savemeta.c
@@ -668,26 +668,13 @@ void savemeta(char *out_fn, int saveoption, int gziplevel)
fprintf(stderr, Bad constants (1)\n);
exit(-1);
}
-   if(sbd.gfs1) {
-   sbd.bsize = sbd.sd_sb.sb_bsize;
-   sbd.sd_inptrs = (sbd.bsize -
-sizeof(struct gfs_indirect)) /
-   sizeof(uint64_t);
-   sbd.sd_diptrs = (sbd.bsize -
- sizeof(struct gfs_dinode)) /
-   sizeof(uint64_t);
-   } else {
-   if (read_sb(sbd)  0)
-   slow = TRUE;
-   else {
-   sbd.sd_inptrs = (sbd.bsize -
-sizeof(struct gfs2_meta_header)) /
-   sizeof(uint64_t);
-   sbd.sd_diptrs = (sbd.bsize -
- sizeof(struct gfs2_dinode)) /
-   sizeof(uint64_t);
-   }
+   sbd.gfs1 = read_sb(sbd, 1);
+   if (sbd.gfs1  0) {
+   slow = TRUE;
+   sbd.gfs1 = 0;
}
+   if (sbd.gfs1)
+   sbd.bsize = sbd.sd_sb.sb_bsize;
}
last_fs_block = lseek(sbd.device_fd, 0, SEEK_END) / sbd.bsize;
printf(There are %llu blocks of %u bytes in the destination 
@@ -923,7 +910,7 @@ static int restore_data(int fd, gzFile *gzin_fd, int 
printblocksonly,
sbd1-sb_header.mh_format == GFS_FORMAT_SB 
sbd1-sb_multihost_format == GFS_FORMAT_MULTI) {
sbd.gfs1 = TRUE;
-   } else if (check_sb(sbd.sd_sb)) {
+   } else if (check_sb(sbd.sd_sb, 0)) {
fprintf(stderr,Error: Invalid superblock 
data.\n);
return -1;
}
diff --git a/gfs2/fsck/initialize.c b/gfs2/fsck/initialize.c
index 2506108..18d13cc 100644
--- a/gfs2/fsck/initialize.c
+++ b/gfs2/fsck/initialize.c
@@ -1160,7 +1160,7 @@ static int fill_super_block(struct gfs2_sbd *sdp)
log_crit(_(Bad constants (1)\n));
exit(-1);
}
-   if (read_sb(sdp)  0) {
+   if (read_sb(sdp, 0)  0) {
/* First, check for a gfs1 (not gfs2) file system */
if (sdp-sd_sb.sb_header.mh_magic == GFS2_MAGIC 
sdp-sd_sb.sb_header.mh_type == GFS2_METATYPE_SB)
@@ -1169,7 +1169,7 @@ static int fill_super_block(struct gfs2_sbd *sdp)
if (sb_repair(sdp) != 0)
return -1; /* unrepairable, so exit */
/* Now that we've tried to repair it, re-read it. */
-   if (read_sb(sdp)  0)
+   if (read_sb(sdp, 0)  0)
return -1;
}
 
diff --git a/gfs2/libgfs2/libgfs2.h b/gfs2/libgfs2/libgfs2.h
index 5f66312..82c39f1 100644
--- a/gfs2/libgfs2/libgfs2.h
+++ b/gfs2/libgfs2/libgfs2.h
@@ -493,7 +493,29 @@ extern int write_journal(struct gfs2_sbd *sdp, unsigned 
int j,
 
 extern int device_size(int fd, uint64_t *bytes);
 
-/* gfs1.c - GFS1 backward compatibility functions */
+/* gfs1.c - GFS1 backward compatibility structures and functions */
+
+#define GFS_FORMAT_SB   (100)  /* Super-Block */
+#define GFS_METATYPE_SB (1)/* Super-Block */
+#define GFS_FORMAT_FS   (1309) /* Filesystem (all-encompassing) */
+#define GFS_FORMAT_MULTI(1401) /* Multi-Host */
+/* GFS1 Dinode types  */
+#define GFS_FILE_NON(0)
+#define GFS_FILE_REG(1)/* regular file */
+#define GFS_FILE_DIR(2)/* directory */
+#define GFS_FILE_LNK(5)/* link */
+#define GFS_FILE_BLK(7)/* block device node */
+#define GFS_FILE_CHR(8)/* character device node

[Cluster-devel] [Patch 03/44] libgfs1: Add a centralized gfs1 variable to superblock variables

2011-08-11 Thread Bob Peterson

From 32a60225151e08ec6291e43dec2ab8ac7f24db21 Mon Sep 17 00:00:00 2001
From: Bob Peterson rpete...@redhat.com
Date: Mon, 8 Aug 2011 09:45:36 -0500
Subject: [PATCH 03/44] libgfs1: Add a centralized gfs1 variable to superblock
 variables

This patch adds a gfs1 variable to the in-core superblock structure
for utils that can operate on both gfs and gfs2 file systems and need
to determine which is which.

rhbz#675723
---
 gfs2/edit/extended.c   |   10 ++--
 gfs2/edit/gfs2hex.c|   13 +++---
 gfs2/edit/hexedit.c|  111 
 gfs2/edit/hexedit.h|3 +-
 gfs2/edit/savemeta.c   |   30 ++--
 gfs2/libgfs2/libgfs2.h |1 +
 6 files changed, 84 insertions(+), 84 deletions(-)

diff --git a/gfs2/edit/extended.c b/gfs2/edit/extended.c
index 3cf6f8b..575c387 100644
--- a/gfs2/edit/extended.c
+++ b/gfs2/edit/extended.c
@@ -66,7 +66,7 @@ static int _do_indirect_extended(char *diebuf, struct iinfo 
*iinf, int hgt)
iinf-ii[x].dirents = 0;
memset(iinf-ii[x].dirent, 0, sizeof(struct gfs2_dirents));
}
-   for (x = (gfs1 ? sizeof(struct gfs_indirect):
+   for (x = (sbd.gfs1 ? sizeof(struct gfs_indirect):
  sizeof(struct gfs2_meta_header)), y = 0;
 x  sbd.bsize;
 x += sizeof(uint64_t), y++) {
@@ -251,7 +251,7 @@ static int display_indirect(struct iinfo *ind, int 
indblocks, int level,
 /*  */
 static void print_inode_type(__be16 de_type)
 {
-   if (gfs1) {
+   if (sbd.gfs1) {
switch(de_type) {
case GFS_FILE_NON:
print_gfs2(Unknown);
@@ -515,7 +515,7 @@ static int parse_rindex(struct gfs2_inode *dip, int 
print_rindex)
 
roff = print_entry_ndx * risize();
 
-   if (gfs1)
+   if (sbd.gfs1)
error = gfs1_readi(dip, (void *)rbuf, roff, risize());
else
error = gfs2_readi(dip, (void *)rbuf, roff, risize());
@@ -543,7 +543,7 @@ static int parse_rindex(struct gfs2_inode *dip, int 
print_rindex)
struct gfs2_buffer_head *tmp_bh;
 
tmp_bh = bread(sbd, ri.ri_addr);
-   if (gfs1) {
+   if (sbd.gfs1) {
struct gfs_rgrp rg1;
gfs_rgrp_in(rg1, tmp_bh);
gfs_rgrp_print(rg1);
@@ -664,7 +664,7 @@ int display_extended(void)
else if (display_indirect(indirect, indirect_blocks, 0, 0) == 0)
return -1;
else if (block_is_rglist()) {
-   if (gfs1)
+   if (sbd.gfs1)
tmp_bh = bread(sbd, sbd1-sb_rindex_di.no_addr);
else
tmp_bh = bread(sbd, masterblock(rindex));
diff --git a/gfs2/edit/gfs2hex.c b/gfs2/edit/gfs2hex.c
index 8dbd7e5..40959c7 100644
--- a/gfs2/edit/gfs2hex.c
+++ b/gfs2/edit/gfs2hex.c
@@ -45,7 +45,6 @@ uint64_t block = 0;
 int blockhist = 0;
 struct iinfo *indirect;
 int indirect_blocks;
-int gfs1  = 0;
 uint64_t block_in_mem = -1;
 struct gfs2_sbd sbd;
 uint64_t starting_blk;
@@ -277,7 +276,7 @@ void do_dinode_extended(struct gfs2_dinode *dine, struct 
gfs2_buffer_head *lbh)
unsigned int x, y, ptroff = 0;
uint64_t p, last;
int isdir = !!(S_ISDIR(dine-di_mode)) || 
-   (gfs1  dine-__pad1 == GFS_FILE_DIR);
+   (sbd.gfs1  dine-__pad1 == GFS_FILE_DIR);
 
indirect_blocks = 0;
memset(indirect, 0, sizeof(indirect));
@@ -468,11 +467,11 @@ static void gfs2_sb_print2(struct gfs2_sb *sbp2)
pv(sbp2, sb_fs_format, %u, 0x%x);
pv(sbp2, sb_multihost_format, %u, 0x%x);
 
-   if (gfs1)
+   if (sbd.gfs1)
pv(sbd1, sb_flags, %u, 0x%x);
pv(sbp2, sb_bsize, %u, 0x%x);
pv(sbp2, sb_bsize_shift, %u, 0x%x);
-   if (gfs1) {
+   if (sbd.gfs1) {
pv(sbd1, sb_seg_size, %u, 0x%x);
gfs2_inum_print2(jindex ino, sbd1-sb_jindex_di);
gfs2_inum_print2(rindex ino, sbd1-sb_rindex_di);
@@ -483,7 +482,7 @@ static void gfs2_sb_print2(struct gfs2_sb *sbp2)
 
pv(sbp2, sb_lockproto, %s, NULL);
pv(sbp2, sb_locktable, %s, NULL);
-   if (gfs1) {
+   if (sbd.gfs1) {
gfs2_inum_print2(quota ino , gfs1_quota_di);
gfs2_inum_print2(license   , gfs1_license_di);
}
@@ -575,7 +574,7 @@ int display_gfs2(void)
break;
 
case GFS2_METATYPE_RG:
-   if (gfs1) {
+   if (sbd.gfs1) {
struct gfs1_rgrp rg1;
 
gfs1_rgrp_in(rg1, bh);
@@ -608,7 +607,7 @@ int display_gfs2(void

[Cluster-devel] [Patch 05/44] libgfs2: add generic risize function, move gfs1 structures to libgfs2

2011-08-11 Thread Bob Peterson

From e02f2523bfd3b3cc12b07cabdcc388b88c0ba769 Mon Sep 17 00:00:00 2001
From: Bob Peterson rpete...@redhat.com
Date: Mon, 8 Aug 2011 11:23:47 -0500
Subject: [PATCH 05/44] libgfs2: add generic risize function, move gfs1
 structures to libgfs2

This patch moves a number of gfs1-specific structures from gfs2_edit to
libgfs2 so other utils can reference them.  In addition, this moves the
risize function so that callers of libgfs2 can reference the proper rindex
entry size depending on whether they're operating on gfs1 or gfs2.
It also changes function rindex_read so it can operate on gfs1 or gfs2
rindex files.

rhbz#675723
---
 gfs2/edit/extended.c   |   11 +++--
 gfs2/edit/hexedit.c|   11 ++--
 gfs2/edit/hexedit.h|  126 
 gfs2/libgfs2/libgfs2.h |   95 
 gfs2/libgfs2/super.c   |   60 ---
 5 files changed, 149 insertions(+), 154 deletions(-)

diff --git a/gfs2/edit/extended.c b/gfs2/edit/extended.c
index 575c387..47938e4 100644
--- a/gfs2/edit/extended.c
+++ b/gfs2/edit/extended.c
@@ -505,7 +505,8 @@ static int parse_rindex(struct gfs2_inode *dip, int 
print_rindex)
 
start_line = line;
error = 0;
-   print_gfs2(RG index entries found: %d., dip-i_di.di_size / risize());
+   print_gfs2(RG index entries found: %d., dip-i_di.di_size /
+  risize(sbd));
eol(0);
lines_per_row[dmode] = 6;
memset(highlighted_addr, 0, sizeof(highlighted_addr));
@@ -513,12 +514,14 @@ static int parse_rindex(struct gfs2_inode *dip, int 
print_rindex)
for (print_entry_ndx=0; ; print_entry_ndx++) {
uint64_t roff;
 
-   roff = print_entry_ndx * risize();
+   roff = print_entry_ndx * risize(sbd);
 
if (sbd.gfs1)
-   error = gfs1_readi(dip, (void *)rbuf, roff, risize());
+   error = gfs1_readi(dip, (void *)rbuf, roff,
+  risize(sbd));
else
-   error = gfs2_readi(dip, (void *)rbuf, roff, risize());
+   error = gfs2_readi(dip, (void *)rbuf, roff,
+  risize(sbd));
if (!error) /* end of file */
break;
gfs2_rindex_in(ri, rbuf);
diff --git a/gfs2/edit/hexedit.c b/gfs2/edit/hexedit.c
index 651c6f6..366f515 100644
--- a/gfs2/edit/hexedit.c
+++ b/gfs2/edit/hexedit.c
@@ -1466,7 +1466,7 @@ uint64_t masterblock(const char *fn)
 static void rgcount(void)
 {
printf(%lld RGs in this file system.\n,
-  (unsigned long long)sbd.md.riinode-i_di.di_size / risize());
+  (unsigned long long)sbd.md.riinode-i_di.di_size / risize(sbd));
inode_put(sbd.md.riinode);
gfs2_rgrp_free(sbd.rglist);
exit(EXIT_SUCCESS);
@@ -1481,7 +1481,7 @@ static uint64_t find_rgrp_block(struct gfs2_inode *dif, 
int rg)
struct gfs2_rindex fbuf, ri;
uint64_t foffset, gfs1_adj = 0;
 
-   foffset = rg * risize();
+   foffset = rg * risize(sbd);
if (sbd.gfs1) {
uint64_t sd_jbsize =
(sbd.bsize - sizeof(struct gfs2_meta_header));
@@ -1490,7 +1490,7 @@ static uint64_t find_rgrp_block(struct gfs2_inode *dif, 
int rg)
sizeof(struct gfs2_meta_header);
gfs1_adj += sizeof(struct gfs2_meta_header);
}
-   amt = gfs2_readi(dif, (void *)fbuf, foffset + gfs1_adj, risize());
+   amt = gfs2_readi(dif, (void *)fbuf, foffset + gfs1_adj, risize(sbd));
if (!amt) /* end of file */
return 0;
gfs2_rindex_in(ri, (void *)fbuf);
@@ -1559,11 +1559,12 @@ static uint64_t get_rg_addr(int rgnum)
else
gblock = masterblock(rindex);
riinode = inode_read(sbd, gblock);
-   if (rgnum  riinode-i_di.di_size / risize())
+   if (rgnum  riinode-i_di.di_size / risize(sbd))
rgblk = find_rgrp_block(riinode, rgnum);
else
fprintf(stderr, Error: File system only has %lld RGs.\n,
-   (unsigned long long)riinode-i_di.di_size / risize());
+   (unsigned long long)riinode-i_di.di_size /
+   risize(sbd));
inode_put(riinode);
return rgblk;
 }
diff --git a/gfs2/edit/hexedit.h b/gfs2/edit/hexedit.h
index 8a3c615..f7b539e 100644
--- a/gfs2/edit/hexedit.h
+++ b/gfs2/edit/hexedit.h
@@ -21,27 +21,6 @@
 enum dsp_mode { HEX_MODE = 0, GFS2_MODE = 1, EXTENDED_MODE = 2, INIT_MODE = 3 
};
 #define BLOCK_STACK_SIZE 256
 
-#define GFS_FORMAT_SB   (100)  /* Super-Block */
-#define GFS_METATYPE_SB (1)/* Super-Block */
-#define GFS_FORMAT_FS   (1309) /* Filesystem (all-encompassing) */
-#define GFS_FORMAT_MULTI(1401) /* Multi-Host */
-/* GFS1 Dinode types  */
-#define GFS_FILE_NON(0

[Cluster-devel] [Patch 06/44] fsck.gfs2: Check for blocks wrongly inside resource groups

2011-08-11 Thread Bob Peterson

From 7bb269a5158f81c6c5d9190c4f76d73a83e3c9d7 Mon Sep 17 00:00:00 2001
From: Bob Peterson rpete...@redhat.com
Date: Mon, 8 Aug 2011 12:46:29 -0500
Subject: [PATCH 06/44] fsck.gfs2: Check for blocks wrongly inside resource
 groups

It's not enough to range_check blocks in order to call them valid.
We also need to check whether those block collide with resource groups.
We don't want a bitmap block to ever be referenced unless it's part of
the rgrp and rindex functions.  This patch changes most of the fsck code
from doing simple block range checks to doing range checks plus checks
for blocks inside the resource groups.

rhbz#675723
---
 gfs2/fsck/lost_n_found.c |2 +-
 gfs2/fsck/metawalk.c |   20 +++---
 gfs2/fsck/pass1.c|   66 +++--
 gfs2/fsck/pass1b.c   |2 +-
 gfs2/fsck/pass1c.c   |   10 +++---
 gfs2/fsck/pass2.c|4 +-
 gfs2/fsck/rgrepair.c |9 +++---
 gfs2/fsck/util.c |2 +-
 gfs2/libgfs2/fs_bits.c   |   19 -
 gfs2/libgfs2/libgfs2.h   |1 +
 10 files changed, 77 insertions(+), 58 deletions(-)

diff --git a/gfs2/fsck/lost_n_found.c b/gfs2/fsck/lost_n_found.c
index 04aa90d..4eff83b 100644
--- a/gfs2/fsck/lost_n_found.c
+++ b/gfs2/fsck/lost_n_found.c
@@ -104,7 +104,7 @@ int add_inode_to_lf(struct gfs2_inode *ip){
/* If there's a pre-existing .. directory entry, we have to
   back out the links. */
di = dirtree_find(ip-i_di.di_num.no_addr);
-   if (di  gfs2_check_range(sdp, di-dotdot_parent) == 0) {
+   if (di  !valid_block(sdp, di-dotdot_parent) == 0) {
struct gfs2_inode *dip;
 
log_debug(_(Directory %lld (0x%llx) already had a 
diff --git a/gfs2/fsck/metawalk.c b/gfs2/fsck/metawalk.c
index 3cee0fd..5d0afa5 100644
--- a/gfs2/fsck/metawalk.c
+++ b/gfs2/fsck/metawalk.c
@@ -455,7 +455,7 @@ static void warn_and_patch(struct gfs2_inode *ip, uint64_t 
*leaf_no,
}
if (*leaf_no == *bad_leaf ||
query( _(Attempt to patch around it? (y/n) ))) {
-   if (gfs2_check_range(ip-i_sbd, old_leaf) == 0)
+   if (!valid_block(ip-i_sbd, old_leaf) == 0)
gfs2_put_leaf_nr(ip, pindex, old_leaf);
else
gfs2_put_leaf_nr(ip, pindex, first_ok_leaf);
@@ -605,7 +605,7 @@ static int check_leaf_blks(struct gfs2_inode *ip, struct 
metawalk_fxns *pass)
first_ok_leaf = leaf_no = -1;
for(lindex = 0; lindex  (1  ip-i_di.di_depth); lindex++) {
gfs2_get_leaf_nr(ip, lindex, leaf_no);
-   if (gfs2_check_range(ip-i_sbd, leaf_no) == 0) {
+   if (!valid_block(ip-i_sbd, leaf_no) == 0) {
lbh = bread(sdp, leaf_no);
/* Make sure it's really a valid leaf block. */
if (gfs2_check_meta(lbh, GFS2_METATYPE_LF) == 0) {
@@ -644,7 +644,7 @@ static int check_leaf_blks(struct gfs2_inode *ip, struct 
metawalk_fxns *pass)
}
 
do {
-   if (gfs2_check_range(ip-i_sbd, old_leaf) == 0) {
+   if (!valid_block(ip-i_sbd, old_leaf) == 0) {
error = check_num_ptrs(ip, old_leaf,
   ref_count, exp_count,
   lindex, oldleaf);
@@ -656,7 +656,7 @@ static int check_leaf_blks(struct gfs2_inode *ip, struct 
metawalk_fxns *pass)
if (fsck_abort)
break;
/* Make sure the block number is in range. */
-   if (gfs2_check_range(ip-i_sbd, leaf_no)){
+   if (!valid_block(ip-i_sbd, leaf_no)){
log_err( _(Leaf block #%llu (0x%llx) is out 
of range for directory #%llu (0x%llx
).\n), (unsigned long long)leaf_no,
@@ -909,7 +909,7 @@ int delete_block(struct gfs2_inode *ip, uint64_t block,
 struct gfs2_buffer_head **bh, const char *btype,
 void *private)
 {
-   if (gfs2_check_range(ip-i_sbd, block) == 0) {
+   if (!valid_block(ip-i_sbd, block) == 0) {
fsck_blockmap_set(ip, block, btype, gfs2_block_free);
return 0;
}
@@ -930,7 +930,7 @@ static int delete_block_if_notdup(struct gfs2_inode *ip, 
uint64_t block,
uint8_t q;
struct duptree *d;
 
-   if (gfs2_check_range(ip-i_sbd, block) != 0)
+   if (!valid_block(ip-i_sbd, block) != 0)
return -EFAULT;
 
q = block_type(block);
@@ -1190,7 +1190,7 @@ static int build_and_check_metalist(struct gfs2_inode 
*ip, osi_list_t *mlp,
   (unsigned long long)block

[Cluster-devel] [Patch 07/44] fsck.gfs2: Rename function check_leaf to check_ealeaf_block

2011-08-11 Thread Bob Peterson

From a2fc75cad03602f8582a744c6a14ddae3a85cffd Mon Sep 17 00:00:00 2001
From: Bob Peterson rpete...@redhat.com
Date: Mon, 8 Aug 2011 12:58:04 -0500
Subject: [PATCH 07/44] fsck.gfs2: Rename function check_leaf to
 check_ealeaf_block

This patch renames function check_leaf_block to check_ealeaf_block to
avoid confusion between directory leaf block handling and extended
attribute leaf block handling.

rhbz#675723
---
 gfs2/fsck/pass1.c |   11 +++
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/gfs2/fsck/pass1.c b/gfs2/fsck/pass1.c
index e2fe73c..30d6b3c 100644
--- a/gfs2/fsck/pass1.c
+++ b/gfs2/fsck/pass1.c
@@ -636,8 +636,11 @@ static int finish_eattr_indir(struct gfs2_inode *ip, int 
leaf_pointers,
return 1;
 }
 
-static int check_leaf_block(struct gfs2_inode *ip, uint64_t block, int btype,
-   struct gfs2_buffer_head **bh, void *private)
+/* check_ealeaf_block
+ *  checks an extended attribute (not directory) leaf block
+ */
+static int check_ealeaf_block(struct gfs2_inode *ip, uint64_t block, int btype,
+ struct gfs2_buffer_head **bh, void *private)
 {
struct gfs2_buffer_head *leaf_bh = NULL;
struct gfs2_sbd *sdp = ip-i_sbd;
@@ -728,7 +731,7 @@ static int check_extended_leaf_eattr(struct gfs2_inode *ip, 
uint64_t *data_ptr,
  gfs2_bad_block);
return 1;
}
-   error = check_leaf_block(ip, el_blk, GFS2_METATYPE_ED, bh, private);
+   error = check_ealeaf_block(ip, el_blk, GFS2_METATYPE_ED, bh, private);
if (bh)
brelse(bh);
return error;
@@ -769,7 +772,7 @@ static int check_eattr_leaf(struct gfs2_inode *ip, uint64_t 
block,
Attribute leaf), gfs2_bad_block);
return 1;
}
-   return check_leaf_block(ip, block, GFS2_METATYPE_EA, bh, private);
+   return check_ealeaf_block(ip, block, GFS2_METATYPE_EA, bh, private);
 }
 
 static int check_eattr_entries(struct gfs2_inode *ip,
-- 
1.7.4.4

[Cluster-devel] [Patch 08/44] fsck.gfs2: eliminate vestigial buffer_head variable in check_leaf

2011-08-11 Thread Bob Peterson

From dfa63a3b56e71b8607098cb02e5162fc01aa8bab Mon Sep 17 00:00:00 2001
From: Bob Peterson rpete...@redhat.com
Date: Mon, 8 Aug 2011 13:28:08 -0500
Subject: [PATCH 08/44] fsck.gfs2: eliminate vestigial buffer_head variable in
 check_leaf

This patch eliminates a variable bh from all the check_leaf metawalk
functions because it is no longer referenced.

rhbzs#675723
---
 gfs2/fsck/metawalk.c |   10 --
 gfs2/fsck/metawalk.h |5 ++---
 gfs2/fsck/pass1.c|   14 ++
 3 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/gfs2/fsck/metawalk.c b/gfs2/fsck/metawalk.c
index 5d0afa5..ea1774a 100644
--- a/gfs2/fsck/metawalk.c
+++ b/gfs2/fsck/metawalk.c
@@ -686,7 +686,7 @@ static int check_leaf_blks(struct gfs2_inode *ip, struct 
metawalk_fxns *pass)
}
gfs2_leaf_in(leaf, lbh);
if (pass-check_leaf)
-   error = pass-check_leaf(ip, leaf_no, lbh,
+   error = pass-check_leaf(ip, leaf_no,
 pass-private);
 
/*
@@ -1462,10 +1462,9 @@ int delete_metadata(struct gfs2_inode *ip, uint64_t 
block,
return delete_block_if_notdup(ip, block, bh, _(metadata), private);
 }
 
-int delete_leaf(struct gfs2_inode *ip, uint64_t block,
-   struct gfs2_buffer_head *bh, void *private)
+int delete_leaf(struct gfs2_inode *ip, uint64_t block, void *private)
 {
-   return delete_block_if_notdup(ip, block, bh, _(leaf), private);
+   return delete_block_if_notdup(ip, block, NULL, _(leaf), private);
 }
 
 int delete_data(struct gfs2_inode *ip, uint64_t block, void *private)
@@ -1528,8 +1527,7 @@ static int alloc_data(struct gfs2_inode *ip, uint64_t 
block, void *private)
return 0;
 }
 
-static int alloc_leaf(struct gfs2_inode *ip, uint64_t block,
- struct gfs2_buffer_head *bh, void *private)
+static int alloc_leaf(struct gfs2_inode *ip, uint64_t block, void *private)
 {
uint8_t q;
 
diff --git a/gfs2/fsck/metawalk.h b/gfs2/fsck/metawalk.h
index c1e61fb..ea023b6 100644
--- a/gfs2/fsck/metawalk.h
+++ b/gfs2/fsck/metawalk.h
@@ -20,8 +20,7 @@ extern int delete_block(struct gfs2_inode *ip, uint64_t block,
 void *private);
 extern int delete_metadata(struct gfs2_inode *ip, uint64_t block,
   struct gfs2_buffer_head **bh, int h, void *private);
-extern int delete_leaf(struct gfs2_inode *ip, uint64_t block,
-   struct gfs2_buffer_head *bh, void *private);
+extern int delete_leaf(struct gfs2_inode *ip, uint64_t block, void *private);
 extern int delete_data(struct gfs2_inode *ip, uint64_t block, void *private);
 extern int delete_eattr_indir(struct gfs2_inode *ip, uint64_t block, uint64_t 
parent,
   struct gfs2_buffer_head **bh, void *private);
@@ -60,7 +59,7 @@ extern struct gfs2_inode *fsck_system_inode(struct gfs2_sbd 
*sdp,
 struct metawalk_fxns {
void *private;
int (*check_leaf) (struct gfs2_inode *ip, uint64_t block,
-  struct gfs2_buffer_head *bh, void *private);
+  void *private);
int (*check_metalist) (struct gfs2_inode *ip, uint64_t block,
   struct gfs2_buffer_head **bh, int h,
   void *private);
diff --git a/gfs2/fsck/pass1.c b/gfs2/fsck/pass1.c
index 30d6b3c..f0e7277 100644
--- a/gfs2/fsck/pass1.c
+++ b/gfs2/fsck/pass1.c
@@ -34,8 +34,7 @@ struct block_count {
uint64_t ea_count;
 };
 
-static int leaf(struct gfs2_inode *ip, uint64_t block,
-   struct gfs2_buffer_head *bh, void *private);
+static int leaf(struct gfs2_inode *ip, uint64_t block, void *private);
 static int check_metalist(struct gfs2_inode *ip, uint64_t block,
  struct gfs2_buffer_head **bh, int h, void *private);
 static int undo_check_metalist(struct gfs2_inode *ip, uint64_t block,
@@ -66,7 +65,7 @@ static int invalidate_metadata(struct gfs2_inode *ip, 
uint64_t block,
   struct gfs2_buffer_head **bh, int h,
   void *private);
 static int invalidate_leaf(struct gfs2_inode *ip, uint64_t block,
-  struct gfs2_buffer_head *bh, void *private);
+  void *private);
 static int invalidate_data(struct gfs2_inode *ip, uint64_t block,
   void *private);
 static int invalidate_eattr_indir(struct gfs2_inode *ip, uint64_t block,
@@ -200,8 +199,7 @@ struct metawalk_fxns sysdir_fxns = {
.check_dentry = resuscitate_dentry,
 };
 
-static int leaf(struct gfs2_inode *ip, uint64_t block,
-   struct gfs2_buffer_head *bh, void *private)
+static int leaf(struct gfs2_inode *ip, uint64_t block, void *private)
 {
struct block_count *bc = (struct block_count *) private;
 
@@ -856,7 +854,7 @@ static int

[Cluster-devel] [Patch 09/44] fsck.gfs2: Rename the nlink functions to make them more intuitive

2011-08-11 Thread Bob Peterson

From 55e442c79adec0fa7f6d4e7f6700f14a630d4e3e Mon Sep 17 00:00:00 2001
From: Bob Peterson rpete...@redhat.com
Date: Mon, 8 Aug 2011 14:01:18 -0500
Subject: [PATCH 09/44] fsck.gfs2: Rename the nlink functions to make them
 more intuitive

Part of fsck's checks is to verify the count of links for directories,
but the variable names and function names are too confusing to understand
without in-depth analysis of what the code is doing.

This patch renames the structure variable link_count to something that
makes more intuitive sense: di_nlink, which matches the variable name in
the dinode.  That distinguishes it from the number of links that fsck is
trying to count manually.  It also renames the link count functions to
make them more intuitively obvious as well: set_di_nlink sets the di_nlink
based on the value passed in from the dinode, and incr_link_count and
decr_link_count increment and decrement the counted links respectively.

rhbz#675723
---
 gfs2/fsck/fsck.h |4 ++--
 gfs2/fsck/link.c |   16 ++--
 gfs2/fsck/link.h |   10 +-
 gfs2/fsck/lost_n_found.c |   29 ++---
 gfs2/fsck/pass1.c|2 +-
 gfs2/fsck/pass2.c|   20 ++--
 gfs2/fsck/pass3.c|4 ++--
 gfs2/fsck/pass4.c|   14 +++---
 8 files changed, 51 insertions(+), 48 deletions(-)

diff --git a/gfs2/fsck/fsck.h b/gfs2/fsck/fsck.h
index 25bc3b9..6353dfc 100644
--- a/gfs2/fsck/fsck.h
+++ b/gfs2/fsck/fsck.h
@@ -29,8 +29,8 @@ struct inode_info
 {
 struct osi_node node;
 uint64_t   inode;
-uint16_t   link_count;   /* the number of links the inode
-  * thinks it has */
+uint16_t   di_nlink;   /* the number of links the inode
+   * thinks it has */
 uint16_t   counted_links; /* the number of links we've found */
 };
 
diff --git a/gfs2/fsck/link.c b/gfs2/fsck/link.c
index 08ea94c..e49f3af 100644
--- a/gfs2/fsck/link.c
+++ b/gfs2/fsck/link.c
@@ -13,22 +13,26 @@
 #include inode_hash.h
 #include link.h
 
-int set_link_count(uint64_t inode_no, uint32_t count)
+int set_di_nlink(struct gfs2_inode *ip)
 {
struct inode_info *ii;
+   uint64_t inode_no = ip-i_di.di_num.no_addr;
+
+   /*log_debug( _(Setting link count to %u for % PRIu64
+  (0x% PRIx64 )\n), count, inode_no, inode_no);*/
/* If the list has entries, look for one that matches inode_no */
ii = inodetree_find(inode_no);
if (!ii)
ii = inodetree_insert(inode_no);
if (ii)
-   ii-link_count = count;
+   ii-di_nlink = ip-i_di.di_nlink;
else
return -1;
return 0;
 }
 
-int increment_link(uint64_t inode_no, uint64_t referenced_from,
-  const char *why)
+int incr_link_count(uint64_t inode_no, uint64_t referenced_from,
+   const char *why)
 {
struct inode_info *ii = NULL;
 
@@ -61,8 +65,8 @@ int increment_link(uint64_t inode_no, uint64_t 
referenced_from,
return 0;
 }
 
-int decrement_link(uint64_t inode_no, uint64_t referenced_from,
-  const char *why)
+int decr_link_count(uint64_t inode_no, uint64_t referenced_from,
+   const char *why)
 {
struct inode_info *ii = NULL;
 
diff --git a/gfs2/fsck/link.h b/gfs2/fsck/link.h
index f890575..ad040e6 100644
--- a/gfs2/fsck/link.h
+++ b/gfs2/fsck/link.h
@@ -1,10 +1,10 @@
 #ifndef _LINK_H
 #define _LINK_H
 
-int set_link_count(uint64_t inode_no, uint32_t count);
-int increment_link(uint64_t inode_no, uint64_t referenced_from,
-  const char *why);
-int decrement_link(uint64_t inode_no, uint64_t referenced_from,
-  const char *why);
+int set_di_nlink(struct gfs2_inode *ip);
+int incr_link_count(uint64_t inode_no, uint64_t referenced_from,
+   const char *why);
+int decr_link_count(uint64_t inode_no, uint64_t referenced_from,
+   const char *why);
 
 #endif /* _LINK_H */
diff --git a/gfs2/fsck/lost_n_found.c b/gfs2/fsck/lost_n_found.c
index 4eff83b..32f3c5c 100644
--- a/gfs2/fsck/lost_n_found.c
+++ b/gfs2/fsck/lost_n_found.c
@@ -51,8 +51,7 @@ int add_inode_to_lf(struct gfs2_inode *ip){
   the root directory.  We must increment the nlink value
   in the hash table to keep them in sync so that pass4 can
   detect and fix any descrepancies. */
-   set_link_count(sdp-sd_sb.sb_root_dir.no_addr,
-  sdp-md.rooti-i_di.di_nlink);
+   set_di_nlink(sdp-md.rooti);
 
q = block_type(lf_dip-i_di.di_num.no_addr);
if (q != gfs2_inode_dir) {
@@ -68,15 +67,15 @@ int add_inode_to_lf(struct gfs2_inode *ip){
  _(lost+found dinode),
  gfs2_inode_dir);
/* root inode links

[Cluster-devel] [Patch 10/44] fsck.gfs2: Keep di_nlink in sync when adding links for lost+found

2011-08-11 Thread Bob Peterson

From da57639e65b148bb4d2a3c6a9d98623b8ad18b04 Mon Sep 17 00:00:00 2001
From: Bob Peterson rpete...@redhat.com
Date: Mon, 8 Aug 2011 14:21:06 -0500
Subject: [PATCH 10/44] fsck.gfs2: Keep di_nlink in sync when adding links for
 lost+found

When adding a .. entry to a directory newly linked to lost+found
fsck.gfs2 needs to update its di_nlink value to account for the new
link.  If not, it can correct the di_nlink value to the wrong
value and not find the error until a second fsck.gfs2 is done.
This only happens in the rare case where there is no pre-existing
.. entry that may be reused to re-link to lost+found.

rhbz#675723
---
 gfs2/fsck/lost_n_found.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/gfs2/fsck/lost_n_found.c b/gfs2/fsck/lost_n_found.c
index 32f3c5c..b6f02b9 100644
--- a/gfs2/fsck/lost_n_found.c
+++ b/gfs2/fsck/lost_n_found.c
@@ -118,6 +118,7 @@ int add_inode_to_lf(struct gfs2_inode *ip){
dip = fsck_load_inode(sdp, di-dotdot_parent);
if (dip-i_di.di_nlink  0) {
dip-i_di.di_nlink--;
+   set_di_nlink(dip); /* keep inode tree in sync */
log_debug(_(Decrementing its links to %d\n),
  dip-i_di.di_nlink);
bmodified(dip-i_bh);
@@ -128,6 +129,7 @@ int add_inode_to_lf(struct gfs2_inode *ip){
Changing it to 0.\n),
  dip-i_di.di_nlink);
dip-i_di.di_nlink = 0;
+   set_di_nlink(dip); /* keep inode tree in sync */
bmodified(dip-i_bh);
}
fsck_inode_put(dip);
-- 
1.7.4.4

[Cluster-devel] [Patch 11/44] fsck.gfs2: directory entry count was only 16 bits in check_entries

2011-08-11 Thread Bob Peterson

From 0dc5622515a2e888efd89cb33e7c2fe600f895a5 Mon Sep 17 00:00:00 2001
From: Bob Peterson rpete...@redhat.com
Date: Mon, 8 Aug 2011 14:38:19 -0500
Subject: [PATCH 11/44] fsck.gfs2: directory entry count was only 16 bits in
 check_entries

When counting directory links, fsck.gfs2 was using a 16-bit integer.
Therefore, if a directory had more than 65535 links, it would wrap to
zero and the counts would be damaged by fsck.gfs2.  Subsequent runs would
not find the corruption, but it was there nonetheless.  You would
encounter it if you tried to delete enough entries to cause the count
to become negative.

rhbz#675723
---
 gfs2/fsck/fsck.h |6 +++---
 gfs2/fsck/metawalk.c |8 
 gfs2/fsck/metawalk.h |2 +-
 gfs2/fsck/pass1.c|2 +-
 gfs2/fsck/pass1b.c   |4 ++--
 gfs2/fsck/pass2.c|2 +-
 6 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/gfs2/fsck/fsck.h b/gfs2/fsck/fsck.h
index 6353dfc..0fed06b 100644
--- a/gfs2/fsck/fsck.h
+++ b/gfs2/fsck/fsck.h
@@ -29,9 +29,9 @@ struct inode_info
 {
 struct osi_node node;
 uint64_t   inode;
-uint16_t   di_nlink;   /* the number of links the inode
-   * thinks it has */
-uint16_t   counted_links; /* the number of links we've found */
+uint32_t   di_nlink;/* the number of links the inode
+* thinks it has */
+uint32_t   counted_links; /* the number of links we've found */
 };
 
 struct dir_info
diff --git a/gfs2/fsck/metawalk.c b/gfs2/fsck/metawalk.c
index ea1774a..f2cd938 100644
--- a/gfs2/fsck/metawalk.c
+++ b/gfs2/fsck/metawalk.c
@@ -300,7 +300,7 @@ static void dirblk_truncate(struct gfs2_inode *ip, struct 
gfs2_dirent *fixb,
  * -1 - error occurred
  */
 static int check_entries(struct gfs2_inode *ip, struct gfs2_buffer_head *bh,
- int type, uint16_t *count, struct metawalk_fxns *pass)
+ int type, uint32_t *count, struct metawalk_fxns *pass)
 {
struct gfs2_dirent *dent;
struct gfs2_dirent de, *prev;
@@ -596,7 +596,7 @@ static int check_leaf_blks(struct gfs2_inode *ip, struct 
metawalk_fxns *pass)
struct gfs2_buffer_head *lbh;
int lindex;
struct gfs2_sbd *sdp = ip-i_sbd;
-   uint16_t count;
+   uint32_t count;
int ref_count = 0, exp_count = 0;
 
/* Find the first valid leaf pointer in range and use it as our old
@@ -1373,7 +1373,7 @@ int check_linear_dir(struct gfs2_inode *ip, struct 
gfs2_buffer_head *bh,
 struct metawalk_fxns *pass)
 {
int error = 0;
-   uint16_t count = 0;
+   uint32_t count = 0;
 
error = check_entries(ip, bh, DIR_LINEAR, count, pass);
if (error  0) {
@@ -1406,7 +1406,7 @@ int check_dir(struct gfs2_sbd *sdp, uint64_t block, 
struct metawalk_fxns *pass)
 static int remove_dentry(struct gfs2_inode *ip, struct gfs2_dirent *dent,
 struct gfs2_dirent *prev_de,
 struct gfs2_buffer_head *bh,
-char *filename, uint16_t *count, void *private)
+char *filename, uint32_t *count, void *private)
 {
/* the metawalk_fxn's private field must be set to the dentry
 * block we want to clear */
diff --git a/gfs2/fsck/metawalk.h b/gfs2/fsck/metawalk.h
index ea023b6..c15d7b7 100644
--- a/gfs2/fsck/metawalk.h
+++ b/gfs2/fsck/metawalk.h
@@ -74,7 +74,7 @@ struct metawalk_fxns {
int (*check_dentry) (struct gfs2_inode *ip, struct gfs2_dirent *de,
 struct gfs2_dirent *prev,
 struct gfs2_buffer_head *bh,
-char *filename, uint16_t *count, void *private);
+char *filename, uint32_t *count, void *private);
int (*check_eattr_entry) (struct gfs2_inode *ip,
  struct gfs2_buffer_head *leaf_bh,
  struct gfs2_ea_header *ea_hdr,
diff --git a/gfs2/fsck/pass1.c b/gfs2/fsck/pass1.c
index 1bd8464..b9aa165 100644
--- a/gfs2/fsck/pass1.c
+++ b/gfs2/fsck/pass1.c
@@ -147,7 +147,7 @@ static int resuscitate_metalist(struct gfs2_inode *ip, 
uint64_t block,
 static int resuscitate_dentry(struct gfs2_inode *ip, struct gfs2_dirent *dent,
  struct gfs2_dirent *prev_de,
  struct gfs2_buffer_head *bh, char *filename,
- uint16_t *count, void *priv)
+ uint32_t *count, void *priv)
 {
struct gfs2_sbd *sdp = ip-i_sbd;
struct gfs2_dirent dentry, *de;
diff --git a/gfs2/fsck/pass1b.c b/gfs2/fsck/pass1b.c
index bbf33d2..9497c78 100644
--- a/gfs2/fsck/pass1b.c
+++ b/gfs2/fsck/pass1b.c
@@ -49,7 +49,7 @@ static int check_eattr_extentry(struct gfs2_inode *ip, 
uint64_t *ea_data_ptr,
void *private);
 static int find_dentry(struct gfs2_inode *ip, struct

[Cluster-devel] [Patch 12/44] fsck.gfs2: get rid of triple negative logic

2011-08-11 Thread Bob Peterson

From 62d7423184da8e291396cb54269612d501437006 Mon Sep 17 00:00:00 2001
From: Bob Peterson rpete...@redhat.com
Date: Mon, 8 Aug 2011 14:44:46 -0500
Subject: [PATCH 12/44] fsck.gfs2: get rid of triple negative logic

This patch changes the logic of the code from being triple-negative
to single-negative so it won't twist your brain into knots.

rhbz#675723
---
 gfs2/fsck/metawalk.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gfs2/fsck/metawalk.c b/gfs2/fsck/metawalk.c
index f2cd938..a4d7d3e 100644
--- a/gfs2/fsck/metawalk.c
+++ b/gfs2/fsck/metawalk.c
@@ -930,7 +930,7 @@ static int delete_block_if_notdup(struct gfs2_inode *ip, 
uint64_t block,
uint8_t q;
struct duptree *d;
 
-   if (!valid_block(ip-i_sbd, block) != 0)
+   if (!valid_block(ip-i_sbd, block))
return -EFAULT;
 
q = block_type(block);
-- 
1.7.4.4

[Cluster-devel] [Patch 13/44] dirent_repair needs to mark the buffer as modified

2011-08-11 Thread Bob Peterson

From fa744b806ad8655c9ed3a18fcbec1c7992735be5 Mon Sep 17 00:00:00 2001
From: Bob Peterson rpete...@redhat.com
Date: Mon, 8 Aug 2011 14:47:49 -0500
Subject: [PATCH 13/44] dirent_repair needs to mark the buffer as modified

This patch adds a call to bmodified to function dirent_repair.  Without
setting the modified bit, directory repairs may be forgotten and never
written back to disk, leaving the damage in place.

rhbz#675723
---
 gfs2/fsck/metawalk.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/gfs2/fsck/metawalk.c b/gfs2/fsck/metawalk.c
index a4d7d3e..6bdea5a 100644
--- a/gfs2/fsck/metawalk.c
+++ b/gfs2/fsck/metawalk.c
@@ -266,6 +266,7 @@ static int dirent_repair(struct gfs2_inode *ip, struct 
gfs2_buffer_head *bh,
de-de_rec_len = GFS2_DIRENT_SIZE(de-de_name_len);
}
gfs2_dirent_out(de, (char *)dent);
+   bmodified(bh);
return 0;
 }
 
-- 
1.7.4.4

[Cluster-devel] [Patch 14/44] fsck.gfs2: Ask to reclaim unlinked meta on a per-rgrp basis only

2011-08-11 Thread Bob Peterson

From 0307db694e7316ab93071239704428ba5e346fcb Mon Sep 17 00:00:00 2001
From: Bob Peterson rpete...@redhat.com
Date: Mon, 8 Aug 2011 15:16:01 -0500
Subject: [PATCH 14/44] fsck.gfs2: Ask to reclaim unlinked meta on a per-rgrp
 basis only

Before this patch, fsck.gfs2 would ask for every unlinked metadata bit
whether you wanted to reclaim it as free space.  This patch makes it
ask only once per resource group, and reports which resource group
so that the user doesn't think it's stuck in an infinite loop.

rhbz#675723
---
 gfs2/fsck/initialize.c |   16 
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/gfs2/fsck/initialize.c b/gfs2/fsck/initialize.c
index 18d13cc..c0e83a7 100644
--- a/gfs2/fsck/initialize.c
+++ b/gfs2/fsck/initialize.c
@@ -195,7 +195,7 @@ static void check_rgrp_integrity(struct gfs2_sbd *sdp, 
struct rgrp_list *rgd,
 int *this_rg_bad)
 {
uint32_t rg_free, rg_reclaimed;
-   int rgb, x, y, off, bytes_to_check, total_bytes_to_check;
+   int rgb, x, y, off, bytes_to_check, total_bytes_to_check, asked = 0;
unsigned int state;
 
rg_free = rg_reclaimed = 0;
@@ -234,9 +234,17 @@ static void check_rgrp_integrity(struct gfs2_sbd *sdp, 
struct rgrp_list *rgd,
}
/* GFS2_BLKST_UNLINKED */
*this_rg_bad = 1;
-   if (!(*fixit)) {
-   if (query(_(Okay to reclaim unlinked 
-   inodes? (y/n
+   if (!asked) {
+   char msg[256];
+
+   asked = 1;
+   sprintf(msg,
+   _(Okay to reclaim unlinked 
+ inodes in resource group 
+ %lld (0x%llx)? (y/n)),
+   (unsigned long 
long)rgd-ri.ri_addr,
+   (unsigned long 
long)rgd-ri.ri_addr);
+   if (query(%s, msg))
*fixit = 1;
}
if (!(*fixit))
-- 
1.7.4.4

[Cluster-devel] [Patch 15/44] fsck.gfs2: Factor out function to add .. entry when linking to lost+found

2011-08-11 Thread Bob Peterson

From 37b96d287c82e81b5626948a80b52d62bb2b8612 Mon Sep 17 00:00:00 2001
From: Bob Peterson rpete...@redhat.com
Date: Mon, 8 Aug 2011 15:44:19 -0500
Subject: [PATCH 15/44] fsck.gfs2: Factor out function to add .. entry when
 linking to lost+found

This function factors out a section of code from function add_inode_to_lf.
This makes it easier to read and gives it the ability to print better
messages regarding where the block was previously linked.

rhbz#675723
---
 gfs2/fsck/lost_n_found.c |  125 ++
 1 files changed, 70 insertions(+), 55 deletions(-)

diff --git a/gfs2/fsck/lost_n_found.c b/gfs2/fsck/lost_n_found.c
index b6f02b9..7ce5db5 100644
--- a/gfs2/fsck/lost_n_found.c
+++ b/gfs2/fsck/lost_n_found.c
@@ -17,6 +17,75 @@
 #include metawalk.h
 #include util.h
 
+static void add_dotdot(struct gfs2_inode *ip)
+{
+   struct dir_info *di;
+   struct gfs2_sbd *sdp = ip-i_sbd;
+   int err;
+
+   log_info( _(Adding .. entry to directory %llu (0x%llx) pointing back 
+   to lost+found\n),
+ (unsigned long long)ip-i_di.di_num.no_addr,
+ (unsigned long long)ip-i_di.di_num.no_addr);
+
+   /* If there's a pre-existing .. directory entry, we have to
+  back out the links. */
+   di = dirtree_find(ip-i_di.di_num.no_addr);
+   if (di  !valid_block(sdp, di-dotdot_parent) == 0) {
+   struct gfs2_inode *dip;
+
+   log_debug(_(Directory %lld (0x%llx) already had a 
+   \..\ link to %lld (0x%llx).\n),
+ (unsigned long long)ip-i_di.di_num.no_addr,
+ (unsigned long long)ip-i_di.di_num.no_addr,
+ (unsigned long long)di-dotdot_parent,
+ (unsigned long long)di-dotdot_parent);
+   decr_link_count(di-dotdot_parent, ip-i_di.di_num.no_addr,
+   _(.. unlinked, moving to lost+found));
+   dip = fsck_load_inode(sdp, di-dotdot_parent);
+   if (dip-i_di.di_nlink  0) {
+   dip-i_di.di_nlink--;
+   set_di_nlink(dip); /* keep inode tree in sync */
+   log_debug(_(Decrementing its links to %d\n),
+ dip-i_di.di_nlink);
+   bmodified(dip-i_bh);
+   } else if (!dip-i_di.di_nlink) {
+   log_debug(_(Its link count is zero.\n));
+   } else {
+   log_debug(_(Its link count is %d!  Changing 
+   it to 0.\n), dip-i_di.di_nlink);
+   dip-i_di.di_nlink = 0;
+   set_di_nlink(dip); /* keep inode tree in sync */
+   bmodified(dip-i_bh);
+   }
+   fsck_inode_put(dip);
+   di = NULL;
+   } else {
+   if (di)
+   log_debug(_(Couldn't find a valid \..\ entry 
+   for orphan directory %lld (0x%llx): 
+   '..' = 0x%llx\n),
+ (unsigned long long)ip-i_di.di_num.no_addr,
+ (unsigned long long)ip-i_di.di_num.no_addr,
+ (unsigned long long)di-dotdot_parent);
+   else
+   log_debug(_(Couldn't find a valid \..\ entry 
+   for orphan directory %lld (0x%llx)\n),
+ (unsigned long long)ip-i_di.di_num.no_addr,
+ (unsigned long long)ip-i_di.di_num.no_addr);
+   }
+   if (gfs2_dirent_del(ip, .., 2))
+   log_warn( _(add_inode_to_lf:  Unable to remove 
+   \..\ directory entry.\n));
+
+   err = dir_add(ip, .., 2, (lf_dip-i_di.di_num), DT_DIR);
+   if (err) {
+   log_crit(_(Error adding .. directory: %s\n),
+strerror(errno));
+   exit(-1);
+   }
+}
+
 /* add_inode_to_lf - Add dir entry to lost+found for the inode
  * @ip: inode to add to lost + found
  *
@@ -95,61 +164,7 @@ int add_inode_to_lf(struct gfs2_inode *ip){
 
switch(ip-i_di.di_mode  S_IFMT){
case S_IFDIR:
-   log_info( _(Adding .. entry pointing to lost+found for 
-   directory %llu (0x%llx)\n),
- (unsigned long long)ip-i_di.di_num.no_addr,
- (unsigned long long)ip-i_di.di_num.no_addr);
-
-   /* If there's a pre-existing .. directory entry, we have to
-  back out the links. */
-   di = dirtree_find(ip-i_di.di_num.no_addr);
-   if (di  !valid_block(sdp, di-dotdot_parent) == 0) {
-   struct gfs2_inode *dip;
-
-   log_debug(_(Directory %lld (0x%llx) already had

[Cluster-devel] [Patch 16/44] libgfs2: Use FUNCTION rather than FILE for debug messages

2011-08-11 Thread Bob Peterson

From 97b0253e2347b87f29ecf5d5fefbb08655358bb2 Mon Sep 17 00:00:00 2001
From: Bob Peterson rpete...@redhat.com
Date: Mon, 8 Aug 2011 16:11:48 -0500
Subject: [PATCH 16/44] libgfs2: Use __FUNCTION__ rather than __FILE__ for
 debug messages

This patch changes the debug output of gfs2-utils to use __FUNCTION__
rather than __FILE__.  The output file is much smaller.  Digging through
a 6.5GB output is better and faster than a 9GB output file.

rhbz#675723
---
 gfs2/fsck/metawalk.c   |   13 +++--
 gfs2/fsck/metawalk.h   |2 +-
 gfs2/libgfs2/libgfs2.h |2 +-
 3 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/gfs2/fsck/metawalk.c b/gfs2/fsck/metawalk.c
index 6bdea5a..9abec79 100644
--- a/gfs2/fsck/metawalk.c
+++ b/gfs2/fsck/metawalk.c
@@ -100,17 +100,10 @@ int _fsck_blockmap_set(struct gfs2_inode *ip, uint64_t 
bblock,
int error;
 
if (print_level = MSG_DEBUG) {
-   const char *p;
-
-   p = strrchr(caller, '/');
-   if (p)
-   p++;
-   else
-   p = caller;
/* I'm circumventing the log levels here on purpose to make the
   output easier to debug. */
if (ip-i_di.di_num.no_addr == bblock) {
-   print_fsck_log(MSG_DEBUG, p, fline,
+   print_fsck_log(MSG_DEBUG, caller, fline,
   _(%s inode found at block %lld 
 (0x%llx): marking as '%s'\n),
   btype, (unsigned long long)
@@ -119,7 +112,7 @@ int _fsck_blockmap_set(struct gfs2_inode *ip, uint64_t 
bblock,
   ip-i_di.di_num.no_addr,
   block_type_string(mark));
} else if (mark == gfs2_bad_block || mark == gfs2_meta_inval) {
-   print_fsck_log(MSG_DEBUG, p, fline,
+   print_fsck_log(MSG_DEBUG, caller, fline,
   _(inode %lld (0x%llx) references 
 %s block %lld (0x%llx): 
 marking as '%s'\n),
@@ -131,7 +124,7 @@ int _fsck_blockmap_set(struct gfs2_inode *ip, uint64_t 
bblock,
   (unsigned long long)bblock,
   block_type_string(mark));
} else {
-   print_fsck_log(MSG_DEBUG, p, fline,
+   print_fsck_log(MSG_DEBUG, caller, fline,
   _(inode %lld (0x%llx) references 
 %s block %lld (0x%llx): 
 marking as '%s'\n),
diff --git a/gfs2/fsck/metawalk.h b/gfs2/fsck/metawalk.h
index c15d7b7..d705726 100644
--- a/gfs2/fsck/metawalk.h
+++ b/gfs2/fsck/metawalk.h
@@ -39,7 +39,7 @@ extern struct gfs2_inode *fsck_system_inode(struct gfs2_sbd 
*sdp,
 #define is_duplicate(dblock) ((dupfind(dblock)) ? 1 : 0)
 
 #define fsck_blockmap_set(ip, b, bt, m) _fsck_blockmap_set(ip, b, bt, m, \
-  __FILE__, __LINE__)
+  __FUNCTION__, 
__LINE__)
 
 /* metawalk_fxns: function pointers to check various parts of the fs
  *
diff --git a/gfs2/libgfs2/libgfs2.h b/gfs2/libgfs2/libgfs2.h
index 8f2ac89..d418d2f 100644
--- a/gfs2/libgfs2/libgfs2.h
+++ b/gfs2/libgfs2/libgfs2.h
@@ -697,7 +697,7 @@ extern int print_level;
 #define MSG_NULL1
 
 #define print_log(priority, format...) \
-   do { print_fsck_log(priority, __FILE__, __LINE__, ## format); } while(0)
+   do { print_fsck_log(priority, __FUNCTION__, __LINE__, ## format); } 
while(0)
 
 #define log_debug(format...) \
do { if(print_level = MSG_DEBUG) print_log(MSG_DEBUG, format); } 
while(0)
-- 
1.7.4.4

[Cluster-devel] [Patch 17/44] fsck.gfs2: Don't stop invalidating blocks if an invalid one is found

2011-08-11 Thread Bob Peterson

From 0f424f6c6a2b4fda8c5b9b2bc1cb246d868d3fec Mon Sep 17 00:00:00 2001
From: Bob Peterson rpete...@redhat.com
Date: Mon, 8 Aug 2011 16:20:14 -0500
Subject: [PATCH 17/44] fsck.gfs2: Don't stop invalidating blocks if an
 invalid one is found

When fsck found a duplicate reference to a block it invalidated the dinode's
metadata.  But if it encountered an invalid block, for example, out of range,
the invalidating would stop.  If we encounter a block that isn't valid, we
obviously can't invalidate it.  However, if we return an error, all future
invalidating will stop for that dinode.  That's wrong because we need it to
continue to invalidate the other valid blocks.  If we don't do this, block
references that follow the bad one that are also referenced elsewhere
(duplicates) won't be flagged as such.  As a result, they'll be freed when
this corrupt dinode is deleted, despite being used by another dinode as a
valid block.  This patch makes it return a good return code so the invalidating
continues.

rhbz#675723
---
 gfs2/fsck/pass1.c |   11 +--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gfs2/fsck/pass1.c b/gfs2/fsck/pass1.c
index b9aa165..2b04227 100644
--- a/gfs2/fsck/pass1.c
+++ b/gfs2/fsck/pass1.c
@@ -827,8 +827,15 @@ static int mark_block_invalid(struct gfs2_inode *ip, 
uint64_t block,
 {
uint8_t q;

-   if (!valid_block(ip-i_sbd, block) != 0)
-   return -EFAULT;
+   /* If the block isn't valid, we obviously can't invalidate it.
+* However, if we return an error, invalidating will stop, and
+* we want it to continue to invalidate the valid blocks.  If we
+* don't do this, block references that follow that are also
+* referenced elsewhere (duplicates) won't be flagged as such,
+* and as a result, they'll be freed when this dinode is deleted,
+* despite being used by another dinode as a valid block. */
+   if (!valid_block(ip-i_sbd, block))
+   return 0;

q = block_type(block);
if (q != gfs2_block_free) {
-- 
1.7.4.4

[Cluster-devel] [Patch 18/44] fsck.gfs2: Find and clear duplicate references that are leaf blocks

2011-08-11 Thread Bob Peterson

From 50e59f2cd489e7b0200bb098f1c11c30977d5e28 Mon Sep 17 00:00:00 2001
From: Bob Peterson rpete...@redhat.com
Date: Mon, 8 Aug 2011 16:44:47 -0500
Subject: [PATCH 18/44] fsck.gfs2: Find and clear duplicate references that
 are leaf blocks

Duplicate references that were in leaf blocks were never found nor
cleared.  This patch adds that capability.

rhbz#675723
---
 gfs2/fsck/pass1b.c |   15 +--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/gfs2/fsck/pass1b.c b/gfs2/fsck/pass1b.c
index 9497c78..7c6ae3d 100644
--- a/gfs2/fsck/pass1b.c
+++ b/gfs2/fsck/pass1b.c
@@ -28,6 +28,7 @@ struct dup_handler {
int ref_count;
 };
 
+static int check_leaf(struct gfs2_inode *ip, uint64_t block, void *private);
 static int check_metalist(struct gfs2_inode *ip, uint64_t block,
  struct gfs2_buffer_head **bh, int h, void *private);
 static int check_data(struct gfs2_inode *ip, uint64_t block, void *private);
@@ -53,7 +54,7 @@ static int find_dentry(struct gfs2_inode *ip, struct 
gfs2_dirent *de,
 
 struct metawalk_fxns find_refs = {
.private = NULL,
-   .check_leaf = NULL,
+   .check_leaf = check_leaf,
.check_metalist = check_metalist,
.check_data = check_data,
.check_eattr_indir = check_eattr_indir,
@@ -75,6 +76,11 @@ struct metawalk_fxns find_dirents = {
.check_eattr_extentry = NULL,
 };
 
+static int check_leaf(struct gfs2_inode *ip, uint64_t block, void *private)
+{
+   return add_duplicate_ref(ip, block, ref_as_meta, 1, INODE_VALID);
+}
+
 static int check_metalist(struct gfs2_inode *ip, uint64_t block,
  struct gfs2_buffer_head **bh, int h, void *private)
 {
@@ -253,6 +259,11 @@ static int clear_dup_data(struct gfs2_inode *ip, uint64_t 
block, void *private)
return clear_dup_metalist(ip, block, NULL, 0, private);
 }
 
+static int clear_leaf(struct gfs2_inode *ip, uint64_t block, void *private)
+{
+   return clear_dup_metalist(ip, block, NULL, 0, private);
+}
+
 static int clear_dup_eattr_indir(struct gfs2_inode *ip, uint64_t block,
 uint64_t parent, struct gfs2_buffer_head **bh,
 void *private)
@@ -395,7 +406,7 @@ static int clear_a_reference(struct gfs2_sbd *sdp, struct 
duptree *b,
osi_list_t *tmp, *x;
struct metawalk_fxns clear_dup_fxns = {
.private = NULL,
-   .check_leaf = NULL,
+   .check_leaf = clear_leaf,
.check_metalist = clear_dup_metalist,
.check_data = clear_dup_data,
.check_eattr_indir = clear_dup_eattr_indir,
-- 
1.7.4.4

[Cluster-devel] [Patch 21/44] fsck.gfs2: split function check_leaf_blks to make it more understandable

2011-08-11 Thread Bob Peterson

From ce30310b12df03b34d1e430e69e483bf1ee10b64 Mon Sep 17 00:00:00 2001
From: Bob Peterson rpete...@redhat.com
Date: Tue, 9 Aug 2011 10:08:21 -0500
Subject: [PATCH 21/44] fsck.gfs2: split function check_leaf_blks to make it
 more understandable

This patch splits function check_leaf_blks into two functions to make it
more understandable.  Before, it had way too many levels of indentation and
spanned multiple screens.  This makes it a lot more clear.

rhbz#675723
---
 gfs2/fsck/metawalk.c |  255 +++---
 1 files changed, 138 insertions(+), 117 deletions(-)

diff --git a/gfs2/fsck/metawalk.c b/gfs2/fsck/metawalk.c
index 622f9d7..51d456c 100644
--- a/gfs2/fsck/metawalk.c
+++ b/gfs2/fsck/metawalk.c
@@ -435,10 +435,12 @@ static int check_entries(struct gfs2_inode *ip, struct 
gfs2_buffer_head *bh,
 /* so that they replace the bad ones.  We have to hack up the old*/
 /* leaf a bit, but it's better than deleting the whole directory,*/
 /* which is what used to happen before.  */
-static void warn_and_patch(struct gfs2_inode *ip, uint64_t *leaf_no, 
-  uint64_t *bad_leaf, uint64_t old_leaf,
-  uint64_t first_ok_leaf, int pindex, const char *msg)
+static int warn_and_patch(struct gfs2_inode *ip, uint64_t *leaf_no, 
+ uint64_t *bad_leaf, uint64_t old_leaf,
+ uint64_t first_ok_leaf, int pindex, const char *msg)
 {
+   int okay_to_fix;
+
if (*bad_leaf != *leaf_no) {
log_err( _(Directory Inode %llu (0x%llx) points to leaf %llu
 (0x%llx) %s.\n),
@@ -448,7 +450,7 @@ static void warn_and_patch(struct gfs2_inode *ip, uint64_t 
*leaf_no,
(unsigned long long)*leaf_no, msg);
}
if (*leaf_no == *bad_leaf ||
-   query( _(Attempt to patch around it? (y/n) ))) {
+   (okay_to_fix = query( _(Attempt to patch around it? (y/n)  {
if (!valid_block(ip-i_sbd, old_leaf) == 0)
gfs2_put_leaf_nr(ip, pindex, old_leaf);
else
@@ -461,6 +463,133 @@ static void warn_and_patch(struct gfs2_inode *ip, 
uint64_t *leaf_no,
log_err( _(Bad leaf left in place.\n));
*bad_leaf = *leaf_no;
*leaf_no = old_leaf;
+   return okay_to_fix;
+}
+
+/**
+ * check_leaf - check a leaf block for errors
+ */
+static int check_leaf(struct gfs2_inode *ip, int lindex,
+ struct metawalk_fxns *pass, int *ref_count,
+ uint64_t *leaf_no, uint64_t old_leaf, uint64_t *bad_leaf,
+ uint64_t first_ok_leaf, struct gfs2_leaf *leaf,
+ struct gfs2_leaf *oldleaf)
+{
+   int error = 0, fix;
+   struct gfs2_buffer_head *lbh = NULL;
+   uint32_t count = 0;
+   struct gfs2_sbd *sdp = ip-i_sbd;
+   const char *msg;
+
+   *ref_count = 1;
+   /* Make sure the block number is in range. */
+   if (!valid_block(ip-i_sbd, *leaf_no)) {
+   log_err( _(Leaf block #%llu (0x%llx) is out of range for 
+  directory #%llu (0x%llx).\n),
+(unsigned long long)*leaf_no,
+(unsigned long long)*leaf_no,
+(unsigned long long)ip-i_di.di_num.no_addr,
+(unsigned long long)ip-i_di.di_num.no_addr);
+   msg = _(that is out of range);
+   goto out_copy_old_leaf;
+   }
+
+   /* Try to read in the leaf block. */
+   lbh = bread(sdp, *leaf_no);
+   /* Make sure it's really a valid leaf block. */
+   if (gfs2_check_meta(lbh, GFS2_METATYPE_LF)) {
+   msg = _(that is not really a leaf);
+   goto out_copy_old_leaf;
+   }
+   if (pass-check_leaf) {
+   error = pass-check_leaf(ip, *leaf_no, pass-private);
+   if (error) {
+   log_info(_(Previous reference to leaf %lld (0x%llx) 
+  has already checked it; skipping.\n),
+(unsigned long long)*leaf_no,
+(unsigned long long)*leaf_no);
+   brelse(lbh);
+   return error;
+   }
+   }
+   /* Early versions of GFS2 had an endianess bug in the kernel that set
+  lf_dirent_format to cpu_to_be16(GFS2_FORMAT_DE).  This was fixed
+  to use cpu_to_be32(), but we should check for incorrect values and
+  replace them with the correct value. */
+
+   gfs2_leaf_in(leaf, lbh);
+   if (leaf-lf_dirent_format == (GFS2_FORMAT_DE  16)) {
+   log_debug( _(incorrect lf_dirent_format at leaf #% PRIu64
+\n), *leaf_no);
+   leaf-lf_dirent_format = GFS2_FORMAT_DE;
+   gfs2_leaf_out(leaf, lbh);
+   log_debug( _(Fixing

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2233 matches

Mail list logo