from:"Russell Cattelan"

Re: [PATCH] prune_icache_sb

2006-12-04 Thread Russell Cattelan


Wendy Cheng wrote:


Andrew Morton wrote:


On Thu, 30 Nov 2006 11:05:32 -0500
Wendy Cheng <[EMAIL PROTECTED]> wrote:

 



The idea is, instead of unconditionally dropping every buffer 
associated with the particular mount point (that defeats the purpose 
of page caching), base kernel exports the "drop_pagecache_sb()" call 
that allows page cache to be trimmed. More importantly, it is 
changed to offer the choice of not randomly purging any buffer but 
the ones that seem to be unused (i_state is NULL and i_count is 
zero). This will encourage filesystem(s) to pro actively response to 
vm memory shortage if they choose so.
  



argh.
 

I read this as "It is ok to give system admin(s) commands (that this 
"drop_pagecache_sb() call" is all about) to drop page cache. It is, 
however, not ok to give filesystem developer(s) this very same 
function to trim their own page cache if the filesystems choose to do 
so" ?



In Linux a filesystem is a dumb layer which sits between the VFS and the
I/O layer and provides dumb services such as reading/writing inodes,
reading/writing directory entries, mapping pagecache offsets to disk
blocks, etc.  (This model is to varying degrees incorrect for every
post-ext2 filesystem, but that's the way it is).
 

Linux kernel, particularly the VFS layer, is starting to show signs of 
inadequacy as the software components built upon it keep growing. I 
have doubts that it can keep up and handle this complexity with a 
development policy like you just described (filesystem is a dumb layer 
?). Aren't these DIO_xxx_LOCKING flags inside __blockdev_direct_IO() a 
perfect example why trying to do too many things inside vfs layer for 
so many filesystems is a bad idea ? By the way, since we're on this 
subject, could we discuss a little bit about vfs rename call (or I can 
start another new discussion thread) ?


Note that linux do_rename() starts with the usual lookup logic, 
followed by "lock_rename", then a final round of dentry lookup, and 
finally comes to filesystem's i_op->rename call. Since lock_rename() 
only calls for vfs layer locks that are local to this particular 
machine, for a cluster filesystem, there exists a huge window between 
the final lookup and filesystem's i_op->rename calls such that the 
file could get deleted from another node before fs can do anything 
about it. Is it possible that we could get a new function pointer 
(lock_rename) in inode_operations structure so a cluster filesystem 
can do proper locking ?


It looks like the ocfs2 guys have the similar problem?

http://ftp.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/ocfs2_git_patches/ocfs2-upstream-linus-20060924/0009-PATCH-Allow-file-systems-to-manually-d_move-inside-of-rename.txt

Does this change help fix gfs lock ordering problem as well?


-Russell Cattelan
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] prune_icache_sb

2006-12-04 Thread Russell Cattelan


Wendy Cheng wrote:


Andrew Morton wrote:


On Thu, 30 Nov 2006 11:05:32 -0500
Wendy Cheng [EMAIL PROTECTED] wrote:

 



The idea is, instead of unconditionally dropping every buffer 
associated with the particular mount point (that defeats the purpose 
of page caching), base kernel exports the drop_pagecache_sb() call 
that allows page cache to be trimmed. More importantly, it is 
changed to offer the choice of not randomly purging any buffer but 
the ones that seem to be unused (i_state is NULL and i_count is 
zero). This will encourage filesystem(s) to pro actively response to 
vm memory shortage if they choose so.
  



argh.
 

I read this as It is ok to give system admin(s) commands (that this 
drop_pagecache_sb() call is all about) to drop page cache. It is, 
however, not ok to give filesystem developer(s) this very same 
function to trim their own page cache if the filesystems choose to do 
so ?



In Linux a filesystem is a dumb layer which sits between the VFS and the
I/O layer and provides dumb services such as reading/writing inodes,
reading/writing directory entries, mapping pagecache offsets to disk
blocks, etc.  (This model is to varying degrees incorrect for every
post-ext2 filesystem, but that's the way it is).
 

Linux kernel, particularly the VFS layer, is starting to show signs of 
inadequacy as the software components built upon it keep growing. I 
have doubts that it can keep up and handle this complexity with a 
development policy like you just described (filesystem is a dumb layer 
?). Aren't these DIO_xxx_LOCKING flags inside __blockdev_direct_IO() a 
perfect example why trying to do too many things inside vfs layer for 
so many filesystems is a bad idea ? By the way, since we're on this 
subject, could we discuss a little bit about vfs rename call (or I can 
start another new discussion thread) ?


Note that linux do_rename() starts with the usual lookup logic, 
followed by lock_rename, then a final round of dentry lookup, and 
finally comes to filesystem's i_op-rename call. Since lock_rename() 
only calls for vfs layer locks that are local to this particular 
machine, for a cluster filesystem, there exists a huge window between 
the final lookup and filesystem's i_op-rename calls such that the 
file could get deleted from another node before fs can do anything 
about it. Is it possible that we could get a new function pointer 
(lock_rename) in inode_operations structure so a cluster filesystem 
can do proper locking ?


It looks like the ocfs2 guys have the similar problem?

http://ftp.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/ocfs2_git_patches/ocfs2-upstream-linus-20060924/0009-PATCH-Allow-file-systems-to-manually-d_move-inside-of-rename.txt

Does this change help fix gfs lock ordering problem as well?


-Russell Cattelan
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Cluster-devel] Re: [GFS2] Change argument of gfs2_dinode_out [17/70]

2006-12-03 Thread Russell Cattelan


Al Viro wrote:


On Fri, Dec 01, 2006 at 05:29:46PM -0600, Russell Cattelan wrote:

 


gfs2 is supposed to be stabilized and use-able for the up coming rhel5
release, not pretty up for somebody to print out and hang on their wall.
   



Your insight, sir, is truly stunning.  That is to say, it reminds of
a sudden and unpleasant contact with something dense and misplaced.
May I direct your attention to the fact that rhel5 is quite unlikely
to be based on 2.6.20?
 


Ok your right rhel5 has no bearing on what goes into 2.6.20.
And again I not taking issues with any of the cleanups you sent
in they were complete and addressed real potential problems.

Just trying to point out stablizing and bug fixing  over partial 
cleanups would be more helpful

for gfs2 in general,  for whatever kernel it is running in.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Cluster-devel] Re: [GFS2] Change argument of gfs2_dinode_out [17/70]

2006-12-03 Thread Russell Cattelan


Al Viro wrote:


On Fri, Dec 01, 2006 at 05:29:46PM -0600, Russell Cattelan wrote:

 


gfs2 is supposed to be stabilized and use-able for the up coming rhel5
release, not pretty up for somebody to print out and hang on their wall.
   



Your insight, sir, is truly stunning.  That is to say, it reminds of
a sudden and unpleasant contact with something dense and misplaced.
May I direct your attention to the fact that rhel5 is quite unlikely
to be based on 2.6.20?
 


Ok your right rhel5 has no bearing on what goes into 2.6.20.
And again I not taking issues with any of the cleanups you sent
in they were complete and addressed real potential problems.

Just trying to point out stablizing and bug fixing  over partial 
cleanups would be more helpful

for gfs2 in general,  for whatever kernel it is running in.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Cluster-devel] Re: [GFS2] Change argument of gfs2_dinode_out [17/70]

2006-12-01 Thread Russell Cattelan

On Fri, 2006-12-01 at 21:08 +, Al Viro wrote:
> On Fri, Dec 01, 2006 at 02:52:11PM -0600, Russell Cattelan wrote:
> > code clean up are not without risk and with no regression test suite to
> > verify
> > that a "cleanup" has not broken something. Cleanups are very much a
> > hindrance to stabilization. With no know working points in a code
> > history it becomes difficult
> > to bisect changes and figure out when bugs were introduced
> > Especially when cleanups are mixed in with bug fixes.
> > 
> > Pretty code does not equal correct code.
> 
> No, but convoluted and unreadable code ends up being crappier due
> to lack of review.  And that's aside of the memory footprint,
> likeliness of bugs introduced by code modifications (having in-core
> and on-disk data structures with different contents and the same C
> type => trouble that won't be caught by compiler), etc.

Nothing makes up for the complete lack of GFS2 testing.
reviewed code does not equal correct code either.

Honestly tell me what test suite do you run on GFS2?

Sure is it possible to make an educated guess that some
cleanups will not destabilize the code. Indeed the stuff
you have done is quite useful to ensure that endian bugs are
being caught by the compiler/sparse.
But no amount of "it shouldn't break anything" assertions
can replace testing.

But there is a large quantity of the 70 or so patches that were
sent out were to enable "future" cleanup's. Putting in partial cleanups
do nothing core code readability and I many cases is more confusing.
Unless you meticulously keep up with the partial cleanups looking
at the code is now a jumbled mess of inconsistencies.

gfs2 is supposed to be stabilized and use-able for the up coming rhel5
release, not pretty up for somebody to print out and hang on their wall.

-- 
Russell Cattelan <[EMAIL PROTECTED]>

signature.asc
Description: This is a digitally signed message part

Re: [Cluster-devel] Re: [GFS2] Change argument of gfs2_dinode_out [17/70]

2006-12-01 Thread Russell Cattelan

On Fri, 2006-12-01 at 19:25 +, Al Viro wrote:
> On Fri, Dec 01, 2006 at 01:19:04PM -0600, Russell Cattelan wrote:
> > On Thu, 2006-11-30 at 12:15 +, Steven Whitehouse wrote:
> > > >From 539e5d6b7ae8612c0393fe940d2da5b591318d3d Mon Sep 17 00:00:00 2001
> > > From: Steven Whitehouse <[EMAIL PROTECTED]>
> > > Date: Tue, 31 Oct 2006 15:07:05 -0500
> > > Subject: [PATCH] [GFS2] Change argument of gfs2_dinode_out
> > > 
> > > Everywhere this was called, a struct gfs2_inode was available,
> > > but despite that, it was always called with a struct gfs2_dinode
> > > as an argument. By making this change it paves the way to start
> > > eliminating fields duplicated between the kernel's struct inode
> > > and the struct gfs2_dinode.
> > More pointless code churn.
> > 
> > This only makes sense once the file system is working 
> > and we have time to do this type of cleanup on against
> > a stable and TESTED code base.
> 
> Bzzert.  Cleaner code is easier to _get_ stable.  "Keep it ucking fugly
> until everyone stops looking at it out of sheer disgust" is a bad idea.'
code clean up are not without risk and with no regression test suite to
verify
that a "cleanup" has not broken something. Cleanups are very much a
hindrance to stabilization. With no know working points in a code
history it becomes difficult
to bisect changes and figure out when bugs were introduced
Especially when cleanups are mixed in with bug fixes.

Pretty code does not equal correct code.


-- 
Russell Cattelan <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Change argument of gfs2_dinode_out [17/70]

2006-12-01 Thread Russell Cattelan

_out(>i_di, dibh->b_data);
> + gfs2_dinode_out(ip, dibh->b_data);
>   brelse(dibh);
>   }
>  
> @@ -762,7 +762,7 @@ static int gfs2_rename(struct inode *odi
>   goto out_end_trans;
>   ip->i_di.di_ctime = get_seconds();
>   gfs2_trans_add_bh(ip->i_gl, dibh, 1);
> - gfs2_dinode_out(>i_di, dibh->b_data);
> + gfs2_dinode_out(ip, dibh->b_data);
>   brelse(dibh);
>   }
>  
> @@ -949,7 +949,7 @@ static int setattr_chown(struct inode *i
>   gfs2_inode_attr_out(ip);
>  
>   gfs2_trans_add_bh(ip->i_gl, dibh, 1);
> - gfs2_dinode_out(>i_di, dibh->b_data);
> + gfs2_dinode_out(ip, dibh->b_data);
>   brelse(dibh);
>  
>   if (ouid != NO_QUOTA_CHANGE || ogid != NO_QUOTA_CHANGE) {
> diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h
> index 10a507d..550effa 100644
> --- a/include/linux/gfs2_ondisk.h
> +++ b/include/linux/gfs2_ondisk.h
> @@ -535,7 +535,8 @@ extern void gfs2_rgrp_in(struct gfs2_rgr
>  extern void gfs2_rgrp_out(const struct gfs2_rgrp_host *rg, void *buf);
>  extern void gfs2_quota_in(struct gfs2_quota_host *qu, const void *buf);
>  extern void gfs2_dinode_in(struct gfs2_dinode_host *di, const void *buf);
> -extern void gfs2_dinode_out(const struct gfs2_dinode_host *di, void *buf);
> +struct gfs2_inode;
> +extern void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf);
>  extern void gfs2_ea_header_in(struct gfs2_ea_header *ea, const void *buf);
>  extern void gfs2_ea_header_out(const struct gfs2_ea_header *ea, void *buf);
>  extern void gfs2_log_header_in(struct gfs2_log_header_host *lh, const void 
> *buf);
-- 
Russell Cattelan <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Move gfs2_meta_syncfs() into log.c [57/70]

2006-12-01 Thread Russell Cattelan

On Thu, 2006-11-30 at 12:22 +, Steven Whitehouse wrote:
> >From a25311c8e0b7071b129ca9a9e49e22eeaf620864 Mon Sep 17 00:00:00 2001
> From: Steven Whitehouse <[EMAIL PROTECTED]>
> Date: Thu, 23 Nov 2006 11:06:35 -0500
> Subject: [PATCH] [GFS2] Move gfs2_meta_syncfs() into log.c
> 
> By moving gfs2_meta_syncfs() into log.c, gfs2_ail1_start()
> can be made static.
While this is not a incorrect change as it stands.
This kind of pointless code curn is making it harder
to stabilize GFS2.

> 
> Signed-off-by: Steven Whitehouse <[EMAIL PROTECTED]>
> ---
>  fs/gfs2/log.c |   21 -
>  fs/gfs2/log.h |2 +-
>  fs/gfs2/meta_io.c |   17 -
>  fs/gfs2/meta_io.h |1 -
>  4 files changed, 21 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
> index 6456fc3..7713d59 100644
> --- a/fs/gfs2/log.c
> +++ b/fs/gfs2/log.c
> @@ -15,6 +15,7 @@ #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "gfs2.h"
>  #include "incore.h"
> @@ -142,7 +143,7 @@ static int gfs2_ail1_empty_one(struct gf
>   return list_empty(>ai_ail1_list);
>  }
>  
> -void gfs2_ail1_start(struct gfs2_sbd *sdp, int flags)
> +static void gfs2_ail1_start(struct gfs2_sbd *sdp, int flags)
>  {
>   struct list_head *head = >sd_ail1_list;
>   u64 sync_gen;
> @@ -689,3 +690,21 @@ void gfs2_log_shutdown(struct gfs2_sbd *
>   up_write(>sd_log_flush_lock);
>  }
>  
> +
> +/**
> + * gfs2_meta_syncfs - sync all the buffers in a filesystem
> + * @sdp: the filesystem
> + *
> + */
> +
> +void gfs2_meta_syncfs(struct gfs2_sbd *sdp)
> +{
> + gfs2_log_flush(sdp, NULL);
> + for (;;) {
> + gfs2_ail1_start(sdp, DIO_ALL);
> + if (gfs2_ail1_empty(sdp, DIO_ALL))
> + break;
> + msleep(10);
> + }
> +}
> +
> diff --git a/fs/gfs2/log.h b/fs/gfs2/log.h
> index 7f5737d..8e7aa0f 100644
> --- a/fs/gfs2/log.h
> +++ b/fs/gfs2/log.h
> @@ -48,7 +48,6 @@ static inline void gfs2_log_pointers_ini
>  unsigned int gfs2_struct2blk(struct gfs2_sbd *sdp, unsigned int nstruct,
>   unsigned int ssize);
>  
> -void gfs2_ail1_start(struct gfs2_sbd *sdp, int flags);
>  int gfs2_ail1_empty(struct gfs2_sbd *sdp, int flags);
>  
>  int gfs2_log_reserve(struct gfs2_sbd *sdp, unsigned int blks);
> @@ -61,5 +60,6 @@ void gfs2_log_flush(struct gfs2_sbd *sdp
>  void gfs2_log_commit(struct gfs2_sbd *sdp, struct gfs2_trans *trans);
>  
>  void gfs2_log_shutdown(struct gfs2_sbd *sdp);
> +void gfs2_meta_syncfs(struct gfs2_sbd *sdp);
>  
>  #endif /* __LOG_DOT_H__ */
> diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
> index 939a09f..fbeba81 100644
> --- a/fs/gfs2/meta_io.c
> +++ b/fs/gfs2/meta_io.c
> @@ -574,20 +574,3 @@ out:
>   return first_bh;
>  }
>  
> -/**
> - * gfs2_meta_syncfs - sync all the buffers in a filesystem
> - * @sdp: the filesystem
> - *
> - */
> -
> -void gfs2_meta_syncfs(struct gfs2_sbd *sdp)
> -{
> - gfs2_log_flush(sdp, NULL);
> - for (;;) {
> - gfs2_ail1_start(sdp, DIO_ALL);
> - if (gfs2_ail1_empty(sdp, DIO_ALL))
> - break;
> - msleep(10);
> - }
> -}
> -
> diff --git a/fs/gfs2/meta_io.h b/fs/gfs2/meta_io.h
> index 3ec939e..e037425 100644
> --- a/fs/gfs2/meta_io.h
> +++ b/fs/gfs2/meta_io.h
> @@ -67,7 +67,6 @@ static inline int gfs2_meta_inode_buffer
>  }
>  
>  struct buffer_head *gfs2_meta_ra(struct gfs2_glock *gl, u64 dblock, u32 
> extlen);
> -void gfs2_meta_syncfs(struct gfs2_sbd *sdp);
>  
>  #define buffer_busy(bh) \
>  ((bh)->b_state & ((1ul << BH_Dirty) | (1ul << BH_Lock) | (1ul << BH_Pinned)))
-- 
Russell Cattelan <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Fix page lock/glock deadlock [32/70]

2006-12-01 Thread Russell Cattelan

On Thu, 2006-11-30 at 12:17 +, Steven Whitehouse wrote:
> >From 2ca99501fa5422e84f18333918a503433449e2b5 Mon Sep 17 00:00:00 2001
> From: Steven Whitehouse <[EMAIL PROTECTED]>
> Date: Wed, 8 Nov 2006 10:26:54 -0500
> Subject: [PATCH] [GFS2] Fix page lock/glock deadlock
> 
> This fixes a race between the glock and the page lock encountered
> during truncate in gfs2_readpage and gfs2_prepare_write. The gfs2_readpages
> function doesn't need the same fix since it only uses a try lock anyway, so
> it will fail back to gfs2_readpage in the case of a potential deadlock.
> 
> This bug was spotted by Russell Cattelan.\
This bug was fixed correctly by Russell Cattelan
and this is an incomplete version of my patch.


As I keep trying to point out readpages is still wrong in that
the stuffed case is not correct and results in a panic
if a file is being truncated down.
If nr_pages is calculated prior a truncate the stuffed inode
case will try to operate on pages that are no longer there.


The correct fix is to not try and handled the stuffed inode 
case at all in readpages and simply return AOP_TRUNCATED_PAGE
and let things things fall back to readpage.


> 
> Cc: Russell Cattelan <[EMAIL PROTECTED]>
> Signed-off-by: Steven Whitehouse <[EMAIL PROTECTED]>
> ---
>  fs/gfs2/glock.h   |1 -
>  fs/gfs2/ops_address.c |   13 +
>  2 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
> index b985627..a331bf8 100644
> --- a/fs/gfs2/glock.h
> +++ b/fs/gfs2/glock.h
> @@ -27,7 +27,6 @@ #define GL_SKIP 0x0100
>  #define GL_ATIME 0x0200
>  #define GL_NOCACHE   0x0400
>  #define GL_NOCANCEL  0x1000
> -#define GL_AOP   0x4000
>  
>  #define GLR_TRYFAILED13
>  #define GLR_CANCELED 14
> diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c
> index 5c3962c..3822189 100644
> --- a/fs/gfs2/ops_address.c
> +++ b/fs/gfs2/ops_address.c
> @@ -230,7 +230,7 @@ static int gfs2_readpage(struct file *fi
>   /* gfs2_sharewrite_nopage has grabbed the 
> ip->i_gl already */
>   goto skip_lock;
>   }
> - gfs2_holder_init(ip->i_gl, LM_ST_SHARED, GL_ATIME|GL_AOP, );
> + gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 
> GL_ATIME|LM_FLAG_TRY_1CB, );
>   do_unlock = 1;
>   error = gfs2_glock_nq_m_atime(1, );
>   if (unlikely(error))
> @@ -254,6 +254,8 @@ skip_lock:
>  out:
>   return error;
>  out_unlock:
> + if (error == GLR_TRYFAILED)
> + error = AOP_TRUNCATED_PAGE;
>   unlock_page(page);
>   if (do_unlock)
>   gfs2_holder_uninit();
> @@ -293,7 +295,7 @@ static int gfs2_readpages(struct file *f
>   goto skip_lock;
>   }
>   gfs2_holder_init(ip->i_gl, LM_ST_SHARED,
> -  LM_FLAG_TRY_1CB|GL_ATIME|GL_AOP, );
> +  LM_FLAG_TRY_1CB|GL_ATIME, );
>   do_unlock = 1;
>   ret = gfs2_glock_nq_m_atime(1, );
>   if (ret == GLR_TRYFAILED)
> @@ -366,10 +368,13 @@ static int gfs2_prepare_write(struct fil
>   unsigned int write_len = to - from;
>  
> 
> - gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_ATIME|GL_AOP, >i_gh);
> + gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_ATIME|LM_FLAG_TRY_1CB, 
> >i_gh);
>   error = gfs2_glock_nq_m_atime(1, >i_gh);
> -     if (error)
> + if (unlikely(error)) {
> + if (error == GLR_TRYFAILED)
> + error = AOP_TRUNCATED_PAGE;
>   goto out_uninit;
> + }
>  
>   gfs2_write_calc_reserv(ip, write_len, _blocks, _blocks);
>  
-- 
Russell Cattelan <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GFS2] Tidy up bmap & fix boundary bug [48/70]

2006-12-01 Thread Russell Cattelan

ock, new, );
>   if (error)
> - return error;
> + goto out_fail;
>   }
>  
> - boundary = lookup_block(ip, bh, end_of_metadata, mp, create, , 
> );
> - clear_buffer_mapped(bh_map);
> - clear_buffer_new(bh_map);
> - clear_buffer_boundary(bh_map);
> -
> + boundary = lookup_block(ip, bh, end_of_metadata, , create, , 
> );
>   if (dblock) {
>   map_bh(bh_map, inode->i_sb, dblock);
>   if (boundary)
> - set_buffer_boundary(bh);
> + set_buffer_boundary(bh_map);
>   if (new) {
>   struct buffer_head *dibh;
>   error = gfs2_meta_inode_buffer(ip, );
> @@ -510,8 +531,8 @@ static int gfs2_block_pointers(struct in
>   while(--maxlen && !buffer_boundary(bh_map)) {
>   u64 eblock;
>  
> - mp->mp_list[end_of_metadata]++;
> - boundary = lookup_block(ip, bh, end_of_metadata, mp, 0, 
> , );
> + mp.mp_list[end_of_metadata]++;
> + boundary = lookup_block(ip, bh, end_of_metadata, , 
> 0, , );
>   if (eblock != ++dblock)
>   break;
>   bh_map->b_size += (1 << inode->i_blkbits);
> @@ -521,43 +542,15 @@ static int gfs2_block_pointers(struct in
>   }
>  out_brelse:
>   brelse(bh);
> - return 0;
> -}
> -
> -
> -static inline void bmap_lock(struct inode *inode, int create)
> -{
> - struct gfs2_inode *ip = GFS2_I(inode);
> - if (create)
> - down_write(>i_rw_mutex);
> - else
> - down_read(>i_rw_mutex);
> -}
> -
> -static inline void bmap_unlock(struct inode *inode, int create)
> -{
> - struct gfs2_inode *ip = GFS2_I(inode);
> - if (create)
> - up_write(>i_rw_mutex);
> - else
> - up_read(>i_rw_mutex);
> -}
> -
> -int gfs2_block_map(struct inode *inode, u64 lblock, int create,
> -struct buffer_head *bh)
> -{
> - struct metapath mp;
> - int ret;
> -
> - bmap_lock(inode, create);
> - ret = gfs2_block_pointers(inode, lblock, create, bh, );
> +out_ok:
> + error = 0;
> +out_fail:
>   bmap_unlock(inode, create);
> - return ret;
> + return error;
>  }
>  
>  int gfs2_extent_map(struct inode *inode, u64 lblock, int *new, u64 *dblock, 
> unsigned *extlen)
>  {
> - struct metapath mp;
>   struct buffer_head bh = { .b_state = 0, .b_blocknr = 0 };
>   int ret;
>   int create = *new;
> @@ -567,9 +560,7 @@ int gfs2_extent_map(struct inode *inode,
>   BUG_ON(!new);
>  
>   bh.b_size = 1 << (inode->i_blkbits + 5);
> - bmap_lock(inode, create);
> - ret = gfs2_block_pointers(inode, lblock, create, , );
> - bmap_unlock(inode, create);
> + ret = gfs2_block_map(inode, lblock, create, );
>   *extlen = bh.b_size >> inode->i_blkbits;
>   *dblock = bh.b_blocknr;
>   if (buffer_new())
-- 
Russell Cattelan <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Change argument to gfs2_dinode_in [18/70]

2006-12-01 Thread Russell Cattelan

On Thu, 2006-11-30 at 12:15 +, Steven Whitehouse wrote:
> >From 891ea14712da68e282de8583e5fa14f0d3f3731e Mon Sep 17 00:00:00 2001
> From: Steven Whitehouse <[EMAIL PROTECTED]>
> Date: Tue, 31 Oct 2006 15:22:10 -0500
> Subject: [PATCH] [GFS2] Change argument to gfs2_dinode_in
> 
> This is a preliminary patch to enable the removal of fields
> in gfs2_dinode_host which are duplicated in struct inode.
Deferred till the complete "cleanup" work is done?


> 
> Signed-off-by: Steven Whitehouse <[EMAIL PROTECTED]>
> ---
>  fs/gfs2/inode.c |2 +-
>  fs/gfs2/ondisk.c|4 ++--
>  include/linux/gfs2_ondisk.h |2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
> index b861ddb..9875e93 100644
> --- a/fs/gfs2/inode.c
> +++ b/fs/gfs2/inode.c
> @@ -229,7 +229,7 @@ int gfs2_inode_refresh(struct gfs2_inode
>   return -EIO;
>   }
>  
> - gfs2_dinode_in(>i_di, dibh->b_data);
> + gfs2_dinode_in(ip, dibh->b_data);
>  
>   brelse(dibh);
>  
> diff --git a/fs/gfs2/ondisk.c b/fs/gfs2/ondisk.c
> index 2c50fa0..edf8756 100644
> --- a/fs/gfs2/ondisk.c
> +++ b/fs/gfs2/ondisk.c
> @@ -155,8 +155,9 @@ void gfs2_quota_in(struct gfs2_quota_hos
>   qu->qu_value = be64_to_cpu(str->qu_value);
>  }
>  
> -void gfs2_dinode_in(struct gfs2_dinode_host *di, const void *buf)
> +void gfs2_dinode_in(struct gfs2_inode *ip, const void *buf)
>  {
> + struct gfs2_dinode_host *di = >i_di;
>   const struct gfs2_dinode *str = buf;
>  
>   gfs2_meta_header_in(>di_header, buf);
> @@ -186,7 +187,6 @@ void gfs2_dinode_in(struct gfs2_dinode_h
>   di->di_entries = be32_to_cpu(str->di_entries);
>  
>   di->di_eattr = be64_to_cpu(str->di_eattr);
> -
>  }
>  
>  void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf)
> diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h
> index 550effa..08d8240 100644
> --- a/include/linux/gfs2_ondisk.h
> +++ b/include/linux/gfs2_ondisk.h
> @@ -534,8 +534,8 @@ extern void gfs2_rindex_out(const struct
>  extern void gfs2_rgrp_in(struct gfs2_rgrp_host *rg, const void *buf);
>  extern void gfs2_rgrp_out(const struct gfs2_rgrp_host *rg, void *buf);
>  extern void gfs2_quota_in(struct gfs2_quota_host *qu, const void *buf);
> -extern void gfs2_dinode_in(struct gfs2_dinode_host *di, const void *buf);
>  struct gfs2_inode;
> +extern void gfs2_dinode_in(struct gfs2_inode *ip, const void *buf);
>  extern void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf);
>  extern void gfs2_ea_header_in(struct gfs2_ea_header *ea, const void *buf);
>  extern void gfs2_ea_header_out(const struct gfs2_ea_header *ea, void *buf);
-- 
Russell Cattelan <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Fix journal flush problem [56/70]

2006-12-01 Thread Russell Cattelan

> @@ -274,7 +280,7 @@ int gfs2_log_reserve(struct gfs2_sbd *sd
>  
>   mutex_lock(>sd_log_reserve_mutex);
>   gfs2_log_lock(sdp);
> - while(sdp->sd_log_blks_free <= blks) {
> + while(sdp->sd_log_blks_free <= (blks + 6)) {
>   gfs2_log_unlock(sdp);
>   gfs2_ail1_empty(sdp, 0);
>   gfs2_log_flush(sdp, NULL);
> @@ -643,12 +649,9 @@ void gfs2_log_commit(struct gfs2_sbd *sd
>   up_read(>sd_log_flush_lock);
>  
>   gfs2_log_lock(sdp);
> - if (sdp->sd_log_num_buf > gfs2_tune_get(sdp, gt_incore_log_blocks)) {
> - gfs2_log_unlock(sdp);
> - gfs2_log_flush(sdp, NULL);
> - } else {
> - gfs2_log_unlock(sdp);
> - }
> + if (sdp->sd_log_num_buf > gfs2_tune_get(sdp, gt_incore_log_blocks))
> + wake_up_process(sdp->sd_logd_process);
> + gfs2_log_unlock(sdp);
>  }
>  
>  /**
> diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
> index 3912d6a..939a09f 100644
> --- a/fs/gfs2/meta_io.c
> +++ b/fs/gfs2/meta_io.c
> @@ -472,6 +472,9 @@ int gfs2_meta_indirect_buffer(struct gfs
>   struct buffer_head *bh = NULL, **bh_slot = ip->i_cache + height;
>   int in_cache = 0;
>  
> + BUG_ON(!gl);
> + BUG_ON(!sdp);
> +
>   spin_lock(>i_spin);
>   if (*bh_slot && (*bh_slot)->b_blocknr == num) {
>   bh = *bh_slot;
> diff --git a/fs/gfs2/ops_super.c b/fs/gfs2/ops_super.c
> index 8635175..7685b46 100644
> --- a/fs/gfs2/ops_super.c
> +++ b/fs/gfs2/ops_super.c
> @@ -157,7 +157,8 @@ static void gfs2_write_super(struct supe
>  static int gfs2_sync_fs(struct super_block *sb, int wait)
>  {
>   sb->s_dirt = 0;
> - gfs2_log_flush(sb->s_fs_info, NULL);
> + if (wait)
> + gfs2_log_flush(sb->s_fs_info, NULL);
>   return 0;
>  }
>  
> @@ -293,8 +294,6 @@ static void gfs2_clear_inode(struct inod
>*/
>   if (inode->i_private) {
>   struct gfs2_inode *ip = GFS2_I(inode);
> - gfs2_glock_inode_squish(inode);
> - gfs2_assert(inode->i_sb->s_fs_info, ip->i_gl->gl_state == 
> LM_ST_UNLOCKED);
>   ip->i_gl->gl_object = NULL;
>   gfs2_glock_schedule_for_reclaim(ip->i_gl);
>   gfs2_glock_put(ip->i_gl);
> @@ -395,7 +394,7 @@ static void gfs2_delete_inode(struct ino
>   if (!inode->i_private)
>   goto out;
>  
> - error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB | 
> GL_NOCACHE, );
> + error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB, 
> );
>   if (unlikely(error)) {
>   gfs2_glock_dq_uninit(>i_iopen_gh);
>   goto out;
-- 
Russell Cattelan <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Reduce number of arguments to meta_io.c:getbuf() [58/70]

2006-12-01 Thread Russell Cattelan

pace, dblock, CREATE);
> + first_bh = getbuf(gl, dblock, CREATE);
>  
>   if (buffer_uptodate(first_bh))
>   goto out;
> @@ -558,7 +556,7 @@ struct buffer_head *gfs2_meta_ra(struct 
>   extlen--;
>  
>   while (extlen) {
> - bh = getbuf(sdp, aspace, dblock, CREATE);
> + bh = getbuf(gl, dblock, CREATE);
>  
>   if (!buffer_uptodate(bh) && !buffer_locked(bh))
>   ll_rw_block(READA, 1, );
> -- 
> 1.4.1
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Russell Cattelan <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Simplify glops functions [53/70]

2006-12-01 Thread Russell Cattelan

gt;  {
> - int meta = (flags & DIO_METADATA);
> - int data = (flags & DIO_DATA);
> -
>   if (test_bit(GLF_DIRTY, >gl_flags)) {
> - if (meta && data) {
> - gfs2_page_writeback(gl);
> - gfs2_log_flush(gl->gl_sbd, gl);
> - gfs2_meta_sync(gl);
> - gfs2_page_wait(gl);
> - clear_bit(GLF_DIRTY, >gl_flags);
> - } else if (meta) {
> - gfs2_log_flush(gl->gl_sbd, gl);
> - gfs2_meta_sync(gl);
> - } else if (data) {
> - gfs2_page_writeback(gl);
> - gfs2_page_wait(gl);
> - }
> - if (flags & DIO_RELEASE)
> - gfs2_ail_empty_gl(gl);
> + gfs2_page_writeback(gl);
> + gfs2_log_flush(gl->gl_sbd, gl);
> + gfs2_meta_sync(gl);
> + gfs2_page_wait(gl);
> + clear_bit(GLF_DIRTY, >gl_flags);
> + gfs2_ail_empty_gl(gl);
>   }
>  }
>  
> @@ -302,15 +284,13 @@ static void inode_go_sync(struct gfs2_gl
>  static void inode_go_inval(struct gfs2_glock *gl, int flags)
>  {
>   int meta = (flags & DIO_METADATA);
> - int data = (flags & DIO_DATA);
>  
>   if (meta) {
>   struct gfs2_inode *ip = gl->gl_object;
>   gfs2_meta_inval(gl);
>   set_bit(GIF_INVALID, >i_flags);
>   }
> - if (data)
> - gfs2_page_inval(gl);
> + gfs2_page_inval(gl);
>  }
>  
>  /**
> @@ -494,7 +474,7 @@ static void trans_go_xmote_bh(struct gfs
>   if (gl->gl_state != LM_ST_UNLOCKED &&
>   test_bit(SDF_JOURNAL_LIVE, >sd_flags)) {
>   gfs2_meta_cache_flush(GFS2_I(sdp->sd_jdesc->jd_inode));
> - j_gl->gl_ops->go_inval(j_gl, DIO_METADATA | DIO_DATA);
> + j_gl->gl_ops->go_inval(j_gl, DIO_METADATA);
>  
>   error = gfs2_find_jhead(sdp->sd_jdesc, );
>   if (error)
> diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> index 227a74d..734421e 100644
> --- a/fs/gfs2/incore.h
> +++ b/fs/gfs2/incore.h
> @@ -14,8 +14,6 @@ #include 
>  
>  #define DIO_WAIT 0x0010
>  #define DIO_METADATA 0x0020
> -#define DIO_DATA 0x0040
> -#define DIO_RELEASE  0x0080
>  #define DIO_ALL  0x0100
>  
>  struct gfs2_log_operations;
> @@ -103,18 +101,17 @@ struct gfs2_bufdata {
>  };
>  
>  struct gfs2_glock_operations {
> - void (*go_xmote_th) (struct gfs2_glock * gl, unsigned int state,
> -  int flags);
> - void (*go_xmote_bh) (struct gfs2_glock * gl);
> - void (*go_drop_th) (struct gfs2_glock * gl);
> - void (*go_drop_bh) (struct gfs2_glock * gl);
> - void (*go_sync) (struct gfs2_glock * gl, int flags);
> - void (*go_inval) (struct gfs2_glock * gl, int flags);
> - int (*go_demote_ok) (struct gfs2_glock * gl);
> - int (*go_lock) (struct gfs2_holder * gh);
> - void (*go_unlock) (struct gfs2_holder * gh);
> - void (*go_callback) (struct gfs2_glock * gl, unsigned int state);
> - void (*go_greedy) (struct gfs2_glock * gl);
> + void (*go_xmote_th) (struct gfs2_glock *gl, unsigned int state, int 
> flags);
> + void (*go_xmote_bh) (struct gfs2_glock *gl);
> + void (*go_drop_th) (struct gfs2_glock *gl);
> + void (*go_drop_bh) (struct gfs2_glock *gl);
> + void (*go_sync) (struct gfs2_glock *gl);
> + void (*go_inval) (struct gfs2_glock *gl, int flags);
> + int (*go_demote_ok) (struct gfs2_glock *gl);
> + int (*go_lock) (struct gfs2_holder *gh);
> + void (*go_unlock) (struct gfs2_holder *gh);
> + void (*go_callback) (struct gfs2_glock *gl, unsigned int state);
> + void (*go_greedy) (struct gfs2_glock *gl);
>   const int go_type;
>  };
>  
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 0ef8317..1408c5f 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -517,7 +517,7 @@ int gfs2_make_fs_rw(struct gfs2_sbd *sdp
>   return error;
>  
>   gfs2_meta_cache_flush(ip);
> - j_gl->gl_ops->go_inval(j_gl, DIO_METADATA | DIO_DATA);
> + j_gl->gl_ops->go_inval(j_gl, DIO_METADATA);
>  
>   error = gfs2_find_jhead(sdp->sd_jdesc, );
>   if (error)
-- 
Russell Cattelan <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part

Re: [GFS2 & DLM] Guide to -nmw tree patches

2006-12-01 Thread Russell Cattelan

On Thu, 2006-11-30 at 13:21 +0100, Arjan van de Ven wrote:
> On Thu, 2006-11-30 at 12:12 +, Steven Whitehouse wrote:
> > Hi,
> > 
> > Below is a summary diffstat of all the changes in the GFS2 & DLM -nmw
> > (next merge window) git tree. Since merge time is once again upon us,
> > the following patches are the current complete content of the tree.
> 
> 
> Hi,
> 
> can you please make sure your patch series is in reply to your first
> message, so that your mails properly thread in mail clients?
> 
> (everyone else seems to manage this, I bet the git or quilt patch series
> send stuff has this built in somehow..)

I agree.

I would also continue to urge for internal reviews (since this is
the first I have seen any of these changes) before spamming lkml
with 70 loosely related patches.

> 
> Greetings,
>   Arjan van de Ven
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Russell Cattelan <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Remove unused function from inode.c [50/70]

2006-12-01 Thread Russell Cattelan

;  #endif /* __INODE_DOT_H__ */
> diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c
> index 2f7ef98..8676c39 100644
> --- a/fs/gfs2/ops_address.c
> +++ b/fs/gfs2/ops_address.c
> @@ -217,7 +217,7 @@ static int gfs2_readpage(struct file *fi
>   }
>   gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 
> GL_ATIME|LM_FLAG_TRY_1CB, );
>   do_unlock = 1;
> - error = gfs2_glock_nq_m_atime(1, );
> + error = gfs2_glock_nq_atime();
>   if (unlikely(error))
>   goto out_unlock;
>   }
> @@ -282,7 +282,7 @@ static int gfs2_readpages(struct file *f
>   gfs2_holder_init(ip->i_gl, LM_ST_SHARED,
>LM_FLAG_TRY_1CB|GL_ATIME, );
>   do_unlock = 1;
> - ret = gfs2_glock_nq_m_atime(1, );
> + ret = gfs2_glock_nq_atime();
>   if (ret == GLR_TRYFAILED)
>   goto out_noerror;
>   if (unlikely(ret))
> @@ -354,7 +354,7 @@ static int gfs2_prepare_write(struct fil
>  
> 
>   gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_ATIME|LM_FLAG_TRY_1CB, 
> >i_gh);
> - error = gfs2_glock_nq_m_atime(1, >i_gh);
> + error = gfs2_glock_nq_atime(>i_gh);
>   if (unlikely(error)) {
>   if (error == GLR_TRYFAILED)
>   error = AOP_TRUNCATED_PAGE;
> @@ -609,7 +609,7 @@ static ssize_t gfs2_direct_IO(int rw, st
>* on this path. All we need change is atime.
>*/
>   gfs2_holder_init(ip->i_gl, LM_ST_SHARED, GL_ATIME, );
> - rv = gfs2_glock_nq_m_atime(1, );
> + rv = gfs2_glock_nq_atime();
>   if (rv)
>   goto out;
>  
> diff --git a/fs/gfs2/ops_file.c b/fs/gfs2/ops_file.c
> index eabf6c6..c2be216 100644
> --- a/fs/gfs2/ops_file.c
> +++ b/fs/gfs2/ops_file.c
> @@ -253,7 +253,7 @@ static int gfs2_get_flags(struct file *f
>   u32 fsflags;
>  
>   gfs2_holder_init(ip->i_gl, LM_ST_SHARED, GL_ATIME, );
> - error = gfs2_glock_nq_m_atime(1, );
> + error = gfs2_glock_nq_atime();
>   if (error)
>   return error;
>  
-- 
Russell Cattelan <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Shrink gfs2_inode (8) - i_vn [28/70]

2006-12-01 Thread Russell Cattelan

On Thu, 2006-11-30 at 12:16 +, Steven Whitehouse wrote:
> >From bfded27ba010d1c3b0aa3843f97dc9b80de751be Mon Sep 17 00:00:00 2001
> From: Steven Whitehouse <[EMAIL PROTECTED]>
> Date: Wed, 1 Nov 2006 16:05:38 -0500
> Subject: [PATCH] [GFS2] Shrink gfs2_inode (8) - i_vn
> 
> This shrinks the size of the gfs2_inode by 8 bytes by
> replacing the version counter with a one bit valid/invalid
> flag.
What is the version number used for?
It seems like anything that was specifically carving a 64 container
has a more specific reason that just ON/OFF?



> 
> Signed-off-by: Steven Whitehouse <[EMAIL PROTECTED]>
> ---
>  fs/gfs2/glops.c |5 +++--
>  fs/gfs2/incore.h|2 +-
>  fs/gfs2/inode.c |4 ++--
>  fs/gfs2/ops_inode.c |2 +-
>  4 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
> index aad45b7..9c20337 100644
> --- a/fs/gfs2/glops.c
> +++ b/fs/gfs2/glops.c
> @@ -305,8 +305,9 @@ static void inode_go_inval(struct gfs2_g
>   int data = (flags & DIO_DATA);
>  
>   if (meta) {
> + struct gfs2_inode *ip = gl->gl_object;
>   gfs2_meta_inval(gl);
> - gl->gl_vn++;
> + set_bit(GIF_INVALID, >i_flags);
>   }
>   if (data)
>   gfs2_page_inval(gl);
> @@ -351,7 +352,7 @@ static int inode_go_lock(struct gfs2_hol
>   if (!ip)
>   return 0;
>  
> - if (ip->i_vn != gl->gl_vn) {
> + if (test_bit(GIF_INVALID, >i_flags)) {
>   error = gfs2_inode_refresh(ip);
>   if (error)
>   return error;
> diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> index c0a8c3b..227a74d 100644
> --- a/fs/gfs2/incore.h
> +++ b/fs/gfs2/incore.h
> @@ -217,6 +217,7 @@ struct gfs2_alloc {
>  };
>  
>  enum {
> + GIF_INVALID = 0,
>   GIF_QD_LOCKED   = 1,
>   GIF_PAGED   = 2,
>   GIF_SW_PAGED= 3,
> @@ -228,7 +229,6 @@ struct gfs2_inode {
>  
>   unsigned long i_flags;  /* GIF_... */
>  
> - u64 i_vn;
>   struct gfs2_dinode_host i_di; /* To be replaced by ref to block */
>  
>   struct gfs2_glock *i_gl; /* Move into i_gh? */
> diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
> index f6177fc..e467780 100644
> --- a/fs/gfs2/inode.c
> +++ b/fs/gfs2/inode.c
> @@ -145,7 +145,7 @@ struct inode *gfs2_inode_lookup(struct s
>   if (unlikely(error))
>   goto fail_put;
>  
> - ip->i_vn = ip->i_gl->gl_vn - 1;
> + set_bit(GIF_INVALID, >i_flags);
>   error = gfs2_glock_nq_init(io_gl, LM_ST_SHARED, GL_EXACT, 
> >i_iopen_gh);
>   if (unlikely(error))
>   goto fail_iopen;
> @@ -242,7 +242,7 @@ int gfs2_inode_refresh(struct gfs2_inode
>  
>   error = gfs2_dinode_in(ip, dibh->b_data);
>   brelse(dibh);
> - ip->i_vn = ip->i_gl->gl_vn;
> + clear_bit(GIF_INVALID, >i_flags);
>  
>   return error;
>  }
> diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c
> index 0e4eade..b247f25 100644
> --- a/fs/gfs2/ops_inode.c
> +++ b/fs/gfs2/ops_inode.c
> @@ -844,7 +844,7 @@ static int gfs2_permission(struct inode 
>   struct gfs2_holder i_gh;
>   int error;
>  
> - if (ip->i_vn == ip->i_gl->gl_vn)
> + if (!test_bit(GIF_INVALID, >i_flags))
>   return generic_permission(inode, mask, gfs2_check_acl);
>  
>   error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, LM_FLAG_ANY, _gh);
-- 
Russell Cattelan <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Shrink gfs2_inode (6) - di_atime/di_mtime/di_ctime [26/70]

2006-12-01 Thread Russell Cattelan

  printk(KERN_INFO "  di_size = %llu\n", (unsigned long long)di->di_size);
>   printk(KERN_INFO "  di_blocks = %llu\n", (unsigned long 
> long)di->di_blocks);
> - printk(KERN_INFO "  di_atime = %lld\n", (long long)di->di_atime);
> - printk(KERN_INFO "  di_mtime = %lld\n", (long long)di->di_mtime);
> - printk(KERN_INFO "  di_ctime = %lld\n", (long long)di->di_ctime);
> -
>   printk(KERN_INFO "  di_goal_meta = %llu\n", (unsigned long 
> long)di->di_goal_meta);
>   printk(KERN_INFO "  di_goal_data = %llu\n", (unsigned long 
> long)di->di_goal_data);
>  
> diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c
> index 38b702a..5c3962c 100644
> --- a/fs/gfs2/ops_address.c
> +++ b/fs/gfs2/ops_address.c
> @@ -498,10 +498,6 @@ static int gfs2_commit_write(struct file
>   di->di_size = cpu_to_be64(inode->i_size);
>   }
>  
> - di->di_atime = cpu_to_be64(inode->i_atime.tv_sec);
> - di->di_mtime = cpu_to_be64(inode->i_mtime.tv_sec);
> - di->di_ctime = cpu_to_be64(inode->i_ctime.tv_sec);
> -
>   brelse(dibh);
>   gfs2_trans_end(sdp);
>   if (al->al_requested) {
> diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c
> index 06176de..585b43a 100644
> --- a/fs/gfs2/ops_inode.c
> +++ b/fs/gfs2/ops_inode.c
> @@ -729,7 +729,7 @@ static int gfs2_rename(struct inode *odi
>   error = gfs2_meta_inode_buffer(ip, );
>   if (error)
>   goto out_end_trans;
> - ip->i_di.di_ctime = get_seconds();
> + ip->i_inode.i_ctime.tv_sec = get_seconds();
>   gfs2_trans_add_bh(ip->i_gl, dibh, 1);
>   gfs2_dinode_out(ip, dibh->b_data);
>   brelse(dibh);
> @@ -915,7 +915,6 @@ static int setattr_chown(struct inode *i
>  
>   error = inode_setattr(inode, attr);
>   gfs2_assert_warn(sdp, !error);
> - gfs2_inode_attr_out(ip);
>  
>   gfs2_trans_add_bh(ip->i_gl, dibh, 1);
>   gfs2_dinode_out(ip, dibh->b_data);
> diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h
> index c61517b..7f5a4a1 100644
> --- a/include/linux/gfs2_ondisk.h
> +++ b/include/linux/gfs2_ondisk.h
> @@ -324,9 +324,6 @@ struct gfs2_dinode {
>  struct gfs2_dinode_host {
>   __u64 di_size;  /* number of bytes in file */
>   __u64 di_blocks;/* number of blocks in file */
> - __u64 di_atime; /* time last accessed */
> - __u64 di_mtime; /* time last modified */
> - __u64 di_ctime; /* time last changed */
>  
>   /* This section varies from gfs1. Padding added to align with
>   * remainder of dinode
-- 
Russell Cattelan <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Shrink gfs2_inode (5) - di_nlink [25/70]

2006-12-01 Thread Russell Cattelan

  return -EMLINK;
>  
>   return 0;
> @@ -808,7 +814,7 @@ static int link_dinode(struct gfs2_inode
>   error = gfs2_meta_inode_buffer(ip, );
>   if (error)
>   goto fail_end_trans;
> - ip->i_di.di_nlink = 1;
> + ip->i_inode.i_nlink = 1;
>   gfs2_trans_add_bh(ip->i_gl, dibh, 1);
>   gfs2_dinode_out(ip, dibh->b_data);
>   brelse(dibh);
> @@ -1016,7 +1022,12 @@ int gfs2_rmdiri(struct gfs2_inode *dip, 
>   if (error)
>   return error;
>  
> - error = gfs2_change_nlink(ip, -2);
> + /* It looks odd, but it really should be done twice */
> + error = gfs2_change_nlink(ip, -1);
> + if (error)
> + return error;
> +
> + error = gfs2_change_nlink(ip, -1);
>   if (error)
>   return error;
>  
> diff --git a/fs/gfs2/ondisk.c b/fs/gfs2/ondisk.c
> index e224f6a..b4e354b 100644
> --- a/fs/gfs2/ondisk.c
> +++ b/fs/gfs2/ondisk.c
> @@ -164,7 +164,7 @@ void gfs2_dinode_out(const struct gfs2_i
>   str->di_mode = cpu_to_be32(ip->i_inode.i_mode);
>   str->di_uid = cpu_to_be32(ip->i_inode.i_uid);
>   str->di_gid = cpu_to_be32(ip->i_inode.i_gid);
> - str->di_nlink = cpu_to_be32(di->di_nlink);
> + str->di_nlink = cpu_to_be32(ip->i_inode.i_nlink);
>   str->di_size = cpu_to_be64(di->di_size);
>   str->di_blocks = cpu_to_be64(di->di_blocks);
>   str->di_atime = cpu_to_be64(di->di_atime);
> @@ -191,7 +191,6 @@ void gfs2_dinode_print(const struct gfs2
>  
>   gfs2_inum_print(>i_num);
>  
> - pv(di, di_nlink, "%u");
>   printk(KERN_INFO "  di_size = %llu\n", (unsigned long long)di->di_size);
>   printk(KERN_INFO "  di_blocks = %llu\n", (unsigned long 
> long)di->di_blocks);
>   printk(KERN_INFO "  di_atime = %lld\n", (long long)di->di_atime);
> diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c
> index efbcec3..06176de 100644
> --- a/fs/gfs2/ops_inode.c
> +++ b/fs/gfs2/ops_inode.c
> @@ -169,7 +169,7 @@ static int gfs2_link(struct dentry *old_
>   }
>  
>   error = -EINVAL;
> - if (!dip->i_di.di_nlink)
> + if (!dip->i_inode.i_nlink)
>   goto out_gunlock;
>   error = -EFBIG;
>   if (dip->i_di.di_entries == (u32)-1)
> @@ -178,10 +178,10 @@ static int gfs2_link(struct dentry *old_
>   if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
>   goto out_gunlock;
>   error = -EINVAL;
> - if (!ip->i_di.di_nlink)
> + if (!ip->i_inode.i_nlink)
>   goto out_gunlock;
>   error = -EMLINK;
> - if (ip->i_di.di_nlink == (u32)-1)
> + if (ip->i_inode.i_nlink == (u32)-1)
>   goto out_gunlock;
>  
>   alloc_required = error = gfs2_diradd_alloc_required(dir, 
> >d_name);
> @@ -386,7 +386,7 @@ static int gfs2_mkdir(struct inode *dir,
>  
>   ip = ghs[1].gh_gl->gl_object;
>  
> - ip->i_di.di_nlink = 2;
> + ip->i_inode.i_nlink = 2;
>   ip->i_di.di_size = sdp->sd_sb.sb_bsize - sizeof(struct gfs2_dinode);
>   ip->i_di.di_flags |= GFS2_DIF_JDATA;
>   ip->i_di.di_payload_format = GFS2_FORMAT_DE;
> @@ -636,7 +636,7 @@ static int gfs2_rename(struct inode *odi
>   };
>  
>   if (odip != ndip) {
> - if (!ndip->i_di.di_nlink) {
> + if (!ndip->i_inode.i_nlink) {
>   error = -EINVAL;
>   goto out_gunlock;
>   }
> @@ -645,7 +645,7 @@ static int gfs2_rename(struct inode *odi
>   goto out_gunlock;
>   }
>   if (S_ISDIR(ip->i_inode.i_mode) &&
> - ndip->i_di.di_nlink == (u32)-1) {
> + ndip->i_inode.i_nlink == (u32)-1) {
>   error = -EMLINK;
>   goto out_gunlock;
>   }
> diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h
> index 896c7f8..c61517b 100644
> --- a/include/linux/gfs2_ondisk.h
> +++ b/include/linux/gfs2_ondisk.h
> @@ -322,7 +322,6 @@ struct gfs2_dinode {
>  };
>  
>  struct gfs2_dinode_host {
> - __u32 di_nlink; /* number of links to this file */
>   __u64 di_size;  /* number of bytes in file */
>   __u64 di_blocks;/* number of blocks in file */
>   __u64 di_atime; /* time last accessed */
-- 
Russell Cattelan <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Fix incorrect fs sync behaviour [2/5]

2006-12-01 Thread Russell Cattelan

On Mon, 2006-11-06 at 11:07 +, Steven Whitehouse wrote:
> >From 4a221953ed121692aa25998451a57c7f4be8b4f6 Mon Sep 17 00:00:00 2001
> From: Steven Whitehouse <[EMAIL PROTECTED]>
> Date: Wed, 1 Nov 2006 09:57:57 -0500
> Subject: [PATCH] [GFS2] Fix incorrect fs sync behaviour.
> 
> This adds a sync_fs superblock operation for GFS2 and removes
> the journal flush from write_super in favour of sync_fs where it
> ought to be. This is more or less identical to the way in which ext3
> does this.
> 
> This bug was pointed out by Russell Cattelan <[EMAIL PROTECTED]>
> 
> Cc: Russell Cattelan <[EMAIL PROTECTED]>
> Signed-off-by: Steven Whitehouse <[EMAIL PROTECTED]>
> ---
>  fs/gfs2/ops_super.c |   44 
>  1 files changed, 28 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/gfs2/ops_super.c b/fs/gfs2/ops_super.c
> index 06f06f7..b47d959 100644
> --- a/fs/gfs2/ops_super.c
> +++ b/fs/gfs2/ops_super.c
> @@ -138,16 +138,27 @@ static void gfs2_put_super(struct super_
>  }
>  
>  /**
> - * gfs2_write_super - disk commit all incore transactions
> - * @sb: the filesystem
> + * gfs2_write_super
> + * @sb: the superblock
>   *
> - * This function is called every time sync(2) is called.
> - * After this exits, all dirty buffers are synced.
>   */
>  
>  static void gfs2_write_super(struct super_block *sb)
>  {
> + sb->s_dirt = 0;
This is a bit different than my original patch? 
Are you sure we don't need the s_lock here?


> +}
> +
> +/**
> + * gfs2_sync_fs - sync the filesystem
> + * @sb: the superblock
> + *
> + * Flushes the log to disk.
> + */
> +static int gfs2_sync_fs(struct super_block *sb, int wait)
> +{
> + sb->s_dirt = 0;
>   gfs2_log_flush(sb->s_fs_info, NULL);
> + return 0;
>  }
>  
>  /**
> @@ -452,17 +463,18 @@ static void gfs2_destroy_inode(struct in
>  }
>  
>  struct super_operations gfs2_super_ops = {
> - .alloc_inode = gfs2_alloc_inode,
> - .destroy_inode = gfs2_destroy_inode,
> - .write_inode = gfs2_write_inode,
> - .delete_inode = gfs2_delete_inode,
> - .put_super = gfs2_put_super,
> - .write_super = gfs2_write_super,
> - .write_super_lockfs = gfs2_write_super_lockfs,
> - .unlockfs = gfs2_unlockfs,
> - .statfs = gfs2_statfs,
> - .remount_fs = gfs2_remount_fs,
> - .clear_inode = gfs2_clear_inode,
> - .show_options = gfs2_show_options,
> + .alloc_inode= gfs2_alloc_inode,
> + .destroy_inode  = gfs2_destroy_inode,
> + .write_inode= gfs2_write_inode,
> + .delete_inode   = gfs2_delete_inode,
> + .put_super  = gfs2_put_super,
> + .write_super= gfs2_write_super,
> + .sync_fs= gfs2_sync_fs,
> + .write_super_lockfs = gfs2_write_super_lockfs,
> + .unlockfs   = gfs2_unlockfs,
> + .statfs = gfs2_statfs,
> + .remount_fs = gfs2_remount_fs,
> + .clear_inode= gfs2_clear_inode,
> + .show_options   = gfs2_show_options,
>  };
>  
-- 
Russell Cattelan <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GFS2] Fix incorrect fs sync behaviour [2/5]

2006-12-01 Thread Russell Cattelan

On Mon, 2006-11-06 at 11:07 +, Steven Whitehouse wrote:
 From 4a221953ed121692aa25998451a57c7f4be8b4f6 Mon Sep 17 00:00:00 2001
 From: Steven Whitehouse [EMAIL PROTECTED]
 Date: Wed, 1 Nov 2006 09:57:57 -0500
 Subject: [PATCH] [GFS2] Fix incorrect fs sync behaviour.
 
 This adds a sync_fs superblock operation for GFS2 and removes
 the journal flush from write_super in favour of sync_fs where it
 ought to be. This is more or less identical to the way in which ext3
 does this.
 
 This bug was pointed out by Russell Cattelan [EMAIL PROTECTED]
 
 Cc: Russell Cattelan [EMAIL PROTECTED]
 Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]
 ---
  fs/gfs2/ops_super.c |   44 
  1 files changed, 28 insertions(+), 16 deletions(-)
 
 diff --git a/fs/gfs2/ops_super.c b/fs/gfs2/ops_super.c
 index 06f06f7..b47d959 100644
 --- a/fs/gfs2/ops_super.c
 +++ b/fs/gfs2/ops_super.c
 @@ -138,16 +138,27 @@ static void gfs2_put_super(struct super_
  }
  
  /**
 - * gfs2_write_super - disk commit all incore transactions
 - * @sb: the filesystem
 + * gfs2_write_super
 + * @sb: the superblock
   *
 - * This function is called every time sync(2) is called.
 - * After this exits, all dirty buffers are synced.
   */
  
  static void gfs2_write_super(struct super_block *sb)
  {
 + sb-s_dirt = 0;
This is a bit different than my original patch? 
Are you sure we don't need the s_lock here?


 +}
 +
 +/**
 + * gfs2_sync_fs - sync the filesystem
 + * @sb: the superblock
 + *
 + * Flushes the log to disk.
 + */
 +static int gfs2_sync_fs(struct super_block *sb, int wait)
 +{
 + sb-s_dirt = 0;
   gfs2_log_flush(sb-s_fs_info, NULL);
 + return 0;
  }
  
  /**
 @@ -452,17 +463,18 @@ static void gfs2_destroy_inode(struct in
  }
  
  struct super_operations gfs2_super_ops = {
 - .alloc_inode = gfs2_alloc_inode,
 - .destroy_inode = gfs2_destroy_inode,
 - .write_inode = gfs2_write_inode,
 - .delete_inode = gfs2_delete_inode,
 - .put_super = gfs2_put_super,
 - .write_super = gfs2_write_super,
 - .write_super_lockfs = gfs2_write_super_lockfs,
 - .unlockfs = gfs2_unlockfs,
 - .statfs = gfs2_statfs,
 - .remount_fs = gfs2_remount_fs,
 - .clear_inode = gfs2_clear_inode,
 - .show_options = gfs2_show_options,
 + .alloc_inode= gfs2_alloc_inode,
 + .destroy_inode  = gfs2_destroy_inode,
 + .write_inode= gfs2_write_inode,
 + .delete_inode   = gfs2_delete_inode,
 + .put_super  = gfs2_put_super,
 + .write_super= gfs2_write_super,
 + .sync_fs= gfs2_sync_fs,
 + .write_super_lockfs = gfs2_write_super_lockfs,
 + .unlockfs   = gfs2_unlockfs,
 + .statfs = gfs2_statfs,
 + .remount_fs = gfs2_remount_fs,
 + .clear_inode= gfs2_clear_inode,
 + .show_options   = gfs2_show_options,
  };
  
-- 
Russell Cattelan [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GFS2] Shrink gfs2_inode (5) - di_nlink [25/70]

2006-12-01 Thread Russell Cattelan

;
 +
 + error = gfs2_change_nlink(ip, -1);
   if (error)
   return error;
  
 diff --git a/fs/gfs2/ondisk.c b/fs/gfs2/ondisk.c
 index e224f6a..b4e354b 100644
 --- a/fs/gfs2/ondisk.c
 +++ b/fs/gfs2/ondisk.c
 @@ -164,7 +164,7 @@ void gfs2_dinode_out(const struct gfs2_i
   str-di_mode = cpu_to_be32(ip-i_inode.i_mode);
   str-di_uid = cpu_to_be32(ip-i_inode.i_uid);
   str-di_gid = cpu_to_be32(ip-i_inode.i_gid);
 - str-di_nlink = cpu_to_be32(di-di_nlink);
 + str-di_nlink = cpu_to_be32(ip-i_inode.i_nlink);
   str-di_size = cpu_to_be64(di-di_size);
   str-di_blocks = cpu_to_be64(di-di_blocks);
   str-di_atime = cpu_to_be64(di-di_atime);
 @@ -191,7 +191,6 @@ void gfs2_dinode_print(const struct gfs2
  
   gfs2_inum_print(ip-i_num);
  
 - pv(di, di_nlink, %u);
   printk(KERN_INFO   di_size = %llu\n, (unsigned long long)di-di_size);
   printk(KERN_INFO   di_blocks = %llu\n, (unsigned long 
 long)di-di_blocks);
   printk(KERN_INFO   di_atime = %lld\n, (long long)di-di_atime);
 diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c
 index efbcec3..06176de 100644
 --- a/fs/gfs2/ops_inode.c
 +++ b/fs/gfs2/ops_inode.c
 @@ -169,7 +169,7 @@ static int gfs2_link(struct dentry *old_
   }
  
   error = -EINVAL;
 - if (!dip-i_di.di_nlink)
 + if (!dip-i_inode.i_nlink)
   goto out_gunlock;
   error = -EFBIG;
   if (dip-i_di.di_entries == (u32)-1)
 @@ -178,10 +178,10 @@ static int gfs2_link(struct dentry *old_
   if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
   goto out_gunlock;
   error = -EINVAL;
 - if (!ip-i_di.di_nlink)
 + if (!ip-i_inode.i_nlink)
   goto out_gunlock;
   error = -EMLINK;
 - if (ip-i_di.di_nlink == (u32)-1)
 + if (ip-i_inode.i_nlink == (u32)-1)
   goto out_gunlock;
  
   alloc_required = error = gfs2_diradd_alloc_required(dir, 
 dentry-d_name);
 @@ -386,7 +386,7 @@ static int gfs2_mkdir(struct inode *dir,
  
   ip = ghs[1].gh_gl-gl_object;
  
 - ip-i_di.di_nlink = 2;
 + ip-i_inode.i_nlink = 2;
   ip-i_di.di_size = sdp-sd_sb.sb_bsize - sizeof(struct gfs2_dinode);
   ip-i_di.di_flags |= GFS2_DIF_JDATA;
   ip-i_di.di_payload_format = GFS2_FORMAT_DE;
 @@ -636,7 +636,7 @@ static int gfs2_rename(struct inode *odi
   };
  
   if (odip != ndip) {
 - if (!ndip-i_di.di_nlink) {
 + if (!ndip-i_inode.i_nlink) {
   error = -EINVAL;
   goto out_gunlock;
   }
 @@ -645,7 +645,7 @@ static int gfs2_rename(struct inode *odi
   goto out_gunlock;
   }
   if (S_ISDIR(ip-i_inode.i_mode) 
 - ndip-i_di.di_nlink == (u32)-1) {
 + ndip-i_inode.i_nlink == (u32)-1) {
   error = -EMLINK;
   goto out_gunlock;
   }
 diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h
 index 896c7f8..c61517b 100644
 --- a/include/linux/gfs2_ondisk.h
 +++ b/include/linux/gfs2_ondisk.h
 @@ -322,7 +322,6 @@ struct gfs2_dinode {
  };
  
  struct gfs2_dinode_host {
 - __u32 di_nlink; /* number of links to this file */
   __u64 di_size;  /* number of bytes in file */
   __u64 di_blocks;/* number of blocks in file */
   __u64 di_atime; /* time last accessed */
-- 
Russell Cattelan [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Shrink gfs2_inode (8) - i_vn [28/70]

2006-12-01 Thread Russell Cattelan

On Thu, 2006-11-30 at 12:16 +, Steven Whitehouse wrote:
 From bfded27ba010d1c3b0aa3843f97dc9b80de751be Mon Sep 17 00:00:00 2001
 From: Steven Whitehouse [EMAIL PROTECTED]
 Date: Wed, 1 Nov 2006 16:05:38 -0500
 Subject: [PATCH] [GFS2] Shrink gfs2_inode (8) - i_vn
 
 This shrinks the size of the gfs2_inode by 8 bytes by
 replacing the version counter with a one bit valid/invalid
 flag.
What is the version number used for?
It seems like anything that was specifically carving a 64 container
has a more specific reason that just ON/OFF?



 
 Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]
 ---
  fs/gfs2/glops.c |5 +++--
  fs/gfs2/incore.h|2 +-
  fs/gfs2/inode.c |4 ++--
  fs/gfs2/ops_inode.c |2 +-
  4 files changed, 7 insertions(+), 6 deletions(-)
 
 diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
 index aad45b7..9c20337 100644
 --- a/fs/gfs2/glops.c
 +++ b/fs/gfs2/glops.c
 @@ -305,8 +305,9 @@ static void inode_go_inval(struct gfs2_g
   int data = (flags  DIO_DATA);
  
   if (meta) {
 + struct gfs2_inode *ip = gl-gl_object;
   gfs2_meta_inval(gl);
 - gl-gl_vn++;
 + set_bit(GIF_INVALID, ip-i_flags);
   }
   if (data)
   gfs2_page_inval(gl);
 @@ -351,7 +352,7 @@ static int inode_go_lock(struct gfs2_hol
   if (!ip)
   return 0;
  
 - if (ip-i_vn != gl-gl_vn) {
 + if (test_bit(GIF_INVALID, ip-i_flags)) {
   error = gfs2_inode_refresh(ip);
   if (error)
   return error;
 diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
 index c0a8c3b..227a74d 100644
 --- a/fs/gfs2/incore.h
 +++ b/fs/gfs2/incore.h
 @@ -217,6 +217,7 @@ struct gfs2_alloc {
  };
  
  enum {
 + GIF_INVALID = 0,
   GIF_QD_LOCKED   = 1,
   GIF_PAGED   = 2,
   GIF_SW_PAGED= 3,
 @@ -228,7 +229,6 @@ struct gfs2_inode {
  
   unsigned long i_flags;  /* GIF_... */
  
 - u64 i_vn;
   struct gfs2_dinode_host i_di; /* To be replaced by ref to block */
  
   struct gfs2_glock *i_gl; /* Move into i_gh? */
 diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
 index f6177fc..e467780 100644
 --- a/fs/gfs2/inode.c
 +++ b/fs/gfs2/inode.c
 @@ -145,7 +145,7 @@ struct inode *gfs2_inode_lookup(struct s
   if (unlikely(error))
   goto fail_put;
  
 - ip-i_vn = ip-i_gl-gl_vn - 1;
 + set_bit(GIF_INVALID, ip-i_flags);
   error = gfs2_glock_nq_init(io_gl, LM_ST_SHARED, GL_EXACT, 
 ip-i_iopen_gh);
   if (unlikely(error))
   goto fail_iopen;
 @@ -242,7 +242,7 @@ int gfs2_inode_refresh(struct gfs2_inode
  
   error = gfs2_dinode_in(ip, dibh-b_data);
   brelse(dibh);
 - ip-i_vn = ip-i_gl-gl_vn;
 + clear_bit(GIF_INVALID, ip-i_flags);
  
   return error;
  }
 diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c
 index 0e4eade..b247f25 100644
 --- a/fs/gfs2/ops_inode.c
 +++ b/fs/gfs2/ops_inode.c
 @@ -844,7 +844,7 @@ static int gfs2_permission(struct inode 
   struct gfs2_holder i_gh;
   int error;
  
 - if (ip-i_vn == ip-i_gl-gl_vn)
 + if (!test_bit(GIF_INVALID, ip-i_flags))
   return generic_permission(inode, mask, gfs2_check_acl);
  
   error = gfs2_glock_nq_init(ip-i_gl, LM_ST_SHARED, LM_FLAG_ANY, i_gh);
-- 
Russell Cattelan [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Remove unused function from inode.c [50/70]

2006-12-01 Thread Russell Cattelan

, LM_ST_SHARED,
LM_FLAG_TRY_1CB|GL_ATIME, gh);
   do_unlock = 1;
 - ret = gfs2_glock_nq_m_atime(1, gh);
 + ret = gfs2_glock_nq_atime(gh);
   if (ret == GLR_TRYFAILED)
   goto out_noerror;
   if (unlikely(ret))
 @@ -354,7 +354,7 @@ static int gfs2_prepare_write(struct fil
  
 
   gfs2_holder_init(ip-i_gl, LM_ST_EXCLUSIVE, GL_ATIME|LM_FLAG_TRY_1CB, 
 ip-i_gh);
 - error = gfs2_glock_nq_m_atime(1, ip-i_gh);
 + error = gfs2_glock_nq_atime(ip-i_gh);
   if (unlikely(error)) {
   if (error == GLR_TRYFAILED)
   error = AOP_TRUNCATED_PAGE;
 @@ -609,7 +609,7 @@ static ssize_t gfs2_direct_IO(int rw, st
* on this path. All we need change is atime.
*/
   gfs2_holder_init(ip-i_gl, LM_ST_SHARED, GL_ATIME, gh);
 - rv = gfs2_glock_nq_m_atime(1, gh);
 + rv = gfs2_glock_nq_atime(gh);
   if (rv)
   goto out;
  
 diff --git a/fs/gfs2/ops_file.c b/fs/gfs2/ops_file.c
 index eabf6c6..c2be216 100644
 --- a/fs/gfs2/ops_file.c
 +++ b/fs/gfs2/ops_file.c
 @@ -253,7 +253,7 @@ static int gfs2_get_flags(struct file *f
   u32 fsflags;
  
   gfs2_holder_init(ip-i_gl, LM_ST_SHARED, GL_ATIME, gh);
 - error = gfs2_glock_nq_m_atime(1, gh);
 + error = gfs2_glock_nq_atime(gh);
   if (error)
   return error;
  
-- 
Russell Cattelan [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part

Re: [GFS2 DLM] Guide to -nmw tree patches

2006-12-01 Thread Russell Cattelan

On Thu, 2006-11-30 at 13:21 +0100, Arjan van de Ven wrote:
 On Thu, 2006-11-30 at 12:12 +, Steven Whitehouse wrote:
  Hi,
  
  Below is a summary diffstat of all the changes in the GFS2  DLM -nmw
  (next merge window) git tree. Since merge time is once again upon us,
  the following patches are the current complete content of the tree.
 
 
 Hi,
 
 can you please make sure your patch series is in reply to your first
 message, so that your mails properly thread in mail clients?
 
 (everyone else seems to manage this, I bet the git or quilt patch series
 send stuff has this built in somehow..)

I agree.

I would also continue to urge for internal reviews (since this is
the first I have seen any of these changes) before spamming lkml
with 70 loosely related patches.

 
 Greetings,
   Arjan van de Ven
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
-- 
Russell Cattelan [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Simplify glops functions [53/70]

2006-12-01 Thread Russell Cattelan

);
 - gfs2_page_wait(gl);
 - }
 - if (flags  DIO_RELEASE)
 - gfs2_ail_empty_gl(gl);
 + gfs2_page_writeback(gl);
 + gfs2_log_flush(gl-gl_sbd, gl);
 + gfs2_meta_sync(gl);
 + gfs2_page_wait(gl);
 + clear_bit(GLF_DIRTY, gl-gl_flags);
 + gfs2_ail_empty_gl(gl);
   }
  }
  
 @@ -302,15 +284,13 @@ static void inode_go_sync(struct gfs2_gl
  static void inode_go_inval(struct gfs2_glock *gl, int flags)
  {
   int meta = (flags  DIO_METADATA);
 - int data = (flags  DIO_DATA);
  
   if (meta) {
   struct gfs2_inode *ip = gl-gl_object;
   gfs2_meta_inval(gl);
   set_bit(GIF_INVALID, ip-i_flags);
   }
 - if (data)
 - gfs2_page_inval(gl);
 + gfs2_page_inval(gl);
  }
  
  /**
 @@ -494,7 +474,7 @@ static void trans_go_xmote_bh(struct gfs
   if (gl-gl_state != LM_ST_UNLOCKED 
   test_bit(SDF_JOURNAL_LIVE, sdp-sd_flags)) {
   gfs2_meta_cache_flush(GFS2_I(sdp-sd_jdesc-jd_inode));
 - j_gl-gl_ops-go_inval(j_gl, DIO_METADATA | DIO_DATA);
 + j_gl-gl_ops-go_inval(j_gl, DIO_METADATA);
  
   error = gfs2_find_jhead(sdp-sd_jdesc, head);
   if (error)
 diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
 index 227a74d..734421e 100644
 --- a/fs/gfs2/incore.h
 +++ b/fs/gfs2/incore.h
 @@ -14,8 +14,6 @@ #include linux/fs.h
  
  #define DIO_WAIT 0x0010
  #define DIO_METADATA 0x0020
 -#define DIO_DATA 0x0040
 -#define DIO_RELEASE  0x0080
  #define DIO_ALL  0x0100
  
  struct gfs2_log_operations;
 @@ -103,18 +101,17 @@ struct gfs2_bufdata {
  };
  
  struct gfs2_glock_operations {
 - void (*go_xmote_th) (struct gfs2_glock * gl, unsigned int state,
 -  int flags);
 - void (*go_xmote_bh) (struct gfs2_glock * gl);
 - void (*go_drop_th) (struct gfs2_glock * gl);
 - void (*go_drop_bh) (struct gfs2_glock * gl);
 - void (*go_sync) (struct gfs2_glock * gl, int flags);
 - void (*go_inval) (struct gfs2_glock * gl, int flags);
 - int (*go_demote_ok) (struct gfs2_glock * gl);
 - int (*go_lock) (struct gfs2_holder * gh);
 - void (*go_unlock) (struct gfs2_holder * gh);
 - void (*go_callback) (struct gfs2_glock * gl, unsigned int state);
 - void (*go_greedy) (struct gfs2_glock * gl);
 + void (*go_xmote_th) (struct gfs2_glock *gl, unsigned int state, int 
 flags);
 + void (*go_xmote_bh) (struct gfs2_glock *gl);
 + void (*go_drop_th) (struct gfs2_glock *gl);
 + void (*go_drop_bh) (struct gfs2_glock *gl);
 + void (*go_sync) (struct gfs2_glock *gl);
 + void (*go_inval) (struct gfs2_glock *gl, int flags);
 + int (*go_demote_ok) (struct gfs2_glock *gl);
 + int (*go_lock) (struct gfs2_holder *gh);
 + void (*go_unlock) (struct gfs2_holder *gh);
 + void (*go_callback) (struct gfs2_glock *gl, unsigned int state);
 + void (*go_greedy) (struct gfs2_glock *gl);
   const int go_type;
  };
  
 diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
 index 0ef8317..1408c5f 100644
 --- a/fs/gfs2/super.c
 +++ b/fs/gfs2/super.c
 @@ -517,7 +517,7 @@ int gfs2_make_fs_rw(struct gfs2_sbd *sdp
   return error;
  
   gfs2_meta_cache_flush(ip);
 - j_gl-gl_ops-go_inval(j_gl, DIO_METADATA | DIO_DATA);
 + j_gl-gl_ops-go_inval(j_gl, DIO_METADATA);
  
   error = gfs2_find_jhead(sdp-sd_jdesc, head);
   if (error)
-- 
Russell Cattelan [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Reduce number of arguments to meta_io.c:getbuf() [58/70]

2006-12-01 Thread Russell Cattelan

 of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
-- 
Russell Cattelan [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Fix journal flush problem [56/70]

2006-12-01 Thread Russell Cattelan

;
  
 + BUG_ON(!gl);
 + BUG_ON(!sdp);
 +
   spin_lock(ip-i_spin);
   if (*bh_slot  (*bh_slot)-b_blocknr == num) {
   bh = *bh_slot;
 diff --git a/fs/gfs2/ops_super.c b/fs/gfs2/ops_super.c
 index 8635175..7685b46 100644
 --- a/fs/gfs2/ops_super.c
 +++ b/fs/gfs2/ops_super.c
 @@ -157,7 +157,8 @@ static void gfs2_write_super(struct supe
  static int gfs2_sync_fs(struct super_block *sb, int wait)
  {
   sb-s_dirt = 0;
 - gfs2_log_flush(sb-s_fs_info, NULL);
 + if (wait)
 + gfs2_log_flush(sb-s_fs_info, NULL);
   return 0;
  }
  
 @@ -293,8 +294,6 @@ static void gfs2_clear_inode(struct inod
*/
   if (inode-i_private) {
   struct gfs2_inode *ip = GFS2_I(inode);
 - gfs2_glock_inode_squish(inode);
 - gfs2_assert(inode-i_sb-s_fs_info, ip-i_gl-gl_state == 
 LM_ST_UNLOCKED);
   ip-i_gl-gl_object = NULL;
   gfs2_glock_schedule_for_reclaim(ip-i_gl);
   gfs2_glock_put(ip-i_gl);
 @@ -395,7 +394,7 @@ static void gfs2_delete_inode(struct ino
   if (!inode-i_private)
   goto out;
  
 - error = gfs2_glock_nq_init(ip-i_gl, LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB | 
 GL_NOCACHE, gh);
 + error = gfs2_glock_nq_init(ip-i_gl, LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB, 
 gh);
   if (unlikely(error)) {
   gfs2_glock_dq_uninit(ip-i_iopen_gh);
   goto out;
-- 
Russell Cattelan [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Change argument to gfs2_dinode_in [18/70]

2006-12-01 Thread Russell Cattelan

On Thu, 2006-11-30 at 12:15 +, Steven Whitehouse wrote:
 From 891ea14712da68e282de8583e5fa14f0d3f3731e Mon Sep 17 00:00:00 2001
 From: Steven Whitehouse [EMAIL PROTECTED]
 Date: Tue, 31 Oct 2006 15:22:10 -0500
 Subject: [PATCH] [GFS2] Change argument to gfs2_dinode_in
 
 This is a preliminary patch to enable the removal of fields
 in gfs2_dinode_host which are duplicated in struct inode.
Deferred till the complete cleanup work is done?


 
 Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]
 ---
  fs/gfs2/inode.c |2 +-
  fs/gfs2/ondisk.c|4 ++--
  include/linux/gfs2_ondisk.h |2 +-
  3 files changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
 index b861ddb..9875e93 100644
 --- a/fs/gfs2/inode.c
 +++ b/fs/gfs2/inode.c
 @@ -229,7 +229,7 @@ int gfs2_inode_refresh(struct gfs2_inode
   return -EIO;
   }
  
 - gfs2_dinode_in(ip-i_di, dibh-b_data);
 + gfs2_dinode_in(ip, dibh-b_data);
  
   brelse(dibh);
  
 diff --git a/fs/gfs2/ondisk.c b/fs/gfs2/ondisk.c
 index 2c50fa0..edf8756 100644
 --- a/fs/gfs2/ondisk.c
 +++ b/fs/gfs2/ondisk.c
 @@ -155,8 +155,9 @@ void gfs2_quota_in(struct gfs2_quota_hos
   qu-qu_value = be64_to_cpu(str-qu_value);
  }
  
 -void gfs2_dinode_in(struct gfs2_dinode_host *di, const void *buf)
 +void gfs2_dinode_in(struct gfs2_inode *ip, const void *buf)
  {
 + struct gfs2_dinode_host *di = ip-i_di;
   const struct gfs2_dinode *str = buf;
  
   gfs2_meta_header_in(di-di_header, buf);
 @@ -186,7 +187,6 @@ void gfs2_dinode_in(struct gfs2_dinode_h
   di-di_entries = be32_to_cpu(str-di_entries);
  
   di-di_eattr = be64_to_cpu(str-di_eattr);
 -
  }
  
  void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf)
 diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h
 index 550effa..08d8240 100644
 --- a/include/linux/gfs2_ondisk.h
 +++ b/include/linux/gfs2_ondisk.h
 @@ -534,8 +534,8 @@ extern void gfs2_rindex_out(const struct
  extern void gfs2_rgrp_in(struct gfs2_rgrp_host *rg, const void *buf);
  extern void gfs2_rgrp_out(const struct gfs2_rgrp_host *rg, void *buf);
  extern void gfs2_quota_in(struct gfs2_quota_host *qu, const void *buf);
 -extern void gfs2_dinode_in(struct gfs2_dinode_host *di, const void *buf);
  struct gfs2_inode;
 +extern void gfs2_dinode_in(struct gfs2_inode *ip, const void *buf);
  extern void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf);
  extern void gfs2_ea_header_in(struct gfs2_ea_header *ea, const void *buf);
  extern void gfs2_ea_header_out(const struct gfs2_ea_header *ea, void *buf);
-- 
Russell Cattelan [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Tidy up bmap fix boundary bug [48/70]

2006-12-01 Thread Russell Cattelan

);
 + set_buffer_boundary(bh_map);
   if (new) {
   struct buffer_head *dibh;
   error = gfs2_meta_inode_buffer(ip, dibh);
 @@ -510,8 +531,8 @@ static int gfs2_block_pointers(struct in
   while(--maxlen  !buffer_boundary(bh_map)) {
   u64 eblock;
  
 - mp-mp_list[end_of_metadata]++;
 - boundary = lookup_block(ip, bh, end_of_metadata, mp, 0, 
 new, eblock);
 + mp.mp_list[end_of_metadata]++;
 + boundary = lookup_block(ip, bh, end_of_metadata, mp, 
 0, new, eblock);
   if (eblock != ++dblock)
   break;
   bh_map-b_size += (1  inode-i_blkbits);
 @@ -521,43 +542,15 @@ static int gfs2_block_pointers(struct in
   }
  out_brelse:
   brelse(bh);
 - return 0;
 -}
 -
 -
 -static inline void bmap_lock(struct inode *inode, int create)
 -{
 - struct gfs2_inode *ip = GFS2_I(inode);
 - if (create)
 - down_write(ip-i_rw_mutex);
 - else
 - down_read(ip-i_rw_mutex);
 -}
 -
 -static inline void bmap_unlock(struct inode *inode, int create)
 -{
 - struct gfs2_inode *ip = GFS2_I(inode);
 - if (create)
 - up_write(ip-i_rw_mutex);
 - else
 - up_read(ip-i_rw_mutex);
 -}
 -
 -int gfs2_block_map(struct inode *inode, u64 lblock, int create,
 -struct buffer_head *bh)
 -{
 - struct metapath mp;
 - int ret;
 -
 - bmap_lock(inode, create);
 - ret = gfs2_block_pointers(inode, lblock, create, bh, mp);
 +out_ok:
 + error = 0;
 +out_fail:
   bmap_unlock(inode, create);
 - return ret;
 + return error;
  }
  
  int gfs2_extent_map(struct inode *inode, u64 lblock, int *new, u64 *dblock, 
 unsigned *extlen)
  {
 - struct metapath mp;
   struct buffer_head bh = { .b_state = 0, .b_blocknr = 0 };
   int ret;
   int create = *new;
 @@ -567,9 +560,7 @@ int gfs2_extent_map(struct inode *inode,
   BUG_ON(!new);
  
   bh.b_size = 1  (inode-i_blkbits + 5);
 - bmap_lock(inode, create);
 - ret = gfs2_block_pointers(inode, lblock, create, bh, mp);
 - bmap_unlock(inode, create);
 + ret = gfs2_block_map(inode, lblock, create, bh);
   *extlen = bh.b_size  inode-i_blkbits;
   *dblock = bh.b_blocknr;
   if (buffer_new(bh))
-- 
Russell Cattelan [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Fix page lock/glock deadlock [32/70]

2006-12-01 Thread Russell Cattelan

On Thu, 2006-11-30 at 12:17 +, Steven Whitehouse wrote:
 From 2ca99501fa5422e84f18333918a503433449e2b5 Mon Sep 17 00:00:00 2001
 From: Steven Whitehouse [EMAIL PROTECTED]
 Date: Wed, 8 Nov 2006 10:26:54 -0500
 Subject: [PATCH] [GFS2] Fix page lock/glock deadlock
 
 This fixes a race between the glock and the page lock encountered
 during truncate in gfs2_readpage and gfs2_prepare_write. The gfs2_readpages
 function doesn't need the same fix since it only uses a try lock anyway, so
 it will fail back to gfs2_readpage in the case of a potential deadlock.
 
 This bug was spotted by Russell Cattelan.\
This bug was fixed correctly by Russell Cattelan
and this is an incomplete version of my patch.


As I keep trying to point out readpages is still wrong in that
the stuffed case is not correct and results in a panic
if a file is being truncated down.
If nr_pages is calculated prior a truncate the stuffed inode
case will try to operate on pages that are no longer there.


The correct fix is to not try and handled the stuffed inode 
case at all in readpages and simply return AOP_TRUNCATED_PAGE
and let things things fall back to readpage.


 
 Cc: Russell Cattelan [EMAIL PROTECTED]
 Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]
 ---
  fs/gfs2/glock.h   |1 -
  fs/gfs2/ops_address.c |   13 +
  2 files changed, 9 insertions(+), 5 deletions(-)
 
 diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
 index b985627..a331bf8 100644
 --- a/fs/gfs2/glock.h
 +++ b/fs/gfs2/glock.h
 @@ -27,7 +27,6 @@ #define GL_SKIP 0x0100
  #define GL_ATIME 0x0200
  #define GL_NOCACHE   0x0400
  #define GL_NOCANCEL  0x1000
 -#define GL_AOP   0x4000
  
  #define GLR_TRYFAILED13
  #define GLR_CANCELED 14
 diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c
 index 5c3962c..3822189 100644
 --- a/fs/gfs2/ops_address.c
 +++ b/fs/gfs2/ops_address.c
 @@ -230,7 +230,7 @@ static int gfs2_readpage(struct file *fi
   /* gfs2_sharewrite_nopage has grabbed the 
 ip-i_gl already */
   goto skip_lock;
   }
 - gfs2_holder_init(ip-i_gl, LM_ST_SHARED, GL_ATIME|GL_AOP, gh);
 + gfs2_holder_init(ip-i_gl, LM_ST_SHARED, 
 GL_ATIME|LM_FLAG_TRY_1CB, gh);
   do_unlock = 1;
   error = gfs2_glock_nq_m_atime(1, gh);
   if (unlikely(error))
 @@ -254,6 +254,8 @@ skip_lock:
  out:
   return error;
  out_unlock:
 + if (error == GLR_TRYFAILED)
 + error = AOP_TRUNCATED_PAGE;
   unlock_page(page);
   if (do_unlock)
   gfs2_holder_uninit(gh);
 @@ -293,7 +295,7 @@ static int gfs2_readpages(struct file *f
   goto skip_lock;
   }
   gfs2_holder_init(ip-i_gl, LM_ST_SHARED,
 -  LM_FLAG_TRY_1CB|GL_ATIME|GL_AOP, gh);
 +  LM_FLAG_TRY_1CB|GL_ATIME, gh);
   do_unlock = 1;
   ret = gfs2_glock_nq_m_atime(1, gh);
   if (ret == GLR_TRYFAILED)
 @@ -366,10 +368,13 @@ static int gfs2_prepare_write(struct fil
   unsigned int write_len = to - from;
  
 
 - gfs2_holder_init(ip-i_gl, LM_ST_EXCLUSIVE, GL_ATIME|GL_AOP, ip-i_gh);
 + gfs2_holder_init(ip-i_gl, LM_ST_EXCLUSIVE, GL_ATIME|LM_FLAG_TRY_1CB, 
 ip-i_gh);
   error = gfs2_glock_nq_m_atime(1, ip-i_gh);
 - if (error)
 + if (unlikely(error)) {
 + if (error == GLR_TRYFAILED)
 + error = AOP_TRUNCATED_PAGE;
   goto out_uninit;
 + }
  
   gfs2_write_calc_reserv(ip, write_len, data_blocks, ind_blocks);
  
-- 
Russell Cattelan [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GFS2] Change argument of gfs2_dinode_out [17/70]

2006-12-01 Thread Russell Cattelan

;
  
   gfs2_trans_add_bh(ip-i_gl, dibh, 1);
 - gfs2_dinode_out(ip-i_di, dibh-b_data);
 + gfs2_dinode_out(ip, dibh-b_data);
   brelse(dibh);
   mark_inode_dirty(ip-i_inode);
  
 @@ -792,7 +792,7 @@ static int link_dinode(struct gfs2_inode
   goto fail_end_trans;
   ip-i_di.di_nlink = 1;
   gfs2_trans_add_bh(ip-i_gl, dibh, 1);
 - gfs2_dinode_out(ip-i_di, dibh-b_data);
 + gfs2_dinode_out(ip, dibh-b_data);
   brelse(dibh);
   return 0;
  
 @@ -1349,7 +1349,7 @@ __gfs2_setattr_simple(struct gfs2_inode 
   gfs2_inode_attr_out(ip);
  
   gfs2_trans_add_bh(ip-i_gl, dibh, 1);
 - gfs2_dinode_out(ip-i_di, dibh-b_data);
 + gfs2_dinode_out(ip, dibh-b_data);
   brelse(dibh);
   }
   return error;
 diff --git a/fs/gfs2/ondisk.c b/fs/gfs2/ondisk.c
 index 2d1682d..2c50fa0 100644
 --- a/fs/gfs2/ondisk.c
 +++ b/fs/gfs2/ondisk.c
 @@ -15,6 +15,8 @@ #include linux/buffer_head.h
  
  #include gfs2.h
  #include linux/gfs2_ondisk.h
 +#include linux/lm_interface.h
 +#include incore.h
  
  #define pv(struct, member, fmt) printk(KERN_INFO   #member = fmt\n, \
  struct-member);
 @@ -187,8 +189,9 @@ void gfs2_dinode_in(struct gfs2_dinode_h
  
  }
  
 -void gfs2_dinode_out(const struct gfs2_dinode_host *di, void *buf)
 +void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf)
  {
 + const struct gfs2_dinode_host *di = ip-i_di;
   struct gfs2_dinode *str = buf;
  
   gfs2_meta_header_out(di-di_header, buf);
 diff --git a/fs/gfs2/ops_file.c b/fs/gfs2/ops_file.c
 index 2fc8868..7ea4175 100644
 --- a/fs/gfs2/ops_file.c
 +++ b/fs/gfs2/ops_file.c
 @@ -336,7 +336,7 @@ static int do_gfs2_set_flags(struct file
   goto out_trans_end;
   gfs2_trans_add_bh(ip-i_gl, bh, 1);
   ip-i_di.di_flags = new_flags;
 - gfs2_dinode_out(ip-i_di, bh-b_data);
 + gfs2_dinode_out(ip, bh-b_data);
   brelse(bh);
  out_trans_end:
   gfs2_trans_end(sdp);
 diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c
 index ef6e5ed..bd26885 100644
 --- a/fs/gfs2/ops_inode.c
 +++ b/fs/gfs2/ops_inode.c
 @@ -339,7 +339,7 @@ static int gfs2_symlink(struct inode *di
   error = gfs2_meta_inode_buffer(ip, dibh);
  
   if (!gfs2_assert_withdraw(sdp, !error)) {
 - gfs2_dinode_out(ip-i_di, dibh-b_data);
 + gfs2_dinode_out(ip, dibh-b_data);
   memcpy(dibh-b_data + sizeof(struct gfs2_dinode), symname,
  size);
   brelse(dibh);
 @@ -414,7 +414,7 @@ static int gfs2_mkdir(struct inode *dir,
   gfs2_inum_out(dip-i_num, dent-de_inum);
   dent-de_type = cpu_to_be16(DT_DIR);
  
 - gfs2_dinode_out(ip-i_di, di);
 + gfs2_dinode_out(ip, di);
  
   brelse(dibh);
   }
 @@ -541,7 +541,7 @@ static int gfs2_mknod(struct inode *dir,
   error = gfs2_meta_inode_buffer(ip, dibh);
  
   if (!gfs2_assert_withdraw(sdp, !error)) {
 - gfs2_dinode_out(ip-i_di, dibh-b_data);
 + gfs2_dinode_out(ip, dibh-b_data);
   brelse(dibh);
   }
  
 @@ -762,7 +762,7 @@ static int gfs2_rename(struct inode *odi
   goto out_end_trans;
   ip-i_di.di_ctime = get_seconds();
   gfs2_trans_add_bh(ip-i_gl, dibh, 1);
 - gfs2_dinode_out(ip-i_di, dibh-b_data);
 + gfs2_dinode_out(ip, dibh-b_data);
   brelse(dibh);
   }
  
 @@ -949,7 +949,7 @@ static int setattr_chown(struct inode *i
   gfs2_inode_attr_out(ip);
  
   gfs2_trans_add_bh(ip-i_gl, dibh, 1);
 - gfs2_dinode_out(ip-i_di, dibh-b_data);
 + gfs2_dinode_out(ip, dibh-b_data);
   brelse(dibh);
  
   if (ouid != NO_QUOTA_CHANGE || ogid != NO_QUOTA_CHANGE) {
 diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h
 index 10a507d..550effa 100644
 --- a/include/linux/gfs2_ondisk.h
 +++ b/include/linux/gfs2_ondisk.h
 @@ -535,7 +535,8 @@ extern void gfs2_rgrp_in(struct gfs2_rgr
  extern void gfs2_rgrp_out(const struct gfs2_rgrp_host *rg, void *buf);
  extern void gfs2_quota_in(struct gfs2_quota_host *qu, const void *buf);
  extern void gfs2_dinode_in(struct gfs2_dinode_host *di, const void *buf);
 -extern void gfs2_dinode_out(const struct gfs2_dinode_host *di, void *buf);
 +struct gfs2_inode;
 +extern void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf);
  extern void gfs2_ea_header_in(struct gfs2_ea_header *ea, const void *buf);
  extern void gfs2_ea_header_out(const struct gfs2_ea_header *ea, void *buf);
  extern void gfs2_log_header_in(struct gfs2_log_header_host *lh, const void 
 *buf);
-- 
Russell Cattelan [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part

Re: [GFS2] Move gfs2_meta_syncfs() into log.c [57/70]

2006-12-01 Thread Russell Cattelan

On Thu, 2006-11-30 at 12:22 +, Steven Whitehouse wrote:
 From a25311c8e0b7071b129ca9a9e49e22eeaf620864 Mon Sep 17 00:00:00 2001
 From: Steven Whitehouse [EMAIL PROTECTED]
 Date: Thu, 23 Nov 2006 11:06:35 -0500
 Subject: [PATCH] [GFS2] Move gfs2_meta_syncfs() into log.c
 
 By moving gfs2_meta_syncfs() into log.c, gfs2_ail1_start()
 can be made static.
While this is not a incorrect change as it stands.
This kind of pointless code curn is making it harder
to stabilize GFS2.

 
 Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]
 ---
  fs/gfs2/log.c |   21 -
  fs/gfs2/log.h |2 +-
  fs/gfs2/meta_io.c |   17 -
  fs/gfs2/meta_io.h |1 -
  4 files changed, 21 insertions(+), 20 deletions(-)
 
 diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
 index 6456fc3..7713d59 100644
 --- a/fs/gfs2/log.c
 +++ b/fs/gfs2/log.c
 @@ -15,6 +15,7 @@ #include linux/buffer_head.h
  #include linux/gfs2_ondisk.h
  #include linux/crc32.h
  #include linux/lm_interface.h
 +#include linux/delay.h
  
  #include gfs2.h
  #include incore.h
 @@ -142,7 +143,7 @@ static int gfs2_ail1_empty_one(struct gf
   return list_empty(ai-ai_ail1_list);
  }
  
 -void gfs2_ail1_start(struct gfs2_sbd *sdp, int flags)
 +static void gfs2_ail1_start(struct gfs2_sbd *sdp, int flags)
  {
   struct list_head *head = sdp-sd_ail1_list;
   u64 sync_gen;
 @@ -689,3 +690,21 @@ void gfs2_log_shutdown(struct gfs2_sbd *
   up_write(sdp-sd_log_flush_lock);
  }
  
 +
 +/**
 + * gfs2_meta_syncfs - sync all the buffers in a filesystem
 + * @sdp: the filesystem
 + *
 + */
 +
 +void gfs2_meta_syncfs(struct gfs2_sbd *sdp)
 +{
 + gfs2_log_flush(sdp, NULL);
 + for (;;) {
 + gfs2_ail1_start(sdp, DIO_ALL);
 + if (gfs2_ail1_empty(sdp, DIO_ALL))
 + break;
 + msleep(10);
 + }
 +}
 +
 diff --git a/fs/gfs2/log.h b/fs/gfs2/log.h
 index 7f5737d..8e7aa0f 100644
 --- a/fs/gfs2/log.h
 +++ b/fs/gfs2/log.h
 @@ -48,7 +48,6 @@ static inline void gfs2_log_pointers_ini
  unsigned int gfs2_struct2blk(struct gfs2_sbd *sdp, unsigned int nstruct,
   unsigned int ssize);
  
 -void gfs2_ail1_start(struct gfs2_sbd *sdp, int flags);
  int gfs2_ail1_empty(struct gfs2_sbd *sdp, int flags);
  
  int gfs2_log_reserve(struct gfs2_sbd *sdp, unsigned int blks);
 @@ -61,5 +60,6 @@ void gfs2_log_flush(struct gfs2_sbd *sdp
  void gfs2_log_commit(struct gfs2_sbd *sdp, struct gfs2_trans *trans);
  
  void gfs2_log_shutdown(struct gfs2_sbd *sdp);
 +void gfs2_meta_syncfs(struct gfs2_sbd *sdp);
  
  #endif /* __LOG_DOT_H__ */
 diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
 index 939a09f..fbeba81 100644
 --- a/fs/gfs2/meta_io.c
 +++ b/fs/gfs2/meta_io.c
 @@ -574,20 +574,3 @@ out:
   return first_bh;
  }
  
 -/**
 - * gfs2_meta_syncfs - sync all the buffers in a filesystem
 - * @sdp: the filesystem
 - *
 - */
 -
 -void gfs2_meta_syncfs(struct gfs2_sbd *sdp)
 -{
 - gfs2_log_flush(sdp, NULL);
 - for (;;) {
 - gfs2_ail1_start(sdp, DIO_ALL);
 - if (gfs2_ail1_empty(sdp, DIO_ALL))
 - break;
 - msleep(10);
 - }
 -}
 -
 diff --git a/fs/gfs2/meta_io.h b/fs/gfs2/meta_io.h
 index 3ec939e..e037425 100644
 --- a/fs/gfs2/meta_io.h
 +++ b/fs/gfs2/meta_io.h
 @@ -67,7 +67,6 @@ static inline int gfs2_meta_inode_buffer
  }
  
  struct buffer_head *gfs2_meta_ra(struct gfs2_glock *gl, u64 dblock, u32 
 extlen);
 -void gfs2_meta_syncfs(struct gfs2_sbd *sdp);
  
  #define buffer_busy(bh) \
  ((bh)-b_state  ((1ul  BH_Dirty) | (1ul  BH_Lock) | (1ul  BH_Pinned)))
-- 
Russell Cattelan [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part

Re: [Cluster-devel] Re: [GFS2] Change argument of gfs2_dinode_out [17/70]

2006-12-01 Thread Russell Cattelan

On Fri, 2006-12-01 at 19:25 +, Al Viro wrote:
 On Fri, Dec 01, 2006 at 01:19:04PM -0600, Russell Cattelan wrote:
  On Thu, 2006-11-30 at 12:15 +, Steven Whitehouse wrote:
   From 539e5d6b7ae8612c0393fe940d2da5b591318d3d Mon Sep 17 00:00:00 2001
   From: Steven Whitehouse [EMAIL PROTECTED]
   Date: Tue, 31 Oct 2006 15:07:05 -0500
   Subject: [PATCH] [GFS2] Change argument of gfs2_dinode_out

   Everywhere this was called, a struct gfs2_inode was available,
   but despite that, it was always called with a struct gfs2_dinode
   as an argument. By making this change it paves the way to start
   eliminating fields duplicated between the kernel's struct inode
   and the struct gfs2_dinode.
  More pointless code churn.

  This only makes sense once the file system is working 
  and we have time to do this type of cleanup on against
  a stable and TESTED code base.

 Bzzert.  Cleaner code is easier to _get_ stable.  Keep it ucking fugly
 until everyone stops looking at it out of sheer disgust is a bad idea.'
code clean up are not without risk and with no regression test suite to
verify
that a cleanup has not broken something. Cleanups are very much a
hindrance to stabilization. With no know working points in a code
history it becomes difficult
to bisect changes and figure out when bugs were introduced
Especially when cleanups are mixed in with bug fixes.

Pretty code does not equal correct code.

-- 
Russell Cattelan [EMAIL PROTECTED]

signature.asc
Description: This is a digitally signed message part

Re: [Cluster-devel] Re: [GFS2] Change argument of gfs2_dinode_out [17/70]

2006-12-01 Thread Russell Cattelan

On Fri, 2006-12-01 at 21:08 +, Al Viro wrote:
 On Fri, Dec 01, 2006 at 02:52:11PM -0600, Russell Cattelan wrote:
  code clean up are not without risk and with no regression test suite to
  verify
  that a cleanup has not broken something. Cleanups are very much a
  hindrance to stabilization. With no know working points in a code
  history it becomes difficult
  to bisect changes and figure out when bugs were introduced
  Especially when cleanups are mixed in with bug fixes.
  
  Pretty code does not equal correct code.
 
 No, but convoluted and unreadable code ends up being crappier due
 to lack of review.  And that's aside of the memory footprint,
 likeliness of bugs introduced by code modifications (having in-core
 and on-disk data structures with different contents and the same C
 type = trouble that won't be caught by compiler), etc.

Nothing makes up for the complete lack of GFS2 testing.
reviewed code does not equal correct code either.

Honestly tell me what test suite do you run on GFS2?

Sure is it possible to make an educated guess that some
cleanups will not destabilize the code. Indeed the stuff
you have done is quite useful to ensure that endian bugs are
being caught by the compiler/sparse.
But no amount of it shouldn't break anything assertions
can replace testing.


But there is a large quantity of the 70 or so patches that were
sent out were to enable future cleanup's. Putting in partial cleanups
do nothing core code readability and I many cases is more confusing.
Unless you meticulously keep up with the partial cleanups looking
at the code is now a jumbled mess of inconsistencies.

gfs2 is supposed to be stabilized and use-able for the up coming rhel5
release, not pretty up for somebody to print out and hang on their wall.


-- 
Russell Cattelan [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part

Re: [xfs-masters] Re: [PATCH] pull XFS support out of Kconfig submenu

2005-08-18 Thread Russell Cattelan

On Thu, 2005-08-18 at 06:53 -0700, Chris Wedgwood wrote:
> On Wed, Aug 17, 2005 at 10:45:48PM +0200, Jesper Juhl wrote:
> 
> > It seems slightly odd to me that XFS support should be in a separate
> > submenu, when all the other filesystems are not using submenus but
> > are directly selectable from the Filesystems menu.
> 
> XFS also has an out-of-tree version.  Making it a submenu is probably
> to make maintenance easier (ie. replace files, not merge).
> 
That is why the Kconfig options for xfs moved from fs/Kconfig to
fs/xfs/Kconfig but using submenu was simply a convince thing 
to group all the XFS options together.

If the submenu is really causing people distress go ahead and 
remove it. Since it's a cosmetic change it's not going to impact
anything.

-Russell

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [xfs-masters] Re: [PATCH] pull XFS support out of Kconfig submenu

2005-08-18 Thread Russell Cattelan

On Thu, 2005-08-18 at 06:53 -0700, Chris Wedgwood wrote:
 On Wed, Aug 17, 2005 at 10:45:48PM +0200, Jesper Juhl wrote:
 
  It seems slightly odd to me that XFS support should be in a separate
  submenu, when all the other filesystems are not using submenus but
  are directly selectable from the Filesystems menu.
 
 XFS also has an out-of-tree version.  Making it a submenu is probably
 to make maintenance easier (ie. replace files, not merge).
 
That is why the Kconfig options for xfs moved from fs/Kconfig to
fs/xfs/Kconfig but using submenu was simply a convince thing 
to group all the XFS options together.

If the submenu is really causing people distress go ahead and 
remove it. Since it's a cosmetic change it's not going to impact
anything.

-Russell

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel merge issues

2001-04-19 Thread Russell Cattelan

james rich wrote:

> Hi folks,
> I'm sure many here have read the discussion on lkml about lvm and
> the problems that team is having.  As part of that discussion it was said:

I'm not entirely familiar with the issues surrounding lvm development, I know
things
are not in good shape right now.

But I would say the following statement is unrealistic.
Trying to manage a large software project without source code control is
an even bigger nightmare than with source code control.
patch is NOT a source code control tool.
In fact a good source code control system would allow patch incremental
generation,
This is actually one glaring limitation of CVS, the ability group a set of
changes
together, aka "mod"
(Bitkeeper has addressed a lot of these issues; it can generate or accept
incremental
patch sets)

The lvm problems seems to be more of wetware issue than a source code
control tool issue.

Hopefully XFS will be able to keep ahead the problem of dramatically
diverging
code bases by staying active with Linus's releases.

> Dan Kegel <[EMAIL PROTECTED]>:
> >I know very little about LVM, but from watching earlier projects
> >in the same situation you're in now, the path you need to follow
> >seems clear:
> >   Stop using CVS internally for development.
> >   It makes checking in changes without submitting them to
> >   Linus too easy.
>
> >To get sync'd back up, *start with the standard kernel*,
> >and start generating clean, human-understandable patches one
> >at a time that bring it up to where you want.
>
> I am wondering how the XFS team plans on avoiding the same problems once
> XFS becomes part of the kernel.  Is there potential for problems with SGI
> "losing control" over the source or direction of XFS once Linus puts it in
> his tree?
>
> How does the above comment relate to the XFS team's plans on patches to
> XFS and related areas once XFS is in?
>
> Just want to get these issues into the air before rancor and ill will
> spread...
>
> P.S. XFS has been extremely solid and has saved me a lot of time waiting
> for fscks.  I am really impressed by the professionalism of the XFS team.
> Hopefully I can contribute soon - working on a slackware boot/modules/root
> disk set for XFS.
>
> James Rich
> [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel merge issues

2001-04-19 Thread Russell Cattelan


james rich wrote:

 Hi folks,
 I'm sure many here have read the discussion on lkml about lvm and
 the problems that team is having.  As part of that discussion it was said:

I'm not entirely familiar with the issues surrounding lvm development, I know
things
are not in good shape right now.

But I would say the following statement is unrealistic.
Trying to manage a large software project without source code control is
an even bigger nightmare than with source code control.
patch is NOT a source code control tool.
In fact a good source code control system would allow patch incremental
generation,
This is actually one glaring limitation of CVS, the ability group a set of
changes
together, aka "mod"
(Bitkeeper has addressed a lot of these issues; it can generate or accept
incremental
patch sets)

The lvm problems seems to be more of wetware issue than a source code
control tool issue.

Hopefully XFS will be able to keep ahead the problem of dramatically
diverging
code bases by staying active with Linus's releases.

 Dan Kegel [EMAIL PROTECTED]:
 I know very little about LVM, but from watching earlier projects
 in the same situation you're in now, the path you need to follow
 seems clear:
Stop using CVS internally for development.
It makes checking in changes without submitting them to
Linus too easy.

 To get sync'd back up, *start with the standard kernel*,
 and start generating clean, human-understandable patches one
 at a time that bring it up to where you want.

 I am wondering how the XFS team plans on avoiding the same problems once
 XFS becomes part of the kernel.  Is there potential for problems with SGI
 "losing control" over the source or direction of XFS once Linus puts it in
 his tree?

 How does the above comment relate to the XFS team's plans on patches to
 XFS and related areas once XFS is in?

 Just want to get these issues into the air before rancor and ill will
 spread...

 P.S. XFS has been extremely solid and has saved me a lot of time waiting
 for fscks.  I am really impressed by the professionalism of the XFS team.
 Hopefully I can contribute soon - working on a slackware boot/modules/root
 disk set for XFS.

 James Rich
 [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Russell Cattelan


"Stephen C. Tweedie" wrote:

> Hi,
>
> On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote:
> > On Thu, 14 Dec 2000, Linus Torvalds wrote:
> >
> > Just one: any fs that really cares about completion callback is very likely
> > to be picky about the requests ordering. So sync_buffers() is very unlikely
> > to be useful anyway.
> >
> > In that sense we really don't have anonymous buffers here. I seriously
> > suspect that "unrealistic" assumption is not unrealistic at all. I'm
> > not sufficiently familiar with XFS code to say for sure, but...
>
> Right.  ext3 and reiserfs just want to submit their own IOs when it
> comes to the journal.  (At least in ext3, already-journaled buffers
> can be written back by the VM freely.)  It's a matter of telling the
> fs when that should start.
>
> > What we really need is a way for VFS/VM to pass the pressure on filesystem.
> > That's it. If fs wants unusual completions for requests - let it have its
> > own queueing mechanism and submit these requests when it finds that convenient.
>
> There is a very clean way of doing this with address spaces.  It's
> something I would like to see done properly for 2.5: eliminate all
> knowledge of buffer_heads from the VM layer.  It would be pretty
> simple to remove page->buffers completely and replace it with a
> page->private pointer, owned by whatever address_space controlled the
> page.  Instead of trying to unmap and flush buffers on the page
> directly, these operations would become address_space operations.

Yes this is a lot of what page buf would like to do eventually.
Have the VM system pressure page_buf for pages which would
then be able to intelligently call the file system to free up cached pages.
A big part of getting Delay Alloc to not completely consume all the
system pages, is being told when it's time to start really allocating disk
space and push pages out.


>
>
> We could still provide the standard try_to_free_buffers() and
> unmap_underlying_metadata() functions to operate on the per-page
> buffer_head lists, and existing filesystems would only have to point
> their address_space "private metadata" operations at the generic
> functions.  However, a filesystem which had additional ordering
> constraints could then intercept the flush or writeback calls from the
> VM and decide on its own how best to honour the VM pressure.
>
> This even works both for hashed and unhashed anonymous buffers, *if*
> you allow the filesystem to attach all of its hashed buffers buffers
> to an address_space of its own rather than having the buffer cache
> pool all such buffers together.
>
> --Stephen

--
Russell Cattelan
[EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Russell Cattelan


Chris Mason wrote:

> On Fri, 15 Dec 2000, Alexander Viro wrote:
>
> > Just one: any fs that really cares about completion callback is very likely
> > to be picky about the requests ordering. So sync_buffers() is very unlikely
> > to be useful anyway.
> >
> Somewhat.  I guess there are at least two ways to do it.  First flush the
> buffers where ordering matters (log blocks), then send the others onto the
> dirty list (general metadata).  You might have your own end_io for those, and
> sync_buffers would lose it.
>
> Second way (reiserfs recently changed to this method) is to do all the
> flushing yourself, and remove the need for an end_io call back.
>
I'm curious about this.
Does the mean reiserFS is doing all of it's own buffer management?

This would seem a little redundant with what is already in the kernel?

>
>
> > In that sense we really don't have anonymous buffers here. I seriously
> > suspect that "unrealistic" assumption is not unrealistic at all. I'm
> > not sufficiently familiar with XFS code to say for sure, but...
> >
> > What we really need is a way for VFS/VM to pass the pressure on filesystem.
> > That's it. If fs wants unusual completions for requests - let it have its
> > own queueing mechanism and submit these requests when it finds that convenient.
> >
> Yes, this is exactly what we've discussed.
>
> -chris

--
Russell Cattelan
[EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Russell Cattelan

Alexander Viro wrote:

> On Thu, 14 Dec 2000, Linus Torvalds wrote:
>
> > Good point.
> >
> > This actually looks fairly nasty to fix. The obvious fix would be to not
> > put such buffers on the dirty list at all, and instead rely on the VM
> > layer calling "writepage()" when it wants to push out the pages.
> > That would be the nice behaviour from a VM standpoint.
> >
> > However, that assumes that you don't have any "anonymous" buffers, which
> > is probably an unrealistic assumption.
> >
> > The problem is that we don't have any per-buffer "writebuffer()" function,
> > the way we have them per-page. It was never needed for any of the normal
> > filesystems, and XFS just happened to be able to take advantage of the
> > b_end_io behaviour.
> >
> > Suggestions welcome.
>
> Just one: any fs that really cares about completion callback is very likely
> to be picky about the requests ordering. So sync_buffers() is very unlikely
> to be useful anyway.

Actually no,  that's not the issue.

The XFS log uses a LSN (Log Sequence Number) to keep track of log write ordering.
Sync IO on each log buffer isn't realistic; the performance hit would be to great.

I wasn't around when  most of XFS was developed, but  from I what I understand it
was discovered early on that firing off writes in a particular order doesn't
guarantee that
they will finish in that order.  Thus the implantation of a sequence number for
each log write.

One of the obstacles we ran into early on in the linux port was the fact that
linux used fixed size IO requests to any given device.
But most of XFS's meta data structures vary in size in multiples of 512 bytes.

We were also implementing a page caching / clustering layer called
page_buf which understands  primarily  pages and not necessary
disk blocks. If your FS block size happens to match your page size then things
are good,  but it doesn't
So we added a  bit map field to the pages structure.
Each bit then represents one BASIC BLOCK eg 512 for all practical purposes

The end_io functions XFS defines updates the correct bit or the whole bit array
if the whole page is valid, thus signaling the rest of the page_buf that the io
has
completed.

Ok there is a lot more to it than what I've just described but you probably get
the idea.

>
>
> In that sense we really don't have anonymous buffers here. I seriously
> suspect that "unrealistic" assumption is not unrealistic at all. I'm
> not sufficiently familiar with XFS code to say for sure, but...
>
> What we really need is a way for VFS/VM to pass the pressure on filesystem.
> That's it. If fs wants unusual completions for requests - let it have its
> own queueing mechanism and submit these requests when it finds that convenient.
>
> Stephen, you probably already thought about that area. Could you comment on
> that?
> Cheers,
> Al

--
Russell Cattelan
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Russell Cattelan

Linus Torvalds wrote:

> On Thu, 14 Dec 2000, Russell Cattelan wrote:
> >
> > Ok one more wrinkle.
> > sync_buffers calls ll_rw_block, this is going to have the same problem as
> > calling ll_rw_block directly.
>
> Good point.
>
> This actually looks fairly nasty to fix. The obvious fix would be to not
> put such buffers on the dirty list at all, and instead rely on the VM
> layer calling "writepage()" when it wants to push out the pages.
> That would be the nice behaviour from a VM standpoint.
>
> However, that assumes that you don't have any "anonymous" buffers, which
> is probably an unrealistic assumption.
>
> The problem is that we don't have any per-buffer "writebuffer()" function,
> the way we have them per-page. It was never needed for any of the normal
> filesystems, and XFS just happened to be able to take advantage of the
> b_end_io behaviour.
>
> Suggestions welcome.
>
> Linus

Ok after a bit of trial and error I do have something working.
I wouldn't call it the most elegant solution but it does work
and it isn't very intrusive.

#define BH_End_io  7/* End io function defined don't remap it */

/*  don't change the callback if somebody explicitly set it */

if(!test_bit(BH_End_io, >b_state)){
  bh->b_end_io = end_buffer_io_sync;
}
What I've done is in the XFS set buffer_head setup functions is
set the initial value of b_state to BH_Locked  and BH_End_io
set the callback function and the rest of the relevant fields and then unlock
the
buffer.

The only other quick fix that comes to mind is to change sync_buffers to use
submit_bh rather than ll_rw_block.

--
Russell Cattelan
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Russell Cattelan


Linus Torvalds wrote:

 On Thu, 14 Dec 2000, Russell Cattelan wrote:
 
  Ok one more wrinkle.
  sync_buffers calls ll_rw_block, this is going to have the same problem as
  calling ll_rw_block directly.

 Good point.

 This actually looks fairly nasty to fix. The obvious fix would be to not
 put such buffers on the dirty list at all, and instead rely on the VM
 layer calling "writepage()" when it wants to push out the pages.
 That would be the nice behaviour from a VM standpoint.

 However, that assumes that you don't have any "anonymous" buffers, which
 is probably an unrealistic assumption.

 The problem is that we don't have any per-buffer "writebuffer()" function,
 the way we have them per-page. It was never needed for any of the normal
 filesystems, and XFS just happened to be able to take advantage of the
 b_end_io behaviour.

 Suggestions welcome.

 Linus

Ok after a bit of trial and error I do have something working.
I wouldn't call it the most elegant solution but it does work
and it isn't very intrusive.

#define BH_End_io  7/* End io function defined don't remap it */

/*  don't change the callback if somebody explicitly set it */

if(!test_bit(BH_End_io, bh-b_state)){
  bh-b_end_io = end_buffer_io_sync;
}
What I've done is in the XFS set buffer_head setup functions is
set the initial value of b_state to BH_Locked  and BH_End_io
set the callback function and the rest of the relevant fields and then unlock
the
buffer.

The only other quick fix that comes to mind is to change sync_buffers to use
submit_bh rather than ll_rw_block.

--
Russell Cattelan
[EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Russell Cattelan


Alexander Viro wrote:

 On Thu, 14 Dec 2000, Linus Torvalds wrote:

  Good point.
 
  This actually looks fairly nasty to fix. The obvious fix would be to not
  put such buffers on the dirty list at all, and instead rely on the VM
  layer calling "writepage()" when it wants to push out the pages.
  That would be the nice behaviour from a VM standpoint.
 
  However, that assumes that you don't have any "anonymous" buffers, which
  is probably an unrealistic assumption.
 
  The problem is that we don't have any per-buffer "writebuffer()" function,
  the way we have them per-page. It was never needed for any of the normal
  filesystems, and XFS just happened to be able to take advantage of the
  b_end_io behaviour.
 
  Suggestions welcome.

 Just one: any fs that really cares about completion callback is very likely
 to be picky about the requests ordering. So sync_buffers() is very unlikely
 to be useful anyway.

Actually no,  that's not the issue.

The XFS log uses a LSN (Log Sequence Number) to keep track of log write ordering.
Sync IO on each log buffer isn't realistic; the performance hit would be to great.

I wasn't around when  most of XFS was developed, but  from I what I understand it
was discovered early on that firing off writes in a particular order doesn't
guarantee that
they will finish in that order.  Thus the implantation of a sequence number for
each log write.


One of the obstacles we ran into early on in the linux port was the fact that
linux used fixed size IO requests to any given device.
But most of XFS's meta data structures vary in size in multiples of 512 bytes.

We were also implementing a page caching / clustering layer called
page_buf which understands  primarily  pages and not necessary
disk blocks. If your FS block size happens to match your page size then things
are good,  but it doesn't
So we added a  bit map field to the pages structure.
Each bit then represents one BASIC BLOCK eg 512 for all practical purposes

The end_io functions XFS defines updates the correct bit or the whole bit array
if the whole page is valid, thus signaling the rest of the page_buf that the io
has
completed.

Ok there is a lot more to it than what I've just described but you probably get
the idea.




 In that sense we really don't have anonymous buffers here. I seriously
 suspect that "unrealistic" assumption is not unrealistic at all. I'm
 not sufficiently familiar with XFS code to say for sure, but...

 What we really need is a way for VFS/VM to pass the pressure on filesystem.
 That's it. If fs wants unusual completions for requests - let it have its
 own queueing mechanism and submit these requests when it finds that convenient.

 Stephen, you probably already thought about that area. Could you comment on
 that?
 Cheers,
     Al

--
Russell Cattelan
[EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Russell Cattelan


Chris Mason wrote:

 On Fri, 15 Dec 2000, Alexander Viro wrote:

  Just one: any fs that really cares about completion callback is very likely
  to be picky about the requests ordering. So sync_buffers() is very unlikely
  to be useful anyway.
 
 Somewhat.  I guess there are at least two ways to do it.  First flush the
 buffers where ordering matters (log blocks), then send the others onto the
 dirty list (general metadata).  You might have your own end_io for those, and
 sync_buffers would lose it.

 Second way (reiserfs recently changed to this method) is to do all the
 flushing yourself, and remove the need for an end_io call back.

I'm curious about this.
Does the mean reiserFS is doing all of it's own buffer management?

This would seem a little redundant with what is already in the kernel?



  In that sense we really don't have anonymous buffers here. I seriously
  suspect that "unrealistic" assumption is not unrealistic at all. I'm
  not sufficiently familiar with XFS code to say for sure, but...
 
  What we really need is a way for VFS/VM to pass the pressure on filesystem.
  That's it. If fs wants unusual completions for requests - let it have its
  own queueing mechanism and submit these requests when it finds that convenient.
 
 Yes, this is exactly what we've discussed.

 -chris

--
Russell Cattelan
[EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Russell Cattelan


"Stephen C. Tweedie" wrote:

 Hi,

 On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote:
  On Thu, 14 Dec 2000, Linus Torvalds wrote:
 
  Just one: any fs that really cares about completion callback is very likely
  to be picky about the requests ordering. So sync_buffers() is very unlikely
  to be useful anyway.
 
  In that sense we really don't have anonymous buffers here. I seriously
  suspect that "unrealistic" assumption is not unrealistic at all. I'm
  not sufficiently familiar with XFS code to say for sure, but...

 Right.  ext3 and reiserfs just want to submit their own IOs when it
 comes to the journal.  (At least in ext3, already-journaled buffers
 can be written back by the VM freely.)  It's a matter of telling the
 fs when that should start.

  What we really need is a way for VFS/VM to pass the pressure on filesystem.
  That's it. If fs wants unusual completions for requests - let it have its
  own queueing mechanism and submit these requests when it finds that convenient.

 There is a very clean way of doing this with address spaces.  It's
 something I would like to see done properly for 2.5: eliminate all
 knowledge of buffer_heads from the VM layer.  It would be pretty
 simple to remove page-buffers completely and replace it with a
 page-private pointer, owned by whatever address_space controlled the
 page.  Instead of trying to unmap and flush buffers on the page
 directly, these operations would become address_space operations.

Yes this is a lot of what page buf would like to do eventually.
Have the VM system pressure page_buf for pages which would
then be able to intelligently call the file system to free up cached pages.
A big part of getting Delay Alloc to not completely consume all the
system pages, is being told when it's time to start really allocating disk
space and push pages out.




 We could still provide the standard try_to_free_buffers() and
 unmap_underlying_metadata() functions to operate on the per-page
 buffer_head lists, and existing filesystems would only have to point
 their address_space "private metadata" operations at the generic
 functions.  However, a filesystem which had additional ordering
 constraints could then intercept the flush or writeback calls from the
 VM and decide on its own how best to honour the VM pressure.

 This even works both for hashed and unhashed anonymous buffers, *if*
 you allow the filesystem to attach all of its hashed buffers buffers
 to an address_space of its own rather than having the buffer cache
 pool all such buffers together.

 --Stephen

--
Russell Cattelan
[EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Test12 ll_rw_block error.

2000-12-14 Thread Russell Cattelan


This would seem to be an error on the part of ll_rw_block.
Setting b_end_io to a default handler without checking to see
a callback has already been defined defeats the purpose of having
a function op.

void ll_rw_block(int rw, int nr, struct buffer_head * bhs[])
 {
@@ -928,7 +1046,8 @@
if (test_and_set_bit(BH_Lock, >b_state))
continue;

-   set_bit(BH_Req, >b_state);
+   /* We have the buffer lock */
+   bh->b_end_io = end_buffer_io_sync;

switch(rw) {
case WRITE:


-Russell Cattelan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Test12 ll_rw_block error.

2000-12-14 Thread Russell Cattelan


This would seem to be an error on the part of ll_rw_block.
Setting b_end_io to a default handler without checking to see
a callback has already been defined defeats the purpose of having
a function op.

void ll_rw_block(int rw, int nr, struct buffer_head * bhs[])
 {
@@ -928,7 +1046,8 @@
if (test_and_set_bit(BH_Lock, bh-b_state))
continue;

-   set_bit(BH_Req, bh-b_state);
+   /* We have the buffer lock */
+   bh-b_end_io = end_buffer_io_sync;

switch(rw) {
case WRITE:


-Russell Cattelan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] livelock in elevator scheduling

2000-12-05 Thread Russell Cattelan


Jens Axboe wrote:

> On Mon, Dec 04 2000, Russell Cattelan wrote:
> > I'm going to take a closer look at the scsi_back_merge_fn.
> > This may  have more to due with our/Chait's kiobuf modifications than
> > anything else.
> >
> >
> >
> > XFS (dev: 8/20) mounting with KIOBUFIO
> > Start mounting filesystem: sd(8,20)
> > Ending clean XFS mount for filesystem: sd(8,20)
> > kmem_alloc doing a vmalloc 262144 size & PAGE_SIZE 0 rval=0xe0a1
> > Unable to handle kernel NULL pointer dereference at virtual address
> > 0008
> >  printing eip:
> > c019f8b5
> > *pde = 
> >
> > Entering kdb (current=0xc191, pid 5) on processor 1 Panic: Oops
> > due to panic @ 0xc019f8b5
> > eax = 0x0002 ebx = 0x0001 ecx = 0x00081478 edx = 0x
> > esi = 0xc1957da0 edi = 0xc1923ac8 esp = 0xc1911e94 eip = 0xc019f8b5
> > ebp = 0xc1911e9c xss = 0x0018 xcs = 0x0010 eflags = 0x00010046
> > xds = 0x0018 xes = 0x0018 origeax = 0x  = 0xc1911e60
> > [1]kdb> bt
> > EBP   EIP Function(args)
> > 0xc1911e9c 0xc019f8b5 scsi_back_merge_fn_c+0x15 (0xc1923a98,
> > 0xc1957da0, 0xcfb05780, 0x80)
> >kernel .text 0xc010 0xc019f8a0
>
> Ah, I see what it is now. The elevator is attempting to merge a buffer
> head into a kio based request, poof. The attached diff should take
> care of that in your tree.

Hmm..  Yup... that is actually the mods made for kio in our base XFS tree.
I wonder why the patch dropped them?
I should have caught that.

Thanks.
I'll let you know how things go.


>
>
> --
> * Jens Axboe <[EMAIL PROTECTED]>
> * SuSE Labs
>
>   
>
>xfs-elv-1Name: xfs-elv-1
> Type: Plain Text (text/plain)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] livelock in elevator scheduling

2000-12-05 Thread Russell Cattelan


Jens Axboe wrote:

 On Mon, Dec 04 2000, Russell Cattelan wrote:
  I'm going to take a closer look at the scsi_back_merge_fn.
  This may  have more to due with our/Chait's kiobuf modifications than
  anything else.
 
 
 
  XFS (dev: 8/20) mounting with KIOBUFIO
  Start mounting filesystem: sd(8,20)
  Ending clean XFS mount for filesystem: sd(8,20)
  kmem_alloc doing a vmalloc 262144 size  PAGE_SIZE 0 rval=0xe0a1
  Unable to handle kernel NULL pointer dereference at virtual address
  0008
   printing eip:
  c019f8b5
  *pde = 
 
  Entering kdb (current=0xc191, pid 5) on processor 1 Panic: Oops
  due to panic @ 0xc019f8b5
  eax = 0x0002 ebx = 0x0001 ecx = 0x00081478 edx = 0x
  esi = 0xc1957da0 edi = 0xc1923ac8 esp = 0xc1911e94 eip = 0xc019f8b5
  ebp = 0xc1911e9c xss = 0x0018 xcs = 0x0010 eflags = 0x00010046
  xds = 0x0018 xes = 0x0018 origeax = 0x regs = 0xc1911e60
  [1]kdb bt
  EBP   EIP Function(args)
  0xc1911e9c 0xc019f8b5 scsi_back_merge_fn_c+0x15 (0xc1923a98,
  0xc1957da0, 0xcfb05780, 0x80)
 kernel .text 0xc010 0xc019f8a0

 Ah, I see what it is now. The elevator is attempting to merge a buffer
 head into a kio based request, poof. The attached diff should take
 care of that in your tree.

Hmm..  Yup... that is actually the mods made for kio in our base XFS tree.
I wonder why the patch dropped them?
I should have caught that.

Thanks.
I'll let you know how things go.




 --
 * Jens Axboe [EMAIL PROTECTED]
 * SuSE Labs

   

xfs-elv-1Name: xfs-elv-1
 Type: Plain Text (text/plain)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] livelock in elevator scheduling

2000-12-04 Thread Russell Cattelan


Jens Axboe wrote:

> On Fri, Dec 01 2000, Russell Cattelan wrote:
> > > If performance is down, then that problem is most likely elsewhere.
> > > I/O limited benchmarking typically thrives on lots of request
> > > latency -- with that comes better throughput for individual threads.
> > >
> > > > Anyway, I'll try your patch.
> >
> > Well this patch does help with the request starvation problem.
> > Unfortunately it has introduced another problem.
> > Running 4 doio programs, on and XFS partion with KIO buf IO turned on.
>
> This looks like a generic aic7xxx problem, and not block related. Since
> you are doing such nice traces, what is the other CPU doing? CPU1
> seems to be stuck grabbing the io_request_lock (for reasons not entirely
> clear from reading the aic7xxx source...)
>

Ok Keith gave me a quick hack to help with the race condition.

Here is the latest set up back traces...
The actually accuracy of these back traces... well?  who knows, but it does
give us something to go on.
It doesn't make much sense to me right now, but I'm guessing the problem is
starting with that do_div error.

I'm going to take a closer look at the scsi_back_merge_fn.
This may  have more to due with our/Chait's kiobuf modifications than
anything else.



XFS (dev: 8/20) mounting with KIOBUFIO
Start mounting filesystem: sd(8,20)
Ending clean XFS mount for filesystem: sd(8,20)
kmem_alloc doing a vmalloc 262144 size & PAGE_SIZE 0 rval=0xe0a1
Unable to handle kernel NULL pointer dereference at virtual address
0008
 printing eip:
c019f8b5
*pde = 

Entering kdb (current=0xc191, pid 5) on processor 1 Panic: Oops
due to panic @ 0xc019f8b5
eax = 0x0002 ebx = 0x0001 ecx = 0x00081478 edx = 0x
esi = 0xc1957da0 edi = 0xc1923ac8 esp = 0xc1911e94 eip = 0xc019f8b5
ebp = 0xc1911e9c xss = 0x0018 xcs = 0x0010 eflags = 0x00010046
xds = 0x0018 xes = 0x0018 origeax = 0x  = 0xc1911e60
[1]kdb> bt
EBP   EIP Function(args)
0xc1911e9c 0xc019f8b5 scsi_back_merge_fn_c+0x15 (0xc1923a98,
0xc1957da0, 0xcfb05780, 0x80)
   kernel .text 0xc010 0xc019f8a0
0xc019f98c
0xc1911f2c 0xc016a0df __make_request+0x1af (0xc1923a98, 0x1,
0xcfb05780, 0x0, 0x814)
   kernel .text 0xc010 0xc0169f30
0xc016a8a4
0xc1911f70 0xc016a9c8 generic_make_request+0x124 (0x1, 0xcfb05780,
0x0, 0x0, 0x0)
   kernel .text 0xc010 0xc016a8a4
0xc016aa50
0xc1911fac 0xc016abde ll_rw_block+0x18e (0x1, 0x1, 0xc1911fd0, 0x0)

   kernel .text 0xc010 0xc016aa50
0xc016ac58
0xc1911fd4 0xc0138ed7 flush_dirty_buffers+0x97 (0x0, 0x10f00)
   kernel .text 0xc010 0xc0138e40
0xc0138f24
0xc1911fec 0xc01391ab bdflush+0x8f
   kernel .text 0xc010 0xc013911c
0xc0139260
   0xc0108c9b kernel_thread+0x23
   kernel .text 0xc010 0xc0108c78
0xc0108cb0
[1]kdb> go
Oops: 
CPU:1
EIP:0010:[]
EFLAGS: 00010046
eax: 0002   ebx: 0001   ecx: 00081478   edx: 
esi: c1957da0   edi: c1923ac8   ebp: c1911e9c   esp: c1911e94
ds: 0018   es: 0018   ss: 0018
Process kflushd (pid: 5, stackpage=c1911000)
Stack: cfb05780 c1923a98 c1911f2c c016a0df c1923a98 c1957da0 cfb05780
0080
   0814 00081478 cfb05780 0008 0001 0200 
c1923ac0
    000e c191 c1911efc c010c77e 0246 0814
def0e800
Call Trace: [] [] [] []
[] [] []
   []
Code: 66 81 7a 08 00 10 0f 47 d8 8b 4a 2c 85 c9 74 19 0f b7 42 08
NMI Watchdog detected LOCKUP on CPU0, registers:
CPU:0
EIP:0010:[]
EFLAGS: 0086
eax: c01b21ac   ebx: c197b078   ecx:    edx: 0012
esi: 0286   edi: c02f5f94   ebp: c02f5f44   esp: c02f5f38
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c02f5000)
Stack: c190fb20 0401  c02f5f64 c010c539 0012 c197b078
c02f5f94
   0240 c0331a40 0012 c02f5f8c c010c73d 0012 c02f5f94
c190fb20
   c0108960 c02f4000 c0108960 c190fb20  c02f5fc8 c010a8c8
c0108960
Call Trace: [] [] [] []
[] [] []
   [] [] [] [] []
Code: 80 3d 64 47 2e c0 00 f3 90 7e f5 e9 1b a7 f9 ff 80 3d 64 e3

Entering kdb (current=0xc02f4000, pid 0) on processor 0 due to WatchDog
Interrupt @ 0xc0217a98
eax = 0xc01b21ac ebx = 0xc197b078 ecx = 0x edx = 0x0012
esi = 0x0286 edi = 0xc02f5f94 esp = 0xc02f5f38 eip = 0xc0217a98
ebp = 0xc02f5f44 xss = 0x0018 xcs = 0x0010 eflags = 0x0086
xds = 0x0018 xes = 0xc02f0018 origeax = 0xc01b21ac  = 0xc02f5f04
[0]kdb> bt
EBP   EIP Function(args)
   0xc0217a98 stext_lock+0x43a8
   kernel .text.lock 0xc02136f0 0xc02136f0
0xc

Re: [PATCH] livelock in elevator scheduling

2000-12-04 Thread Russell Cattelan


Jens Axboe wrote:

> On Fri, Dec 01 2000, Russell Cattelan wrote:
> > > If performance is down, then that problem is most likely elsewhere.
> > > I/O limited benchmarking typically thrives on lots of request
> > > latency -- with that comes better throughput for individual threads.
> > >
> > > > Anyway, I'll try your patch.
> >
> > Well this patch does help with the request starvation problem.
> > Unfortunately it has introduced another problem.
> > Running 4 doio programs, on and XFS partion with KIO buf IO turned on.
>
> This looks like a generic aic7xxx problem, and not block related. Since
> you are doing such nice traces, what is the other CPU doing? CPU1
> seems to be stuck grabbing the io_request_lock (for reasons not entirely
> clear from reading the aic7xxx source...)
>

Sorry I haven't been able to get a decent backtrace of the other processor.

According to Keith Owens the maintainer of kdb there is a race condition in

kbd and the NMI loop detection stuff that resulting in not being able to
switch cpus.


I'll keep try to dig up some more info.

I'm also seeing various other panics in XFS (well pagebuf to be specific)
with this patch.
Nothing seems to be very consistent and this point.

Ok I did manage to switch processors.
Entering kdb (current=0xd7c0a000, pid 645) on processor 1 due to cpu switch

[1]kdb> bt
EBP   EIP Function(args)
   0xc0216594 stext_lock+0x2ea4
   kernel .text.lock 0xc02136f0 0xc02136f0
0xc02197c0
0xd7c0bf98 0xc0155964 ext2_sync_file+0x2c (0xd8257560, 0xd7348220,
0x0, 0xd7c0a000)
   kernel .text 0xc010 0xc0155938
0xc0155a40
0xd7c0bfbc 0xc0136064 sys_fsync+0x54 (0x1, 0xb020, 0x0,
0xb048, 0x8051738)
   kernel .text 0xc010 0xc0136010
0xc0136088
   0xc010a807 system_call+0x33
   kernel .text 0xc010 0xc010a7d4
0xc010a80c
[1]kdb>


>
> --
> * Jens Axboe <[EMAIL PROTECTED]>
> * SuSE Labs

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] livelock in elevator scheduling

2000-12-04 Thread Russell Cattelan


Jens Axboe wrote:

 On Fri, Dec 01 2000, Russell Cattelan wrote:
   If performance is down, then that problem is most likely elsewhere.
   I/O limited benchmarking typically thrives on lots of request
   latency -- with that comes better throughput for individual threads.
  
Anyway, I'll try your patch.
 
  Well this patch does help with the request starvation problem.
  Unfortunately it has introduced another problem.
  Running 4 doio programs, on and XFS partion with KIO buf IO turned on.

 This looks like a generic aic7xxx problem, and not block related. Since
 you are doing such nice traces, what is the other CPU doing? CPU1
 seems to be stuck grabbing the io_request_lock (for reasons not entirely
 clear from reading the aic7xxx source...)


Ok Keith gave me a quick hack to help with the race condition.

Here is the latest set up back traces...
The actually accuracy of these back traces... well?  who knows, but it does
give us something to go on.
It doesn't make much sense to me right now, but I'm guessing the problem is
starting with that do_div error.

I'm going to take a closer look at the scsi_back_merge_fn.
This may  have more to due with our/Chait's kiobuf modifications than
anything else.



XFS (dev: 8/20) mounting with KIOBUFIO
Start mounting filesystem: sd(8,20)
Ending clean XFS mount for filesystem: sd(8,20)
kmem_alloc doing a vmalloc 262144 size  PAGE_SIZE 0 rval=0xe0a1
Unable to handle kernel NULL pointer dereference at virtual address
0008
 printing eip:
c019f8b5
*pde = 

Entering kdb (current=0xc191, pid 5) on processor 1 Panic: Oops
due to panic @ 0xc019f8b5
eax = 0x0002 ebx = 0x0001 ecx = 0x00081478 edx = 0x
esi = 0xc1957da0 edi = 0xc1923ac8 esp = 0xc1911e94 eip = 0xc019f8b5
ebp = 0xc1911e9c xss = 0x0018 xcs = 0x0010 eflags = 0x00010046
xds = 0x0018 xes = 0x0018 origeax = 0x regs = 0xc1911e60
[1]kdb bt
EBP   EIP Function(args)
0xc1911e9c 0xc019f8b5 scsi_back_merge_fn_c+0x15 (0xc1923a98,
0xc1957da0, 0xcfb05780, 0x80)
   kernel .text 0xc010 0xc019f8a0
0xc019f98c
0xc1911f2c 0xc016a0df __make_request+0x1af (0xc1923a98, 0x1,
0xcfb05780, 0x0, 0x814)
   kernel .text 0xc010 0xc0169f30
0xc016a8a4
0xc1911f70 0xc016a9c8 generic_make_request+0x124 (0x1, 0xcfb05780,
0x0, 0x0, 0x0)
   kernel .text 0xc010 0xc016a8a4
0xc016aa50
0xc1911fac 0xc016abde ll_rw_block+0x18e (0x1, 0x1, 0xc1911fd0, 0x0)

   kernel .text 0xc010 0xc016aa50
0xc016ac58
0xc1911fd4 0xc0138ed7 flush_dirty_buffers+0x97 (0x0, 0x10f00)
   kernel .text 0xc010 0xc0138e40
0xc0138f24
0xc1911fec 0xc01391ab bdflush+0x8f
   kernel .text 0xc010 0xc013911c
0xc0139260
   0xc0108c9b kernel_thread+0x23
   kernel .text 0xc010 0xc0108c78
0xc0108cb0
[1]kdb go
Oops: 
CPU:1
EIP:0010:[c019f8b5]
EFLAGS: 00010046
eax: 0002   ebx: 0001   ecx: 00081478   edx: 
esi: c1957da0   edi: c1923ac8   ebp: c1911e9c   esp: c1911e94
ds: 0018   es: 0018   ss: 0018
Process kflushd (pid: 5, stackpage=c1911000)
Stack: cfb05780 c1923a98 c1911f2c c016a0df c1923a98 c1957da0 cfb05780
0080
   0814 00081478 cfb05780 0008 0001 0200 
c1923ac0
    000e c191 c1911efc c010c77e 0246 0814
def0e800
Call Trace: [c016a0df] [c010c77e] [c010a8c8] [c016a9c8]
[c016abde] [c0138ed7] [c01391ab]
   [c0108c9b]
Code: 66 81 7a 08 00 10 0f 47 d8 8b 4a 2c 85 c9 74 19 0f b7 42 08
NMI Watchdog detected LOCKUP on CPU0, registers:
CPU:0
EIP:0010:[c0217a98]
EFLAGS: 0086
eax: c01b21ac   ebx: c197b078   ecx:    edx: 0012
esi: 0286   edi: c02f5f94   ebp: c02f5f44   esp: c02f5f38
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c02f5000)
Stack: c190fb20 0401  c02f5f64 c010c539 0012 c197b078
c02f5f94
   0240 c0331a40 0012 c02f5f8c c010c73d 0012 c02f5f94
c190fb20
   c0108960 c02f4000 c0108960 c190fb20  c02f5fc8 c010a8c8
c0108960
Call Trace: [c010c539] [c010c73d] [c0108960] [c0108960]
[c010a8c8] [c0108960] [c0108960]
   [c0100018] [c010898f] [c0108a02] [c0105000] [c01001d0]
Code: 80 3d 64 47 2e c0 00 f3 90 7e f5 e9 1b a7 f9 ff 80 3d 64 e3

Entering kdb (current=0xc02f4000, pid 0) on processor 0 due to WatchDog
Interrupt @ 0xc0217a98
eax = 0xc01b21ac ebx = 0xc197b078 ecx = 0x edx = 0x0012
esi = 0x0286 edi = 0xc02f5f94 esp = 0xc02f5f38 eip = 0xc0217a98
ebp = 0xc02f5f44 xss = 0x0018 xcs = 0x0010 eflags = 0x0086
xds = 0x0018 xes = 0xc02f0018 origeax = 0xc01b21ac regs = 0xc02f5f04
[0]kdb bt
EBP   EIP Function(args)
   0xc0217a98 stext_lock+0x43a8
   kernel .text.lock 0xc02136f0

Re: [PATCH] livelock in elevator scheduling

2000-12-01 Thread Russell Cattelan


Jens Axboe wrote:

> On Tue, Nov 21 2000, [EMAIL PROTECTED] wrote:
> >  > Believe it or not, but this is intentional. In that regard, the
> >  > function name is a misnomer -- call it i/o scheduler instead :-)
> >
> > I never believe it intentional.  If it is true, the current kernel
> > will be suffered from a kind of DOS attack.  Yes, actually I'm a
> > victim of it.
>
> The problem is caused by the too high sequence numbers in stock
> kernel, as I said. Plus, the sequence decrementing doesn't take
> request/buffer size into account. So the starvation _is_ limited,
> the limit is just too high.
>
> > By Running ZD's ServerBench, not only the performance down, but my
> > machine blocks all commands execution including /bin/ps, /bin/ls... ,
> > and those are not ^C able unless the benchmark is stopped. Those
> > commands are read from disks but the requests are wating at the end of
> > I/O queue, those won't be executed.
>
> If performance is down, then that problem is most likely elsewhere.
> I/O limited benchmarking typically thrives on lots of request
> latency -- with that comes better throughput for individual threads.
>
> > Anyway, I'll try your patch.

Well this patch does help with the request starvation problem.
Unfortunately it has introduced another problem.
Running 4 doio programs, on and XFS partion with KIO buf IO turned on.

I did see something about problems with aic7xxx driver in test11, so this may

not be related to your patch.

I'm going to run without kiobuf  to see if the problem still occurs.


XFS (dev: 8/17) mounting with KIOBUFIO
Start mounting filesystem: sd(8,17)
Ending clean XFS mount for filesystem: sd(8,17)
NMI Watchdog detected LOCKUP on CPU1, registers:
CPU:1
EIP:0010:[]
EFLAGS: 0082
eax: c01b21ac   ebx: c197b078   ecx:    edx: 0012
esi: 0286   edi: dfff7f70   ebp: dfff7f20   esp: dfff7f14
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=dfff7000)
Stack: c190fb20 0401 0020 dfff7f40 c010c539 0012 c197b078
dfff7f70
   0240 c0331a40 0012 dfff7f68 c010c73d 0012 dfff7f70
c190fb20
   c0108960 dfff6000 c0108960 c190fb20 0001 dfff7fa4 c010a8c8
c0108960
Call Trace: [] [] [] [] []
[] []
   [] [] [] []
Code: f3 90 7e f5 e9 1b a7 f9 ff 80 3d e4 e4 2e c0 00 f3 90 7e f5

Entering kdb (current=0xdfff6000, pid 0) on processor 1 due to WatchDog
Interrupt @ 0xc0217a9f
eax = 0xc01b21ac ebx = 0xc197b078 ecx = 0x edx = 0x0012
esi = 0x0286 edi = 0xdfff7f70 esp = 0xdfff7f14 eip = 0xc0217a9f
ebp = 0xdfff7f20 xss = 0x0018 xcs = 0x0010 eflags = 0x0082
xds = 0xc1970018 xes = 0xdfff0018 origeax = 0xc01b21ac  = 0xdfff7ee0
[1]kdb> bt
EBP   EIP Function(args)
   0xc0217a9f stext_lock+0x43af
   kernel .text.lock 0xc02136f0 0xc02136f0
0xc02197c0
0xdfff7f20 0xc01b21c3 do_aic7xxx_isr+0x17 (0x12, 0xc197b078,
0xdfff7f70, 0x240, 0xc0331a40)
   kernel .text 0xc010 0xc01b21ac 0xc01b225c
0xdfff7f40 0xc010c539 handle_IRQ_event+0x4d (0x12, 0xdfff7f70,
0xc190fb20, 0xc0108960, 0xdfff6000)
   kernel .text 0xc010 0xc010c4ec 0xc010c568
0xdfff7f68 0xc010c73d do_IRQ+0x99 (0xc0108960, 0x0, 0xdfff6000,
0xdfff6000, 0xc0108960)
   kernel .text 0xc010 0xc010c6a4 0xc010c790
   0xc010a8c8 ret_from_intr
   kernel .text 0xc010 0xc010a8c8 0xc010a8e8
Interrupt registers:
eax = 0x ebx = 0xc0108960 ecx = 0x edx = 0xdfff6000
esi = 0xdfff6000 edi = 0xc0108960 esp = 0xdfff7fa4 eip = 0xc010898f
ebp = 0xdfff7fa4 xss = 0x0018 xcs = 0x0010 eflags = 0x0246
xds = 0xc0100018 xes = 0xdfff0018 origeax = 0xff12  = 0xdfff7f70
   0xc010898f default_idle+0x2f
   kernel .text 0xc010 0xc0108960 0xc0108998
0xdfff7fb8 0xc0108a02 cpu_idle+0x42
   kernel .text 0xc010 0xc01089c0 0xc0108a18
0xdfff7fc0 0xc02fb5b9 start_secondary+0x25
   kernel .text.init 0xc02f6000 0xc02fb594
0xc02fb5c0


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] livelock in elevator scheduling

2000-12-01 Thread Russell Cattelan


Jens Axboe wrote:

 On Tue, Nov 21 2000, [EMAIL PROTECTED] wrote:
Believe it or not, but this is intentional. In that regard, the
function name is a misnomer -- call it i/o scheduler instead :-)
 
  I never believe it intentional.  If it is true, the current kernel
  will be suffered from a kind of DOS attack.  Yes, actually I'm a
  victim of it.

 The problem is caused by the too high sequence numbers in stock
 kernel, as I said. Plus, the sequence decrementing doesn't take
 request/buffer size into account. So the starvation _is_ limited,
 the limit is just too high.

  By Running ZD's ServerBench, not only the performance down, but my
  machine blocks all commands execution including /bin/ps, /bin/ls... ,
  and those are not ^C able unless the benchmark is stopped. Those
  commands are read from disks but the requests are wating at the end of
  I/O queue, those won't be executed.

 If performance is down, then that problem is most likely elsewhere.
 I/O limited benchmarking typically thrives on lots of request
 latency -- with that comes better throughput for individual threads.

  Anyway, I'll try your patch.

Well this patch does help with the request starvation problem.
Unfortunately it has introduced another problem.
Running 4 doio programs, on and XFS partion with KIO buf IO turned on.

I did see something about problems with aic7xxx driver in test11, so this may

not be related to your patch.

I'm going to run without kiobuf  to see if the problem still occurs.


XFS (dev: 8/17) mounting with KIOBUFIO
Start mounting filesystem: sd(8,17)
Ending clean XFS mount for filesystem: sd(8,17)
NMI Watchdog detected LOCKUP on CPU1, registers:
CPU:1
EIP:0010:[c0217a9f]
EFLAGS: 0082
eax: c01b21ac   ebx: c197b078   ecx:    edx: 0012
esi: 0286   edi: dfff7f70   ebp: dfff7f20   esp: dfff7f14
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=dfff7000)
Stack: c190fb20 0401 0020 dfff7f40 c010c539 0012 c197b078
dfff7f70
   0240 c0331a40 0012 dfff7f68 c010c73d 0012 dfff7f70
c190fb20
   c0108960 dfff6000 c0108960 c190fb20 0001 dfff7fa4 c010a8c8
c0108960
Call Trace: [c010c539] [c010c73d] [c0108960] [c0108960] [c010a8c8]
[c0108960] [c0108960]
   [c0100018] [c010898f] [c0108a02] [c010a9be]
Code: f3 90 7e f5 e9 1b a7 f9 ff 80 3d e4 e4 2e c0 00 f3 90 7e f5

Entering kdb (current=0xdfff6000, pid 0) on processor 1 due to WatchDog
Interrupt @ 0xc0217a9f
eax = 0xc01b21ac ebx = 0xc197b078 ecx = 0x edx = 0x0012
esi = 0x0286 edi = 0xdfff7f70 esp = 0xdfff7f14 eip = 0xc0217a9f
ebp = 0xdfff7f20 xss = 0x0018 xcs = 0x0010 eflags = 0x0082
xds = 0xc1970018 xes = 0xdfff0018 origeax = 0xc01b21ac regs = 0xdfff7ee0
[1]kdb bt
EBP   EIP Function(args)
   0xc0217a9f stext_lock+0x43af
   kernel .text.lock 0xc02136f0 0xc02136f0
0xc02197c0
0xdfff7f20 0xc01b21c3 do_aic7xxx_isr+0x17 (0x12, 0xc197b078,
0xdfff7f70, 0x240, 0xc0331a40)
   kernel .text 0xc010 0xc01b21ac 0xc01b225c
0xdfff7f40 0xc010c539 handle_IRQ_event+0x4d (0x12, 0xdfff7f70,
0xc190fb20, 0xc0108960, 0xdfff6000)
   kernel .text 0xc010 0xc010c4ec 0xc010c568
0xdfff7f68 0xc010c73d do_IRQ+0x99 (0xc0108960, 0x0, 0xdfff6000,
0xdfff6000, 0xc0108960)
   kernel .text 0xc010 0xc010c6a4 0xc010c790
   0xc010a8c8 ret_from_intr
   kernel .text 0xc010 0xc010a8c8 0xc010a8e8
Interrupt registers:
eax = 0x ebx = 0xc0108960 ecx = 0x edx = 0xdfff6000
esi = 0xdfff6000 edi = 0xc0108960 esp = 0xdfff7fa4 eip = 0xc010898f
ebp = 0xdfff7fa4 xss = 0x0018 xcs = 0x0010 eflags = 0x0246
xds = 0xc0100018 xes = 0xdfff0018 origeax = 0xff12 regs = 0xdfff7f70
   0xc010898f default_idle+0x2f
   kernel .text 0xc010 0xc0108960 0xc0108998
0xdfff7fb8 0xc0108a02 cpu_idle+0x42
   kernel .text 0xc010 0xc01089c0 0xc0108a18
0xdfff7fc0 0xc02fb5b9 start_secondary+0x25
   kernel .text.init 0xc02f6000 0xc02fb594
0xc02fb5c0


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

56 matches

Mail list logo