Re: [PATCH] prune_icache_sb
Wendy Cheng wrote: Andrew Morton wrote: On Thu, 30 Nov 2006 11:05:32 -0500 Wendy Cheng <[EMAIL PROTECTED]> wrote: The idea is, instead of unconditionally dropping every buffer associated with the particular mount point (that defeats the purpose of page caching), base kernel exports the "drop_pagecache_sb()" call that allows page cache to be trimmed. More importantly, it is changed to offer the choice of not randomly purging any buffer but the ones that seem to be unused (i_state is NULL and i_count is zero). This will encourage filesystem(s) to pro actively response to vm memory shortage if they choose so. argh. I read this as "It is ok to give system admin(s) commands (that this "drop_pagecache_sb() call" is all about) to drop page cache. It is, however, not ok to give filesystem developer(s) this very same function to trim their own page cache if the filesystems choose to do so" ? In Linux a filesystem is a dumb layer which sits between the VFS and the I/O layer and provides dumb services such as reading/writing inodes, reading/writing directory entries, mapping pagecache offsets to disk blocks, etc. (This model is to varying degrees incorrect for every post-ext2 filesystem, but that's the way it is). Linux kernel, particularly the VFS layer, is starting to show signs of inadequacy as the software components built upon it keep growing. I have doubts that it can keep up and handle this complexity with a development policy like you just described (filesystem is a dumb layer ?). Aren't these DIO_xxx_LOCKING flags inside __blockdev_direct_IO() a perfect example why trying to do too many things inside vfs layer for so many filesystems is a bad idea ? By the way, since we're on this subject, could we discuss a little bit about vfs rename call (or I can start another new discussion thread) ? Note that linux do_rename() starts with the usual lookup logic, followed by "lock_rename", then a final round of dentry lookup, and finally comes to filesystem's i_op->rename call. Since lock_rename() only calls for vfs layer locks that are local to this particular machine, for a cluster filesystem, there exists a huge window between the final lookup and filesystem's i_op->rename calls such that the file could get deleted from another node before fs can do anything about it. Is it possible that we could get a new function pointer (lock_rename) in inode_operations structure so a cluster filesystem can do proper locking ? It looks like the ocfs2 guys have the similar problem? http://ftp.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/ocfs2_git_patches/ocfs2-upstream-linus-20060924/0009-PATCH-Allow-file-systems-to-manually-d_move-inside-of-rename.txt Does this change help fix gfs lock ordering problem as well? -Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] prune_icache_sb
Wendy Cheng wrote: Andrew Morton wrote: On Thu, 30 Nov 2006 11:05:32 -0500 Wendy Cheng [EMAIL PROTECTED] wrote: The idea is, instead of unconditionally dropping every buffer associated with the particular mount point (that defeats the purpose of page caching), base kernel exports the drop_pagecache_sb() call that allows page cache to be trimmed. More importantly, it is changed to offer the choice of not randomly purging any buffer but the ones that seem to be unused (i_state is NULL and i_count is zero). This will encourage filesystem(s) to pro actively response to vm memory shortage if they choose so. argh. I read this as It is ok to give system admin(s) commands (that this drop_pagecache_sb() call is all about) to drop page cache. It is, however, not ok to give filesystem developer(s) this very same function to trim their own page cache if the filesystems choose to do so ? In Linux a filesystem is a dumb layer which sits between the VFS and the I/O layer and provides dumb services such as reading/writing inodes, reading/writing directory entries, mapping pagecache offsets to disk blocks, etc. (This model is to varying degrees incorrect for every post-ext2 filesystem, but that's the way it is). Linux kernel, particularly the VFS layer, is starting to show signs of inadequacy as the software components built upon it keep growing. I have doubts that it can keep up and handle this complexity with a development policy like you just described (filesystem is a dumb layer ?). Aren't these DIO_xxx_LOCKING flags inside __blockdev_direct_IO() a perfect example why trying to do too many things inside vfs layer for so many filesystems is a bad idea ? By the way, since we're on this subject, could we discuss a little bit about vfs rename call (or I can start another new discussion thread) ? Note that linux do_rename() starts with the usual lookup logic, followed by lock_rename, then a final round of dentry lookup, and finally comes to filesystem's i_op-rename call. Since lock_rename() only calls for vfs layer locks that are local to this particular machine, for a cluster filesystem, there exists a huge window between the final lookup and filesystem's i_op-rename calls such that the file could get deleted from another node before fs can do anything about it. Is it possible that we could get a new function pointer (lock_rename) in inode_operations structure so a cluster filesystem can do proper locking ? It looks like the ocfs2 guys have the similar problem? http://ftp.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/ocfs2_git_patches/ocfs2-upstream-linus-20060924/0009-PATCH-Allow-file-systems-to-manually-d_move-inside-of-rename.txt Does this change help fix gfs lock ordering problem as well? -Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Cluster-devel] Re: [GFS2] Change argument of gfs2_dinode_out [17/70]
Al Viro wrote: On Fri, Dec 01, 2006 at 05:29:46PM -0600, Russell Cattelan wrote: gfs2 is supposed to be stabilized and use-able for the up coming rhel5 release, not pretty up for somebody to print out and hang on their wall. Your insight, sir, is truly stunning. That is to say, it reminds of a sudden and unpleasant contact with something dense and misplaced. May I direct your attention to the fact that rhel5 is quite unlikely to be based on 2.6.20? Ok your right rhel5 has no bearing on what goes into 2.6.20. And again I not taking issues with any of the cleanups you sent in they were complete and addressed real potential problems. Just trying to point out stablizing and bug fixing over partial cleanups would be more helpful for gfs2 in general, for whatever kernel it is running in. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Cluster-devel] Re: [GFS2] Change argument of gfs2_dinode_out [17/70]
Al Viro wrote: On Fri, Dec 01, 2006 at 05:29:46PM -0600, Russell Cattelan wrote: gfs2 is supposed to be stabilized and use-able for the up coming rhel5 release, not pretty up for somebody to print out and hang on their wall. Your insight, sir, is truly stunning. That is to say, it reminds of a sudden and unpleasant contact with something dense and misplaced. May I direct your attention to the fact that rhel5 is quite unlikely to be based on 2.6.20? Ok your right rhel5 has no bearing on what goes into 2.6.20. And again I not taking issues with any of the cleanups you sent in they were complete and addressed real potential problems. Just trying to point out stablizing and bug fixing over partial cleanups would be more helpful for gfs2 in general, for whatever kernel it is running in. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Cluster-devel] Re: [GFS2] Change argument of gfs2_dinode_out [17/70]
On Fri, 2006-12-01 at 21:08 +, Al Viro wrote: > On Fri, Dec 01, 2006 at 02:52:11PM -0600, Russell Cattelan wrote: > > code clean up are not without risk and with no regression test suite to > > verify > > that a "cleanup" has not broken something. Cleanups are very much a > > hindrance to stabilization. With no know working points in a code > > history it becomes difficult > > to bisect changes and figure out when bugs were introduced > > Especially when cleanups are mixed in with bug fixes. > > > > Pretty code does not equal correct code. > > No, but convoluted and unreadable code ends up being crappier due > to lack of review. And that's aside of the memory footprint, > likeliness of bugs introduced by code modifications (having in-core > and on-disk data structures with different contents and the same C > type => trouble that won't be caught by compiler), etc. Nothing makes up for the complete lack of GFS2 testing. reviewed code does not equal correct code either. Honestly tell me what test suite do you run on GFS2? Sure is it possible to make an educated guess that some cleanups will not destabilize the code. Indeed the stuff you have done is quite useful to ensure that endian bugs are being caught by the compiler/sparse. But no amount of "it shouldn't break anything" assertions can replace testing. But there is a large quantity of the 70 or so patches that were sent out were to enable "future" cleanup's. Putting in partial cleanups do nothing core code readability and I many cases is more confusing. Unless you meticulously keep up with the partial cleanups looking at the code is now a jumbled mess of inconsistencies. gfs2 is supposed to be stabilized and use-able for the up coming rhel5 release, not pretty up for somebody to print out and hang on their wall. -- Russell Cattelan <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [Cluster-devel] Re: [GFS2] Change argument of gfs2_dinode_out [17/70]
On Fri, 2006-12-01 at 19:25 +, Al Viro wrote: > On Fri, Dec 01, 2006 at 01:19:04PM -0600, Russell Cattelan wrote: > > On Thu, 2006-11-30 at 12:15 +, Steven Whitehouse wrote: > > > >From 539e5d6b7ae8612c0393fe940d2da5b591318d3d Mon Sep 17 00:00:00 2001 > > > From: Steven Whitehouse <[EMAIL PROTECTED]> > > > Date: Tue, 31 Oct 2006 15:07:05 -0500 > > > Subject: [PATCH] [GFS2] Change argument of gfs2_dinode_out > > > > > > Everywhere this was called, a struct gfs2_inode was available, > > > but despite that, it was always called with a struct gfs2_dinode > > > as an argument. By making this change it paves the way to start > > > eliminating fields duplicated between the kernel's struct inode > > > and the struct gfs2_dinode. > > More pointless code churn. > > > > This only makes sense once the file system is working > > and we have time to do this type of cleanup on against > > a stable and TESTED code base. > > Bzzert. Cleaner code is easier to _get_ stable. "Keep it ucking fugly > until everyone stops looking at it out of sheer disgust" is a bad idea.' code clean up are not without risk and with no regression test suite to verify that a "cleanup" has not broken something. Cleanups are very much a hindrance to stabilization. With no know working points in a code history it becomes difficult to bisect changes and figure out when bugs were introduced Especially when cleanups are mixed in with bug fixes. Pretty code does not equal correct code. -- Russell Cattelan <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [GFS2] Change argument of gfs2_dinode_out [17/70]
_out(>i_di, dibh->b_data); > + gfs2_dinode_out(ip, dibh->b_data); > brelse(dibh); > } > > @@ -762,7 +762,7 @@ static int gfs2_rename(struct inode *odi > goto out_end_trans; > ip->i_di.di_ctime = get_seconds(); > gfs2_trans_add_bh(ip->i_gl, dibh, 1); > - gfs2_dinode_out(>i_di, dibh->b_data); > + gfs2_dinode_out(ip, dibh->b_data); > brelse(dibh); > } > > @@ -949,7 +949,7 @@ static int setattr_chown(struct inode *i > gfs2_inode_attr_out(ip); > > gfs2_trans_add_bh(ip->i_gl, dibh, 1); > - gfs2_dinode_out(>i_di, dibh->b_data); > + gfs2_dinode_out(ip, dibh->b_data); > brelse(dibh); > > if (ouid != NO_QUOTA_CHANGE || ogid != NO_QUOTA_CHANGE) { > diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h > index 10a507d..550effa 100644 > --- a/include/linux/gfs2_ondisk.h > +++ b/include/linux/gfs2_ondisk.h > @@ -535,7 +535,8 @@ extern void gfs2_rgrp_in(struct gfs2_rgr > extern void gfs2_rgrp_out(const struct gfs2_rgrp_host *rg, void *buf); > extern void gfs2_quota_in(struct gfs2_quota_host *qu, const void *buf); > extern void gfs2_dinode_in(struct gfs2_dinode_host *di, const void *buf); > -extern void gfs2_dinode_out(const struct gfs2_dinode_host *di, void *buf); > +struct gfs2_inode; > +extern void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf); > extern void gfs2_ea_header_in(struct gfs2_ea_header *ea, const void *buf); > extern void gfs2_ea_header_out(const struct gfs2_ea_header *ea, void *buf); > extern void gfs2_log_header_in(struct gfs2_log_header_host *lh, const void > *buf); -- Russell Cattelan <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [GFS2] Move gfs2_meta_syncfs() into log.c [57/70]
On Thu, 2006-11-30 at 12:22 +, Steven Whitehouse wrote: > >From a25311c8e0b7071b129ca9a9e49e22eeaf620864 Mon Sep 17 00:00:00 2001 > From: Steven Whitehouse <[EMAIL PROTECTED]> > Date: Thu, 23 Nov 2006 11:06:35 -0500 > Subject: [PATCH] [GFS2] Move gfs2_meta_syncfs() into log.c > > By moving gfs2_meta_syncfs() into log.c, gfs2_ail1_start() > can be made static. While this is not a incorrect change as it stands. This kind of pointless code curn is making it harder to stabilize GFS2. > > Signed-off-by: Steven Whitehouse <[EMAIL PROTECTED]> > --- > fs/gfs2/log.c | 21 - > fs/gfs2/log.h |2 +- > fs/gfs2/meta_io.c | 17 - > fs/gfs2/meta_io.h |1 - > 4 files changed, 21 insertions(+), 20 deletions(-) > > diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c > index 6456fc3..7713d59 100644 > --- a/fs/gfs2/log.c > +++ b/fs/gfs2/log.c > @@ -15,6 +15,7 @@ #include > #include > #include > #include > +#include > > #include "gfs2.h" > #include "incore.h" > @@ -142,7 +143,7 @@ static int gfs2_ail1_empty_one(struct gf > return list_empty(>ai_ail1_list); > } > > -void gfs2_ail1_start(struct gfs2_sbd *sdp, int flags) > +static void gfs2_ail1_start(struct gfs2_sbd *sdp, int flags) > { > struct list_head *head = >sd_ail1_list; > u64 sync_gen; > @@ -689,3 +690,21 @@ void gfs2_log_shutdown(struct gfs2_sbd * > up_write(>sd_log_flush_lock); > } > > + > +/** > + * gfs2_meta_syncfs - sync all the buffers in a filesystem > + * @sdp: the filesystem > + * > + */ > + > +void gfs2_meta_syncfs(struct gfs2_sbd *sdp) > +{ > + gfs2_log_flush(sdp, NULL); > + for (;;) { > + gfs2_ail1_start(sdp, DIO_ALL); > + if (gfs2_ail1_empty(sdp, DIO_ALL)) > + break; > + msleep(10); > + } > +} > + > diff --git a/fs/gfs2/log.h b/fs/gfs2/log.h > index 7f5737d..8e7aa0f 100644 > --- a/fs/gfs2/log.h > +++ b/fs/gfs2/log.h > @@ -48,7 +48,6 @@ static inline void gfs2_log_pointers_ini > unsigned int gfs2_struct2blk(struct gfs2_sbd *sdp, unsigned int nstruct, > unsigned int ssize); > > -void gfs2_ail1_start(struct gfs2_sbd *sdp, int flags); > int gfs2_ail1_empty(struct gfs2_sbd *sdp, int flags); > > int gfs2_log_reserve(struct gfs2_sbd *sdp, unsigned int blks); > @@ -61,5 +60,6 @@ void gfs2_log_flush(struct gfs2_sbd *sdp > void gfs2_log_commit(struct gfs2_sbd *sdp, struct gfs2_trans *trans); > > void gfs2_log_shutdown(struct gfs2_sbd *sdp); > +void gfs2_meta_syncfs(struct gfs2_sbd *sdp); > > #endif /* __LOG_DOT_H__ */ > diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c > index 939a09f..fbeba81 100644 > --- a/fs/gfs2/meta_io.c > +++ b/fs/gfs2/meta_io.c > @@ -574,20 +574,3 @@ out: > return first_bh; > } > > -/** > - * gfs2_meta_syncfs - sync all the buffers in a filesystem > - * @sdp: the filesystem > - * > - */ > - > -void gfs2_meta_syncfs(struct gfs2_sbd *sdp) > -{ > - gfs2_log_flush(sdp, NULL); > - for (;;) { > - gfs2_ail1_start(sdp, DIO_ALL); > - if (gfs2_ail1_empty(sdp, DIO_ALL)) > - break; > - msleep(10); > - } > -} > - > diff --git a/fs/gfs2/meta_io.h b/fs/gfs2/meta_io.h > index 3ec939e..e037425 100644 > --- a/fs/gfs2/meta_io.h > +++ b/fs/gfs2/meta_io.h > @@ -67,7 +67,6 @@ static inline int gfs2_meta_inode_buffer > } > > struct buffer_head *gfs2_meta_ra(struct gfs2_glock *gl, u64 dblock, u32 > extlen); > -void gfs2_meta_syncfs(struct gfs2_sbd *sdp); > > #define buffer_busy(bh) \ > ((bh)->b_state & ((1ul << BH_Dirty) | (1ul << BH_Lock) | (1ul << BH_Pinned))) -- Russell Cattelan <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [GFS2] Fix page lock/glock deadlock [32/70]
On Thu, 2006-11-30 at 12:17 +, Steven Whitehouse wrote: > >From 2ca99501fa5422e84f18333918a503433449e2b5 Mon Sep 17 00:00:00 2001 > From: Steven Whitehouse <[EMAIL PROTECTED]> > Date: Wed, 8 Nov 2006 10:26:54 -0500 > Subject: [PATCH] [GFS2] Fix page lock/glock deadlock > > This fixes a race between the glock and the page lock encountered > during truncate in gfs2_readpage and gfs2_prepare_write. The gfs2_readpages > function doesn't need the same fix since it only uses a try lock anyway, so > it will fail back to gfs2_readpage in the case of a potential deadlock. > > This bug was spotted by Russell Cattelan.\ This bug was fixed correctly by Russell Cattelan and this is an incomplete version of my patch. As I keep trying to point out readpages is still wrong in that the stuffed case is not correct and results in a panic if a file is being truncated down. If nr_pages is calculated prior a truncate the stuffed inode case will try to operate on pages that are no longer there. The correct fix is to not try and handled the stuffed inode case at all in readpages and simply return AOP_TRUNCATED_PAGE and let things things fall back to readpage. > > Cc: Russell Cattelan <[EMAIL PROTECTED]> > Signed-off-by: Steven Whitehouse <[EMAIL PROTECTED]> > --- > fs/gfs2/glock.h |1 - > fs/gfs2/ops_address.c | 13 + > 2 files changed, 9 insertions(+), 5 deletions(-) > > diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h > index b985627..a331bf8 100644 > --- a/fs/gfs2/glock.h > +++ b/fs/gfs2/glock.h > @@ -27,7 +27,6 @@ #define GL_SKIP 0x0100 > #define GL_ATIME 0x0200 > #define GL_NOCACHE 0x0400 > #define GL_NOCANCEL 0x1000 > -#define GL_AOP 0x4000 > > #define GLR_TRYFAILED13 > #define GLR_CANCELED 14 > diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c > index 5c3962c..3822189 100644 > --- a/fs/gfs2/ops_address.c > +++ b/fs/gfs2/ops_address.c > @@ -230,7 +230,7 @@ static int gfs2_readpage(struct file *fi > /* gfs2_sharewrite_nopage has grabbed the > ip->i_gl already */ > goto skip_lock; > } > - gfs2_holder_init(ip->i_gl, LM_ST_SHARED, GL_ATIME|GL_AOP, ); > + gfs2_holder_init(ip->i_gl, LM_ST_SHARED, > GL_ATIME|LM_FLAG_TRY_1CB, ); > do_unlock = 1; > error = gfs2_glock_nq_m_atime(1, ); > if (unlikely(error)) > @@ -254,6 +254,8 @@ skip_lock: > out: > return error; > out_unlock: > + if (error == GLR_TRYFAILED) > + error = AOP_TRUNCATED_PAGE; > unlock_page(page); > if (do_unlock) > gfs2_holder_uninit(); > @@ -293,7 +295,7 @@ static int gfs2_readpages(struct file *f > goto skip_lock; > } > gfs2_holder_init(ip->i_gl, LM_ST_SHARED, > - LM_FLAG_TRY_1CB|GL_ATIME|GL_AOP, ); > + LM_FLAG_TRY_1CB|GL_ATIME, ); > do_unlock = 1; > ret = gfs2_glock_nq_m_atime(1, ); > if (ret == GLR_TRYFAILED) > @@ -366,10 +368,13 @@ static int gfs2_prepare_write(struct fil > unsigned int write_len = to - from; > > > - gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_ATIME|GL_AOP, >i_gh); > + gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_ATIME|LM_FLAG_TRY_1CB, > >i_gh); > error = gfs2_glock_nq_m_atime(1, >i_gh); > - if (error) > + if (unlikely(error)) { > + if (error == GLR_TRYFAILED) > + error = AOP_TRUNCATED_PAGE; > goto out_uninit; > + } > > gfs2_write_calc_reserv(ip, write_len, _blocks, _blocks); > -- Russell Cattelan <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GFS2] Tidy up bmap & fix boundary bug [48/70]
ock, new, ); > if (error) > - return error; > + goto out_fail; > } > > - boundary = lookup_block(ip, bh, end_of_metadata, mp, create, , > ); > - clear_buffer_mapped(bh_map); > - clear_buffer_new(bh_map); > - clear_buffer_boundary(bh_map); > - > + boundary = lookup_block(ip, bh, end_of_metadata, , create, , > ); > if (dblock) { > map_bh(bh_map, inode->i_sb, dblock); > if (boundary) > - set_buffer_boundary(bh); > + set_buffer_boundary(bh_map); > if (new) { > struct buffer_head *dibh; > error = gfs2_meta_inode_buffer(ip, ); > @@ -510,8 +531,8 @@ static int gfs2_block_pointers(struct in > while(--maxlen && !buffer_boundary(bh_map)) { > u64 eblock; > > - mp->mp_list[end_of_metadata]++; > - boundary = lookup_block(ip, bh, end_of_metadata, mp, 0, > , ); > + mp.mp_list[end_of_metadata]++; > + boundary = lookup_block(ip, bh, end_of_metadata, , > 0, , ); > if (eblock != ++dblock) > break; > bh_map->b_size += (1 << inode->i_blkbits); > @@ -521,43 +542,15 @@ static int gfs2_block_pointers(struct in > } > out_brelse: > brelse(bh); > - return 0; > -} > - > - > -static inline void bmap_lock(struct inode *inode, int create) > -{ > - struct gfs2_inode *ip = GFS2_I(inode); > - if (create) > - down_write(>i_rw_mutex); > - else > - down_read(>i_rw_mutex); > -} > - > -static inline void bmap_unlock(struct inode *inode, int create) > -{ > - struct gfs2_inode *ip = GFS2_I(inode); > - if (create) > - up_write(>i_rw_mutex); > - else > - up_read(>i_rw_mutex); > -} > - > -int gfs2_block_map(struct inode *inode, u64 lblock, int create, > -struct buffer_head *bh) > -{ > - struct metapath mp; > - int ret; > - > - bmap_lock(inode, create); > - ret = gfs2_block_pointers(inode, lblock, create, bh, ); > +out_ok: > + error = 0; > +out_fail: > bmap_unlock(inode, create); > - return ret; > + return error; > } > > int gfs2_extent_map(struct inode *inode, u64 lblock, int *new, u64 *dblock, > unsigned *extlen) > { > - struct metapath mp; > struct buffer_head bh = { .b_state = 0, .b_blocknr = 0 }; > int ret; > int create = *new; > @@ -567,9 +560,7 @@ int gfs2_extent_map(struct inode *inode, > BUG_ON(!new); > > bh.b_size = 1 << (inode->i_blkbits + 5); > - bmap_lock(inode, create); > - ret = gfs2_block_pointers(inode, lblock, create, , ); > - bmap_unlock(inode, create); > + ret = gfs2_block_map(inode, lblock, create, ); > *extlen = bh.b_size >> inode->i_blkbits; > *dblock = bh.b_blocknr; > if (buffer_new()) -- Russell Cattelan <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [GFS2] Change argument to gfs2_dinode_in [18/70]
On Thu, 2006-11-30 at 12:15 +, Steven Whitehouse wrote: > >From 891ea14712da68e282de8583e5fa14f0d3f3731e Mon Sep 17 00:00:00 2001 > From: Steven Whitehouse <[EMAIL PROTECTED]> > Date: Tue, 31 Oct 2006 15:22:10 -0500 > Subject: [PATCH] [GFS2] Change argument to gfs2_dinode_in > > This is a preliminary patch to enable the removal of fields > in gfs2_dinode_host which are duplicated in struct inode. Deferred till the complete "cleanup" work is done? > > Signed-off-by: Steven Whitehouse <[EMAIL PROTECTED]> > --- > fs/gfs2/inode.c |2 +- > fs/gfs2/ondisk.c|4 ++-- > include/linux/gfs2_ondisk.h |2 +- > 3 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c > index b861ddb..9875e93 100644 > --- a/fs/gfs2/inode.c > +++ b/fs/gfs2/inode.c > @@ -229,7 +229,7 @@ int gfs2_inode_refresh(struct gfs2_inode > return -EIO; > } > > - gfs2_dinode_in(>i_di, dibh->b_data); > + gfs2_dinode_in(ip, dibh->b_data); > > brelse(dibh); > > diff --git a/fs/gfs2/ondisk.c b/fs/gfs2/ondisk.c > index 2c50fa0..edf8756 100644 > --- a/fs/gfs2/ondisk.c > +++ b/fs/gfs2/ondisk.c > @@ -155,8 +155,9 @@ void gfs2_quota_in(struct gfs2_quota_hos > qu->qu_value = be64_to_cpu(str->qu_value); > } > > -void gfs2_dinode_in(struct gfs2_dinode_host *di, const void *buf) > +void gfs2_dinode_in(struct gfs2_inode *ip, const void *buf) > { > + struct gfs2_dinode_host *di = >i_di; > const struct gfs2_dinode *str = buf; > > gfs2_meta_header_in(>di_header, buf); > @@ -186,7 +187,6 @@ void gfs2_dinode_in(struct gfs2_dinode_h > di->di_entries = be32_to_cpu(str->di_entries); > > di->di_eattr = be64_to_cpu(str->di_eattr); > - > } > > void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf) > diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h > index 550effa..08d8240 100644 > --- a/include/linux/gfs2_ondisk.h > +++ b/include/linux/gfs2_ondisk.h > @@ -534,8 +534,8 @@ extern void gfs2_rindex_out(const struct > extern void gfs2_rgrp_in(struct gfs2_rgrp_host *rg, const void *buf); > extern void gfs2_rgrp_out(const struct gfs2_rgrp_host *rg, void *buf); > extern void gfs2_quota_in(struct gfs2_quota_host *qu, const void *buf); > -extern void gfs2_dinode_in(struct gfs2_dinode_host *di, const void *buf); > struct gfs2_inode; > +extern void gfs2_dinode_in(struct gfs2_inode *ip, const void *buf); > extern void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf); > extern void gfs2_ea_header_in(struct gfs2_ea_header *ea, const void *buf); > extern void gfs2_ea_header_out(const struct gfs2_ea_header *ea, void *buf); -- Russell Cattelan <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [GFS2] Fix journal flush problem [56/70]
> @@ -274,7 +280,7 @@ int gfs2_log_reserve(struct gfs2_sbd *sd > > mutex_lock(>sd_log_reserve_mutex); > gfs2_log_lock(sdp); > - while(sdp->sd_log_blks_free <= blks) { > + while(sdp->sd_log_blks_free <= (blks + 6)) { > gfs2_log_unlock(sdp); > gfs2_ail1_empty(sdp, 0); > gfs2_log_flush(sdp, NULL); > @@ -643,12 +649,9 @@ void gfs2_log_commit(struct gfs2_sbd *sd > up_read(>sd_log_flush_lock); > > gfs2_log_lock(sdp); > - if (sdp->sd_log_num_buf > gfs2_tune_get(sdp, gt_incore_log_blocks)) { > - gfs2_log_unlock(sdp); > - gfs2_log_flush(sdp, NULL); > - } else { > - gfs2_log_unlock(sdp); > - } > + if (sdp->sd_log_num_buf > gfs2_tune_get(sdp, gt_incore_log_blocks)) > + wake_up_process(sdp->sd_logd_process); > + gfs2_log_unlock(sdp); > } > > /** > diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c > index 3912d6a..939a09f 100644 > --- a/fs/gfs2/meta_io.c > +++ b/fs/gfs2/meta_io.c > @@ -472,6 +472,9 @@ int gfs2_meta_indirect_buffer(struct gfs > struct buffer_head *bh = NULL, **bh_slot = ip->i_cache + height; > int in_cache = 0; > > + BUG_ON(!gl); > + BUG_ON(!sdp); > + > spin_lock(>i_spin); > if (*bh_slot && (*bh_slot)->b_blocknr == num) { > bh = *bh_slot; > diff --git a/fs/gfs2/ops_super.c b/fs/gfs2/ops_super.c > index 8635175..7685b46 100644 > --- a/fs/gfs2/ops_super.c > +++ b/fs/gfs2/ops_super.c > @@ -157,7 +157,8 @@ static void gfs2_write_super(struct supe > static int gfs2_sync_fs(struct super_block *sb, int wait) > { > sb->s_dirt = 0; > - gfs2_log_flush(sb->s_fs_info, NULL); > + if (wait) > + gfs2_log_flush(sb->s_fs_info, NULL); > return 0; > } > > @@ -293,8 +294,6 @@ static void gfs2_clear_inode(struct inod >*/ > if (inode->i_private) { > struct gfs2_inode *ip = GFS2_I(inode); > - gfs2_glock_inode_squish(inode); > - gfs2_assert(inode->i_sb->s_fs_info, ip->i_gl->gl_state == > LM_ST_UNLOCKED); > ip->i_gl->gl_object = NULL; > gfs2_glock_schedule_for_reclaim(ip->i_gl); > gfs2_glock_put(ip->i_gl); > @@ -395,7 +394,7 @@ static void gfs2_delete_inode(struct ino > if (!inode->i_private) > goto out; > > - error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB | > GL_NOCACHE, ); > + error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB, > ); > if (unlikely(error)) { > gfs2_glock_dq_uninit(>i_iopen_gh); > goto out; -- Russell Cattelan <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [GFS2] Reduce number of arguments to meta_io.c:getbuf() [58/70]
pace, dblock, CREATE); > + first_bh = getbuf(gl, dblock, CREATE); > > if (buffer_uptodate(first_bh)) > goto out; > @@ -558,7 +556,7 @@ struct buffer_head *gfs2_meta_ra(struct > extlen--; > > while (extlen) { > - bh = getbuf(sdp, aspace, dblock, CREATE); > + bh = getbuf(gl, dblock, CREATE); > > if (!buffer_uptodate(bh) && !buffer_locked(bh)) > ll_rw_block(READA, 1, ); > -- > 1.4.1 > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Russell Cattelan <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [GFS2] Simplify glops functions [53/70]
gt; { > - int meta = (flags & DIO_METADATA); > - int data = (flags & DIO_DATA); > - > if (test_bit(GLF_DIRTY, >gl_flags)) { > - if (meta && data) { > - gfs2_page_writeback(gl); > - gfs2_log_flush(gl->gl_sbd, gl); > - gfs2_meta_sync(gl); > - gfs2_page_wait(gl); > - clear_bit(GLF_DIRTY, >gl_flags); > - } else if (meta) { > - gfs2_log_flush(gl->gl_sbd, gl); > - gfs2_meta_sync(gl); > - } else if (data) { > - gfs2_page_writeback(gl); > - gfs2_page_wait(gl); > - } > - if (flags & DIO_RELEASE) > - gfs2_ail_empty_gl(gl); > + gfs2_page_writeback(gl); > + gfs2_log_flush(gl->gl_sbd, gl); > + gfs2_meta_sync(gl); > + gfs2_page_wait(gl); > + clear_bit(GLF_DIRTY, >gl_flags); > + gfs2_ail_empty_gl(gl); > } > } > > @@ -302,15 +284,13 @@ static void inode_go_sync(struct gfs2_gl > static void inode_go_inval(struct gfs2_glock *gl, int flags) > { > int meta = (flags & DIO_METADATA); > - int data = (flags & DIO_DATA); > > if (meta) { > struct gfs2_inode *ip = gl->gl_object; > gfs2_meta_inval(gl); > set_bit(GIF_INVALID, >i_flags); > } > - if (data) > - gfs2_page_inval(gl); > + gfs2_page_inval(gl); > } > > /** > @@ -494,7 +474,7 @@ static void trans_go_xmote_bh(struct gfs > if (gl->gl_state != LM_ST_UNLOCKED && > test_bit(SDF_JOURNAL_LIVE, >sd_flags)) { > gfs2_meta_cache_flush(GFS2_I(sdp->sd_jdesc->jd_inode)); > - j_gl->gl_ops->go_inval(j_gl, DIO_METADATA | DIO_DATA); > + j_gl->gl_ops->go_inval(j_gl, DIO_METADATA); > > error = gfs2_find_jhead(sdp->sd_jdesc, ); > if (error) > diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h > index 227a74d..734421e 100644 > --- a/fs/gfs2/incore.h > +++ b/fs/gfs2/incore.h > @@ -14,8 +14,6 @@ #include > > #define DIO_WAIT 0x0010 > #define DIO_METADATA 0x0020 > -#define DIO_DATA 0x0040 > -#define DIO_RELEASE 0x0080 > #define DIO_ALL 0x0100 > > struct gfs2_log_operations; > @@ -103,18 +101,17 @@ struct gfs2_bufdata { > }; > > struct gfs2_glock_operations { > - void (*go_xmote_th) (struct gfs2_glock * gl, unsigned int state, > - int flags); > - void (*go_xmote_bh) (struct gfs2_glock * gl); > - void (*go_drop_th) (struct gfs2_glock * gl); > - void (*go_drop_bh) (struct gfs2_glock * gl); > - void (*go_sync) (struct gfs2_glock * gl, int flags); > - void (*go_inval) (struct gfs2_glock * gl, int flags); > - int (*go_demote_ok) (struct gfs2_glock * gl); > - int (*go_lock) (struct gfs2_holder * gh); > - void (*go_unlock) (struct gfs2_holder * gh); > - void (*go_callback) (struct gfs2_glock * gl, unsigned int state); > - void (*go_greedy) (struct gfs2_glock * gl); > + void (*go_xmote_th) (struct gfs2_glock *gl, unsigned int state, int > flags); > + void (*go_xmote_bh) (struct gfs2_glock *gl); > + void (*go_drop_th) (struct gfs2_glock *gl); > + void (*go_drop_bh) (struct gfs2_glock *gl); > + void (*go_sync) (struct gfs2_glock *gl); > + void (*go_inval) (struct gfs2_glock *gl, int flags); > + int (*go_demote_ok) (struct gfs2_glock *gl); > + int (*go_lock) (struct gfs2_holder *gh); > + void (*go_unlock) (struct gfs2_holder *gh); > + void (*go_callback) (struct gfs2_glock *gl, unsigned int state); > + void (*go_greedy) (struct gfs2_glock *gl); > const int go_type; > }; > > diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c > index 0ef8317..1408c5f 100644 > --- a/fs/gfs2/super.c > +++ b/fs/gfs2/super.c > @@ -517,7 +517,7 @@ int gfs2_make_fs_rw(struct gfs2_sbd *sdp > return error; > > gfs2_meta_cache_flush(ip); > - j_gl->gl_ops->go_inval(j_gl, DIO_METADATA | DIO_DATA); > + j_gl->gl_ops->go_inval(j_gl, DIO_METADATA); > > error = gfs2_find_jhead(sdp->sd_jdesc, ); > if (error) -- Russell Cattelan <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [GFS2 & DLM] Guide to -nmw tree patches
On Thu, 2006-11-30 at 13:21 +0100, Arjan van de Ven wrote: > On Thu, 2006-11-30 at 12:12 +, Steven Whitehouse wrote: > > Hi, > > > > Below is a summary diffstat of all the changes in the GFS2 & DLM -nmw > > (next merge window) git tree. Since merge time is once again upon us, > > the following patches are the current complete content of the tree. > > > Hi, > > can you please make sure your patch series is in reply to your first > message, so that your mails properly thread in mail clients? > > (everyone else seems to manage this, I bet the git or quilt patch series > send stuff has this built in somehow..) I agree. I would also continue to urge for internal reviews (since this is the first I have seen any of these changes) before spamming lkml with 70 loosely related patches. > > Greetings, > Arjan van de Ven > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Russell Cattelan <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [GFS2] Remove unused function from inode.c [50/70]
; #endif /* __INODE_DOT_H__ */ > diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c > index 2f7ef98..8676c39 100644 > --- a/fs/gfs2/ops_address.c > +++ b/fs/gfs2/ops_address.c > @@ -217,7 +217,7 @@ static int gfs2_readpage(struct file *fi > } > gfs2_holder_init(ip->i_gl, LM_ST_SHARED, > GL_ATIME|LM_FLAG_TRY_1CB, ); > do_unlock = 1; > - error = gfs2_glock_nq_m_atime(1, ); > + error = gfs2_glock_nq_atime(); > if (unlikely(error)) > goto out_unlock; > } > @@ -282,7 +282,7 @@ static int gfs2_readpages(struct file *f > gfs2_holder_init(ip->i_gl, LM_ST_SHARED, >LM_FLAG_TRY_1CB|GL_ATIME, ); > do_unlock = 1; > - ret = gfs2_glock_nq_m_atime(1, ); > + ret = gfs2_glock_nq_atime(); > if (ret == GLR_TRYFAILED) > goto out_noerror; > if (unlikely(ret)) > @@ -354,7 +354,7 @@ static int gfs2_prepare_write(struct fil > > > gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_ATIME|LM_FLAG_TRY_1CB, > >i_gh); > - error = gfs2_glock_nq_m_atime(1, >i_gh); > + error = gfs2_glock_nq_atime(>i_gh); > if (unlikely(error)) { > if (error == GLR_TRYFAILED) > error = AOP_TRUNCATED_PAGE; > @@ -609,7 +609,7 @@ static ssize_t gfs2_direct_IO(int rw, st >* on this path. All we need change is atime. >*/ > gfs2_holder_init(ip->i_gl, LM_ST_SHARED, GL_ATIME, ); > - rv = gfs2_glock_nq_m_atime(1, ); > + rv = gfs2_glock_nq_atime(); > if (rv) > goto out; > > diff --git a/fs/gfs2/ops_file.c b/fs/gfs2/ops_file.c > index eabf6c6..c2be216 100644 > --- a/fs/gfs2/ops_file.c > +++ b/fs/gfs2/ops_file.c > @@ -253,7 +253,7 @@ static int gfs2_get_flags(struct file *f > u32 fsflags; > > gfs2_holder_init(ip->i_gl, LM_ST_SHARED, GL_ATIME, ); > - error = gfs2_glock_nq_m_atime(1, ); > + error = gfs2_glock_nq_atime(); > if (error) > return error; > -- Russell Cattelan <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [GFS2] Shrink gfs2_inode (8) - i_vn [28/70]
On Thu, 2006-11-30 at 12:16 +, Steven Whitehouse wrote: > >From bfded27ba010d1c3b0aa3843f97dc9b80de751be Mon Sep 17 00:00:00 2001 > From: Steven Whitehouse <[EMAIL PROTECTED]> > Date: Wed, 1 Nov 2006 16:05:38 -0500 > Subject: [PATCH] [GFS2] Shrink gfs2_inode (8) - i_vn > > This shrinks the size of the gfs2_inode by 8 bytes by > replacing the version counter with a one bit valid/invalid > flag. What is the version number used for? It seems like anything that was specifically carving a 64 container has a more specific reason that just ON/OFF? > > Signed-off-by: Steven Whitehouse <[EMAIL PROTECTED]> > --- > fs/gfs2/glops.c |5 +++-- > fs/gfs2/incore.h|2 +- > fs/gfs2/inode.c |4 ++-- > fs/gfs2/ops_inode.c |2 +- > 4 files changed, 7 insertions(+), 6 deletions(-) > > diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c > index aad45b7..9c20337 100644 > --- a/fs/gfs2/glops.c > +++ b/fs/gfs2/glops.c > @@ -305,8 +305,9 @@ static void inode_go_inval(struct gfs2_g > int data = (flags & DIO_DATA); > > if (meta) { > + struct gfs2_inode *ip = gl->gl_object; > gfs2_meta_inval(gl); > - gl->gl_vn++; > + set_bit(GIF_INVALID, >i_flags); > } > if (data) > gfs2_page_inval(gl); > @@ -351,7 +352,7 @@ static int inode_go_lock(struct gfs2_hol > if (!ip) > return 0; > > - if (ip->i_vn != gl->gl_vn) { > + if (test_bit(GIF_INVALID, >i_flags)) { > error = gfs2_inode_refresh(ip); > if (error) > return error; > diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h > index c0a8c3b..227a74d 100644 > --- a/fs/gfs2/incore.h > +++ b/fs/gfs2/incore.h > @@ -217,6 +217,7 @@ struct gfs2_alloc { > }; > > enum { > + GIF_INVALID = 0, > GIF_QD_LOCKED = 1, > GIF_PAGED = 2, > GIF_SW_PAGED= 3, > @@ -228,7 +229,6 @@ struct gfs2_inode { > > unsigned long i_flags; /* GIF_... */ > > - u64 i_vn; > struct gfs2_dinode_host i_di; /* To be replaced by ref to block */ > > struct gfs2_glock *i_gl; /* Move into i_gh? */ > diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c > index f6177fc..e467780 100644 > --- a/fs/gfs2/inode.c > +++ b/fs/gfs2/inode.c > @@ -145,7 +145,7 @@ struct inode *gfs2_inode_lookup(struct s > if (unlikely(error)) > goto fail_put; > > - ip->i_vn = ip->i_gl->gl_vn - 1; > + set_bit(GIF_INVALID, >i_flags); > error = gfs2_glock_nq_init(io_gl, LM_ST_SHARED, GL_EXACT, > >i_iopen_gh); > if (unlikely(error)) > goto fail_iopen; > @@ -242,7 +242,7 @@ int gfs2_inode_refresh(struct gfs2_inode > > error = gfs2_dinode_in(ip, dibh->b_data); > brelse(dibh); > - ip->i_vn = ip->i_gl->gl_vn; > + clear_bit(GIF_INVALID, >i_flags); > > return error; > } > diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c > index 0e4eade..b247f25 100644 > --- a/fs/gfs2/ops_inode.c > +++ b/fs/gfs2/ops_inode.c > @@ -844,7 +844,7 @@ static int gfs2_permission(struct inode > struct gfs2_holder i_gh; > int error; > > - if (ip->i_vn == ip->i_gl->gl_vn) > + if (!test_bit(GIF_INVALID, >i_flags)) > return generic_permission(inode, mask, gfs2_check_acl); > > error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, LM_FLAG_ANY, _gh); -- Russell Cattelan <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [GFS2] Shrink gfs2_inode (6) - di_atime/di_mtime/di_ctime [26/70]
printk(KERN_INFO " di_size = %llu\n", (unsigned long long)di->di_size); > printk(KERN_INFO " di_blocks = %llu\n", (unsigned long > long)di->di_blocks); > - printk(KERN_INFO " di_atime = %lld\n", (long long)di->di_atime); > - printk(KERN_INFO " di_mtime = %lld\n", (long long)di->di_mtime); > - printk(KERN_INFO " di_ctime = %lld\n", (long long)di->di_ctime); > - > printk(KERN_INFO " di_goal_meta = %llu\n", (unsigned long > long)di->di_goal_meta); > printk(KERN_INFO " di_goal_data = %llu\n", (unsigned long > long)di->di_goal_data); > > diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c > index 38b702a..5c3962c 100644 > --- a/fs/gfs2/ops_address.c > +++ b/fs/gfs2/ops_address.c > @@ -498,10 +498,6 @@ static int gfs2_commit_write(struct file > di->di_size = cpu_to_be64(inode->i_size); > } > > - di->di_atime = cpu_to_be64(inode->i_atime.tv_sec); > - di->di_mtime = cpu_to_be64(inode->i_mtime.tv_sec); > - di->di_ctime = cpu_to_be64(inode->i_ctime.tv_sec); > - > brelse(dibh); > gfs2_trans_end(sdp); > if (al->al_requested) { > diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c > index 06176de..585b43a 100644 > --- a/fs/gfs2/ops_inode.c > +++ b/fs/gfs2/ops_inode.c > @@ -729,7 +729,7 @@ static int gfs2_rename(struct inode *odi > error = gfs2_meta_inode_buffer(ip, ); > if (error) > goto out_end_trans; > - ip->i_di.di_ctime = get_seconds(); > + ip->i_inode.i_ctime.tv_sec = get_seconds(); > gfs2_trans_add_bh(ip->i_gl, dibh, 1); > gfs2_dinode_out(ip, dibh->b_data); > brelse(dibh); > @@ -915,7 +915,6 @@ static int setattr_chown(struct inode *i > > error = inode_setattr(inode, attr); > gfs2_assert_warn(sdp, !error); > - gfs2_inode_attr_out(ip); > > gfs2_trans_add_bh(ip->i_gl, dibh, 1); > gfs2_dinode_out(ip, dibh->b_data); > diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h > index c61517b..7f5a4a1 100644 > --- a/include/linux/gfs2_ondisk.h > +++ b/include/linux/gfs2_ondisk.h > @@ -324,9 +324,6 @@ struct gfs2_dinode { > struct gfs2_dinode_host { > __u64 di_size; /* number of bytes in file */ > __u64 di_blocks;/* number of blocks in file */ > - __u64 di_atime; /* time last accessed */ > - __u64 di_mtime; /* time last modified */ > - __u64 di_ctime; /* time last changed */ > > /* This section varies from gfs1. Padding added to align with > * remainder of dinode -- Russell Cattelan <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [GFS2] Shrink gfs2_inode (5) - di_nlink [25/70]
return -EMLINK; > > return 0; > @@ -808,7 +814,7 @@ static int link_dinode(struct gfs2_inode > error = gfs2_meta_inode_buffer(ip, ); > if (error) > goto fail_end_trans; > - ip->i_di.di_nlink = 1; > + ip->i_inode.i_nlink = 1; > gfs2_trans_add_bh(ip->i_gl, dibh, 1); > gfs2_dinode_out(ip, dibh->b_data); > brelse(dibh); > @@ -1016,7 +1022,12 @@ int gfs2_rmdiri(struct gfs2_inode *dip, > if (error) > return error; > > - error = gfs2_change_nlink(ip, -2); > + /* It looks odd, but it really should be done twice */ > + error = gfs2_change_nlink(ip, -1); > + if (error) > + return error; > + > + error = gfs2_change_nlink(ip, -1); > if (error) > return error; > > diff --git a/fs/gfs2/ondisk.c b/fs/gfs2/ondisk.c > index e224f6a..b4e354b 100644 > --- a/fs/gfs2/ondisk.c > +++ b/fs/gfs2/ondisk.c > @@ -164,7 +164,7 @@ void gfs2_dinode_out(const struct gfs2_i > str->di_mode = cpu_to_be32(ip->i_inode.i_mode); > str->di_uid = cpu_to_be32(ip->i_inode.i_uid); > str->di_gid = cpu_to_be32(ip->i_inode.i_gid); > - str->di_nlink = cpu_to_be32(di->di_nlink); > + str->di_nlink = cpu_to_be32(ip->i_inode.i_nlink); > str->di_size = cpu_to_be64(di->di_size); > str->di_blocks = cpu_to_be64(di->di_blocks); > str->di_atime = cpu_to_be64(di->di_atime); > @@ -191,7 +191,6 @@ void gfs2_dinode_print(const struct gfs2 > > gfs2_inum_print(>i_num); > > - pv(di, di_nlink, "%u"); > printk(KERN_INFO " di_size = %llu\n", (unsigned long long)di->di_size); > printk(KERN_INFO " di_blocks = %llu\n", (unsigned long > long)di->di_blocks); > printk(KERN_INFO " di_atime = %lld\n", (long long)di->di_atime); > diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c > index efbcec3..06176de 100644 > --- a/fs/gfs2/ops_inode.c > +++ b/fs/gfs2/ops_inode.c > @@ -169,7 +169,7 @@ static int gfs2_link(struct dentry *old_ > } > > error = -EINVAL; > - if (!dip->i_di.di_nlink) > + if (!dip->i_inode.i_nlink) > goto out_gunlock; > error = -EFBIG; > if (dip->i_di.di_entries == (u32)-1) > @@ -178,10 +178,10 @@ static int gfs2_link(struct dentry *old_ > if (IS_IMMUTABLE(inode) || IS_APPEND(inode)) > goto out_gunlock; > error = -EINVAL; > - if (!ip->i_di.di_nlink) > + if (!ip->i_inode.i_nlink) > goto out_gunlock; > error = -EMLINK; > - if (ip->i_di.di_nlink == (u32)-1) > + if (ip->i_inode.i_nlink == (u32)-1) > goto out_gunlock; > > alloc_required = error = gfs2_diradd_alloc_required(dir, > >d_name); > @@ -386,7 +386,7 @@ static int gfs2_mkdir(struct inode *dir, > > ip = ghs[1].gh_gl->gl_object; > > - ip->i_di.di_nlink = 2; > + ip->i_inode.i_nlink = 2; > ip->i_di.di_size = sdp->sd_sb.sb_bsize - sizeof(struct gfs2_dinode); > ip->i_di.di_flags |= GFS2_DIF_JDATA; > ip->i_di.di_payload_format = GFS2_FORMAT_DE; > @@ -636,7 +636,7 @@ static int gfs2_rename(struct inode *odi > }; > > if (odip != ndip) { > - if (!ndip->i_di.di_nlink) { > + if (!ndip->i_inode.i_nlink) { > error = -EINVAL; > goto out_gunlock; > } > @@ -645,7 +645,7 @@ static int gfs2_rename(struct inode *odi > goto out_gunlock; > } > if (S_ISDIR(ip->i_inode.i_mode) && > - ndip->i_di.di_nlink == (u32)-1) { > + ndip->i_inode.i_nlink == (u32)-1) { > error = -EMLINK; > goto out_gunlock; > } > diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h > index 896c7f8..c61517b 100644 > --- a/include/linux/gfs2_ondisk.h > +++ b/include/linux/gfs2_ondisk.h > @@ -322,7 +322,6 @@ struct gfs2_dinode { > }; > > struct gfs2_dinode_host { > - __u32 di_nlink; /* number of links to this file */ > __u64 di_size; /* number of bytes in file */ > __u64 di_blocks;/* number of blocks in file */ > __u64 di_atime; /* time last accessed */ -- Russell Cattelan <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [GFS2] Fix incorrect fs sync behaviour [2/5]
On Mon, 2006-11-06 at 11:07 +, Steven Whitehouse wrote: > >From 4a221953ed121692aa25998451a57c7f4be8b4f6 Mon Sep 17 00:00:00 2001 > From: Steven Whitehouse <[EMAIL PROTECTED]> > Date: Wed, 1 Nov 2006 09:57:57 -0500 > Subject: [PATCH] [GFS2] Fix incorrect fs sync behaviour. > > This adds a sync_fs superblock operation for GFS2 and removes > the journal flush from write_super in favour of sync_fs where it > ought to be. This is more or less identical to the way in which ext3 > does this. > > This bug was pointed out by Russell Cattelan <[EMAIL PROTECTED]> > > Cc: Russell Cattelan <[EMAIL PROTECTED]> > Signed-off-by: Steven Whitehouse <[EMAIL PROTECTED]> > --- > fs/gfs2/ops_super.c | 44 > 1 files changed, 28 insertions(+), 16 deletions(-) > > diff --git a/fs/gfs2/ops_super.c b/fs/gfs2/ops_super.c > index 06f06f7..b47d959 100644 > --- a/fs/gfs2/ops_super.c > +++ b/fs/gfs2/ops_super.c > @@ -138,16 +138,27 @@ static void gfs2_put_super(struct super_ > } > > /** > - * gfs2_write_super - disk commit all incore transactions > - * @sb: the filesystem > + * gfs2_write_super > + * @sb: the superblock > * > - * This function is called every time sync(2) is called. > - * After this exits, all dirty buffers are synced. > */ > > static void gfs2_write_super(struct super_block *sb) > { > + sb->s_dirt = 0; This is a bit different than my original patch? Are you sure we don't need the s_lock here? > +} > + > +/** > + * gfs2_sync_fs - sync the filesystem > + * @sb: the superblock > + * > + * Flushes the log to disk. > + */ > +static int gfs2_sync_fs(struct super_block *sb, int wait) > +{ > + sb->s_dirt = 0; > gfs2_log_flush(sb->s_fs_info, NULL); > + return 0; > } > > /** > @@ -452,17 +463,18 @@ static void gfs2_destroy_inode(struct in > } > > struct super_operations gfs2_super_ops = { > - .alloc_inode = gfs2_alloc_inode, > - .destroy_inode = gfs2_destroy_inode, > - .write_inode = gfs2_write_inode, > - .delete_inode = gfs2_delete_inode, > - .put_super = gfs2_put_super, > - .write_super = gfs2_write_super, > - .write_super_lockfs = gfs2_write_super_lockfs, > - .unlockfs = gfs2_unlockfs, > - .statfs = gfs2_statfs, > - .remount_fs = gfs2_remount_fs, > - .clear_inode = gfs2_clear_inode, > - .show_options = gfs2_show_options, > + .alloc_inode= gfs2_alloc_inode, > + .destroy_inode = gfs2_destroy_inode, > + .write_inode= gfs2_write_inode, > + .delete_inode = gfs2_delete_inode, > + .put_super = gfs2_put_super, > + .write_super= gfs2_write_super, > + .sync_fs= gfs2_sync_fs, > + .write_super_lockfs = gfs2_write_super_lockfs, > + .unlockfs = gfs2_unlockfs, > + .statfs = gfs2_statfs, > + .remount_fs = gfs2_remount_fs, > + .clear_inode= gfs2_clear_inode, > + .show_options = gfs2_show_options, > }; > -- Russell Cattelan <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GFS2] Fix incorrect fs sync behaviour [2/5]
On Mon, 2006-11-06 at 11:07 +, Steven Whitehouse wrote: From 4a221953ed121692aa25998451a57c7f4be8b4f6 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse [EMAIL PROTECTED] Date: Wed, 1 Nov 2006 09:57:57 -0500 Subject: [PATCH] [GFS2] Fix incorrect fs sync behaviour. This adds a sync_fs superblock operation for GFS2 and removes the journal flush from write_super in favour of sync_fs where it ought to be. This is more or less identical to the way in which ext3 does this. This bug was pointed out by Russell Cattelan [EMAIL PROTECTED] Cc: Russell Cattelan [EMAIL PROTECTED] Signed-off-by: Steven Whitehouse [EMAIL PROTECTED] --- fs/gfs2/ops_super.c | 44 1 files changed, 28 insertions(+), 16 deletions(-) diff --git a/fs/gfs2/ops_super.c b/fs/gfs2/ops_super.c index 06f06f7..b47d959 100644 --- a/fs/gfs2/ops_super.c +++ b/fs/gfs2/ops_super.c @@ -138,16 +138,27 @@ static void gfs2_put_super(struct super_ } /** - * gfs2_write_super - disk commit all incore transactions - * @sb: the filesystem + * gfs2_write_super + * @sb: the superblock * - * This function is called every time sync(2) is called. - * After this exits, all dirty buffers are synced. */ static void gfs2_write_super(struct super_block *sb) { + sb-s_dirt = 0; This is a bit different than my original patch? Are you sure we don't need the s_lock here? +} + +/** + * gfs2_sync_fs - sync the filesystem + * @sb: the superblock + * + * Flushes the log to disk. + */ +static int gfs2_sync_fs(struct super_block *sb, int wait) +{ + sb-s_dirt = 0; gfs2_log_flush(sb-s_fs_info, NULL); + return 0; } /** @@ -452,17 +463,18 @@ static void gfs2_destroy_inode(struct in } struct super_operations gfs2_super_ops = { - .alloc_inode = gfs2_alloc_inode, - .destroy_inode = gfs2_destroy_inode, - .write_inode = gfs2_write_inode, - .delete_inode = gfs2_delete_inode, - .put_super = gfs2_put_super, - .write_super = gfs2_write_super, - .write_super_lockfs = gfs2_write_super_lockfs, - .unlockfs = gfs2_unlockfs, - .statfs = gfs2_statfs, - .remount_fs = gfs2_remount_fs, - .clear_inode = gfs2_clear_inode, - .show_options = gfs2_show_options, + .alloc_inode= gfs2_alloc_inode, + .destroy_inode = gfs2_destroy_inode, + .write_inode= gfs2_write_inode, + .delete_inode = gfs2_delete_inode, + .put_super = gfs2_put_super, + .write_super= gfs2_write_super, + .sync_fs= gfs2_sync_fs, + .write_super_lockfs = gfs2_write_super_lockfs, + .unlockfs = gfs2_unlockfs, + .statfs = gfs2_statfs, + .remount_fs = gfs2_remount_fs, + .clear_inode= gfs2_clear_inode, + .show_options = gfs2_show_options, }; -- Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GFS2] Shrink gfs2_inode (5) - di_nlink [25/70]
; + + error = gfs2_change_nlink(ip, -1); if (error) return error; diff --git a/fs/gfs2/ondisk.c b/fs/gfs2/ondisk.c index e224f6a..b4e354b 100644 --- a/fs/gfs2/ondisk.c +++ b/fs/gfs2/ondisk.c @@ -164,7 +164,7 @@ void gfs2_dinode_out(const struct gfs2_i str-di_mode = cpu_to_be32(ip-i_inode.i_mode); str-di_uid = cpu_to_be32(ip-i_inode.i_uid); str-di_gid = cpu_to_be32(ip-i_inode.i_gid); - str-di_nlink = cpu_to_be32(di-di_nlink); + str-di_nlink = cpu_to_be32(ip-i_inode.i_nlink); str-di_size = cpu_to_be64(di-di_size); str-di_blocks = cpu_to_be64(di-di_blocks); str-di_atime = cpu_to_be64(di-di_atime); @@ -191,7 +191,6 @@ void gfs2_dinode_print(const struct gfs2 gfs2_inum_print(ip-i_num); - pv(di, di_nlink, %u); printk(KERN_INFO di_size = %llu\n, (unsigned long long)di-di_size); printk(KERN_INFO di_blocks = %llu\n, (unsigned long long)di-di_blocks); printk(KERN_INFO di_atime = %lld\n, (long long)di-di_atime); diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c index efbcec3..06176de 100644 --- a/fs/gfs2/ops_inode.c +++ b/fs/gfs2/ops_inode.c @@ -169,7 +169,7 @@ static int gfs2_link(struct dentry *old_ } error = -EINVAL; - if (!dip-i_di.di_nlink) + if (!dip-i_inode.i_nlink) goto out_gunlock; error = -EFBIG; if (dip-i_di.di_entries == (u32)-1) @@ -178,10 +178,10 @@ static int gfs2_link(struct dentry *old_ if (IS_IMMUTABLE(inode) || IS_APPEND(inode)) goto out_gunlock; error = -EINVAL; - if (!ip-i_di.di_nlink) + if (!ip-i_inode.i_nlink) goto out_gunlock; error = -EMLINK; - if (ip-i_di.di_nlink == (u32)-1) + if (ip-i_inode.i_nlink == (u32)-1) goto out_gunlock; alloc_required = error = gfs2_diradd_alloc_required(dir, dentry-d_name); @@ -386,7 +386,7 @@ static int gfs2_mkdir(struct inode *dir, ip = ghs[1].gh_gl-gl_object; - ip-i_di.di_nlink = 2; + ip-i_inode.i_nlink = 2; ip-i_di.di_size = sdp-sd_sb.sb_bsize - sizeof(struct gfs2_dinode); ip-i_di.di_flags |= GFS2_DIF_JDATA; ip-i_di.di_payload_format = GFS2_FORMAT_DE; @@ -636,7 +636,7 @@ static int gfs2_rename(struct inode *odi }; if (odip != ndip) { - if (!ndip-i_di.di_nlink) { + if (!ndip-i_inode.i_nlink) { error = -EINVAL; goto out_gunlock; } @@ -645,7 +645,7 @@ static int gfs2_rename(struct inode *odi goto out_gunlock; } if (S_ISDIR(ip-i_inode.i_mode) - ndip-i_di.di_nlink == (u32)-1) { + ndip-i_inode.i_nlink == (u32)-1) { error = -EMLINK; goto out_gunlock; } diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h index 896c7f8..c61517b 100644 --- a/include/linux/gfs2_ondisk.h +++ b/include/linux/gfs2_ondisk.h @@ -322,7 +322,6 @@ struct gfs2_dinode { }; struct gfs2_dinode_host { - __u32 di_nlink; /* number of links to this file */ __u64 di_size; /* number of bytes in file */ __u64 di_blocks;/* number of blocks in file */ __u64 di_atime; /* time last accessed */ -- Russell Cattelan [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: [GFS2] Shrink gfs2_inode (8) - i_vn [28/70]
On Thu, 2006-11-30 at 12:16 +, Steven Whitehouse wrote: From bfded27ba010d1c3b0aa3843f97dc9b80de751be Mon Sep 17 00:00:00 2001 From: Steven Whitehouse [EMAIL PROTECTED] Date: Wed, 1 Nov 2006 16:05:38 -0500 Subject: [PATCH] [GFS2] Shrink gfs2_inode (8) - i_vn This shrinks the size of the gfs2_inode by 8 bytes by replacing the version counter with a one bit valid/invalid flag. What is the version number used for? It seems like anything that was specifically carving a 64 container has a more specific reason that just ON/OFF? Signed-off-by: Steven Whitehouse [EMAIL PROTECTED] --- fs/gfs2/glops.c |5 +++-- fs/gfs2/incore.h|2 +- fs/gfs2/inode.c |4 ++-- fs/gfs2/ops_inode.c |2 +- 4 files changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c index aad45b7..9c20337 100644 --- a/fs/gfs2/glops.c +++ b/fs/gfs2/glops.c @@ -305,8 +305,9 @@ static void inode_go_inval(struct gfs2_g int data = (flags DIO_DATA); if (meta) { + struct gfs2_inode *ip = gl-gl_object; gfs2_meta_inval(gl); - gl-gl_vn++; + set_bit(GIF_INVALID, ip-i_flags); } if (data) gfs2_page_inval(gl); @@ -351,7 +352,7 @@ static int inode_go_lock(struct gfs2_hol if (!ip) return 0; - if (ip-i_vn != gl-gl_vn) { + if (test_bit(GIF_INVALID, ip-i_flags)) { error = gfs2_inode_refresh(ip); if (error) return error; diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index c0a8c3b..227a74d 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -217,6 +217,7 @@ struct gfs2_alloc { }; enum { + GIF_INVALID = 0, GIF_QD_LOCKED = 1, GIF_PAGED = 2, GIF_SW_PAGED= 3, @@ -228,7 +229,6 @@ struct gfs2_inode { unsigned long i_flags; /* GIF_... */ - u64 i_vn; struct gfs2_dinode_host i_di; /* To be replaced by ref to block */ struct gfs2_glock *i_gl; /* Move into i_gh? */ diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index f6177fc..e467780 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -145,7 +145,7 @@ struct inode *gfs2_inode_lookup(struct s if (unlikely(error)) goto fail_put; - ip-i_vn = ip-i_gl-gl_vn - 1; + set_bit(GIF_INVALID, ip-i_flags); error = gfs2_glock_nq_init(io_gl, LM_ST_SHARED, GL_EXACT, ip-i_iopen_gh); if (unlikely(error)) goto fail_iopen; @@ -242,7 +242,7 @@ int gfs2_inode_refresh(struct gfs2_inode error = gfs2_dinode_in(ip, dibh-b_data); brelse(dibh); - ip-i_vn = ip-i_gl-gl_vn; + clear_bit(GIF_INVALID, ip-i_flags); return error; } diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c index 0e4eade..b247f25 100644 --- a/fs/gfs2/ops_inode.c +++ b/fs/gfs2/ops_inode.c @@ -844,7 +844,7 @@ static int gfs2_permission(struct inode struct gfs2_holder i_gh; int error; - if (ip-i_vn == ip-i_gl-gl_vn) + if (!test_bit(GIF_INVALID, ip-i_flags)) return generic_permission(inode, mask, gfs2_check_acl); error = gfs2_glock_nq_init(ip-i_gl, LM_ST_SHARED, LM_FLAG_ANY, i_gh); -- Russell Cattelan [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: [GFS2] Remove unused function from inode.c [50/70]
, LM_ST_SHARED, LM_FLAG_TRY_1CB|GL_ATIME, gh); do_unlock = 1; - ret = gfs2_glock_nq_m_atime(1, gh); + ret = gfs2_glock_nq_atime(gh); if (ret == GLR_TRYFAILED) goto out_noerror; if (unlikely(ret)) @@ -354,7 +354,7 @@ static int gfs2_prepare_write(struct fil gfs2_holder_init(ip-i_gl, LM_ST_EXCLUSIVE, GL_ATIME|LM_FLAG_TRY_1CB, ip-i_gh); - error = gfs2_glock_nq_m_atime(1, ip-i_gh); + error = gfs2_glock_nq_atime(ip-i_gh); if (unlikely(error)) { if (error == GLR_TRYFAILED) error = AOP_TRUNCATED_PAGE; @@ -609,7 +609,7 @@ static ssize_t gfs2_direct_IO(int rw, st * on this path. All we need change is atime. */ gfs2_holder_init(ip-i_gl, LM_ST_SHARED, GL_ATIME, gh); - rv = gfs2_glock_nq_m_atime(1, gh); + rv = gfs2_glock_nq_atime(gh); if (rv) goto out; diff --git a/fs/gfs2/ops_file.c b/fs/gfs2/ops_file.c index eabf6c6..c2be216 100644 --- a/fs/gfs2/ops_file.c +++ b/fs/gfs2/ops_file.c @@ -253,7 +253,7 @@ static int gfs2_get_flags(struct file *f u32 fsflags; gfs2_holder_init(ip-i_gl, LM_ST_SHARED, GL_ATIME, gh); - error = gfs2_glock_nq_m_atime(1, gh); + error = gfs2_glock_nq_atime(gh); if (error) return error; -- Russell Cattelan [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: [GFS2 DLM] Guide to -nmw tree patches
On Thu, 2006-11-30 at 13:21 +0100, Arjan van de Ven wrote: On Thu, 2006-11-30 at 12:12 +, Steven Whitehouse wrote: Hi, Below is a summary diffstat of all the changes in the GFS2 DLM -nmw (next merge window) git tree. Since merge time is once again upon us, the following patches are the current complete content of the tree. Hi, can you please make sure your patch series is in reply to your first message, so that your mails properly thread in mail clients? (everyone else seems to manage this, I bet the git or quilt patch series send stuff has this built in somehow..) I agree. I would also continue to urge for internal reviews (since this is the first I have seen any of these changes) before spamming lkml with 70 loosely related patches. Greetings, Arjan van de Ven - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Russell Cattelan [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: [GFS2] Simplify glops functions [53/70]
); - gfs2_page_wait(gl); - } - if (flags DIO_RELEASE) - gfs2_ail_empty_gl(gl); + gfs2_page_writeback(gl); + gfs2_log_flush(gl-gl_sbd, gl); + gfs2_meta_sync(gl); + gfs2_page_wait(gl); + clear_bit(GLF_DIRTY, gl-gl_flags); + gfs2_ail_empty_gl(gl); } } @@ -302,15 +284,13 @@ static void inode_go_sync(struct gfs2_gl static void inode_go_inval(struct gfs2_glock *gl, int flags) { int meta = (flags DIO_METADATA); - int data = (flags DIO_DATA); if (meta) { struct gfs2_inode *ip = gl-gl_object; gfs2_meta_inval(gl); set_bit(GIF_INVALID, ip-i_flags); } - if (data) - gfs2_page_inval(gl); + gfs2_page_inval(gl); } /** @@ -494,7 +474,7 @@ static void trans_go_xmote_bh(struct gfs if (gl-gl_state != LM_ST_UNLOCKED test_bit(SDF_JOURNAL_LIVE, sdp-sd_flags)) { gfs2_meta_cache_flush(GFS2_I(sdp-sd_jdesc-jd_inode)); - j_gl-gl_ops-go_inval(j_gl, DIO_METADATA | DIO_DATA); + j_gl-gl_ops-go_inval(j_gl, DIO_METADATA); error = gfs2_find_jhead(sdp-sd_jdesc, head); if (error) diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index 227a74d..734421e 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -14,8 +14,6 @@ #include linux/fs.h #define DIO_WAIT 0x0010 #define DIO_METADATA 0x0020 -#define DIO_DATA 0x0040 -#define DIO_RELEASE 0x0080 #define DIO_ALL 0x0100 struct gfs2_log_operations; @@ -103,18 +101,17 @@ struct gfs2_bufdata { }; struct gfs2_glock_operations { - void (*go_xmote_th) (struct gfs2_glock * gl, unsigned int state, - int flags); - void (*go_xmote_bh) (struct gfs2_glock * gl); - void (*go_drop_th) (struct gfs2_glock * gl); - void (*go_drop_bh) (struct gfs2_glock * gl); - void (*go_sync) (struct gfs2_glock * gl, int flags); - void (*go_inval) (struct gfs2_glock * gl, int flags); - int (*go_demote_ok) (struct gfs2_glock * gl); - int (*go_lock) (struct gfs2_holder * gh); - void (*go_unlock) (struct gfs2_holder * gh); - void (*go_callback) (struct gfs2_glock * gl, unsigned int state); - void (*go_greedy) (struct gfs2_glock * gl); + void (*go_xmote_th) (struct gfs2_glock *gl, unsigned int state, int flags); + void (*go_xmote_bh) (struct gfs2_glock *gl); + void (*go_drop_th) (struct gfs2_glock *gl); + void (*go_drop_bh) (struct gfs2_glock *gl); + void (*go_sync) (struct gfs2_glock *gl); + void (*go_inval) (struct gfs2_glock *gl, int flags); + int (*go_demote_ok) (struct gfs2_glock *gl); + int (*go_lock) (struct gfs2_holder *gh); + void (*go_unlock) (struct gfs2_holder *gh); + void (*go_callback) (struct gfs2_glock *gl, unsigned int state); + void (*go_greedy) (struct gfs2_glock *gl); const int go_type; }; diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 0ef8317..1408c5f 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -517,7 +517,7 @@ int gfs2_make_fs_rw(struct gfs2_sbd *sdp return error; gfs2_meta_cache_flush(ip); - j_gl-gl_ops-go_inval(j_gl, DIO_METADATA | DIO_DATA); + j_gl-gl_ops-go_inval(j_gl, DIO_METADATA); error = gfs2_find_jhead(sdp-sd_jdesc, head); if (error) -- Russell Cattelan [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: [GFS2] Reduce number of arguments to meta_io.c:getbuf() [58/70]
of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Russell Cattelan [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: [GFS2] Fix journal flush problem [56/70]
; + BUG_ON(!gl); + BUG_ON(!sdp); + spin_lock(ip-i_spin); if (*bh_slot (*bh_slot)-b_blocknr == num) { bh = *bh_slot; diff --git a/fs/gfs2/ops_super.c b/fs/gfs2/ops_super.c index 8635175..7685b46 100644 --- a/fs/gfs2/ops_super.c +++ b/fs/gfs2/ops_super.c @@ -157,7 +157,8 @@ static void gfs2_write_super(struct supe static int gfs2_sync_fs(struct super_block *sb, int wait) { sb-s_dirt = 0; - gfs2_log_flush(sb-s_fs_info, NULL); + if (wait) + gfs2_log_flush(sb-s_fs_info, NULL); return 0; } @@ -293,8 +294,6 @@ static void gfs2_clear_inode(struct inod */ if (inode-i_private) { struct gfs2_inode *ip = GFS2_I(inode); - gfs2_glock_inode_squish(inode); - gfs2_assert(inode-i_sb-s_fs_info, ip-i_gl-gl_state == LM_ST_UNLOCKED); ip-i_gl-gl_object = NULL; gfs2_glock_schedule_for_reclaim(ip-i_gl); gfs2_glock_put(ip-i_gl); @@ -395,7 +394,7 @@ static void gfs2_delete_inode(struct ino if (!inode-i_private) goto out; - error = gfs2_glock_nq_init(ip-i_gl, LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB | GL_NOCACHE, gh); + error = gfs2_glock_nq_init(ip-i_gl, LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB, gh); if (unlikely(error)) { gfs2_glock_dq_uninit(ip-i_iopen_gh); goto out; -- Russell Cattelan [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: [GFS2] Change argument to gfs2_dinode_in [18/70]
On Thu, 2006-11-30 at 12:15 +, Steven Whitehouse wrote: From 891ea14712da68e282de8583e5fa14f0d3f3731e Mon Sep 17 00:00:00 2001 From: Steven Whitehouse [EMAIL PROTECTED] Date: Tue, 31 Oct 2006 15:22:10 -0500 Subject: [PATCH] [GFS2] Change argument to gfs2_dinode_in This is a preliminary patch to enable the removal of fields in gfs2_dinode_host which are duplicated in struct inode. Deferred till the complete cleanup work is done? Signed-off-by: Steven Whitehouse [EMAIL PROTECTED] --- fs/gfs2/inode.c |2 +- fs/gfs2/ondisk.c|4 ++-- include/linux/gfs2_ondisk.h |2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index b861ddb..9875e93 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -229,7 +229,7 @@ int gfs2_inode_refresh(struct gfs2_inode return -EIO; } - gfs2_dinode_in(ip-i_di, dibh-b_data); + gfs2_dinode_in(ip, dibh-b_data); brelse(dibh); diff --git a/fs/gfs2/ondisk.c b/fs/gfs2/ondisk.c index 2c50fa0..edf8756 100644 --- a/fs/gfs2/ondisk.c +++ b/fs/gfs2/ondisk.c @@ -155,8 +155,9 @@ void gfs2_quota_in(struct gfs2_quota_hos qu-qu_value = be64_to_cpu(str-qu_value); } -void gfs2_dinode_in(struct gfs2_dinode_host *di, const void *buf) +void gfs2_dinode_in(struct gfs2_inode *ip, const void *buf) { + struct gfs2_dinode_host *di = ip-i_di; const struct gfs2_dinode *str = buf; gfs2_meta_header_in(di-di_header, buf); @@ -186,7 +187,6 @@ void gfs2_dinode_in(struct gfs2_dinode_h di-di_entries = be32_to_cpu(str-di_entries); di-di_eattr = be64_to_cpu(str-di_eattr); - } void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf) diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h index 550effa..08d8240 100644 --- a/include/linux/gfs2_ondisk.h +++ b/include/linux/gfs2_ondisk.h @@ -534,8 +534,8 @@ extern void gfs2_rindex_out(const struct extern void gfs2_rgrp_in(struct gfs2_rgrp_host *rg, const void *buf); extern void gfs2_rgrp_out(const struct gfs2_rgrp_host *rg, void *buf); extern void gfs2_quota_in(struct gfs2_quota_host *qu, const void *buf); -extern void gfs2_dinode_in(struct gfs2_dinode_host *di, const void *buf); struct gfs2_inode; +extern void gfs2_dinode_in(struct gfs2_inode *ip, const void *buf); extern void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf); extern void gfs2_ea_header_in(struct gfs2_ea_header *ea, const void *buf); extern void gfs2_ea_header_out(const struct gfs2_ea_header *ea, void *buf); -- Russell Cattelan [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: [GFS2] Tidy up bmap fix boundary bug [48/70]
); + set_buffer_boundary(bh_map); if (new) { struct buffer_head *dibh; error = gfs2_meta_inode_buffer(ip, dibh); @@ -510,8 +531,8 @@ static int gfs2_block_pointers(struct in while(--maxlen !buffer_boundary(bh_map)) { u64 eblock; - mp-mp_list[end_of_metadata]++; - boundary = lookup_block(ip, bh, end_of_metadata, mp, 0, new, eblock); + mp.mp_list[end_of_metadata]++; + boundary = lookup_block(ip, bh, end_of_metadata, mp, 0, new, eblock); if (eblock != ++dblock) break; bh_map-b_size += (1 inode-i_blkbits); @@ -521,43 +542,15 @@ static int gfs2_block_pointers(struct in } out_brelse: brelse(bh); - return 0; -} - - -static inline void bmap_lock(struct inode *inode, int create) -{ - struct gfs2_inode *ip = GFS2_I(inode); - if (create) - down_write(ip-i_rw_mutex); - else - down_read(ip-i_rw_mutex); -} - -static inline void bmap_unlock(struct inode *inode, int create) -{ - struct gfs2_inode *ip = GFS2_I(inode); - if (create) - up_write(ip-i_rw_mutex); - else - up_read(ip-i_rw_mutex); -} - -int gfs2_block_map(struct inode *inode, u64 lblock, int create, -struct buffer_head *bh) -{ - struct metapath mp; - int ret; - - bmap_lock(inode, create); - ret = gfs2_block_pointers(inode, lblock, create, bh, mp); +out_ok: + error = 0; +out_fail: bmap_unlock(inode, create); - return ret; + return error; } int gfs2_extent_map(struct inode *inode, u64 lblock, int *new, u64 *dblock, unsigned *extlen) { - struct metapath mp; struct buffer_head bh = { .b_state = 0, .b_blocknr = 0 }; int ret; int create = *new; @@ -567,9 +560,7 @@ int gfs2_extent_map(struct inode *inode, BUG_ON(!new); bh.b_size = 1 (inode-i_blkbits + 5); - bmap_lock(inode, create); - ret = gfs2_block_pointers(inode, lblock, create, bh, mp); - bmap_unlock(inode, create); + ret = gfs2_block_map(inode, lblock, create, bh); *extlen = bh.b_size inode-i_blkbits; *dblock = bh.b_blocknr; if (buffer_new(bh)) -- Russell Cattelan [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: [GFS2] Fix page lock/glock deadlock [32/70]
On Thu, 2006-11-30 at 12:17 +, Steven Whitehouse wrote: From 2ca99501fa5422e84f18333918a503433449e2b5 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse [EMAIL PROTECTED] Date: Wed, 8 Nov 2006 10:26:54 -0500 Subject: [PATCH] [GFS2] Fix page lock/glock deadlock This fixes a race between the glock and the page lock encountered during truncate in gfs2_readpage and gfs2_prepare_write. The gfs2_readpages function doesn't need the same fix since it only uses a try lock anyway, so it will fail back to gfs2_readpage in the case of a potential deadlock. This bug was spotted by Russell Cattelan.\ This bug was fixed correctly by Russell Cattelan and this is an incomplete version of my patch. As I keep trying to point out readpages is still wrong in that the stuffed case is not correct and results in a panic if a file is being truncated down. If nr_pages is calculated prior a truncate the stuffed inode case will try to operate on pages that are no longer there. The correct fix is to not try and handled the stuffed inode case at all in readpages and simply return AOP_TRUNCATED_PAGE and let things things fall back to readpage. Cc: Russell Cattelan [EMAIL PROTECTED] Signed-off-by: Steven Whitehouse [EMAIL PROTECTED] --- fs/gfs2/glock.h |1 - fs/gfs2/ops_address.c | 13 + 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h index b985627..a331bf8 100644 --- a/fs/gfs2/glock.h +++ b/fs/gfs2/glock.h @@ -27,7 +27,6 @@ #define GL_SKIP 0x0100 #define GL_ATIME 0x0200 #define GL_NOCACHE 0x0400 #define GL_NOCANCEL 0x1000 -#define GL_AOP 0x4000 #define GLR_TRYFAILED13 #define GLR_CANCELED 14 diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c index 5c3962c..3822189 100644 --- a/fs/gfs2/ops_address.c +++ b/fs/gfs2/ops_address.c @@ -230,7 +230,7 @@ static int gfs2_readpage(struct file *fi /* gfs2_sharewrite_nopage has grabbed the ip-i_gl already */ goto skip_lock; } - gfs2_holder_init(ip-i_gl, LM_ST_SHARED, GL_ATIME|GL_AOP, gh); + gfs2_holder_init(ip-i_gl, LM_ST_SHARED, GL_ATIME|LM_FLAG_TRY_1CB, gh); do_unlock = 1; error = gfs2_glock_nq_m_atime(1, gh); if (unlikely(error)) @@ -254,6 +254,8 @@ skip_lock: out: return error; out_unlock: + if (error == GLR_TRYFAILED) + error = AOP_TRUNCATED_PAGE; unlock_page(page); if (do_unlock) gfs2_holder_uninit(gh); @@ -293,7 +295,7 @@ static int gfs2_readpages(struct file *f goto skip_lock; } gfs2_holder_init(ip-i_gl, LM_ST_SHARED, - LM_FLAG_TRY_1CB|GL_ATIME|GL_AOP, gh); + LM_FLAG_TRY_1CB|GL_ATIME, gh); do_unlock = 1; ret = gfs2_glock_nq_m_atime(1, gh); if (ret == GLR_TRYFAILED) @@ -366,10 +368,13 @@ static int gfs2_prepare_write(struct fil unsigned int write_len = to - from; - gfs2_holder_init(ip-i_gl, LM_ST_EXCLUSIVE, GL_ATIME|GL_AOP, ip-i_gh); + gfs2_holder_init(ip-i_gl, LM_ST_EXCLUSIVE, GL_ATIME|LM_FLAG_TRY_1CB, ip-i_gh); error = gfs2_glock_nq_m_atime(1, ip-i_gh); - if (error) + if (unlikely(error)) { + if (error == GLR_TRYFAILED) + error = AOP_TRUNCATED_PAGE; goto out_uninit; + } gfs2_write_calc_reserv(ip, write_len, data_blocks, ind_blocks); -- Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GFS2] Change argument of gfs2_dinode_out [17/70]
; gfs2_trans_add_bh(ip-i_gl, dibh, 1); - gfs2_dinode_out(ip-i_di, dibh-b_data); + gfs2_dinode_out(ip, dibh-b_data); brelse(dibh); mark_inode_dirty(ip-i_inode); @@ -792,7 +792,7 @@ static int link_dinode(struct gfs2_inode goto fail_end_trans; ip-i_di.di_nlink = 1; gfs2_trans_add_bh(ip-i_gl, dibh, 1); - gfs2_dinode_out(ip-i_di, dibh-b_data); + gfs2_dinode_out(ip, dibh-b_data); brelse(dibh); return 0; @@ -1349,7 +1349,7 @@ __gfs2_setattr_simple(struct gfs2_inode gfs2_inode_attr_out(ip); gfs2_trans_add_bh(ip-i_gl, dibh, 1); - gfs2_dinode_out(ip-i_di, dibh-b_data); + gfs2_dinode_out(ip, dibh-b_data); brelse(dibh); } return error; diff --git a/fs/gfs2/ondisk.c b/fs/gfs2/ondisk.c index 2d1682d..2c50fa0 100644 --- a/fs/gfs2/ondisk.c +++ b/fs/gfs2/ondisk.c @@ -15,6 +15,8 @@ #include linux/buffer_head.h #include gfs2.h #include linux/gfs2_ondisk.h +#include linux/lm_interface.h +#include incore.h #define pv(struct, member, fmt) printk(KERN_INFO #member = fmt\n, \ struct-member); @@ -187,8 +189,9 @@ void gfs2_dinode_in(struct gfs2_dinode_h } -void gfs2_dinode_out(const struct gfs2_dinode_host *di, void *buf) +void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf) { + const struct gfs2_dinode_host *di = ip-i_di; struct gfs2_dinode *str = buf; gfs2_meta_header_out(di-di_header, buf); diff --git a/fs/gfs2/ops_file.c b/fs/gfs2/ops_file.c index 2fc8868..7ea4175 100644 --- a/fs/gfs2/ops_file.c +++ b/fs/gfs2/ops_file.c @@ -336,7 +336,7 @@ static int do_gfs2_set_flags(struct file goto out_trans_end; gfs2_trans_add_bh(ip-i_gl, bh, 1); ip-i_di.di_flags = new_flags; - gfs2_dinode_out(ip-i_di, bh-b_data); + gfs2_dinode_out(ip, bh-b_data); brelse(bh); out_trans_end: gfs2_trans_end(sdp); diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c index ef6e5ed..bd26885 100644 --- a/fs/gfs2/ops_inode.c +++ b/fs/gfs2/ops_inode.c @@ -339,7 +339,7 @@ static int gfs2_symlink(struct inode *di error = gfs2_meta_inode_buffer(ip, dibh); if (!gfs2_assert_withdraw(sdp, !error)) { - gfs2_dinode_out(ip-i_di, dibh-b_data); + gfs2_dinode_out(ip, dibh-b_data); memcpy(dibh-b_data + sizeof(struct gfs2_dinode), symname, size); brelse(dibh); @@ -414,7 +414,7 @@ static int gfs2_mkdir(struct inode *dir, gfs2_inum_out(dip-i_num, dent-de_inum); dent-de_type = cpu_to_be16(DT_DIR); - gfs2_dinode_out(ip-i_di, di); + gfs2_dinode_out(ip, di); brelse(dibh); } @@ -541,7 +541,7 @@ static int gfs2_mknod(struct inode *dir, error = gfs2_meta_inode_buffer(ip, dibh); if (!gfs2_assert_withdraw(sdp, !error)) { - gfs2_dinode_out(ip-i_di, dibh-b_data); + gfs2_dinode_out(ip, dibh-b_data); brelse(dibh); } @@ -762,7 +762,7 @@ static int gfs2_rename(struct inode *odi goto out_end_trans; ip-i_di.di_ctime = get_seconds(); gfs2_trans_add_bh(ip-i_gl, dibh, 1); - gfs2_dinode_out(ip-i_di, dibh-b_data); + gfs2_dinode_out(ip, dibh-b_data); brelse(dibh); } @@ -949,7 +949,7 @@ static int setattr_chown(struct inode *i gfs2_inode_attr_out(ip); gfs2_trans_add_bh(ip-i_gl, dibh, 1); - gfs2_dinode_out(ip-i_di, dibh-b_data); + gfs2_dinode_out(ip, dibh-b_data); brelse(dibh); if (ouid != NO_QUOTA_CHANGE || ogid != NO_QUOTA_CHANGE) { diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h index 10a507d..550effa 100644 --- a/include/linux/gfs2_ondisk.h +++ b/include/linux/gfs2_ondisk.h @@ -535,7 +535,8 @@ extern void gfs2_rgrp_in(struct gfs2_rgr extern void gfs2_rgrp_out(const struct gfs2_rgrp_host *rg, void *buf); extern void gfs2_quota_in(struct gfs2_quota_host *qu, const void *buf); extern void gfs2_dinode_in(struct gfs2_dinode_host *di, const void *buf); -extern void gfs2_dinode_out(const struct gfs2_dinode_host *di, void *buf); +struct gfs2_inode; +extern void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf); extern void gfs2_ea_header_in(struct gfs2_ea_header *ea, const void *buf); extern void gfs2_ea_header_out(const struct gfs2_ea_header *ea, void *buf); extern void gfs2_log_header_in(struct gfs2_log_header_host *lh, const void *buf); -- Russell Cattelan [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: [GFS2] Move gfs2_meta_syncfs() into log.c [57/70]
On Thu, 2006-11-30 at 12:22 +, Steven Whitehouse wrote: From a25311c8e0b7071b129ca9a9e49e22eeaf620864 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse [EMAIL PROTECTED] Date: Thu, 23 Nov 2006 11:06:35 -0500 Subject: [PATCH] [GFS2] Move gfs2_meta_syncfs() into log.c By moving gfs2_meta_syncfs() into log.c, gfs2_ail1_start() can be made static. While this is not a incorrect change as it stands. This kind of pointless code curn is making it harder to stabilize GFS2. Signed-off-by: Steven Whitehouse [EMAIL PROTECTED] --- fs/gfs2/log.c | 21 - fs/gfs2/log.h |2 +- fs/gfs2/meta_io.c | 17 - fs/gfs2/meta_io.h |1 - 4 files changed, 21 insertions(+), 20 deletions(-) diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index 6456fc3..7713d59 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -15,6 +15,7 @@ #include linux/buffer_head.h #include linux/gfs2_ondisk.h #include linux/crc32.h #include linux/lm_interface.h +#include linux/delay.h #include gfs2.h #include incore.h @@ -142,7 +143,7 @@ static int gfs2_ail1_empty_one(struct gf return list_empty(ai-ai_ail1_list); } -void gfs2_ail1_start(struct gfs2_sbd *sdp, int flags) +static void gfs2_ail1_start(struct gfs2_sbd *sdp, int flags) { struct list_head *head = sdp-sd_ail1_list; u64 sync_gen; @@ -689,3 +690,21 @@ void gfs2_log_shutdown(struct gfs2_sbd * up_write(sdp-sd_log_flush_lock); } + +/** + * gfs2_meta_syncfs - sync all the buffers in a filesystem + * @sdp: the filesystem + * + */ + +void gfs2_meta_syncfs(struct gfs2_sbd *sdp) +{ + gfs2_log_flush(sdp, NULL); + for (;;) { + gfs2_ail1_start(sdp, DIO_ALL); + if (gfs2_ail1_empty(sdp, DIO_ALL)) + break; + msleep(10); + } +} + diff --git a/fs/gfs2/log.h b/fs/gfs2/log.h index 7f5737d..8e7aa0f 100644 --- a/fs/gfs2/log.h +++ b/fs/gfs2/log.h @@ -48,7 +48,6 @@ static inline void gfs2_log_pointers_ini unsigned int gfs2_struct2blk(struct gfs2_sbd *sdp, unsigned int nstruct, unsigned int ssize); -void gfs2_ail1_start(struct gfs2_sbd *sdp, int flags); int gfs2_ail1_empty(struct gfs2_sbd *sdp, int flags); int gfs2_log_reserve(struct gfs2_sbd *sdp, unsigned int blks); @@ -61,5 +60,6 @@ void gfs2_log_flush(struct gfs2_sbd *sdp void gfs2_log_commit(struct gfs2_sbd *sdp, struct gfs2_trans *trans); void gfs2_log_shutdown(struct gfs2_sbd *sdp); +void gfs2_meta_syncfs(struct gfs2_sbd *sdp); #endif /* __LOG_DOT_H__ */ diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c index 939a09f..fbeba81 100644 --- a/fs/gfs2/meta_io.c +++ b/fs/gfs2/meta_io.c @@ -574,20 +574,3 @@ out: return first_bh; } -/** - * gfs2_meta_syncfs - sync all the buffers in a filesystem - * @sdp: the filesystem - * - */ - -void gfs2_meta_syncfs(struct gfs2_sbd *sdp) -{ - gfs2_log_flush(sdp, NULL); - for (;;) { - gfs2_ail1_start(sdp, DIO_ALL); - if (gfs2_ail1_empty(sdp, DIO_ALL)) - break; - msleep(10); - } -} - diff --git a/fs/gfs2/meta_io.h b/fs/gfs2/meta_io.h index 3ec939e..e037425 100644 --- a/fs/gfs2/meta_io.h +++ b/fs/gfs2/meta_io.h @@ -67,7 +67,6 @@ static inline int gfs2_meta_inode_buffer } struct buffer_head *gfs2_meta_ra(struct gfs2_glock *gl, u64 dblock, u32 extlen); -void gfs2_meta_syncfs(struct gfs2_sbd *sdp); #define buffer_busy(bh) \ ((bh)-b_state ((1ul BH_Dirty) | (1ul BH_Lock) | (1ul BH_Pinned))) -- Russell Cattelan [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: [Cluster-devel] Re: [GFS2] Change argument of gfs2_dinode_out [17/70]
On Fri, 2006-12-01 at 19:25 +, Al Viro wrote: On Fri, Dec 01, 2006 at 01:19:04PM -0600, Russell Cattelan wrote: On Thu, 2006-11-30 at 12:15 +, Steven Whitehouse wrote: From 539e5d6b7ae8612c0393fe940d2da5b591318d3d Mon Sep 17 00:00:00 2001 From: Steven Whitehouse [EMAIL PROTECTED] Date: Tue, 31 Oct 2006 15:07:05 -0500 Subject: [PATCH] [GFS2] Change argument of gfs2_dinode_out Everywhere this was called, a struct gfs2_inode was available, but despite that, it was always called with a struct gfs2_dinode as an argument. By making this change it paves the way to start eliminating fields duplicated between the kernel's struct inode and the struct gfs2_dinode. More pointless code churn. This only makes sense once the file system is working and we have time to do this type of cleanup on against a stable and TESTED code base. Bzzert. Cleaner code is easier to _get_ stable. Keep it ucking fugly until everyone stops looking at it out of sheer disgust is a bad idea.' code clean up are not without risk and with no regression test suite to verify that a cleanup has not broken something. Cleanups are very much a hindrance to stabilization. With no know working points in a code history it becomes difficult to bisect changes and figure out when bugs were introduced Especially when cleanups are mixed in with bug fixes. Pretty code does not equal correct code. -- Russell Cattelan [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: [Cluster-devel] Re: [GFS2] Change argument of gfs2_dinode_out [17/70]
On Fri, 2006-12-01 at 21:08 +, Al Viro wrote: On Fri, Dec 01, 2006 at 02:52:11PM -0600, Russell Cattelan wrote: code clean up are not without risk and with no regression test suite to verify that a cleanup has not broken something. Cleanups are very much a hindrance to stabilization. With no know working points in a code history it becomes difficult to bisect changes and figure out when bugs were introduced Especially when cleanups are mixed in with bug fixes. Pretty code does not equal correct code. No, but convoluted and unreadable code ends up being crappier due to lack of review. And that's aside of the memory footprint, likeliness of bugs introduced by code modifications (having in-core and on-disk data structures with different contents and the same C type = trouble that won't be caught by compiler), etc. Nothing makes up for the complete lack of GFS2 testing. reviewed code does not equal correct code either. Honestly tell me what test suite do you run on GFS2? Sure is it possible to make an educated guess that some cleanups will not destabilize the code. Indeed the stuff you have done is quite useful to ensure that endian bugs are being caught by the compiler/sparse. But no amount of it shouldn't break anything assertions can replace testing. But there is a large quantity of the 70 or so patches that were sent out were to enable future cleanup's. Putting in partial cleanups do nothing core code readability and I many cases is more confusing. Unless you meticulously keep up with the partial cleanups looking at the code is now a jumbled mess of inconsistencies. gfs2 is supposed to be stabilized and use-able for the up coming rhel5 release, not pretty up for somebody to print out and hang on their wall. -- Russell Cattelan [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: [xfs-masters] Re: [PATCH] pull XFS support out of Kconfig submenu
On Thu, 2005-08-18 at 06:53 -0700, Chris Wedgwood wrote: > On Wed, Aug 17, 2005 at 10:45:48PM +0200, Jesper Juhl wrote: > > > It seems slightly odd to me that XFS support should be in a separate > > submenu, when all the other filesystems are not using submenus but > > are directly selectable from the Filesystems menu. > > XFS also has an out-of-tree version. Making it a submenu is probably > to make maintenance easier (ie. replace files, not merge). > That is why the Kconfig options for xfs moved from fs/Kconfig to fs/xfs/Kconfig but using submenu was simply a convince thing to group all the XFS options together. If the submenu is really causing people distress go ahead and remove it. Since it's a cosmetic change it's not going to impact anything. -Russell - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [xfs-masters] Re: [PATCH] pull XFS support out of Kconfig submenu
On Thu, 2005-08-18 at 06:53 -0700, Chris Wedgwood wrote: On Wed, Aug 17, 2005 at 10:45:48PM +0200, Jesper Juhl wrote: It seems slightly odd to me that XFS support should be in a separate submenu, when all the other filesystems are not using submenus but are directly selectable from the Filesystems menu. XFS also has an out-of-tree version. Making it a submenu is probably to make maintenance easier (ie. replace files, not merge). That is why the Kconfig options for xfs moved from fs/Kconfig to fs/xfs/Kconfig but using submenu was simply a convince thing to group all the XFS options together. If the submenu is really causing people distress go ahead and remove it. Since it's a cosmetic change it's not going to impact anything. -Russell - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel merge issues
james rich wrote: > Hi folks, > I'm sure many here have read the discussion on lkml about lvm and > the problems that team is having. As part of that discussion it was said: I'm not entirely familiar with the issues surrounding lvm development, I know things are not in good shape right now. But I would say the following statement is unrealistic. Trying to manage a large software project without source code control is an even bigger nightmare than with source code control. patch is NOT a source code control tool. In fact a good source code control system would allow patch incremental generation, This is actually one glaring limitation of CVS, the ability group a set of changes together, aka "mod" (Bitkeeper has addressed a lot of these issues; it can generate or accept incremental patch sets) The lvm problems seems to be more of wetware issue than a source code control tool issue. Hopefully XFS will be able to keep ahead the problem of dramatically diverging code bases by staying active with Linus's releases. > Dan Kegel <[EMAIL PROTECTED]>: > >I know very little about LVM, but from watching earlier projects > >in the same situation you're in now, the path you need to follow > >seems clear: > > Stop using CVS internally for development. > > It makes checking in changes without submitting them to > > Linus too easy. > > >To get sync'd back up, *start with the standard kernel*, > >and start generating clean, human-understandable patches one > >at a time that bring it up to where you want. > > I am wondering how the XFS team plans on avoiding the same problems once > XFS becomes part of the kernel. Is there potential for problems with SGI > "losing control" over the source or direction of XFS once Linus puts it in > his tree? > > How does the above comment relate to the XFS team's plans on patches to > XFS and related areas once XFS is in? > > Just want to get these issues into the air before rancor and ill will > spread... > > P.S. XFS has been extremely solid and has saved me a lot of time waiting > for fscks. I am really impressed by the professionalism of the XFS team. > Hopefully I can contribute soon - working on a slackware boot/modules/root > disk set for XFS. > > James Rich > [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel merge issues
james rich wrote: Hi folks, I'm sure many here have read the discussion on lkml about lvm and the problems that team is having. As part of that discussion it was said: I'm not entirely familiar with the issues surrounding lvm development, I know things are not in good shape right now. But I would say the following statement is unrealistic. Trying to manage a large software project without source code control is an even bigger nightmare than with source code control. patch is NOT a source code control tool. In fact a good source code control system would allow patch incremental generation, This is actually one glaring limitation of CVS, the ability group a set of changes together, aka "mod" (Bitkeeper has addressed a lot of these issues; it can generate or accept incremental patch sets) The lvm problems seems to be more of wetware issue than a source code control tool issue. Hopefully XFS will be able to keep ahead the problem of dramatically diverging code bases by staying active with Linus's releases. Dan Kegel [EMAIL PROTECTED]: I know very little about LVM, but from watching earlier projects in the same situation you're in now, the path you need to follow seems clear: Stop using CVS internally for development. It makes checking in changes without submitting them to Linus too easy. To get sync'd back up, *start with the standard kernel*, and start generating clean, human-understandable patches one at a time that bring it up to where you want. I am wondering how the XFS team plans on avoiding the same problems once XFS becomes part of the kernel. Is there potential for problems with SGI "losing control" over the source or direction of XFS once Linus puts it in his tree? How does the above comment relate to the XFS team's plans on patches to XFS and related areas once XFS is in? Just want to get these issues into the air before rancor and ill will spread... P.S. XFS has been extremely solid and has saved me a lot of time waiting for fscks. I am really impressed by the professionalism of the XFS team. Hopefully I can contribute soon - working on a slackware boot/modules/root disk set for XFS. James Rich [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
"Stephen C. Tweedie" wrote: > Hi, > > On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote: > > On Thu, 14 Dec 2000, Linus Torvalds wrote: > > > > Just one: any fs that really cares about completion callback is very likely > > to be picky about the requests ordering. So sync_buffers() is very unlikely > > to be useful anyway. > > > > In that sense we really don't have anonymous buffers here. I seriously > > suspect that "unrealistic" assumption is not unrealistic at all. I'm > > not sufficiently familiar with XFS code to say for sure, but... > > Right. ext3 and reiserfs just want to submit their own IOs when it > comes to the journal. (At least in ext3, already-journaled buffers > can be written back by the VM freely.) It's a matter of telling the > fs when that should start. > > > What we really need is a way for VFS/VM to pass the pressure on filesystem. > > That's it. If fs wants unusual completions for requests - let it have its > > own queueing mechanism and submit these requests when it finds that convenient. > > There is a very clean way of doing this with address spaces. It's > something I would like to see done properly for 2.5: eliminate all > knowledge of buffer_heads from the VM layer. It would be pretty > simple to remove page->buffers completely and replace it with a > page->private pointer, owned by whatever address_space controlled the > page. Instead of trying to unmap and flush buffers on the page > directly, these operations would become address_space operations. Yes this is a lot of what page buf would like to do eventually. Have the VM system pressure page_buf for pages which would then be able to intelligently call the file system to free up cached pages. A big part of getting Delay Alloc to not completely consume all the system pages, is being told when it's time to start really allocating disk space and push pages out. > > > We could still provide the standard try_to_free_buffers() and > unmap_underlying_metadata() functions to operate on the per-page > buffer_head lists, and existing filesystems would only have to point > their address_space "private metadata" operations at the generic > functions. However, a filesystem which had additional ordering > constraints could then intercept the flush or writeback calls from the > VM and decide on its own how best to honour the VM pressure. > > This even works both for hashed and unhashed anonymous buffers, *if* > you allow the filesystem to attach all of its hashed buffers buffers > to an address_space of its own rather than having the buffer cache > pool all such buffers together. > > --Stephen -- Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Chris Mason wrote: > On Fri, 15 Dec 2000, Alexander Viro wrote: > > > Just one: any fs that really cares about completion callback is very likely > > to be picky about the requests ordering. So sync_buffers() is very unlikely > > to be useful anyway. > > > Somewhat. I guess there are at least two ways to do it. First flush the > buffers where ordering matters (log blocks), then send the others onto the > dirty list (general metadata). You might have your own end_io for those, and > sync_buffers would lose it. > > Second way (reiserfs recently changed to this method) is to do all the > flushing yourself, and remove the need for an end_io call back. > I'm curious about this. Does the mean reiserFS is doing all of it's own buffer management? This would seem a little redundant with what is already in the kernel? > > > > In that sense we really don't have anonymous buffers here. I seriously > > suspect that "unrealistic" assumption is not unrealistic at all. I'm > > not sufficiently familiar with XFS code to say for sure, but... > > > > What we really need is a way for VFS/VM to pass the pressure on filesystem. > > That's it. If fs wants unusual completions for requests - let it have its > > own queueing mechanism and submit these requests when it finds that convenient. > > > Yes, this is exactly what we've discussed. > > -chris -- Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Alexander Viro wrote: > On Thu, 14 Dec 2000, Linus Torvalds wrote: > > > Good point. > > > > This actually looks fairly nasty to fix. The obvious fix would be to not > > put such buffers on the dirty list at all, and instead rely on the VM > > layer calling "writepage()" when it wants to push out the pages. > > That would be the nice behaviour from a VM standpoint. > > > > However, that assumes that you don't have any "anonymous" buffers, which > > is probably an unrealistic assumption. > > > > The problem is that we don't have any per-buffer "writebuffer()" function, > > the way we have them per-page. It was never needed for any of the normal > > filesystems, and XFS just happened to be able to take advantage of the > > b_end_io behaviour. > > > > Suggestions welcome. > > Just one: any fs that really cares about completion callback is very likely > to be picky about the requests ordering. So sync_buffers() is very unlikely > to be useful anyway. Actually no, that's not the issue. The XFS log uses a LSN (Log Sequence Number) to keep track of log write ordering. Sync IO on each log buffer isn't realistic; the performance hit would be to great. I wasn't around when most of XFS was developed, but from I what I understand it was discovered early on that firing off writes in a particular order doesn't guarantee that they will finish in that order. Thus the implantation of a sequence number for each log write. One of the obstacles we ran into early on in the linux port was the fact that linux used fixed size IO requests to any given device. But most of XFS's meta data structures vary in size in multiples of 512 bytes. We were also implementing a page caching / clustering layer called page_buf which understands primarily pages and not necessary disk blocks. If your FS block size happens to match your page size then things are good, but it doesn't So we added a bit map field to the pages structure. Each bit then represents one BASIC BLOCK eg 512 for all practical purposes The end_io functions XFS defines updates the correct bit or the whole bit array if the whole page is valid, thus signaling the rest of the page_buf that the io has completed. Ok there is a lot more to it than what I've just described but you probably get the idea. > > > In that sense we really don't have anonymous buffers here. I seriously > suspect that "unrealistic" assumption is not unrealistic at all. I'm > not sufficiently familiar with XFS code to say for sure, but... > > What we really need is a way for VFS/VM to pass the pressure on filesystem. > That's it. If fs wants unusual completions for requests - let it have its > own queueing mechanism and submit these requests when it finds that convenient. > > Stephen, you probably already thought about that area. Could you comment on > that? > Cheers, > Al -- Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Linus Torvalds wrote: > On Thu, 14 Dec 2000, Russell Cattelan wrote: > > > > Ok one more wrinkle. > > sync_buffers calls ll_rw_block, this is going to have the same problem as > > calling ll_rw_block directly. > > Good point. > > This actually looks fairly nasty to fix. The obvious fix would be to not > put such buffers on the dirty list at all, and instead rely on the VM > layer calling "writepage()" when it wants to push out the pages. > That would be the nice behaviour from a VM standpoint. > > However, that assumes that you don't have any "anonymous" buffers, which > is probably an unrealistic assumption. > > The problem is that we don't have any per-buffer "writebuffer()" function, > the way we have them per-page. It was never needed for any of the normal > filesystems, and XFS just happened to be able to take advantage of the > b_end_io behaviour. > > Suggestions welcome. > > Linus Ok after a bit of trial and error I do have something working. I wouldn't call it the most elegant solution but it does work and it isn't very intrusive. #define BH_End_io 7/* End io function defined don't remap it */ /* don't change the callback if somebody explicitly set it */ if(!test_bit(BH_End_io, >b_state)){ bh->b_end_io = end_buffer_io_sync; } What I've done is in the XFS set buffer_head setup functions is set the initial value of b_state to BH_Locked and BH_End_io set the callback function and the rest of the relevant fields and then unlock the buffer. The only other quick fix that comes to mind is to change sync_buffers to use submit_bh rather than ll_rw_block. -- Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Linus Torvalds wrote: On Thu, 14 Dec 2000, Russell Cattelan wrote: Ok one more wrinkle. sync_buffers calls ll_rw_block, this is going to have the same problem as calling ll_rw_block directly. Good point. This actually looks fairly nasty to fix. The obvious fix would be to not put such buffers on the dirty list at all, and instead rely on the VM layer calling "writepage()" when it wants to push out the pages. That would be the nice behaviour from a VM standpoint. However, that assumes that you don't have any "anonymous" buffers, which is probably an unrealistic assumption. The problem is that we don't have any per-buffer "writebuffer()" function, the way we have them per-page. It was never needed for any of the normal filesystems, and XFS just happened to be able to take advantage of the b_end_io behaviour. Suggestions welcome. Linus Ok after a bit of trial and error I do have something working. I wouldn't call it the most elegant solution but it does work and it isn't very intrusive. #define BH_End_io 7/* End io function defined don't remap it */ /* don't change the callback if somebody explicitly set it */ if(!test_bit(BH_End_io, bh-b_state)){ bh-b_end_io = end_buffer_io_sync; } What I've done is in the XFS set buffer_head setup functions is set the initial value of b_state to BH_Locked and BH_End_io set the callback function and the rest of the relevant fields and then unlock the buffer. The only other quick fix that comes to mind is to change sync_buffers to use submit_bh rather than ll_rw_block. -- Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Alexander Viro wrote: On Thu, 14 Dec 2000, Linus Torvalds wrote: Good point. This actually looks fairly nasty to fix. The obvious fix would be to not put such buffers on the dirty list at all, and instead rely on the VM layer calling "writepage()" when it wants to push out the pages. That would be the nice behaviour from a VM standpoint. However, that assumes that you don't have any "anonymous" buffers, which is probably an unrealistic assumption. The problem is that we don't have any per-buffer "writebuffer()" function, the way we have them per-page. It was never needed for any of the normal filesystems, and XFS just happened to be able to take advantage of the b_end_io behaviour. Suggestions welcome. Just one: any fs that really cares about completion callback is very likely to be picky about the requests ordering. So sync_buffers() is very unlikely to be useful anyway. Actually no, that's not the issue. The XFS log uses a LSN (Log Sequence Number) to keep track of log write ordering. Sync IO on each log buffer isn't realistic; the performance hit would be to great. I wasn't around when most of XFS was developed, but from I what I understand it was discovered early on that firing off writes in a particular order doesn't guarantee that they will finish in that order. Thus the implantation of a sequence number for each log write. One of the obstacles we ran into early on in the linux port was the fact that linux used fixed size IO requests to any given device. But most of XFS's meta data structures vary in size in multiples of 512 bytes. We were also implementing a page caching / clustering layer called page_buf which understands primarily pages and not necessary disk blocks. If your FS block size happens to match your page size then things are good, but it doesn't So we added a bit map field to the pages structure. Each bit then represents one BASIC BLOCK eg 512 for all practical purposes The end_io functions XFS defines updates the correct bit or the whole bit array if the whole page is valid, thus signaling the rest of the page_buf that the io has completed. Ok there is a lot more to it than what I've just described but you probably get the idea. In that sense we really don't have anonymous buffers here. I seriously suspect that "unrealistic" assumption is not unrealistic at all. I'm not sufficiently familiar with XFS code to say for sure, but... What we really need is a way for VFS/VM to pass the pressure on filesystem. That's it. If fs wants unusual completions for requests - let it have its own queueing mechanism and submit these requests when it finds that convenient. Stephen, you probably already thought about that area. Could you comment on that? Cheers, Al -- Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Chris Mason wrote: On Fri, 15 Dec 2000, Alexander Viro wrote: Just one: any fs that really cares about completion callback is very likely to be picky about the requests ordering. So sync_buffers() is very unlikely to be useful anyway. Somewhat. I guess there are at least two ways to do it. First flush the buffers where ordering matters (log blocks), then send the others onto the dirty list (general metadata). You might have your own end_io for those, and sync_buffers would lose it. Second way (reiserfs recently changed to this method) is to do all the flushing yourself, and remove the need for an end_io call back. I'm curious about this. Does the mean reiserFS is doing all of it's own buffer management? This would seem a little redundant with what is already in the kernel? In that sense we really don't have anonymous buffers here. I seriously suspect that "unrealistic" assumption is not unrealistic at all. I'm not sufficiently familiar with XFS code to say for sure, but... What we really need is a way for VFS/VM to pass the pressure on filesystem. That's it. If fs wants unusual completions for requests - let it have its own queueing mechanism and submit these requests when it finds that convenient. Yes, this is exactly what we've discussed. -chris -- Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
"Stephen C. Tweedie" wrote: Hi, On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote: On Thu, 14 Dec 2000, Linus Torvalds wrote: Just one: any fs that really cares about completion callback is very likely to be picky about the requests ordering. So sync_buffers() is very unlikely to be useful anyway. In that sense we really don't have anonymous buffers here. I seriously suspect that "unrealistic" assumption is not unrealistic at all. I'm not sufficiently familiar with XFS code to say for sure, but... Right. ext3 and reiserfs just want to submit their own IOs when it comes to the journal. (At least in ext3, already-journaled buffers can be written back by the VM freely.) It's a matter of telling the fs when that should start. What we really need is a way for VFS/VM to pass the pressure on filesystem. That's it. If fs wants unusual completions for requests - let it have its own queueing mechanism and submit these requests when it finds that convenient. There is a very clean way of doing this with address spaces. It's something I would like to see done properly for 2.5: eliminate all knowledge of buffer_heads from the VM layer. It would be pretty simple to remove page-buffers completely and replace it with a page-private pointer, owned by whatever address_space controlled the page. Instead of trying to unmap and flush buffers on the page directly, these operations would become address_space operations. Yes this is a lot of what page buf would like to do eventually. Have the VM system pressure page_buf for pages which would then be able to intelligently call the file system to free up cached pages. A big part of getting Delay Alloc to not completely consume all the system pages, is being told when it's time to start really allocating disk space and push pages out. We could still provide the standard try_to_free_buffers() and unmap_underlying_metadata() functions to operate on the per-page buffer_head lists, and existing filesystems would only have to point their address_space "private metadata" operations at the generic functions. However, a filesystem which had additional ordering constraints could then intercept the flush or writeback calls from the VM and decide on its own how best to honour the VM pressure. This even works both for hashed and unhashed anonymous buffers, *if* you allow the filesystem to attach all of its hashed buffers buffers to an address_space of its own rather than having the buffer cache pool all such buffers together. --Stephen -- Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Test12 ll_rw_block error.
This would seem to be an error on the part of ll_rw_block. Setting b_end_io to a default handler without checking to see a callback has already been defined defeats the purpose of having a function op. void ll_rw_block(int rw, int nr, struct buffer_head * bhs[]) { @@ -928,7 +1046,8 @@ if (test_and_set_bit(BH_Lock, >b_state)) continue; - set_bit(BH_Req, >b_state); + /* We have the buffer lock */ + bh->b_end_io = end_buffer_io_sync; switch(rw) { case WRITE: -Russell Cattelan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Test12 ll_rw_block error.
This would seem to be an error on the part of ll_rw_block. Setting b_end_io to a default handler without checking to see a callback has already been defined defeats the purpose of having a function op. void ll_rw_block(int rw, int nr, struct buffer_head * bhs[]) { @@ -928,7 +1046,8 @@ if (test_and_set_bit(BH_Lock, bh-b_state)) continue; - set_bit(BH_Req, bh-b_state); + /* We have the buffer lock */ + bh-b_end_io = end_buffer_io_sync; switch(rw) { case WRITE: -Russell Cattelan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] livelock in elevator scheduling
Jens Axboe wrote: > On Mon, Dec 04 2000, Russell Cattelan wrote: > > I'm going to take a closer look at the scsi_back_merge_fn. > > This may have more to due with our/Chait's kiobuf modifications than > > anything else. > > > > > > > > XFS (dev: 8/20) mounting with KIOBUFIO > > Start mounting filesystem: sd(8,20) > > Ending clean XFS mount for filesystem: sd(8,20) > > kmem_alloc doing a vmalloc 262144 size & PAGE_SIZE 0 rval=0xe0a1 > > Unable to handle kernel NULL pointer dereference at virtual address > > 0008 > > printing eip: > > c019f8b5 > > *pde = > > > > Entering kdb (current=0xc191, pid 5) on processor 1 Panic: Oops > > due to panic @ 0xc019f8b5 > > eax = 0x0002 ebx = 0x0001 ecx = 0x00081478 edx = 0x > > esi = 0xc1957da0 edi = 0xc1923ac8 esp = 0xc1911e94 eip = 0xc019f8b5 > > ebp = 0xc1911e9c xss = 0x0018 xcs = 0x0010 eflags = 0x00010046 > > xds = 0x0018 xes = 0x0018 origeax = 0x = 0xc1911e60 > > [1]kdb> bt > > EBP EIP Function(args) > > 0xc1911e9c 0xc019f8b5 scsi_back_merge_fn_c+0x15 (0xc1923a98, > > 0xc1957da0, 0xcfb05780, 0x80) > >kernel .text 0xc010 0xc019f8a0 > > Ah, I see what it is now. The elevator is attempting to merge a buffer > head into a kio based request, poof. The attached diff should take > care of that in your tree. Hmm.. Yup... that is actually the mods made for kio in our base XFS tree. I wonder why the patch dropped them? I should have caught that. Thanks. I'll let you know how things go. > > > -- > * Jens Axboe <[EMAIL PROTECTED]> > * SuSE Labs > > > >xfs-elv-1Name: xfs-elv-1 > Type: Plain Text (text/plain) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] livelock in elevator scheduling
Jens Axboe wrote: On Mon, Dec 04 2000, Russell Cattelan wrote: I'm going to take a closer look at the scsi_back_merge_fn. This may have more to due with our/Chait's kiobuf modifications than anything else. XFS (dev: 8/20) mounting with KIOBUFIO Start mounting filesystem: sd(8,20) Ending clean XFS mount for filesystem: sd(8,20) kmem_alloc doing a vmalloc 262144 size PAGE_SIZE 0 rval=0xe0a1 Unable to handle kernel NULL pointer dereference at virtual address 0008 printing eip: c019f8b5 *pde = Entering kdb (current=0xc191, pid 5) on processor 1 Panic: Oops due to panic @ 0xc019f8b5 eax = 0x0002 ebx = 0x0001 ecx = 0x00081478 edx = 0x esi = 0xc1957da0 edi = 0xc1923ac8 esp = 0xc1911e94 eip = 0xc019f8b5 ebp = 0xc1911e9c xss = 0x0018 xcs = 0x0010 eflags = 0x00010046 xds = 0x0018 xes = 0x0018 origeax = 0x regs = 0xc1911e60 [1]kdb bt EBP EIP Function(args) 0xc1911e9c 0xc019f8b5 scsi_back_merge_fn_c+0x15 (0xc1923a98, 0xc1957da0, 0xcfb05780, 0x80) kernel .text 0xc010 0xc019f8a0 Ah, I see what it is now. The elevator is attempting to merge a buffer head into a kio based request, poof. The attached diff should take care of that in your tree. Hmm.. Yup... that is actually the mods made for kio in our base XFS tree. I wonder why the patch dropped them? I should have caught that. Thanks. I'll let you know how things go. -- * Jens Axboe [EMAIL PROTECTED] * SuSE Labs xfs-elv-1Name: xfs-elv-1 Type: Plain Text (text/plain) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] livelock in elevator scheduling
Jens Axboe wrote: > On Fri, Dec 01 2000, Russell Cattelan wrote: > > > If performance is down, then that problem is most likely elsewhere. > > > I/O limited benchmarking typically thrives on lots of request > > > latency -- with that comes better throughput for individual threads. > > > > > > > Anyway, I'll try your patch. > > > > Well this patch does help with the request starvation problem. > > Unfortunately it has introduced another problem. > > Running 4 doio programs, on and XFS partion with KIO buf IO turned on. > > This looks like a generic aic7xxx problem, and not block related. Since > you are doing such nice traces, what is the other CPU doing? CPU1 > seems to be stuck grabbing the io_request_lock (for reasons not entirely > clear from reading the aic7xxx source...) > Ok Keith gave me a quick hack to help with the race condition. Here is the latest set up back traces... The actually accuracy of these back traces... well? who knows, but it does give us something to go on. It doesn't make much sense to me right now, but I'm guessing the problem is starting with that do_div error. I'm going to take a closer look at the scsi_back_merge_fn. This may have more to due with our/Chait's kiobuf modifications than anything else. XFS (dev: 8/20) mounting with KIOBUFIO Start mounting filesystem: sd(8,20) Ending clean XFS mount for filesystem: sd(8,20) kmem_alloc doing a vmalloc 262144 size & PAGE_SIZE 0 rval=0xe0a1 Unable to handle kernel NULL pointer dereference at virtual address 0008 printing eip: c019f8b5 *pde = Entering kdb (current=0xc191, pid 5) on processor 1 Panic: Oops due to panic @ 0xc019f8b5 eax = 0x0002 ebx = 0x0001 ecx = 0x00081478 edx = 0x esi = 0xc1957da0 edi = 0xc1923ac8 esp = 0xc1911e94 eip = 0xc019f8b5 ebp = 0xc1911e9c xss = 0x0018 xcs = 0x0010 eflags = 0x00010046 xds = 0x0018 xes = 0x0018 origeax = 0x = 0xc1911e60 [1]kdb> bt EBP EIP Function(args) 0xc1911e9c 0xc019f8b5 scsi_back_merge_fn_c+0x15 (0xc1923a98, 0xc1957da0, 0xcfb05780, 0x80) kernel .text 0xc010 0xc019f8a0 0xc019f98c 0xc1911f2c 0xc016a0df __make_request+0x1af (0xc1923a98, 0x1, 0xcfb05780, 0x0, 0x814) kernel .text 0xc010 0xc0169f30 0xc016a8a4 0xc1911f70 0xc016a9c8 generic_make_request+0x124 (0x1, 0xcfb05780, 0x0, 0x0, 0x0) kernel .text 0xc010 0xc016a8a4 0xc016aa50 0xc1911fac 0xc016abde ll_rw_block+0x18e (0x1, 0x1, 0xc1911fd0, 0x0) kernel .text 0xc010 0xc016aa50 0xc016ac58 0xc1911fd4 0xc0138ed7 flush_dirty_buffers+0x97 (0x0, 0x10f00) kernel .text 0xc010 0xc0138e40 0xc0138f24 0xc1911fec 0xc01391ab bdflush+0x8f kernel .text 0xc010 0xc013911c 0xc0139260 0xc0108c9b kernel_thread+0x23 kernel .text 0xc010 0xc0108c78 0xc0108cb0 [1]kdb> go Oops: CPU:1 EIP:0010:[] EFLAGS: 00010046 eax: 0002 ebx: 0001 ecx: 00081478 edx: esi: c1957da0 edi: c1923ac8 ebp: c1911e9c esp: c1911e94 ds: 0018 es: 0018 ss: 0018 Process kflushd (pid: 5, stackpage=c1911000) Stack: cfb05780 c1923a98 c1911f2c c016a0df c1923a98 c1957da0 cfb05780 0080 0814 00081478 cfb05780 0008 0001 0200 c1923ac0 000e c191 c1911efc c010c77e 0246 0814 def0e800 Call Trace: [] [] [] [] [] [] [] [] Code: 66 81 7a 08 00 10 0f 47 d8 8b 4a 2c 85 c9 74 19 0f b7 42 08 NMI Watchdog detected LOCKUP on CPU0, registers: CPU:0 EIP:0010:[] EFLAGS: 0086 eax: c01b21ac ebx: c197b078 ecx: edx: 0012 esi: 0286 edi: c02f5f94 ebp: c02f5f44 esp: c02f5f38 ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=c02f5000) Stack: c190fb20 0401 c02f5f64 c010c539 0012 c197b078 c02f5f94 0240 c0331a40 0012 c02f5f8c c010c73d 0012 c02f5f94 c190fb20 c0108960 c02f4000 c0108960 c190fb20 c02f5fc8 c010a8c8 c0108960 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] Code: 80 3d 64 47 2e c0 00 f3 90 7e f5 e9 1b a7 f9 ff 80 3d 64 e3 Entering kdb (current=0xc02f4000, pid 0) on processor 0 due to WatchDog Interrupt @ 0xc0217a98 eax = 0xc01b21ac ebx = 0xc197b078 ecx = 0x edx = 0x0012 esi = 0x0286 edi = 0xc02f5f94 esp = 0xc02f5f38 eip = 0xc0217a98 ebp = 0xc02f5f44 xss = 0x0018 xcs = 0x0010 eflags = 0x0086 xds = 0x0018 xes = 0xc02f0018 origeax = 0xc01b21ac = 0xc02f5f04 [0]kdb> bt EBP EIP Function(args) 0xc0217a98 stext_lock+0x43a8 kernel .text.lock 0xc02136f0 0xc02136f0 0xc
Re: [PATCH] livelock in elevator scheduling
Jens Axboe wrote: > On Fri, Dec 01 2000, Russell Cattelan wrote: > > > If performance is down, then that problem is most likely elsewhere. > > > I/O limited benchmarking typically thrives on lots of request > > > latency -- with that comes better throughput for individual threads. > > > > > > > Anyway, I'll try your patch. > > > > Well this patch does help with the request starvation problem. > > Unfortunately it has introduced another problem. > > Running 4 doio programs, on and XFS partion with KIO buf IO turned on. > > This looks like a generic aic7xxx problem, and not block related. Since > you are doing such nice traces, what is the other CPU doing? CPU1 > seems to be stuck grabbing the io_request_lock (for reasons not entirely > clear from reading the aic7xxx source...) > Sorry I haven't been able to get a decent backtrace of the other processor. According to Keith Owens the maintainer of kdb there is a race condition in kbd and the NMI loop detection stuff that resulting in not being able to switch cpus. I'll keep try to dig up some more info. I'm also seeing various other panics in XFS (well pagebuf to be specific) with this patch. Nothing seems to be very consistent and this point. Ok I did manage to switch processors. Entering kdb (current=0xd7c0a000, pid 645) on processor 1 due to cpu switch [1]kdb> bt EBP EIP Function(args) 0xc0216594 stext_lock+0x2ea4 kernel .text.lock 0xc02136f0 0xc02136f0 0xc02197c0 0xd7c0bf98 0xc0155964 ext2_sync_file+0x2c (0xd8257560, 0xd7348220, 0x0, 0xd7c0a000) kernel .text 0xc010 0xc0155938 0xc0155a40 0xd7c0bfbc 0xc0136064 sys_fsync+0x54 (0x1, 0xb020, 0x0, 0xb048, 0x8051738) kernel .text 0xc010 0xc0136010 0xc0136088 0xc010a807 system_call+0x33 kernel .text 0xc010 0xc010a7d4 0xc010a80c [1]kdb> > > -- > * Jens Axboe <[EMAIL PROTECTED]> > * SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] livelock in elevator scheduling
Jens Axboe wrote: On Fri, Dec 01 2000, Russell Cattelan wrote: If performance is down, then that problem is most likely elsewhere. I/O limited benchmarking typically thrives on lots of request latency -- with that comes better throughput for individual threads. Anyway, I'll try your patch. Well this patch does help with the request starvation problem. Unfortunately it has introduced another problem. Running 4 doio programs, on and XFS partion with KIO buf IO turned on. This looks like a generic aic7xxx problem, and not block related. Since you are doing such nice traces, what is the other CPU doing? CPU1 seems to be stuck grabbing the io_request_lock (for reasons not entirely clear from reading the aic7xxx source...) Ok Keith gave me a quick hack to help with the race condition. Here is the latest set up back traces... The actually accuracy of these back traces... well? who knows, but it does give us something to go on. It doesn't make much sense to me right now, but I'm guessing the problem is starting with that do_div error. I'm going to take a closer look at the scsi_back_merge_fn. This may have more to due with our/Chait's kiobuf modifications than anything else. XFS (dev: 8/20) mounting with KIOBUFIO Start mounting filesystem: sd(8,20) Ending clean XFS mount for filesystem: sd(8,20) kmem_alloc doing a vmalloc 262144 size PAGE_SIZE 0 rval=0xe0a1 Unable to handle kernel NULL pointer dereference at virtual address 0008 printing eip: c019f8b5 *pde = Entering kdb (current=0xc191, pid 5) on processor 1 Panic: Oops due to panic @ 0xc019f8b5 eax = 0x0002 ebx = 0x0001 ecx = 0x00081478 edx = 0x esi = 0xc1957da0 edi = 0xc1923ac8 esp = 0xc1911e94 eip = 0xc019f8b5 ebp = 0xc1911e9c xss = 0x0018 xcs = 0x0010 eflags = 0x00010046 xds = 0x0018 xes = 0x0018 origeax = 0x regs = 0xc1911e60 [1]kdb bt EBP EIP Function(args) 0xc1911e9c 0xc019f8b5 scsi_back_merge_fn_c+0x15 (0xc1923a98, 0xc1957da0, 0xcfb05780, 0x80) kernel .text 0xc010 0xc019f8a0 0xc019f98c 0xc1911f2c 0xc016a0df __make_request+0x1af (0xc1923a98, 0x1, 0xcfb05780, 0x0, 0x814) kernel .text 0xc010 0xc0169f30 0xc016a8a4 0xc1911f70 0xc016a9c8 generic_make_request+0x124 (0x1, 0xcfb05780, 0x0, 0x0, 0x0) kernel .text 0xc010 0xc016a8a4 0xc016aa50 0xc1911fac 0xc016abde ll_rw_block+0x18e (0x1, 0x1, 0xc1911fd0, 0x0) kernel .text 0xc010 0xc016aa50 0xc016ac58 0xc1911fd4 0xc0138ed7 flush_dirty_buffers+0x97 (0x0, 0x10f00) kernel .text 0xc010 0xc0138e40 0xc0138f24 0xc1911fec 0xc01391ab bdflush+0x8f kernel .text 0xc010 0xc013911c 0xc0139260 0xc0108c9b kernel_thread+0x23 kernel .text 0xc010 0xc0108c78 0xc0108cb0 [1]kdb go Oops: CPU:1 EIP:0010:[c019f8b5] EFLAGS: 00010046 eax: 0002 ebx: 0001 ecx: 00081478 edx: esi: c1957da0 edi: c1923ac8 ebp: c1911e9c esp: c1911e94 ds: 0018 es: 0018 ss: 0018 Process kflushd (pid: 5, stackpage=c1911000) Stack: cfb05780 c1923a98 c1911f2c c016a0df c1923a98 c1957da0 cfb05780 0080 0814 00081478 cfb05780 0008 0001 0200 c1923ac0 000e c191 c1911efc c010c77e 0246 0814 def0e800 Call Trace: [c016a0df] [c010c77e] [c010a8c8] [c016a9c8] [c016abde] [c0138ed7] [c01391ab] [c0108c9b] Code: 66 81 7a 08 00 10 0f 47 d8 8b 4a 2c 85 c9 74 19 0f b7 42 08 NMI Watchdog detected LOCKUP on CPU0, registers: CPU:0 EIP:0010:[c0217a98] EFLAGS: 0086 eax: c01b21ac ebx: c197b078 ecx: edx: 0012 esi: 0286 edi: c02f5f94 ebp: c02f5f44 esp: c02f5f38 ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=c02f5000) Stack: c190fb20 0401 c02f5f64 c010c539 0012 c197b078 c02f5f94 0240 c0331a40 0012 c02f5f8c c010c73d 0012 c02f5f94 c190fb20 c0108960 c02f4000 c0108960 c190fb20 c02f5fc8 c010a8c8 c0108960 Call Trace: [c010c539] [c010c73d] [c0108960] [c0108960] [c010a8c8] [c0108960] [c0108960] [c0100018] [c010898f] [c0108a02] [c0105000] [c01001d0] Code: 80 3d 64 47 2e c0 00 f3 90 7e f5 e9 1b a7 f9 ff 80 3d 64 e3 Entering kdb (current=0xc02f4000, pid 0) on processor 0 due to WatchDog Interrupt @ 0xc0217a98 eax = 0xc01b21ac ebx = 0xc197b078 ecx = 0x edx = 0x0012 esi = 0x0286 edi = 0xc02f5f94 esp = 0xc02f5f38 eip = 0xc0217a98 ebp = 0xc02f5f44 xss = 0x0018 xcs = 0x0010 eflags = 0x0086 xds = 0x0018 xes = 0xc02f0018 origeax = 0xc01b21ac regs = 0xc02f5f04 [0]kdb bt EBP EIP Function(args) 0xc0217a98 stext_lock+0x43a8 kernel .text.lock 0xc02136f0
Re: [PATCH] livelock in elevator scheduling
Jens Axboe wrote: > On Tue, Nov 21 2000, [EMAIL PROTECTED] wrote: > > > Believe it or not, but this is intentional. In that regard, the > > > function name is a misnomer -- call it i/o scheduler instead :-) > > > > I never believe it intentional. If it is true, the current kernel > > will be suffered from a kind of DOS attack. Yes, actually I'm a > > victim of it. > > The problem is caused by the too high sequence numbers in stock > kernel, as I said. Plus, the sequence decrementing doesn't take > request/buffer size into account. So the starvation _is_ limited, > the limit is just too high. > > > By Running ZD's ServerBench, not only the performance down, but my > > machine blocks all commands execution including /bin/ps, /bin/ls... , > > and those are not ^C able unless the benchmark is stopped. Those > > commands are read from disks but the requests are wating at the end of > > I/O queue, those won't be executed. > > If performance is down, then that problem is most likely elsewhere. > I/O limited benchmarking typically thrives on lots of request > latency -- with that comes better throughput for individual threads. > > > Anyway, I'll try your patch. Well this patch does help with the request starvation problem. Unfortunately it has introduced another problem. Running 4 doio programs, on and XFS partion with KIO buf IO turned on. I did see something about problems with aic7xxx driver in test11, so this may not be related to your patch. I'm going to run without kiobuf to see if the problem still occurs. XFS (dev: 8/17) mounting with KIOBUFIO Start mounting filesystem: sd(8,17) Ending clean XFS mount for filesystem: sd(8,17) NMI Watchdog detected LOCKUP on CPU1, registers: CPU:1 EIP:0010:[] EFLAGS: 0082 eax: c01b21ac ebx: c197b078 ecx: edx: 0012 esi: 0286 edi: dfff7f70 ebp: dfff7f20 esp: dfff7f14 ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=dfff7000) Stack: c190fb20 0401 0020 dfff7f40 c010c539 0012 c197b078 dfff7f70 0240 c0331a40 0012 dfff7f68 c010c73d 0012 dfff7f70 c190fb20 c0108960 dfff6000 c0108960 c190fb20 0001 dfff7fa4 c010a8c8 c0108960 Call Trace: [] [] [] [] [] [] [] [] [] [] [] Code: f3 90 7e f5 e9 1b a7 f9 ff 80 3d e4 e4 2e c0 00 f3 90 7e f5 Entering kdb (current=0xdfff6000, pid 0) on processor 1 due to WatchDog Interrupt @ 0xc0217a9f eax = 0xc01b21ac ebx = 0xc197b078 ecx = 0x edx = 0x0012 esi = 0x0286 edi = 0xdfff7f70 esp = 0xdfff7f14 eip = 0xc0217a9f ebp = 0xdfff7f20 xss = 0x0018 xcs = 0x0010 eflags = 0x0082 xds = 0xc1970018 xes = 0xdfff0018 origeax = 0xc01b21ac = 0xdfff7ee0 [1]kdb> bt EBP EIP Function(args) 0xc0217a9f stext_lock+0x43af kernel .text.lock 0xc02136f0 0xc02136f0 0xc02197c0 0xdfff7f20 0xc01b21c3 do_aic7xxx_isr+0x17 (0x12, 0xc197b078, 0xdfff7f70, 0x240, 0xc0331a40) kernel .text 0xc010 0xc01b21ac 0xc01b225c 0xdfff7f40 0xc010c539 handle_IRQ_event+0x4d (0x12, 0xdfff7f70, 0xc190fb20, 0xc0108960, 0xdfff6000) kernel .text 0xc010 0xc010c4ec 0xc010c568 0xdfff7f68 0xc010c73d do_IRQ+0x99 (0xc0108960, 0x0, 0xdfff6000, 0xdfff6000, 0xc0108960) kernel .text 0xc010 0xc010c6a4 0xc010c790 0xc010a8c8 ret_from_intr kernel .text 0xc010 0xc010a8c8 0xc010a8e8 Interrupt registers: eax = 0x ebx = 0xc0108960 ecx = 0x edx = 0xdfff6000 esi = 0xdfff6000 edi = 0xc0108960 esp = 0xdfff7fa4 eip = 0xc010898f ebp = 0xdfff7fa4 xss = 0x0018 xcs = 0x0010 eflags = 0x0246 xds = 0xc0100018 xes = 0xdfff0018 origeax = 0xff12 = 0xdfff7f70 0xc010898f default_idle+0x2f kernel .text 0xc010 0xc0108960 0xc0108998 0xdfff7fb8 0xc0108a02 cpu_idle+0x42 kernel .text 0xc010 0xc01089c0 0xc0108a18 0xdfff7fc0 0xc02fb5b9 start_secondary+0x25 kernel .text.init 0xc02f6000 0xc02fb594 0xc02fb5c0 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] livelock in elevator scheduling
Jens Axboe wrote: On Tue, Nov 21 2000, [EMAIL PROTECTED] wrote: Believe it or not, but this is intentional. In that regard, the function name is a misnomer -- call it i/o scheduler instead :-) I never believe it intentional. If it is true, the current kernel will be suffered from a kind of DOS attack. Yes, actually I'm a victim of it. The problem is caused by the too high sequence numbers in stock kernel, as I said. Plus, the sequence decrementing doesn't take request/buffer size into account. So the starvation _is_ limited, the limit is just too high. By Running ZD's ServerBench, not only the performance down, but my machine blocks all commands execution including /bin/ps, /bin/ls... , and those are not ^C able unless the benchmark is stopped. Those commands are read from disks but the requests are wating at the end of I/O queue, those won't be executed. If performance is down, then that problem is most likely elsewhere. I/O limited benchmarking typically thrives on lots of request latency -- with that comes better throughput for individual threads. Anyway, I'll try your patch. Well this patch does help with the request starvation problem. Unfortunately it has introduced another problem. Running 4 doio programs, on and XFS partion with KIO buf IO turned on. I did see something about problems with aic7xxx driver in test11, so this may not be related to your patch. I'm going to run without kiobuf to see if the problem still occurs. XFS (dev: 8/17) mounting with KIOBUFIO Start mounting filesystem: sd(8,17) Ending clean XFS mount for filesystem: sd(8,17) NMI Watchdog detected LOCKUP on CPU1, registers: CPU:1 EIP:0010:[c0217a9f] EFLAGS: 0082 eax: c01b21ac ebx: c197b078 ecx: edx: 0012 esi: 0286 edi: dfff7f70 ebp: dfff7f20 esp: dfff7f14 ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=dfff7000) Stack: c190fb20 0401 0020 dfff7f40 c010c539 0012 c197b078 dfff7f70 0240 c0331a40 0012 dfff7f68 c010c73d 0012 dfff7f70 c190fb20 c0108960 dfff6000 c0108960 c190fb20 0001 dfff7fa4 c010a8c8 c0108960 Call Trace: [c010c539] [c010c73d] [c0108960] [c0108960] [c010a8c8] [c0108960] [c0108960] [c0100018] [c010898f] [c0108a02] [c010a9be] Code: f3 90 7e f5 e9 1b a7 f9 ff 80 3d e4 e4 2e c0 00 f3 90 7e f5 Entering kdb (current=0xdfff6000, pid 0) on processor 1 due to WatchDog Interrupt @ 0xc0217a9f eax = 0xc01b21ac ebx = 0xc197b078 ecx = 0x edx = 0x0012 esi = 0x0286 edi = 0xdfff7f70 esp = 0xdfff7f14 eip = 0xc0217a9f ebp = 0xdfff7f20 xss = 0x0018 xcs = 0x0010 eflags = 0x0082 xds = 0xc1970018 xes = 0xdfff0018 origeax = 0xc01b21ac regs = 0xdfff7ee0 [1]kdb bt EBP EIP Function(args) 0xc0217a9f stext_lock+0x43af kernel .text.lock 0xc02136f0 0xc02136f0 0xc02197c0 0xdfff7f20 0xc01b21c3 do_aic7xxx_isr+0x17 (0x12, 0xc197b078, 0xdfff7f70, 0x240, 0xc0331a40) kernel .text 0xc010 0xc01b21ac 0xc01b225c 0xdfff7f40 0xc010c539 handle_IRQ_event+0x4d (0x12, 0xdfff7f70, 0xc190fb20, 0xc0108960, 0xdfff6000) kernel .text 0xc010 0xc010c4ec 0xc010c568 0xdfff7f68 0xc010c73d do_IRQ+0x99 (0xc0108960, 0x0, 0xdfff6000, 0xdfff6000, 0xc0108960) kernel .text 0xc010 0xc010c6a4 0xc010c790 0xc010a8c8 ret_from_intr kernel .text 0xc010 0xc010a8c8 0xc010a8e8 Interrupt registers: eax = 0x ebx = 0xc0108960 ecx = 0x edx = 0xdfff6000 esi = 0xdfff6000 edi = 0xc0108960 esp = 0xdfff7fa4 eip = 0xc010898f ebp = 0xdfff7fa4 xss = 0x0018 xcs = 0x0010 eflags = 0x0246 xds = 0xc0100018 xes = 0xdfff0018 origeax = 0xff12 regs = 0xdfff7f70 0xc010898f default_idle+0x2f kernel .text 0xc010 0xc0108960 0xc0108998 0xdfff7fb8 0xc0108a02 cpu_idle+0x42 kernel .text 0xc010 0xc01089c0 0xc0108a18 0xdfff7fc0 0xc02fb5b9 start_secondary+0x25 kernel .text.init 0xc02f6000 0xc02fb594 0xc02fb5c0 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/