Re: [PATCH] Ext3 online resizing locking issue
Hi, On Wed, 2005-08-31 at 12:35, Glauber de Oliveira Costa wrote: > At a first look, i thought about locking gdt-related data. But in a > closer one, it seemed to me that we're in fact modifying a little bit > more than that in the resize code. But all these modifications seem to > be somehow related to the ext3 super block specific data in > ext3_sb_info. My first naive approach would be adding a lock to that > struct I took great care when making that code SMP-safe to avoid such locks, for performance reasons. See the comments at * We need to protect s_groups_count against other CPUs seeing * inconsistent state in the superblock. in fs/ext3/resize.c for the rules. But basically the way it works is that we only usually modify data that cannot be in use by other parts of the kernel --- and that's fairly easy to guarantee, since by definition extending the fs is something that is touching bits that aren't already in use. Only once all the new data is safely installed do we atomically update the s_groups_count field, which instantly makes the new data visible. We enforce this ordering via smp read barriers before reading s_groups_count and write barriers after modifying it, but we don't actually have locks as such. The only use of locking in the resize is hence the superblock lock, which is not really there to protect the resize from the rest of the fs --- the s_groups_count barriers do that. All the sb lock is needed for is to prevent two resizes from progressing at the same time; and that could easily be abstracted into a separate resize lock. Cheers, Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Ext3 online resizing locking issue
> > The two different uses of the superblock lock are really quite > different; I don't see any particular problem with using two different > locks for the two different things. Mount and the namespace code are > not locking the same thing --- the fact that the resize code uses the > superblock lock is really a historical side-effect of the fact that we > used to use the same overloaded superblock lock in the ext2/ext3 block > allocation layers to guard bitmap access. > > At a first look, i thought about locking gdt-related data. But in a closer one, it seemed to me that we're in fact modifying a little bit more than that in the resize code. But all these modifications seem to be somehow related to the ext3 super block specific data in ext3_sb_info. My first naive approach would be adding a lock to that struct Besides that, by doing that, we become pretty much independent of vfs locking decisions to handle ext3 data. Do you think it all make sense? -- = Glauber de Oliveira Costa IBM Linux Technology Center - Brazil [EMAIL PROTECTED] = - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Ext3 online resizing locking issue
The two different uses of the superblock lock are really quite different; I don't see any particular problem with using two different locks for the two different things. Mount and the namespace code are not locking the same thing --- the fact that the resize code uses the superblock lock is really a historical side-effect of the fact that we used to use the same overloaded superblock lock in the ext2/ext3 block allocation layers to guard bitmap access. At a first look, i thought about locking gdt-related data. But in a closer one, it seemed to me that we're in fact modifying a little bit more than that in the resize code. But all these modifications seem to be somehow related to the ext3 super block specific data in ext3_sb_info. My first naive approach would be adding a lock to that struct Besides that, by doing that, we become pretty much independent of vfs locking decisions to handle ext3 data. Do you think it all make sense? -- = Glauber de Oliveira Costa IBM Linux Technology Center - Brazil [EMAIL PROTECTED] = - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Ext3 online resizing locking issue
Hi, On Wed, 2005-08-31 at 12:35, Glauber de Oliveira Costa wrote: At a first look, i thought about locking gdt-related data. But in a closer one, it seemed to me that we're in fact modifying a little bit more than that in the resize code. But all these modifications seem to be somehow related to the ext3 super block specific data in ext3_sb_info. My first naive approach would be adding a lock to that struct I took great care when making that code SMP-safe to avoid such locks, for performance reasons. See the comments at * We need to protect s_groups_count against other CPUs seeing * inconsistent state in the superblock. in fs/ext3/resize.c for the rules. But basically the way it works is that we only usually modify data that cannot be in use by other parts of the kernel --- and that's fairly easy to guarantee, since by definition extending the fs is something that is touching bits that aren't already in use. Only once all the new data is safely installed do we atomically update the s_groups_count field, which instantly makes the new data visible. We enforce this ordering via smp read barriers before reading s_groups_count and write barriers after modifying it, but we don't actually have locks as such. The only use of locking in the resize is hence the superblock lock, which is not really there to protect the resize from the rest of the fs --- the s_groups_count barriers do that. All the sb lock is needed for is to prevent two resizes from progressing at the same time; and that could easily be abstracted into a separate resize lock. Cheers, Stephen - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Ext3 online resizing locking issue
Hi, On Thu, 2005-08-25 at 21:43, Glauber de Oliveira Costa wrote: > Just a question here. With s_lock held by the remount code, we're > altering the struct super_block, and believing we're safe. We try to > acquire it inside the resize functions, because we're trying to modify > this same data. Thus, if we rely on another lock, aren't we probably > messing up something ? The two different uses of the superblock lock are really quite different; I don't see any particular problem with using two different locks for the two different things. Mount and the namespace code are not locking the same thing --- the fact that the resize code uses the superblock lock is really a historical side-effect of the fact that we used to use the same overloaded superblock lock in the ext2/ext3 block allocation layers to guard bitmap access. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Ext3 online resizing locking issue
Hi, On Thu, 2005-08-25 at 21:43, Glauber de Oliveira Costa wrote: Just a question here. With s_lock held by the remount code, we're altering the struct super_block, and believing we're safe. We try to acquire it inside the resize functions, because we're trying to modify this same data. Thus, if we rely on another lock, aren't we probably messing up something ? The two different uses of the superblock lock are really quite different; I don't see any particular problem with using two different locks for the two different things. Mount and the namespace code are not locking the same thing --- the fact that the resize code uses the superblock lock is really a historical side-effect of the fact that we used to use the same overloaded superblock lock in the ext2/ext3 block allocation layers to guard bitmap access. --Stephen - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Ext3 online resizing locking issue
> NAK, this is wrong: > > > + lock_super(sb); > > err = ext3_group_extend(sb, EXT3_SB(sb)->s_es, n_blocks_count); > > + unlock_super(sb); > > This basically reverses the order of locking between lock_super() and > journal_start() (the latter acts like a lock because it can block on a > resource if the journal is too full for the new transaction.) That's > the opposite order to normal, and will result in a potential deadlock. > Ooops! Missed that. But I agree with the point. > But the _right_ fix, if you really want to keep that code, is probably > to move all the resize locking to a separate lock that ranks outside the > journal_start. The easy workaround is to drop the superblock lock and > reaquire it around the journal_start(); it would be pretty easy to make > that work robustly as far as ext3 is concerned, but I suspect there may > be VFS-layer problems if we start dropping the superblock lock in the > middle of the s_ops->remount() call --- Al? > Just a question here. With s_lock held by the remount code, we're altering the struct super_block, and believing we're safe. We try to acquire it inside the resize functions, because we're trying to modify this same data. Thus, if we rely on another lock, aren't we probably messing up something ? (for example, both group_extend and remount code potentially modify s_flags field. If we ioctl and remount at the same time, each one with a different lock, something could go wrong). Am I missing something here ? glauber - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Ext3 online resizing locking issue
Hi, On Wed, 2005-08-24 at 22:03, Glauber de Oliveira Costa wrote: > This simple patch provides a fix for a locking issue found in the online > resizing code. The problem actually happened while trying to resize the > filesystem trough the resize=xxx option in a remount. NAK, this is wrong: > + lock_super(sb); > err = ext3_group_extend(sb, EXT3_SB(sb)->s_es, n_blocks_count); > + unlock_super(sb); This basically reverses the order of locking between lock_super() and journal_start() (the latter acts like a lock because it can block on a resource if the journal is too full for the new transaction.) That's the opposite order to normal, and will result in a potential deadlock. > + {Opt_resize, "resize=%u"}, > {Opt_err, NULL}, > - {Opt_resize, "resize"}, Right, that's disabled for now. I guess the easy fix here is just to remove the code entirely, given that we have locking problems with trying to fix it! But the _right_ fix, if you really want to keep that code, is probably to move all the resize locking to a separate lock that ranks outside the journal_start. The easy workaround is to drop the superblock lock and reaquire it around the journal_start(); it would be pretty easy to make that work robustly as far as ext3 is concerned, but I suspect there may be VFS-layer problems if we start dropping the superblock lock in the middle of the s_ops->remount() call --- Al? --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Ext3 online resizing locking issue
Hi, On Wed, 2005-08-24 at 22:03, Glauber de Oliveira Costa wrote: This simple patch provides a fix for a locking issue found in the online resizing code. The problem actually happened while trying to resize the filesystem trough the resize=xxx option in a remount. NAK, this is wrong: + lock_super(sb); err = ext3_group_extend(sb, EXT3_SB(sb)-s_es, n_blocks_count); + unlock_super(sb); This basically reverses the order of locking between lock_super() and journal_start() (the latter acts like a lock because it can block on a resource if the journal is too full for the new transaction.) That's the opposite order to normal, and will result in a potential deadlock. + {Opt_resize, resize=%u}, {Opt_err, NULL}, - {Opt_resize, resize}, Right, that's disabled for now. I guess the easy fix here is just to remove the code entirely, given that we have locking problems with trying to fix it! But the _right_ fix, if you really want to keep that code, is probably to move all the resize locking to a separate lock that ranks outside the journal_start. The easy workaround is to drop the superblock lock and reaquire it around the journal_start(); it would be pretty easy to make that work robustly as far as ext3 is concerned, but I suspect there may be VFS-layer problems if we start dropping the superblock lock in the middle of the s_ops-remount() call --- Al? --Stephen - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Ext3 online resizing locking issue
NAK, this is wrong: + lock_super(sb); err = ext3_group_extend(sb, EXT3_SB(sb)-s_es, n_blocks_count); + unlock_super(sb); This basically reverses the order of locking between lock_super() and journal_start() (the latter acts like a lock because it can block on a resource if the journal is too full for the new transaction.) That's the opposite order to normal, and will result in a potential deadlock. Ooops! Missed that. But I agree with the point. But the _right_ fix, if you really want to keep that code, is probably to move all the resize locking to a separate lock that ranks outside the journal_start. The easy workaround is to drop the superblock lock and reaquire it around the journal_start(); it would be pretty easy to make that work robustly as far as ext3 is concerned, but I suspect there may be VFS-layer problems if we start dropping the superblock lock in the middle of the s_ops-remount() call --- Al? Just a question here. With s_lock held by the remount code, we're altering the struct super_block, and believing we're safe. We try to acquire it inside the resize functions, because we're trying to modify this same data. Thus, if we rely on another lock, aren't we probably messing up something ? (for example, both group_extend and remount code potentially modify s_flags field. If we ioctl and remount at the same time, each one with a different lock, something could go wrong). Am I missing something here ? glauber - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Ext3 online resizing locking issue
This simple patch provides a fix for a locking issue found in the online resizing code. The problem actually happened while trying to resize the filesystem trough the resize=xxx option in a remount. Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]> diff -up linux-2.6.13-rc6-orig/fs/ext3/ioctl.c linux/fs/ext3/ioctl.c --- linux-2.6.13-rc6-orig/fs/ext3/ioctl.c 2005-08-24 17:48:22.0 -0300 +++ linux/fs/ext3/ioctl.c 2005-08-24 15:12:48.0 -0300 @@ -206,7 +206,9 @@ flags_err: if (get_user(n_blocks_count, (__u32 __user *)arg)) return -EFAULT; + lock_super(sb); err = ext3_group_extend(sb, EXT3_SB(sb)->s_es, n_blocks_count); + unlock_super(sb); journal_lock_updates(EXT3_SB(sb)->s_journal); journal_flush(EXT3_SB(sb)->s_journal); journal_unlock_updates(EXT3_SB(sb)->s_journal); Only in linux/fs/ext3: patch-mnt_resize diff -up linux-2.6.13-rc6-orig/fs/ext3/resize.c linux/fs/ext3/resize.c --- linux-2.6.13-rc6-orig/fs/ext3/resize.c 2005-08-24 17:48:22.0 -0300 +++ linux/fs/ext3/resize.c 2005-08-24 15:15:28.0 -0300 @@ -884,7 +884,9 @@ exit_put: /* Extend the filesystem to the new number of blocks specified. This entry * point is only used to extend the current filesystem to the end of the last * existing group. It can be accessed via ioctl, or by "remount,resize=" - * for emergencies (because it has no dependencies on reserved blocks). + * for emergencies (because it has no dependencies on reserved blocks). + * + * It should be called with sb->s_lock held * * If we _really_ wanted, we could use default values to call ext3_group_add() * allow the "remount" trick to work for arbitrary resizing, assuming enough @@ -959,7 +961,6 @@ int ext3_group_extend(struct super_block goto exit_put; } - lock_super(sb); if (o_blocks_count != le32_to_cpu(es->s_blocks_count)) { ext3_warning(sb, __FUNCTION__, "multiple resizers run on filesystem!\n"); @@ -978,7 +979,6 @@ int ext3_group_extend(struct super_block es->s_blocks_count = cpu_to_le32(o_blocks_count + add); ext3_journal_dirty_metadata(handle, EXT3_SB(sb)->s_sbh); sb->s_dirt = 1; - unlock_super(sb); ext3_debug("freeing blocks %ld through %ld\n", o_blocks_count, o_blocks_count + add); ext3_free_blocks_sb(handle, sb, o_blocks_count, add, _blocks); diff -up linux-2.6.13-rc6-orig/fs/ext3/super.c linux/fs/ext3/super.c --- linux-2.6.13-rc6-orig/fs/ext3/super.c 2005-08-24 17:48:22.0 -0300 +++ linux/fs/ext3/super.c 2005-08-24 15:13:16.0 -0300 @@ -639,8 +639,8 @@ static match_table_t tokens = { {Opt_quota, "quota"}, {Opt_quota, "usrquota"}, {Opt_barrier, "barrier=%u"}, + {Opt_resize, "resize=%u"}, {Opt_err, NULL}, - {Opt_resize, "resize"}, }; static unsigned long get_sb_block(void **data) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Ext3 online resizing locking issue
This simple patch provides a fix for a locking issue found in the online resizing code. The problem actually happened while trying to resize the filesystem trough the resize=xxx option in a remount. Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] diff -up linux-2.6.13-rc6-orig/fs/ext3/ioctl.c linux/fs/ext3/ioctl.c --- linux-2.6.13-rc6-orig/fs/ext3/ioctl.c 2005-08-24 17:48:22.0 -0300 +++ linux/fs/ext3/ioctl.c 2005-08-24 15:12:48.0 -0300 @@ -206,7 +206,9 @@ flags_err: if (get_user(n_blocks_count, (__u32 __user *)arg)) return -EFAULT; + lock_super(sb); err = ext3_group_extend(sb, EXT3_SB(sb)-s_es, n_blocks_count); + unlock_super(sb); journal_lock_updates(EXT3_SB(sb)-s_journal); journal_flush(EXT3_SB(sb)-s_journal); journal_unlock_updates(EXT3_SB(sb)-s_journal); Only in linux/fs/ext3: patch-mnt_resize diff -up linux-2.6.13-rc6-orig/fs/ext3/resize.c linux/fs/ext3/resize.c --- linux-2.6.13-rc6-orig/fs/ext3/resize.c 2005-08-24 17:48:22.0 -0300 +++ linux/fs/ext3/resize.c 2005-08-24 15:15:28.0 -0300 @@ -884,7 +884,9 @@ exit_put: /* Extend the filesystem to the new number of blocks specified. This entry * point is only used to extend the current filesystem to the end of the last * existing group. It can be accessed via ioctl, or by remount,resize=size - * for emergencies (because it has no dependencies on reserved blocks). + * for emergencies (because it has no dependencies on reserved blocks). + * + * It should be called with sb-s_lock held * * If we _really_ wanted, we could use default values to call ext3_group_add() * allow the remount trick to work for arbitrary resizing, assuming enough @@ -959,7 +961,6 @@ int ext3_group_extend(struct super_block goto exit_put; } - lock_super(sb); if (o_blocks_count != le32_to_cpu(es-s_blocks_count)) { ext3_warning(sb, __FUNCTION__, multiple resizers run on filesystem!\n); @@ -978,7 +979,6 @@ int ext3_group_extend(struct super_block es-s_blocks_count = cpu_to_le32(o_blocks_count + add); ext3_journal_dirty_metadata(handle, EXT3_SB(sb)-s_sbh); sb-s_dirt = 1; - unlock_super(sb); ext3_debug(freeing blocks %ld through %ld\n, o_blocks_count, o_blocks_count + add); ext3_free_blocks_sb(handle, sb, o_blocks_count, add, freed_blocks); diff -up linux-2.6.13-rc6-orig/fs/ext3/super.c linux/fs/ext3/super.c --- linux-2.6.13-rc6-orig/fs/ext3/super.c 2005-08-24 17:48:22.0 -0300 +++ linux/fs/ext3/super.c 2005-08-24 15:13:16.0 -0300 @@ -639,8 +639,8 @@ static match_table_t tokens = { {Opt_quota, quota}, {Opt_quota, usrquota}, {Opt_barrier, barrier=%u}, + {Opt_resize, resize=%u}, {Opt_err, NULL}, - {Opt_resize, resize}, }; static unsigned long get_sb_block(void **data) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/