On Sat, 05 Mar 2011 22:03:02 +0000 Ben Hutchings <[email protected]> wrote:
> On Fri, 2011-03-04 at 16:55 -0800, Greg KH wrote: > > 2.6.37-stable review patch. If anyone has any objections, please let us > > know. > > > > ------------------ > > > > From: NeilBrown <[email protected]> > > > > commit 93b270f76e7ef3b81001576860c2701931cdc78b upstream. > > > > There are two cases when we call flush_disk. > > In one, the device has disappeared (check_disk_change) so any > > data will hold becomes irrelevant. > > In the oter, the device has changed size (check_disk_size_change) > > so data we hold may be irrelevant. > [...] > > This regression was introduced by commit 608aeef17a which causes > > check_disk_size_change to call flush_disk, so it is suitable for any > > kernel since 2.6.27. > [...] > > Unfortunately this fix appears to depend on at least these changes in > 2.6.37, and so is not immediately applicable to earlier versions: .... > Are you working on or have you submitted a backport yet? > The following would be that equivalent patch backported to 2.6.36. However given the complexity, it might be better to just revert 608aeef17a as has been suggested in a separate thread... No definite answer yet :-( NeilBrown From 93b270f76e7ef3b81001576860c2701931cdc78b Mon Sep 17 00:00:00 2001 From: NeilBrown <[email protected]> Date: Thu, 24 Feb 2011 17:25:47 +1100 Subject: [PATCH] Fix over-zealous flush_disk when changing device size. There are two cases when we call flush_disk. In one, the device has disappeared (check_disk_change) so any data will hold becomes irrelevant. In the oter, the device has changed size (check_disk_size_change) so data we hold may be irrelevant. In both cases it makes sense to discard any 'clean' buffers, so they will be read back from the device if needed. In the former case it makes sense to discard 'dirty' buffers as there will never be anywhere safe to write the data. In the second case it *does*not* make sense to discard dirty buffers as that will lead to file system corruption when you simply enlarge the containing devices. flush_disk calls __invalidate_devices. __invalidate_device calls both invalidate_inodes and invalidate_bdev. invalidate_inodes *does* discard I_DIRTY inodes and this does lead to fs corruption. invalidate_bev *does*not* discard dirty pages, but I don't really care about that at present. So this patch adds a flag to __invalidate_device (calling it __invalidate_device2) to indicate whether dirty buffers should be killed, and this is passed to invalidate_inodes which can choose to skip dirty inodes. flusk_disk then passes true from check_disk_change and false from check_disk_size_change. dm avoids tripping over this problem by calling i_size_write directly rathher than using check_disk_size_change. md does use check_disk_size_change and so is affected. This regression was introduced by commit 608aeef17a which causes check_disk_size_change to call flush_disk, so it is suitable for any kernel since 2.6.27. Cc: [email protected] Acked-by: Jeff Moyer <[email protected]> Cc: Andrew Patterson <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: NeilBrown <[email protected]> --- diff --git a/block/genhd.c b/block/genhd.c index 59a2db6..bcdb534 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1279,7 +1279,7 @@ int invalidate_partition(struct gendisk *disk, int partno) struct block_device *bdev = bdget_disk(disk, partno); if (bdev) { fsync_bdev(bdev); - res = __invalidate_device(bdev); + res = __invalidate_device(bdev, true); bdput(bdev); } return res; diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index cf04c1b..e36c92d 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -3251,7 +3251,7 @@ static int set_geometry(unsigned int cmd, struct floppy_struct *g, struct block_device *bdev = opened_bdev[cnt]; if (!bdev || ITYPE(drive_state[cnt].fd_device) != type) continue; - __invalidate_device(bdev); + __invalidate_device(bdev, true); } mutex_unlock(&open_lock); } else { diff --git a/fs/block_dev.c b/fs/block_dev.c index 50e8c85..b419fe2 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1198,14 +1198,15 @@ EXPORT_SYMBOL(open_by_devnum); * flush_disk - invalidates all buffer-cache entries on a disk * * @bdev: struct block device to be flushed + * @kill_dirty: flag to guide handling of dirty inodes * * Invalidates all buffer-cache entries on a disk. It should be called * when a disk has been changed -- either by a media change or online * resize. */ -static void flush_disk(struct block_device *bdev) +static void flush_disk(struct block_device *bdev, bool kill_dirty) { - if (__invalidate_device(bdev)) { + if (__invalidate_device(bdev, kill_dirty)) { char name[BDEVNAME_SIZE] = ""; if (bdev->bd_disk) @@ -1242,7 +1243,7 @@ void check_disk_size_change(struct gendisk *disk, struct block_device *bdev) "%s: detected capacity change from %lld to %lld\n", name, bdev_size, disk_size); i_size_write(bdev->bd_inode, disk_size); - flush_disk(bdev); + flush_disk(bdev, false); } } EXPORT_SYMBOL(check_disk_size_change); @@ -1294,7 +1295,7 @@ int check_disk_change(struct block_device *bdev) if (!bdops->media_changed(bdev->bd_disk)) return 0; - flush_disk(bdev); + flush_disk(bdev, true); if (bdops->revalidate_disk) bdops->revalidate_disk(bdev->bd_disk); return 1; @@ -1761,7 +1762,7 @@ void close_bdev_exclusive(struct block_device *bdev, fmode_t mode) EXPORT_SYMBOL(close_bdev_exclusive); -int __invalidate_device(struct block_device *bdev) +int __invalidate_device(struct block_device *bdev, bool kill_dirty) { struct super_block *sb = get_super(bdev); int res = 0; @@ -1774,7 +1775,7 @@ int __invalidate_device(struct block_device *bdev) * hold). */ shrink_dcache_sb(sb); - res = invalidate_inodes(sb); + res = invalidate_inodes(sb, kill_dirty); drop_super(sb); } invalidate_bdev(bdev); diff --git a/fs/ext2/super.c b/fs/ext2/super.c index 1ec6026..0e8e235 100644 --- a/fs/ext2/super.c +++ b/fs/ext2/super.c @@ -1221,7 +1221,7 @@ static int ext2_remount (struct super_block * sb, int * flags, char * data) es = sbi->s_es; if (((sbi->s_mount_opt & EXT2_MOUNT_XIP) != (old_mount_opt & EXT2_MOUNT_XIP)) && - invalidate_inodes(sb)) { + invalidate_inodes(sb, false)) { ext2_msg(sb, KERN_WARNING, "warning: refusing change of " "xip flag with busy inodes while remounting"); sbi->s_mount_opt &= ~EXT2_MOUNT_XIP; diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 4d4b1e8..8bd86d6 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -1264,7 +1264,7 @@ fail_sb: fail_locking: init_locking(sdp, &mount_gh, UNDO); fail_lm: - invalidate_inodes(sb); + invalidate_inodes(sb,true); gfs2_gl_hash_clear(sdp); gfs2_lm_unmount(sdp); fail_sys: diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 77cb9f8..a43b399 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -857,7 +857,7 @@ restart: gfs2_clear_rgrpd(sdp); gfs2_jindex_free(sdp); /* Take apart glock structures and buffer lists */ - invalidate_inodes(sdp->sd_vfs); + invalidate_inodes(sdp->sd_vfs, true); gfs2_gl_hash_clear(sdp); /* Unmount the locking protocol */ gfs2_lm_unmount(sdp); diff --git a/fs/inode.c b/fs/inode.c index 8646433..567f097 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -354,7 +354,8 @@ static void dispose_list(struct list_head *head) /* * Invalidate all inodes for a device. */ -static int invalidate_list(struct list_head *head, struct list_head *dispose) +static int invalidate_list(struct list_head *head, struct list_head *dispose, + bool kill_dirty) { struct list_head *next; int busy = 0, count = 0; @@ -378,6 +379,10 @@ static int invalidate_list(struct list_head *head, struct list_head *dispose) inode = list_entry(tmp, struct inode, i_sb_list); if (inode->i_state & I_NEW) continue; + if (inode->i_state & I_DIRTY && !kill_dirty) { + busy = 1; + continue; + } invalidate_inode_buffers(inode); if (!atomic_read(&inode->i_count)) { list_move(&inode->i_list, dispose); @@ -396,12 +401,13 @@ static int invalidate_list(struct list_head *head, struct list_head *dispose) /** * invalidate_inodes - discard the inodes on a device * @sb: superblock + * @kill_dirty: flag to guide handling of dirty inodes * * Discard all of the inodes for a given superblock. If the discard * fails because there are busy inodes then a non zero value is returned. * If the discard is successful all the inodes have been discarded. */ -int invalidate_inodes(struct super_block *sb) +int invalidate_inodes(struct super_block *sb, bool kill_dirty) { int busy; LIST_HEAD(throw_away); @@ -409,7 +415,7 @@ int invalidate_inodes(struct super_block *sb) down_write(&iprune_sem); spin_lock(&inode_lock); fsnotify_unmount_inodes(&sb->s_inodes); - busy = invalidate_list(&sb->s_inodes, &throw_away); + busy = invalidate_list(&sb->s_inodes, &throw_away, kill_dirty); spin_unlock(&inode_lock); dispose_list(&throw_away); diff --git a/fs/ntfs/super.c b/fs/ntfs/super.c index 5128061..99892dc 100644 --- a/fs/ntfs/super.c +++ b/fs/ntfs/super.c @@ -3048,7 +3048,7 @@ iput_tmp_ino_err_out_now: * method again... FIXME: Do we need to do this twice now because of * attribute inodes? I think not, so leave as is for now... (AIA) */ - if (invalidate_inodes(sb)) { + if (invalidate_inodes(sb, true)) { ntfs_error(sb, "Busy inodes left. This is most likely a NTFS " "driver bug."); /* Copied from fs/super.c. I just love this message. (-; */ diff --git a/fs/smbfs/inode.c b/fs/smbfs/inode.c index 450c919..bde9543 100644 --- a/fs/smbfs/inode.c +++ b/fs/smbfs/inode.c @@ -229,7 +229,7 @@ smb_invalidate_inodes(struct smb_sb_info *server) { VERBOSE("\n"); shrink_dcache_sb(SB_of(server)); - invalidate_inodes(SB_of(server)); + invalidate_inodes(SB_of(server), true); } /* diff --git a/fs/super.c b/fs/super.c index 8819e3a..bc56d70 100644 --- a/fs/super.c +++ b/fs/super.c @@ -274,13 +274,13 @@ void generic_shutdown_super(struct super_block *sb) sb->s_flags &= ~MS_ACTIVE; /* bad name - it should be evict_inodes() */ - invalidate_inodes(sb); + invalidate_inodes(sb, true); if (sop->put_super) sop->put_super(sb); /* Forget any remaining inodes */ - if (invalidate_inodes(sb)) { + if (invalidate_inodes(sb, true)) { printk("VFS: Busy inodes after unmount of %s. " "Self-destruct in 5 seconds. Have a nice day...\n", sb->s_id); diff --git a/include/linux/fs.h b/include/linux/fs.h index 63d069b..3819625 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2073,10 +2073,10 @@ extern void check_disk_size_change(struct gendisk *disk, struct block_device *bdev); extern int revalidate_disk(struct gendisk *); extern int check_disk_change(struct block_device *); -extern int __invalidate_device(struct block_device *); +extern int __invalidate_device(struct block_device *, bool); extern int invalidate_partition(struct gendisk *, int); #endif -extern int invalidate_inodes(struct super_block *); +extern int invalidate_inodes(struct super_block *, bool); unsigned long invalidate_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t end);
signature.asc
Description: PGP signature
_______________________________________________ stable mailing list [email protected] http://linux.kernel.org/mailman/listinfo/stable
