Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
On Fri, 23 Jan 2015 17:59:49 +0100, David Sterba wrote: On Wed, Jan 21, 2015 at 03:04:02PM +0800, Miao Xie wrote: Pending changes are *not* only mount options. Feature change and label change are also pending changes if using sysfs. My miss, I don't notice feature and label change by sysfs. But the implementation of feature and label change by sysfs is wrong, we can not change them without write permission. Label change does not happen if the fs is readonly. If the filesystem is RW and label is changed through sysfs, then remount to RO will sync the filesystem and the new label will be saved. The sysfs features write handler is missing that protection though, I'll send a patch. First, the R/O protection is so cheap, there is a race between R/O remount and label/feature change, please consider the following case: Remount R/O taskLabel/Attr Change Task Check R/O remount ro R/O change Label/feature Second, it forgets to handle the freezing event. For freeze, it's not the same problem since the fs will be unfreeze sooner or later and transaction will be initiated. You can not assume the operations of the users, they might freeze the fs and then shutdown the machine. The semantics of freezing should make the on-device image consistent, but still keep some changes in memory. For example, if we change the features/label through sysfs, and then umount the fs, It is different from pending change. No, now features/label changing using sysfs both use pending changes to do the commit. See BTRFS_PENDING_COMMIT bit. So freeze - change features/label - sync will still cause the deadlock in the same way, and you can try it yourself. As I said above, the implementation of sysfs feature and label change is wrong, it is better to separate them from the pending mount option change, make the sysfs feature and label change be done in the context of transaction after getting the write permission. If so, we needn't do anything special when sync the fs. That would mean to drop the write support of sysfs files that change global filesystem state (label and features right now). This would leave only the ioctl way to do that. I'd like to keep the sysfs write support though for ease of use from scripts and languages not ioctl-friendly. . not drop the write support of sysfs, just fix the bug and make it change the label and features under the writable context. Thanks Miao -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
On Wed, Jan 21, 2015 at 03:47:54PM +0800, Qu Wenruo wrote: To David: I'm a little curious about why inode_cache needs to be delayed to next transaction. In btrfs_remount() we have s_umount mutex, and we synced the whole filesystem already, so there should be no running transaction and we can just set any mount option into fs_info. See our discussion under the noinode_cache option: http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg30075.html http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg30414.html What do you think about reverting the whole patchset and rework the sysfs interface? IMO reverting should be the last option, we have a minimal fix to the sync deadlock and you've proposed the per-trasaction mount options to replace the pending inode_change. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
On Wed, Jan 21, 2015 at 03:04:02PM +0800, Miao Xie wrote: Pending changes are *not* only mount options. Feature change and label change are also pending changes if using sysfs. My miss, I don't notice feature and label change by sysfs. But the implementation of feature and label change by sysfs is wrong, we can not change them without write permission. Label change does not happen if the fs is readonly. If the filesystem is RW and label is changed through sysfs, then remount to RO will sync the filesystem and the new label will be saved. The sysfs features write handler is missing that protection though, I'll send a patch. For freeze, it's not the same problem since the fs will be unfreeze sooner or later and transaction will be initiated. You can not assume the operations of the users, they might freeze the fs and then shutdown the machine. The semantics of freezing should make the on-device image consistent, but still keep some changes in memory. For example, if we change the features/label through sysfs, and then umount the fs, It is different from pending change. No, now features/label changing using sysfs both use pending changes to do the commit. See BTRFS_PENDING_COMMIT bit. So freeze - change features/label - sync will still cause the deadlock in the same way, and you can try it yourself. As I said above, the implementation of sysfs feature and label change is wrong, it is better to separate them from the pending mount option change, make the sysfs feature and label change be done in the context of transaction after getting the write permission. If so, we needn't do anything special when sync the fs. That would mean to drop the write support of sysfs files that change global filesystem state (label and features right now). This would leave only the ioctl way to do that. I'd like to keep the sysfs write support though for ease of use from scripts and languages not ioctl-friendly. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
On Wed, 21 Jan 2015 15:47:54 +0800, Qu Wenruo wrote: On Wed, 21 Jan 2015 11:53:34 +0800, Qu Wenruo wrote: [snipped] This will cause another problem, nobody can ensure there will be next transaction and the change may never to written into disk. First, the pending changes is mount option, that is in-memory data. Second, the same problem would happen after you freeze fs. Pending changes are *not* only mount options. Feature change and label change are also pending changes if using sysfs. My miss, I don't notice feature and label change by sysfs. But the implementation of feature and label change by sysfs is wrong, we can not change them without write permission. Normal ioctl label changing is not affected. For freeze, it's not the same problem since the fs will be unfreeze sooner or later and transaction will be initiated. You can not assume the operations of the users, they might freeze the fs and then shutdown the machine. For example, if we change the features/label through sysfs, and then umount the fs, It is different from pending change. No, now features/label changing using sysfs both use pending changes to do the commit. See BTRFS_PENDING_COMMIT bit. So freeze - change features/label - sync will still cause the deadlock in the same way, and you can try it yourself. As I said above, the implementation of sysfs feature and label change is wrong, it is better to separate them from the pending mount option change, make the sysfs feature and label change be done in the context of transaction after getting the write permission. If so, we needn't do anything special when sync the fs. In short, changing the sysfs feature and label change implementation and removing the unnecessary btrfs_start_transaction in sync_fs can fix the deadlock. Your method will only fix the deadlock, but will introduce the risk like pending inode_cache will never be written to disk as I already explained. (If still using the fs_info-pending_changes mechanism) To ensure pending changes written to disk sync_fs() should start a transaction if needed, or there will be chance that no transaction can handle it. We are sure that writting down the inode cache need transaction. But INODE_CACHE is not a forcible flag. Sometimes though you set it, it is very likely that the inode cache files are not created and the data is not written down because the fs might still be reading inode usage information, and this operation might span several transactions. So I think what you worried is not a problem. Thanks Miao But I don't see the necessity to pending current work(inode_cache, feature/label changes) to next transaction. To David: I'm a little curious about why inode_cache needs to be delayed to next transaction. In btrfs_remount() we have s_umount mutex, and we synced the whole filesystem already, so there should be no running transaction and we can just set any mount option into fs_info. Or even in worst case, there is a racing window, we can still start a transaction and do the commit, a little overhead in such minor case won't impact the overall performance. For sysfs change, I prefer attach or start transaction method, and for mount option change, and such sysfs tuning is also minor case for a filesystem. What do you think about reverting the whole patchset and rework the sysfs interface? Thanks, Qu Thanks Miao Thanks, Qu If you want to change features/label, you should get write permission and make sure the fs is not be freezed because those are on-disk data. So the problem doesn't exist, or there is a bug. Thanks Miao since there is no write, there is no running transaction and if we don't start a new transaction, it won't be flushed to disk. Thanks, Qu the reason is: - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs) - Data on the disk is right and integrated Thanks Miao . . . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; + /* + * Test if the fs is frozen, or start_trasaction + * will deadlock on itself. + */ + if (__sb_start_write(sb, SB_FREEZE_FS, false)) + __sb_end_write(sb, SB_FREEZE_FS); + else + return 0; The more I look into that the more I think that the first fix is the right one. Has been pointed out in this thread, it is ok to skip processing the pending changes if the filesystem is frozen. The pending changes have to flushed from sync (by design), we cannot use mnt_want_write or the sb_start* protections that. The btrfs_freeze callback can safely do the last commit, that's under s_umount held by vfs::freeze_super. Then any other new transaction would block. Any other call to btrfs_sync_fs will not find any active transaction and with this patch will not start one. Sounds safe to me. I think the right level to check is SB_FREEZE_WRITE though, to stop any potential writes as soon as possible and when the s_umount lock is still held in vfs::freeze_super. I'll collect the relevant patches and will send it for review. trans = btrfs_start_transaction(root, 0); } else { return PTR_ERR(trans); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 01:13 On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; + /* +* Test if the fs is frozen, or start_trasaction +* will deadlock on itself. +*/ + if (__sb_start_write(sb, SB_FREEZE_FS, false)) + __sb_end_write(sb, SB_FREEZE_FS); + else + return 0; But what if someone freezes the FS after __sb_end_write() and before btrfs_start_transaction()? I don't see what keeps new freezers from coming in. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Chris Mason c...@fb.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 09:05 On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 01:13 On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; +/* + * Test if the fs is frozen, or start_trasaction + * will deadlock on itself. + */ +if (__sb_start_write(sb, SB_FREEZE_FS, false)) +__sb_end_write(sb, SB_FREEZE_FS); +else +return 0; But what if someone freezes the FS after __sb_end_write() and before btrfs_start_transaction()? I don't see what keeps new freezers from coming in. -chris Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so freeze will not happen during sync. Thanks, Qu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
On Tue, 20 Jan 2015 20:10:56 -0500, Chris Mason wrote: On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Chris Mason c...@fb.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 09:05 On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 01:13 On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; +/* + * Test if the fs is frozen, or start_trasaction + * will deadlock on itself. + */ +if (__sb_start_write(sb, SB_FREEZE_FS, false)) +__sb_end_write(sb, SB_FREEZE_FS); +else +return 0; But what if someone freezes the FS after __sb_end_write() and before btrfs_start_transaction()? I don't see what keeps new freezers from coming in. -chris Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so freeze will not happen during sync. You're right. I was worried about the sync ioctl, but the mutex won't be held there to deadlock against. We'll be fine. There is another problem which is introduced by pending change. That is we will start and commmit a transaction by changing pending mount option after we set the fs to be R/O. I think it is better that we don't start a new transaction for pending changes which are set after the transaction is committed, just make them be handled by the next transaction, the reason is: - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs) - Data on the disk is right and integrated Thanks Miao -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Chris Mason c...@fb.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 09:05 On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 01:13 On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; +/* + * Test if the fs is frozen, or start_trasaction + * will deadlock on itself. + */ +if (__sb_start_write(sb, SB_FREEZE_FS, false)) +__sb_end_write(sb, SB_FREEZE_FS); +else +return 0; But what if someone freezes the FS after __sb_end_write() and before btrfs_start_transaction()? I don't see what keeps new freezers from coming in. -chris Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so freeze will not happen during sync. You're right. I was worried about the sync ioctl, but the mutex won't be held there to deadlock against. We'll be fine. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 01:13 On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; + /* +* Test if the fs is frozen, or start_trasaction +* will deadlock on itself. +*/ + if (__sb_start_write(sb, SB_FREEZE_FS, false)) + __sb_end_write(sb, SB_FREEZE_FS); + else + return 0; The more I look into that the more I think that the first fix is the right one. Has been pointed out in this thread, it is ok to skip processing the pending changes if the filesystem is frozen. That's good, for me, either this patch or the patch 2~5 in the patchset will solve the sync_fs() problem on frozen fs. Just different timing to start the new transaction. But the patchset one has the problem, which needs to deal with the sysfs interface changes, or sync_fs() will still cause deadlock. So I tried to revert the sysfs related patches, but it seems overkilled, needing extra btrfs_start_transaction* things. As you already picked this one, I'm completely OK with this. The pending changes have to flushed from sync (by design), we cannot use mnt_want_write or the sb_start* protections that. The btrfs_freeze callback can safely do the last commit, that's under s_umount held by vfs::freeze_super. Then any other new transaction would block. Any other call to btrfs_sync_fs will not find any active transaction and with this patch will not start one. Sounds safe to me. I think the right level to check is SB_FREEZE_WRITE though, to stop any potential writes as soon as possible and when the s_umount lock is still held in vfs::freeze_super. SB_FREEZE_WRITE seems good for me. But I didn't catch the difference between SB_FREEZE_FS(WRITE/PAGEFAULT/COMPLETE), since freeze() conflicts with sync_fs(), when we comes to btrfs_sync_fs(), the fs is either totally frozen or unfrozen and frozen level won't change during the protection of s_umount. Although SB_FREEZE_WRITE seems better in its meaning and makes it more readable. I'll collect the relevant patches and will send it for review. Thanks for collecting them and sending them out. Thanks, Qu trans = btrfs_start_transaction(root, 0); } else { return PTR_ERR(trans); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Miao Xie miao...@huawei.com To: Qu Wenruo quwen...@cn.fujitsu.com, Chris Mason c...@fb.com Date: 2015年01月21日 11:26 On Wed, 21 Jan 2015 11:15:41 +0800, Qu Wenruo wrote: Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Miao Xie miao...@huawei.com To: Chris Mason c...@fb.com, Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 11:10 On Tue, 20 Jan 2015 20:10:56 -0500, Chris Mason wrote: On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Chris Mason c...@fb.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 09:05 On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 01:13 On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; +/* + * Test if the fs is frozen, or start_trasaction + * will deadlock on itself. + */ +if (__sb_start_write(sb, SB_FREEZE_FS, false)) +__sb_end_write(sb, SB_FREEZE_FS); +else +return 0; But what if someone freezes the FS after __sb_end_write() and before btrfs_start_transaction()? I don't see what keeps new freezers from coming in. -chris Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so freeze will not happen during sync. You're right. I was worried about the sync ioctl, but the mutex won't be held there to deadlock against. We'll be fine. There is another problem which is introduced by pending change. That is we will start and commmit a transaction by changing pending mount option after we set the fs to be R/O. Oh, I missed this problem. I think it is better that we don't start a new transaction for pending changes which are set after the transaction is committed, just make them be handled by the next transaction, This will cause another problem, nobody can ensure there will be next transaction and the change may never to written into disk. First, the pending changes is mount option, that is in-memory data. Second, the same problem would happen after you freeze fs. Pending changes are *not* only mount options. Feature change and label change are also pending changes if using sysfs. Normal ioctl label changing is not affected. For freeze, it's not the same problem since the fs will be unfreeze sooner or later and transaction will be initiated. For example, if we change the features/label through sysfs, and then umount the fs, It is different from pending change. No, now features/label changing using sysfs both use pending changes to do the commit. See BTRFS_PENDING_COMMIT bit. So freeze - change features/label - sync will still cause the deadlock in the same way, and you can try it yourself. Thanks, Qu If you want to change features/label, you should get write permission and make sure the fs is not be freezed because those are on-disk data. So the problem doesn't exist, or there is a bug. Thanks Miao since there is no write, there is no running transaction and if we don't start a new transaction, it won't be flushed to disk. Thanks, Qu the reason is: - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs) - Data on the disk is right and integrated Thanks Miao . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
On Wed, 21 Jan 2015 11:53:34 +0800, Qu Wenruo wrote: +/* + * Test if the fs is frozen, or start_trasaction + * will deadlock on itself. + */ +if (__sb_start_write(sb, SB_FREEZE_FS, false)) +__sb_end_write(sb, SB_FREEZE_FS); +else +return 0; But what if someone freezes the FS after __sb_end_write() and before btrfs_start_transaction()? I don't see what keeps new freezers from coming in. -chris Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so freeze will not happen during sync. You're right. I was worried about the sync ioctl, but the mutex won't be held there to deadlock against. We'll be fine. There is another problem which is introduced by pending change. That is we will start and commmit a transaction by changing pending mount option after we set the fs to be R/O. Oh, I missed this problem. I think it is better that we don't start a new transaction for pending changes which are set after the transaction is committed, just make them be handled by the next transaction, This will cause another problem, nobody can ensure there will be next transaction and the change may never to written into disk. First, the pending changes is mount option, that is in-memory data. Second, the same problem would happen after you freeze fs. Pending changes are *not* only mount options. Feature change and label change are also pending changes if using sysfs. My miss, I don't notice feature and label change by sysfs. But the implementation of feature and label change by sysfs is wrong, we can not change them without write permission. Normal ioctl label changing is not affected. For freeze, it's not the same problem since the fs will be unfreeze sooner or later and transaction will be initiated. You can not assume the operations of the users, they might freeze the fs and then shutdown the machine. For example, if we change the features/label through sysfs, and then umount the fs, It is different from pending change. No, now features/label changing using sysfs both use pending changes to do the commit. See BTRFS_PENDING_COMMIT bit. So freeze - change features/label - sync will still cause the deadlock in the same way, and you can try it yourself. As I said above, the implementation of sysfs feature and label change is wrong, it is better to separate them from the pending mount option change, make the sysfs feature and label change be done in the context of transaction after getting the write permission. If so, we needn't do anything special when sync the fs. In short, changing the sysfs feature and label change implementation and removing the unnecessary btrfs_start_transaction in sync_fs can fix the deadlock. Thanks Miao Thanks, Qu If you want to change features/label, you should get write permission and make sure the fs is not be freezed because those are on-disk data. So the problem doesn't exist, or there is a bug. Thanks Miao since there is no write, there is no running transaction and if we don't start a new transaction, it won't be flushed to disk. Thanks, Qu the reason is: - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs) - Data on the disk is right and integrated Thanks Miao . . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Miao Xie miao...@huawei.com To: Qu Wenruo quwen...@cn.fujitsu.com, Chris Mason c...@fb.com Date: 2015年01月21日 15:04 On Wed, 21 Jan 2015 11:53:34 +0800, Qu Wenruo wrote: [snipped] This will cause another problem, nobody can ensure there will be next transaction and the change may never to written into disk. First, the pending changes is mount option, that is in-memory data. Second, the same problem would happen after you freeze fs. Pending changes are *not* only mount options. Feature change and label change are also pending changes if using sysfs. My miss, I don't notice feature and label change by sysfs. But the implementation of feature and label change by sysfs is wrong, we can not change them without write permission. Normal ioctl label changing is not affected. For freeze, it's not the same problem since the fs will be unfreeze sooner or later and transaction will be initiated. You can not assume the operations of the users, they might freeze the fs and then shutdown the machine. For example, if we change the features/label through sysfs, and then umount the fs, It is different from pending change. No, now features/label changing using sysfs both use pending changes to do the commit. See BTRFS_PENDING_COMMIT bit. So freeze - change features/label - sync will still cause the deadlock in the same way, and you can try it yourself. As I said above, the implementation of sysfs feature and label change is wrong, it is better to separate them from the pending mount option change, make the sysfs feature and label change be done in the context of transaction after getting the write permission. If so, we needn't do anything special when sync the fs. In short, changing the sysfs feature and label change implementation and removing the unnecessary btrfs_start_transaction in sync_fs can fix the deadlock. Your method will only fix the deadlock, but will introduce the risk like pending inode_cache will never be written to disk as I already explained. (If still using the fs_info-pending_changes mechanism) To ensure pending changes written to disk sync_fs() should start a transaction if needed, or there will be chance that no transaction can handle it. But I don't see the necessity to pending current work(inode_cache, feature/label changes) to next transaction. To David: I'm a little curious about why inode_cache needs to be delayed to next transaction. In btrfs_remount() we have s_umount mutex, and we synced the whole filesystem already, so there should be no running transaction and we can just set any mount option into fs_info. Or even in worst case, there is a racing window, we can still start a transaction and do the commit, a little overhead in such minor case won't impact the overall performance. For sysfs change, I prefer attach or start transaction method, and for mount option change, and such sysfs tuning is also minor case for a filesystem. What do you think about reverting the whole patchset and rework the sysfs interface? Thanks, Qu Thanks Miao Thanks, Qu If you want to change features/label, you should get write permission and make sure the fs is not be freezed because those are on-disk data. So the problem doesn't exist, or there is a bug. Thanks Miao since there is no write, there is no running transaction and if we don't start a new transaction, it won't be flushed to disk. Thanks, Qu the reason is: - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs) - Data on the disk is right and integrated Thanks Miao . . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月19日 22:06 On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: The fix is to check if the fs is frozen, if the fs is frozen, just return and waiting for the next transaction. --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; + /* +* Test if the fs is frozen, or start_trasaction +* will deadlock on itself. +*/ + if (__sb_start_write(sb, SB_FREEZE_FS, false)) + __sb_end_write(sb, SB_FREEZE_FS); + else + return 0; I'm not sure this is the right fix. We should use either mnt_want_write_file or sb_start_write around the start/commit functions. The fs may be frozen already, but we also have to catch transition to that state, or RO remount. But the deadlock between s_umount and frozen level is a larger problem... Even Miao mentioned that we can start a transaction in btrfs_freeze(), but there is still possibility that we try to change the feature of the frozen btrfs and do sync, again the deadlock will happen. Although handling in btrfs_freeze() is also needed, but can't resolve all the problem. IMHO the fix is still needed, or at least as a workaround until we find a real root solution for it (If nobody want to revert the patchset) BTW, what about put the pending changes to a workqueue? If we don't start transaction under s_umount context like sync_fs() Thanks, Qu Also, returning 0 is not right, the ioctl actually skipped the expected work. trans = btrfs_start_transaction(root, 0); } else { return PTR_ERR(trans); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
On Mon, 19 Jan 2015 15:42:41 +0800, Qu Wenruo wrote: Commit 6b5fe46dfa52 (btrfs: do commit in sync_fs if there are pending changes) will call btrfs_start_transaction() in sync_fs(), to handle some operations needed to be done in next transaction. However this can cause deadlock if the filesystem is frozen, with the following sys_r+w output: [ 143.255932] Call Trace: [ 143.255936] [816c0e09] schedule+0x29/0x70 [ 143.255939] [811cb7f3] __sb_start_write+0xb3/0x100 [ 143.255971] [a040ec06] start_transaction+0x2e6/0x5a0 [btrfs] [ 143.255992] [a040f1eb] btrfs_start_transaction+0x1b/0x20 [btrfs] [ 143.256003] [a03dc0ba] btrfs_sync_fs+0xca/0xd0 [btrfs] [ 143.256007] [811f7be0] sync_fs_one_sb+0x20/0x30 [ 143.256011] [811cbd01] iterate_supers+0xe1/0xf0 [ 143.256014] [811f7d75] sys_sync+0x55/0x90 [ 143.256017] [816c49d2] system_call_fastpath+0x12/0x17 [ 143.256111] Call Trace: [ 143.256114] [816c0e09] schedule+0x29/0x70 [ 143.256119] [816c3405] rwsem_down_write_failed+0x1c5/0x2d0 [ 143.256123] [8133f013] call_rwsem_down_write_failed+0x13/0x20 [ 143.256131] [811caae8] thaw_super+0x28/0xc0 [ 143.256135] [811db3e5] do_vfs_ioctl+0x3f5/0x540 [ 143.256187] [811db5c1] SyS_ioctl+0x91/0xb0 [ 143.256213] [816c49d2] system_call_fastpath+0x12/0x17 The reason is like the following: (Holding s_umount) VFS sync_fs staff: |- btrfs_sync_fs() |- btrfs_start_transaction() |- sb_start_intwrite() (Waiting thaw_fs to unfreeze) VFS thaw_fs staff: thaw_fs() (Waiting sync_fs to release s_umount) So deadlock happens. This can be easily triggered by fstest/generic/068 with inode_cache mount option. The fix is to check if the fs is frozen, if the fs is frozen, just return and waiting for the next transaction. Cc: David Sterba dste...@suse.cz Reported-by: Gui Hecheng guihc.f...@cn.fujitsu.com Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- fs/btrfs/super.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 60f7cbe..1d9f1e6 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; I think the problem is here -- why -pending_changes is not 0 when the filesystem is frozen? so I think the reason of this problem is btrfs_freeze forget to deal with the pending changes, and the correct fix is to correct the behavior of btrfs_freeze(). Thanks Miao + /* + * Test if the fs is frozen, or start_trasaction + * will deadlock on itself. + */ + if (__sb_start_write(sb, SB_FREEZE_FS, false)) + __sb_end_write(sb, SB_FREEZE_FS); + else + return 0; trans = btrfs_start_transaction(root, 0); } else { return PTR_ERR(trans); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
Add CC to Miao Xie miao...@huawei.com Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Qu Wenruo quwen...@cn.fujitsu.com To: dste...@suse.cz, linux-btrfs@vger.kernel.org, Miao Xie mi...@cn.fujitsu.com Date: 2015年01月20日 10:51 Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月19日 22:06 On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: The fix is to check if the fs is frozen, if the fs is frozen, just return and waiting for the next transaction. --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; +/* + * Test if the fs is frozen, or start_trasaction + * will deadlock on itself. + */ +if (__sb_start_write(sb, SB_FREEZE_FS, false)) +__sb_end_write(sb, SB_FREEZE_FS); +else +return 0; I'm not sure this is the right fix. We should use either mnt_want_write_file or sb_start_write around the start/commit functions. The fs may be frozen already, but we also have to catch transition to that state, or RO remount. But the deadlock between s_umount and frozen level is a larger problem... Even Miao mentioned that we can start a transaction in btrfs_freeze(), but there is still possibility that we try to change the feature of the frozen btrfs and do sync, again the deadlock will happen. Although handling in btrfs_freeze() is also needed, but can't resolve all the problem. IMHO the fix is still needed, or at least as a workaround until we find a real root solution for it (If nobody want to revert the patchset) BTW, what about put the pending changes to a workqueue? If we don't start transaction under s_umount context like sync_fs() Thanks, Qu Also, returning 0 is not right, the ioctl actually skipped the expected work. trans = btrfs_start_transaction(root, 0); } else { return PTR_ERR(trans); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Miao Xie miao...@huawei.com To: Qu Wenruo quwen...@cn.fujitsu.com, linux-btrfs@vger.kernel.org Date: 2015年01月20日 08:19 On Mon, 19 Jan 2015 15:42:41 +0800, Qu Wenruo wrote: Commit 6b5fe46dfa52 (btrfs: do commit in sync_fs if there are pending changes) will call btrfs_start_transaction() in sync_fs(), to handle some operations needed to be done in next transaction. However this can cause deadlock if the filesystem is frozen, with the following sys_r+w output: [ 143.255932] Call Trace: [ 143.255936] [816c0e09] schedule+0x29/0x70 [ 143.255939] [811cb7f3] __sb_start_write+0xb3/0x100 [ 143.255971] [a040ec06] start_transaction+0x2e6/0x5a0 [btrfs] [ 143.255992] [a040f1eb] btrfs_start_transaction+0x1b/0x20 [btrfs] [ 143.256003] [a03dc0ba] btrfs_sync_fs+0xca/0xd0 [btrfs] [ 143.256007] [811f7be0] sync_fs_one_sb+0x20/0x30 [ 143.256011] [811cbd01] iterate_supers+0xe1/0xf0 [ 143.256014] [811f7d75] sys_sync+0x55/0x90 [ 143.256017] [816c49d2] system_call_fastpath+0x12/0x17 [ 143.256111] Call Trace: [ 143.256114] [816c0e09] schedule+0x29/0x70 [ 143.256119] [816c3405] rwsem_down_write_failed+0x1c5/0x2d0 [ 143.256123] [8133f013] call_rwsem_down_write_failed+0x13/0x20 [ 143.256131] [811caae8] thaw_super+0x28/0xc0 [ 143.256135] [811db3e5] do_vfs_ioctl+0x3f5/0x540 [ 143.256187] [811db5c1] SyS_ioctl+0x91/0xb0 [ 143.256213] [816c49d2] system_call_fastpath+0x12/0x17 The reason is like the following: (Holding s_umount) VFS sync_fs staff: |- btrfs_sync_fs() |- btrfs_start_transaction() |- sb_start_intwrite() (Waiting thaw_fs to unfreeze) VFS thaw_fs staff: thaw_fs() (Waiting sync_fs to release s_umount) So deadlock happens. This can be easily triggered by fstest/generic/068 with inode_cache mount option. The fix is to check if the fs is frozen, if the fs is frozen, just return and waiting for the next transaction. Cc: David Sterba dste...@suse.cz Reported-by: Gui Hecheng guihc.f...@cn.fujitsu.com Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- fs/btrfs/super.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 60f7cbe..1d9f1e6 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; I think the problem is here -- why -pending_changes is not 0 when the filesystem is frozen? This happens when already no transaction is running but some one set inode_cache or things needs pending. And the freeze follows. so I think the reason of this problem is btrfs_freeze forget to deal with the pending changes, and the correct fix is to correct the behavior of btrfs_freeze(). Great! Thanks for pointing this! Starting a transaction in btrfs_freeze() seems to be the silver bullet for such case. Thanks Qu Thanks Miao + /* +* Test if the fs is frozen, or start_trasaction +* will deadlock on itself. +*/ + if (__sb_start_write(sb, SB_FREEZE_FS, false)) + __sb_end_write(sb, SB_FREEZE_FS); + else + return 0; trans = btrfs_start_transaction(root, 0); } else { return PTR_ERR(trans); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
Commit 6b5fe46dfa52 (btrfs: do commit in sync_fs if there are pending changes) will call btrfs_start_transaction() in sync_fs(), to handle some operations needed to be done in next transaction. However this can cause deadlock if the filesystem is frozen, with the following sys_r+w output: [ 143.255932] Call Trace: [ 143.255936] [816c0e09] schedule+0x29/0x70 [ 143.255939] [811cb7f3] __sb_start_write+0xb3/0x100 [ 143.255971] [a040ec06] start_transaction+0x2e6/0x5a0 [btrfs] [ 143.255992] [a040f1eb] btrfs_start_transaction+0x1b/0x20 [btrfs] [ 143.256003] [a03dc0ba] btrfs_sync_fs+0xca/0xd0 [btrfs] [ 143.256007] [811f7be0] sync_fs_one_sb+0x20/0x30 [ 143.256011] [811cbd01] iterate_supers+0xe1/0xf0 [ 143.256014] [811f7d75] sys_sync+0x55/0x90 [ 143.256017] [816c49d2] system_call_fastpath+0x12/0x17 [ 143.256111] Call Trace: [ 143.256114] [816c0e09] schedule+0x29/0x70 [ 143.256119] [816c3405] rwsem_down_write_failed+0x1c5/0x2d0 [ 143.256123] [8133f013] call_rwsem_down_write_failed+0x13/0x20 [ 143.256131] [811caae8] thaw_super+0x28/0xc0 [ 143.256135] [811db3e5] do_vfs_ioctl+0x3f5/0x540 [ 143.256187] [811db5c1] SyS_ioctl+0x91/0xb0 [ 143.256213] [816c49d2] system_call_fastpath+0x12/0x17 The reason is like the following: (Holding s_umount) VFS sync_fs staff: |- btrfs_sync_fs() |- btrfs_start_transaction() |- sb_start_intwrite() (Waiting thaw_fs to unfreeze) VFS thaw_fs staff: thaw_fs() (Waiting sync_fs to release s_umount) So deadlock happens. This can be easily triggered by fstest/generic/068 with inode_cache mount option. The fix is to check if the fs is frozen, if the fs is frozen, just return and waiting for the next transaction. Cc: David Sterba dste...@suse.cz Reported-by: Gui Hecheng guihc.f...@cn.fujitsu.com Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- fs/btrfs/super.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 60f7cbe..1d9f1e6 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; + /* +* Test if the fs is frozen, or start_trasaction +* will deadlock on itself. +*/ + if (__sb_start_write(sb, SB_FREEZE_FS, false)) + __sb_end_write(sb, SB_FREEZE_FS); + else + return 0; trans = btrfs_start_transaction(root, 0); } else { return PTR_ERR(trans); -- 2.2.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html