Re: [PATCH 2/2] btrfs: fix false enospc for compression
hi, I have rebased these 2 patches against Linux 4.9-rc2, sorry for being late. After applying these patches, Stefan does not see any ENOSPC error :) If you have free time, please have a check, thanks. Regards, Xiaoguang Wang On 11/01/2016 06:18 PM, Wang Xiaoguang wrote: When testing btrfs compression, sometimes we got ENOSPC error, though fs still has much free space, xfstests generic/171, generic/172, generic/173, generic/174, generic/175 can reveal this bug in my test environment when compression is enabled. After some debuging work, we found that it's btrfs_delalloc_reserve_metadata() which sometimes tries to reserve plenty of metadata space, even for very small data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes we try to reserve is calculated by the difference between outstanding_extents and reserved_extents. Please see below case for how ENOSPC occurs: 1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode outstanding extents be 1, and reserved_extents be 1024. Note it's btrfs_merge_extent_hook() that merges these 128KB units into one big outstanding extent, but do not change reserved_extents. 2, When writing dirty pages, for compression, cow_file_range_async() will split above big extent in unit of 128KB(compression extent size is 128KB). When first split opeartion finishes, we'll have 2 outstanding extents and 1024 reserved extents, and just right now the currently generated ordered extent is dispatched to run and complete, then btrfs_delalloc_release_metadata()(see btrfs_finish_ordered_io()) will be called to release metadata, after that we will have 1 outstanding extents and 1 reserved extents(also see logic in drop_outstanding_extent()). Later cow_file_range_async() continues to handles left data range[128KB, 128MB), and if no other ordered extent was dispatched to run, there will be 1023 outstanding extents and 1 reserved extent. 3, Now if another bufferd write for this file enters, then btrfs_delalloc_reserve_metadata() will at least try to reserve metadata for 1023 outstanding extents' metadata, for 16KB node size, it'll be 1023*16384*2*8, about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, so obviously it's not sane and can easily result in enospc error. The root cause is that for compression, its max extent size will no longer be BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation method in btrfs is not appropriate or correct, here we introduce: enum btrfs_metadata_reserve_type { BTRFS_RESERVE_NORMAL, BTRFS_RESERVE_COMPRESS, }; and expand btrfs_delalloc_reserve_metadata() and btrfs_delalloc_reserve_space() by adding a new enum btrfs_metadata_reserve_type argument. When a data range will go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata. Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go through compression path. With this patch, we can fix these false enospc error for compression. Signed-off-by: Wang XiaoguangTested-by: Holger Hoffstätte Tested-by: Stefan Priebe --- fs/btrfs/ctree.h | 31 ++-- fs/btrfs/extent-tree.c | 55 + fs/btrfs/extent_io.c | 59 +- fs/btrfs/extent_io.h | 2 + fs/btrfs/file.c | 28 +-- fs/btrfs/free-space-cache.c | 6 +- fs/btrfs/inode-map.c | 5 +- fs/btrfs/inode.c | 185 --- fs/btrfs/ioctl.c | 12 ++- fs/btrfs/relocation.c| 14 +++- fs/btrfs/tests/inode-tests.c | 15 ++-- 11 files changed, 315 insertions(+), 97 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 52a8535..457b106 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -98,6 +98,19 @@ static const int btrfs_csum_sizes[] = { 4 }; #define BTRFS_DIRTY_METADATA_THRESH SZ_32M +/* + * for compression, max file extent size would be limited to 128K, so when + * reserving metadata for such delalloc writes, pass BTRFS_RESERVE_COMPRESS to + * btrfs_delalloc_reserve_metadata() or btrfs_delalloc_reserve_space() to + * calculate metadata, for none-compression, use BTRFS_RESERVE_NORMAL. + */ +enum btrfs_metadata_reserve_type { + BTRFS_RESERVE_NORMAL, + BTRFS_RESERVE_COMPRESS, +}; +int inode_need_compress(struct inode *inode); +u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type); + #define BTRFS_MAX_EXTENT_SIZE SZ_128M struct btrfs_mapping_tree { @@ -2693,10 +2706,14 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root, void btrfs_subvolume_release_metadata(struct btrfs_root *root, struct btrfs_block_rsv *rsv, u64 qgroup_reserved); -int btrfs_delalloc_reserve_metadata(struct
[PATCH 2/2] btrfs: fix false enospc for compression
When testing btrfs compression, sometimes we got ENOSPC error, though fs still has much free space, xfstests generic/171, generic/172, generic/173, generic/174, generic/175 can reveal this bug in my test environment when compression is enabled. After some debuging work, we found that it's btrfs_delalloc_reserve_metadata() which sometimes tries to reserve plenty of metadata space, even for very small data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes we try to reserve is calculated by the difference between outstanding_extents and reserved_extents. Please see below case for how ENOSPC occurs: 1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode outstanding extents be 1, and reserved_extents be 1024. Note it's btrfs_merge_extent_hook() that merges these 128KB units into one big outstanding extent, but do not change reserved_extents. 2, When writing dirty pages, for compression, cow_file_range_async() will split above big extent in unit of 128KB(compression extent size is 128KB). When first split opeartion finishes, we'll have 2 outstanding extents and 1024 reserved extents, and just right now the currently generated ordered extent is dispatched to run and complete, then btrfs_delalloc_release_metadata()(see btrfs_finish_ordered_io()) will be called to release metadata, after that we will have 1 outstanding extents and 1 reserved extents(also see logic in drop_outstanding_extent()). Later cow_file_range_async() continues to handles left data range[128KB, 128MB), and if no other ordered extent was dispatched to run, there will be 1023 outstanding extents and 1 reserved extent. 3, Now if another bufferd write for this file enters, then btrfs_delalloc_reserve_metadata() will at least try to reserve metadata for 1023 outstanding extents' metadata, for 16KB node size, it'll be 1023*16384*2*8, about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, so obviously it's not sane and can easily result in enospc error. The root cause is that for compression, its max extent size will no longer be BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation method in btrfs is not appropriate or correct, here we introduce: enum btrfs_metadata_reserve_type { BTRFS_RESERVE_NORMAL, BTRFS_RESERVE_COMPRESS, }; and expand btrfs_delalloc_reserve_metadata() and btrfs_delalloc_reserve_space() by adding a new enum btrfs_metadata_reserve_type argument. When a data range will go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata. Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go through compression path. With this patch, we can fix these false enospc error for compression. Signed-off-by: Wang XiaoguangTested-by: Holger Hoffstätte Tested-by: Stefan Priebe --- fs/btrfs/ctree.h | 31 ++-- fs/btrfs/extent-tree.c | 55 + fs/btrfs/extent_io.c | 59 +- fs/btrfs/extent_io.h | 2 + fs/btrfs/file.c | 28 +-- fs/btrfs/free-space-cache.c | 6 +- fs/btrfs/inode-map.c | 5 +- fs/btrfs/inode.c | 185 --- fs/btrfs/ioctl.c | 12 ++- fs/btrfs/relocation.c| 14 +++- fs/btrfs/tests/inode-tests.c | 15 ++-- 11 files changed, 315 insertions(+), 97 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 52a8535..457b106 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -98,6 +98,19 @@ static const int btrfs_csum_sizes[] = { 4 }; #define BTRFS_DIRTY_METADATA_THRESHSZ_32M +/* + * for compression, max file extent size would be limited to 128K, so when + * reserving metadata for such delalloc writes, pass BTRFS_RESERVE_COMPRESS to + * btrfs_delalloc_reserve_metadata() or btrfs_delalloc_reserve_space() to + * calculate metadata, for none-compression, use BTRFS_RESERVE_NORMAL. + */ +enum btrfs_metadata_reserve_type { + BTRFS_RESERVE_NORMAL, + BTRFS_RESERVE_COMPRESS, +}; +int inode_need_compress(struct inode *inode); +u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type); + #define BTRFS_MAX_EXTENT_SIZE SZ_128M struct btrfs_mapping_tree { @@ -2693,10 +2706,14 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root, void btrfs_subvolume_release_metadata(struct btrfs_root *root, struct btrfs_block_rsv *rsv, u64 qgroup_reserved); -int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes); -void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes); -int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len); -void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len); +int btrfs_delalloc_reserve_metadata(struct
Re: [PATCH 2/2] btrfs: fix false enospc for compression
hi, On 10/19/2016 10:23 PM, David Sterba wrote: On Mon, Oct 17, 2016 at 05:01:46PM +0800, Wang Xiaoguang wrote: [..] int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end, - struct extent_state **cached_state); + struct extent_state **cached_state, int flag); int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end, - struct extent_state **cached_state); + struct extent_state **cached_state, int flag); [..] int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode, struct page **pages, size_t num_pages, loff_t pos, size_t write_bytes, - struct extent_state **cached); + struct extent_state **cached, int flag); Instead of adding "int flag" why not use the already defined btrfs_metadata_reserve_type enum? I know it's just an int at the end of the day, but the dedupe support already added another "int dedupe" argument and it's probably easy to cause confusion. Maybe later it would be beneficial to consolidate the flags into a consistent set of enum values to prevent more "int flag" inflation and better declare the intent of the extent state change. Not sure if that makes sense. Yes, agree. I'll rebase them later, thanks. Would be great. I won't manually merge the patch now as it's not a conflict against the current state, btrfs_set_extent_delalloc has the extra parameter already. Please consolidate them before this patch is supposed to be merged. Thanks. Sorry for being late, I have just finished the rebase work now. I'll run some fstests job, if no regressions, I'll send two patches tomorrow :) Regards, Xiaoguang Wang -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs: fix false enospc for compression
On Mon, Oct 17, 2016 at 05:01:46PM +0800, Wang Xiaoguang wrote: > > [..] > >> int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end, > >> -struct extent_state **cached_state); > >> +struct extent_state **cached_state, int flag); > >> int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end, > >> - struct extent_state **cached_state); > >> + struct extent_state **cached_state, int flag); > > [..] > >> int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode, > >> struct page **pages, size_t num_pages, > >> loff_t pos, size_t write_bytes, > >> -struct extent_state **cached); > >> +struct extent_state **cached, int flag); > > Instead of adding "int flag" why not use the already defined > > btrfs_metadata_reserve_type enum? I know it's just an int at the end of > > the day, but the dedupe support already added another "int dedupe" argument > > and it's probably easy to cause confusion. > > Maybe later it would be beneficial to consolidate the flags into a > > consistent > > set of enum values to prevent more "int flag" inflation and better declare > > the > > intent of the extent state change. Not sure if that makes sense. > Yes, agree. > I'll rebase them later, thanks. Would be great. I won't manually merge the patch now as it's not a conflict against the current state, btrfs_set_extent_delalloc has the extra parameter already. Please consolidate them before this patch is supposed to be merged. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs: fix false enospc for compression
On Wed, Oct 12, 2016 at 11:12:42AM +0800, Wang Xiaoguang wrote: > hi, > > Stefan often reports enospc error in his servers when having btrfs > compression > enabled. Now he has applied these 2 patches to run and no enospc error > occurs > for more than 6 days, it seems they are useful :) > > And these 2 patches are somewhat big, please check it, thanks. It is. As per testing results from Stefan and Holger, I'll add them to for-next, but won't queue them for merging until they get Josef's blessing. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs: fix false enospc for compression
hi, On 10/14/2016 09:59 PM, Holger Hoffstätte wrote: On 10/06/16 04:51, Wang Xiaoguang wrote: When testing btrfs compression, sometimes we got ENOSPC error, though fs still has much free space, xfstests generic/171, generic/172, generic/173, generic/174, generic/175 can reveal this bug in my test environment when compression is enabled. After some debuging work, we found that it's btrfs_delalloc_reserve_metadata() which sometimes tries to reserve plenty of metadata space, even for very small data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes we try to reserve is calculated by the difference between outstanding_extents and reserved_extents. Please see below case for how ENOSPC occurs: 1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode outstanding extents be 1, and reserved_extents be 1024. Note it's btrfs_merge_extent_hook() that merges these 128KB units into one big outstanding extent, but do not change reserved_extents. 2, When writing dirty pages, for compression, cow_file_range_async() will split above big extent in unit of 128KB(compression extent size is 128KB). When first split opeartion finishes, we'll have 2 outstanding extents and 1024 reserved extents, and just right now the currently generated ordered extent is dispatched to run and complete, then btrfs_delalloc_release_metadata()(see btrfs_finish_ordered_io()) will be called to release metadata, after that we will have 1 outstanding extents and 1 reserved extents(also see logic in drop_outstanding_extent()). Later cow_file_range_async() continues to handles left data range[128KB, 128MB), and if no other ordered extent was dispatched to run, there will be 1023 outstanding extents and 1 reserved extent. 3, Now if another bufferd write for this file enters, then btrfs_delalloc_reserve_metadata() will at least try to reserve metadata for 1023 outstanding extents' metadata, for 16KB node size, it'll be 1023*16384*2*8, about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, so obviously it's not sane and can easily result in enospc error. The root cause is that for compression, its max extent size will no longer be BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation method in btrfs is not appropriate or correct, here we introduce: enum btrfs_metadata_reserve_type { BTRFS_RESERVE_NORMAL, BTRFS_RESERVE_COMPRESS, }; and expand btrfs_delalloc_reserve_metadata() and btrfs_delalloc_reserve_space() by adding a new enum btrfs_metadata_reserve_type argument. When a data range will go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata. Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go through compression path. With this patch, we can fix these false enospc error for compression. Signed-off-by: Wang XiaoguangI took some time again to get this into my tree on top of what's in btrfs-4.9rc1 and managed to merge it after all. Both this and patch #1 seem to work fine, and they don't seem to cause any regressions; ran a couple of both full and incremental rsync backups with 100GB on a new and now compressed subvolume without problem. Also Stefan just reported that his ENOSPC seems to be gone as well, so it seems to be good. \o/ So for both this and patch #1 have a careful: Tested-by: Holger Hoffstätte Also a comment about something I found while resolving conflicts caused by the preliminary dedupe suppoort: [..] int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end, - struct extent_state **cached_state); + struct extent_state **cached_state, int flag); int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end, - struct extent_state **cached_state); + struct extent_state **cached_state, int flag); [..] int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode, struct page **pages, size_t num_pages, loff_t pos, size_t write_bytes, - struct extent_state **cached); + struct extent_state **cached, int flag); Instead of adding "int flag" why not use the already defined btrfs_metadata_reserve_type enum? I know it's just an int at the end of the day, but the dedupe support already added another "int dedupe" argument and it's probably easy to cause confusion. Maybe later it would be beneficial to consolidate the flags into a consistent set of enum values to prevent more "int flag" inflation and better declare the intent of the extent state change. Not sure if that makes sense. Yes, agree. I'll rebase them later, thanks. Regards, Xiaoguang Wang thanks, Holger -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a
Re: [PATCH 2/2] btrfs: fix false enospc for compression
On 10/06/16 04:51, Wang Xiaoguang wrote: > When testing btrfs compression, sometimes we got ENOSPC error, though fs > still has much free space, xfstests generic/171, generic/172, generic/173, > generic/174, generic/175 can reveal this bug in my test environment when > compression is enabled. > > After some debuging work, we found that it's btrfs_delalloc_reserve_metadata() > which sometimes tries to reserve plenty of metadata space, even for very small > data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes > we try to reserve is calculated by the difference between outstanding_extents > and reserved_extents. Please see below case for how ENOSPC occurs: > > 1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode > outstanding extents be 1, and reserved_extents be 1024. Note it's > btrfs_merge_extent_hook() that merges these 128KB units into one big > outstanding extent, but do not change reserved_extents. > > 2, When writing dirty pages, for compression, cow_file_range_async() will > split above big extent in unit of 128KB(compression extent size is 128KB). > When first split opeartion finishes, we'll have 2 outstanding extents and 1024 > reserved extents, and just right now the currently generated ordered extent is > dispatched to run and complete, then btrfs_delalloc_release_metadata()(see > btrfs_finish_ordered_io()) will be called to release metadata, after that we > will have 1 outstanding extents and 1 reserved extents(also see logic in > drop_outstanding_extent()). Later cow_file_range_async() continues to handles > left data range[128KB, 128MB), and if no other ordered extent was dispatched > to run, there will be 1023 outstanding extents and 1 reserved extent. > > 3, Now if another bufferd write for this file enters, then > btrfs_delalloc_reserve_metadata() will at least try to reserve metadata > for 1023 outstanding extents' metadata, for 16KB node size, it'll be > 1023*16384*2*8, > about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, > so > obviously it's not sane and can easily result in enospc error. > > The root cause is that for compression, its max extent size will no longer be > BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation > method in btrfs is not appropriate or correct, here we introduce: > enum btrfs_metadata_reserve_type { > BTRFS_RESERVE_NORMAL, > BTRFS_RESERVE_COMPRESS, > }; > and expand btrfs_delalloc_reserve_metadata() and > btrfs_delalloc_reserve_space() > by adding a new enum btrfs_metadata_reserve_type argument. When a data range > will > go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata. > Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go > through compression path. > > With this patch, we can fix these false enospc error for compression. > > Signed-off-by: Wang XiaoguangI took some time again to get this into my tree on top of what's in btrfs-4.9rc1 and managed to merge it after all. Both this and patch #1 seem to work fine, and they don't seem to cause any regressions; ran a couple of both full and incremental rsync backups with >100GB on a new and now compressed subvolume without problem. Also Stefan just reported that his ENOSPC seems to be gone as well, so it seems to be good. \o/ So for both this and patch #1 have a careful: Tested-by: Holger Hoffstätte Also a comment about something I found while resolving conflicts caused by the preliminary dedupe suppoort: [..] > int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end, > - struct extent_state **cached_state); > + struct extent_state **cached_state, int flag); > int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end, > - struct extent_state **cached_state); > + struct extent_state **cached_state, int flag); [..] > int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode, > struct page **pages, size_t num_pages, > loff_t pos, size_t write_bytes, > - struct extent_state **cached); > + struct extent_state **cached, int flag); Instead of adding "int flag" why not use the already defined btrfs_metadata_reserve_type enum? I know it's just an int at the end of the day, but the dedupe support already added another "int dedupe" argument and it's probably easy to cause confusion. Maybe later it would be beneficial to consolidate the flags into a consistent set of enum values to prevent more "int flag" inflation and better declare the intent of the extent state change. Not sure if that makes sense. thanks, Holger -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org
Re: [PATCH 2/2] btrfs: fix false enospc for compression
Am 06.10.2016 um 04:51 schrieb Wang Xiaoguang: > When testing btrfs compression, sometimes we got ENOSPC error, though fs > still has much free space, xfstests generic/171, generic/172, generic/173, > generic/174, generic/175 can reveal this bug in my test environment when > compression is enabled. > > After some debuging work, we found that it's btrfs_delalloc_reserve_metadata() > which sometimes tries to reserve plenty of metadata space, even for very small > data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes > we try to reserve is calculated by the difference between outstanding_extents > and reserved_extents. Please see below case for how ENOSPC occurs: > > 1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode > outstanding extents be 1, and reserved_extents be 1024. Note it's > btrfs_merge_extent_hook() that merges these 128KB units into one big > outstanding extent, but do not change reserved_extents. > > 2, When writing dirty pages, for compression, cow_file_range_async() will > split above big extent in unit of 128KB(compression extent size is 128KB). > When first split opeartion finishes, we'll have 2 outstanding extents and 1024 > reserved extents, and just right now the currently generated ordered extent is > dispatched to run and complete, then btrfs_delalloc_release_metadata()(see > btrfs_finish_ordered_io()) will be called to release metadata, after that we > will have 1 outstanding extents and 1 reserved extents(also see logic in > drop_outstanding_extent()). Later cow_file_range_async() continues to handles > left data range[128KB, 128MB), and if no other ordered extent was dispatched > to run, there will be 1023 outstanding extents and 1 reserved extent. > > 3, Now if another bufferd write for this file enters, then > btrfs_delalloc_reserve_metadata() will at least try to reserve metadata > for 1023 outstanding extents' metadata, for 16KB node size, it'll be > 1023*16384*2*8, > about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, > so > obviously it's not sane and can easily result in enospc error. > > The root cause is that for compression, its max extent size will no longer be > BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation > method in btrfs is not appropriate or correct, here we introduce: > enum btrfs_metadata_reserve_type { > BTRFS_RESERVE_NORMAL, > BTRFS_RESERVE_COMPRESS, > }; > and expand btrfs_delalloc_reserve_metadata() and > btrfs_delalloc_reserve_space() > by adding a new enum btrfs_metadata_reserve_type argument. When a data range > will > go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata. > Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go > through compression path. > > With this patch, we can fix these false enospc error for compression. > > Signed-off-by: Wang XiaoguangTested-by: Stefan Priebe Works fine since 8 days - no ENOSPC errors anymore. Greets, Stefan > --- > fs/btrfs/ctree.h | 31 ++-- > fs/btrfs/extent-tree.c | 55 + > fs/btrfs/extent_io.c | 59 +- > fs/btrfs/extent_io.h | 2 + > fs/btrfs/file.c | 26 +-- > fs/btrfs/free-space-cache.c | 6 +- > fs/btrfs/inode-map.c | 5 +- > fs/btrfs/inode.c | 181 > --- > fs/btrfs/ioctl.c | 12 ++- > fs/btrfs/relocation.c| 14 +++- > fs/btrfs/tests/inode-tests.c | 15 ++-- > 11 files changed, 309 insertions(+), 97 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 16885f6..fa6a19a 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -97,6 +97,19 @@ static const int btrfs_csum_sizes[] = { 4 }; > > #define BTRFS_DIRTY_METADATA_THRESH SZ_32M > > +/* > + * for compression, max file extent size would be limited to 128K, so when > + * reserving metadata for such delalloc writes, pass BTRFS_RESERVE_COMPRESS > to > + * btrfs_delalloc_reserve_metadata() or btrfs_delalloc_reserve_space() to > + * calculate metadata, for none-compression, use BTRFS_RESERVE_NORMAL. > + */ > +enum btrfs_metadata_reserve_type { > + BTRFS_RESERVE_NORMAL, > + BTRFS_RESERVE_COMPRESS, > +}; > +int inode_need_compress(struct inode *inode); > +u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type); > + > #define BTRFS_MAX_EXTENT_SIZE SZ_128M > > struct btrfs_mapping_tree { > @@ -2677,10 +2690,14 @@ int btrfs_subvolume_reserve_metadata(struct > btrfs_root *root, > void btrfs_subvolume_release_metadata(struct btrfs_root *root, > struct btrfs_block_rsv *rsv, > u64 qgroup_reserved); > -int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes); > -void
Re: [PATCH 2/2] btrfs: fix false enospc for compression
hi, Stefan often reports enospc error in his servers when having btrfs compression enabled. Now he has applied these 2 patches to run and no enospc error occurs for more than 6 days, it seems they are useful :) And these 2 patches are somewhat big, please check it, thanks. Regards, Xiaoguang Wang On 10/06/2016 10:51 AM, Wang Xiaoguang wrote: When testing btrfs compression, sometimes we got ENOSPC error, though fs still has much free space, xfstests generic/171, generic/172, generic/173, generic/174, generic/175 can reveal this bug in my test environment when compression is enabled. After some debuging work, we found that it's btrfs_delalloc_reserve_metadata() which sometimes tries to reserve plenty of metadata space, even for very small data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes we try to reserve is calculated by the difference between outstanding_extents and reserved_extents. Please see below case for how ENOSPC occurs: 1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode outstanding extents be 1, and reserved_extents be 1024. Note it's btrfs_merge_extent_hook() that merges these 128KB units into one big outstanding extent, but do not change reserved_extents. 2, When writing dirty pages, for compression, cow_file_range_async() will split above big extent in unit of 128KB(compression extent size is 128KB). When first split opeartion finishes, we'll have 2 outstanding extents and 1024 reserved extents, and just right now the currently generated ordered extent is dispatched to run and complete, then btrfs_delalloc_release_metadata()(see btrfs_finish_ordered_io()) will be called to release metadata, after that we will have 1 outstanding extents and 1 reserved extents(also see logic in drop_outstanding_extent()). Later cow_file_range_async() continues to handles left data range[128KB, 128MB), and if no other ordered extent was dispatched to run, there will be 1023 outstanding extents and 1 reserved extent. 3, Now if another bufferd write for this file enters, then btrfs_delalloc_reserve_metadata() will at least try to reserve metadata for 1023 outstanding extents' metadata, for 16KB node size, it'll be 1023*16384*2*8, about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, so obviously it's not sane and can easily result in enospc error. The root cause is that for compression, its max extent size will no longer be BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation method in btrfs is not appropriate or correct, here we introduce: enum btrfs_metadata_reserve_type { BTRFS_RESERVE_NORMAL, BTRFS_RESERVE_COMPRESS, }; and expand btrfs_delalloc_reserve_metadata() and btrfs_delalloc_reserve_space() by adding a new enum btrfs_metadata_reserve_type argument. When a data range will go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata. Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go through compression path. With this patch, we can fix these false enospc error for compression. Signed-off-by: Wang Xiaoguang--- fs/btrfs/ctree.h | 31 ++-- fs/btrfs/extent-tree.c | 55 + fs/btrfs/extent_io.c | 59 +- fs/btrfs/extent_io.h | 2 + fs/btrfs/file.c | 26 +-- fs/btrfs/free-space-cache.c | 6 +- fs/btrfs/inode-map.c | 5 +- fs/btrfs/inode.c | 181 --- fs/btrfs/ioctl.c | 12 ++- fs/btrfs/relocation.c| 14 +++- fs/btrfs/tests/inode-tests.c | 15 ++-- 11 files changed, 309 insertions(+), 97 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 16885f6..fa6a19a 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -97,6 +97,19 @@ static const int btrfs_csum_sizes[] = { 4 }; #define BTRFS_DIRTY_METADATA_THRESH SZ_32M +/* + * for compression, max file extent size would be limited to 128K, so when + * reserving metadata for such delalloc writes, pass BTRFS_RESERVE_COMPRESS to + * btrfs_delalloc_reserve_metadata() or btrfs_delalloc_reserve_space() to + * calculate metadata, for none-compression, use BTRFS_RESERVE_NORMAL. + */ +enum btrfs_metadata_reserve_type { + BTRFS_RESERVE_NORMAL, + BTRFS_RESERVE_COMPRESS, +}; +int inode_need_compress(struct inode *inode); +u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type); + #define BTRFS_MAX_EXTENT_SIZE SZ_128M struct btrfs_mapping_tree { @@ -2677,10 +2690,14 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root, void btrfs_subvolume_release_metadata(struct btrfs_root *root, struct btrfs_block_rsv *rsv, u64 qgroup_reserved); -int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes); -void
Re: [PATCH 2/2] btrfs: fix false enospc for compression
Hi, On 10/06/2016 10:51 AM, Wang Xiaoguang wrote: When testing btrfs compression, sometimes we got ENOSPC error, though fs still has much free space, xfstests generic/171, generic/172, generic/173, generic/174, generic/175 can reveal this bug in my test environment when compression is enabled. Sorry, here xfstests generic/171, generic/172, generic/173, generic/174, generic/175 have some modifications by me, in original codes, they use "nr_free=$(stat -f -c '%f' $testdir)" to count fs free space and then write the nr_free blocks, but if fs has compression enabled, this operation will not fill fs, so I use "dd if=/dev/zero of=testfile" to fill fs, just this modifications :) Also you can execute below steps to check: # sudo mkfs.btrfs -f -n 64K -b 320M /dev/sdc6 # sudo mount -o compress-force=lzo /dev/sdc6 mntpoint/ # cd mntpoint # sudo dd if=/dev/zero of=testfile In my test environment, it'll get enospc quickly, but fs is not full. My virtual machine settings: [lege@localhost mntpoint]$ free -m totalusedfree shared buff/cache available Mem: 1949 2211141 21 5871597 Swap: 1019 71012 [lege@localhost mntpoint]$ lscpu Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit Byte Order:Little Endian CPU(s):4 On-line CPU(s) list: 0-3 Thread(s) per core:1 Core(s) per socket:1 Socket(s): 4 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family:6 Model: 42 Model name:Intel Xeon E312xx (Sandy Bridge) Stepping: 1 CPU MHz: 3192.816 BogoMIPS: 6445.42 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K NUMA node0 CPU(s): 0-3 Regards, Xiaoguang Wang After some debuging work, we found that it's btrfs_delalloc_reserve_metadata() which sometimes tries to reserve plenty of metadata space, even for very small data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes we try to reserve is calculated by the difference between outstanding_extents and reserved_extents. Please see below case for how ENOSPC occurs: 1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode outstanding extents be 1, and reserved_extents be 1024. Note it's btrfs_merge_extent_hook() that merges these 128KB units into one big outstanding extent, but do not change reserved_extents. 2, When writing dirty pages, for compression, cow_file_range_async() will split above big extent in unit of 128KB(compression extent size is 128KB). When first split opeartion finishes, we'll have 2 outstanding extents and 1024 reserved extents, and just right now the currently generated ordered extent is dispatched to run and complete, then btrfs_delalloc_release_metadata()(see btrfs_finish_ordered_io()) will be called to release metadata, after that we will have 1 outstanding extents and 1 reserved extents(also see logic in drop_outstanding_extent()). Later cow_file_range_async() continues to handles left data range[128KB, 128MB), and if no other ordered extent was dispatched to run, there will be 1023 outstanding extents and 1 reserved extent. 3, Now if another bufferd write for this file enters, then btrfs_delalloc_reserve_metadata() will at least try to reserve metadata for 1023 outstanding extents' metadata, for 16KB node size, it'll be 1023*16384*2*8, about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, so obviously it's not sane and can easily result in enospc error. The root cause is that for compression, its max extent size will no longer be BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation method in btrfs is not appropriate or correct, here we introduce: enum btrfs_metadata_reserve_type { BTRFS_RESERVE_NORMAL, BTRFS_RESERVE_COMPRESS, }; and expand btrfs_delalloc_reserve_metadata() and btrfs_delalloc_reserve_space() by adding a new enum btrfs_metadata_reserve_type argument. When a data range will go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata. Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go through compression path. With this patch, we can fix these false enospc error for compression. Signed-off-by: Wang Xiaoguang--- fs/btrfs/ctree.h | 31 ++-- fs/btrfs/extent-tree.c | 55 + fs/btrfs/extent_io.c | 59 +- fs/btrfs/extent_io.h | 2 + fs/btrfs/file.c | 26 +-- fs/btrfs/free-space-cache.c | 6 +- fs/btrfs/inode-map.c | 5 +- fs/btrfs/inode.c | 181 --- fs/btrfs/ioctl.c | 12 ++- fs/btrfs/relocation.c
[PATCH 2/2] btrfs: fix false enospc for compression
When testing btrfs compression, sometimes we got ENOSPC error, though fs still has much free space, xfstests generic/171, generic/172, generic/173, generic/174, generic/175 can reveal this bug in my test environment when compression is enabled. After some debuging work, we found that it's btrfs_delalloc_reserve_metadata() which sometimes tries to reserve plenty of metadata space, even for very small data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes we try to reserve is calculated by the difference between outstanding_extents and reserved_extents. Please see below case for how ENOSPC occurs: 1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode outstanding extents be 1, and reserved_extents be 1024. Note it's btrfs_merge_extent_hook() that merges these 128KB units into one big outstanding extent, but do not change reserved_extents. 2, When writing dirty pages, for compression, cow_file_range_async() will split above big extent in unit of 128KB(compression extent size is 128KB). When first split opeartion finishes, we'll have 2 outstanding extents and 1024 reserved extents, and just right now the currently generated ordered extent is dispatched to run and complete, then btrfs_delalloc_release_metadata()(see btrfs_finish_ordered_io()) will be called to release metadata, after that we will have 1 outstanding extents and 1 reserved extents(also see logic in drop_outstanding_extent()). Later cow_file_range_async() continues to handles left data range[128KB, 128MB), and if no other ordered extent was dispatched to run, there will be 1023 outstanding extents and 1 reserved extent. 3, Now if another bufferd write for this file enters, then btrfs_delalloc_reserve_metadata() will at least try to reserve metadata for 1023 outstanding extents' metadata, for 16KB node size, it'll be 1023*16384*2*8, about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, so obviously it's not sane and can easily result in enospc error. The root cause is that for compression, its max extent size will no longer be BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation method in btrfs is not appropriate or correct, here we introduce: enum btrfs_metadata_reserve_type { BTRFS_RESERVE_NORMAL, BTRFS_RESERVE_COMPRESS, }; and expand btrfs_delalloc_reserve_metadata() and btrfs_delalloc_reserve_space() by adding a new enum btrfs_metadata_reserve_type argument. When a data range will go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata. Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go through compression path. With this patch, we can fix these false enospc error for compression. Signed-off-by: Wang Xiaoguang--- fs/btrfs/ctree.h | 31 ++-- fs/btrfs/extent-tree.c | 55 + fs/btrfs/extent_io.c | 59 +- fs/btrfs/extent_io.h | 2 + fs/btrfs/file.c | 26 +-- fs/btrfs/free-space-cache.c | 6 +- fs/btrfs/inode-map.c | 5 +- fs/btrfs/inode.c | 181 --- fs/btrfs/ioctl.c | 12 ++- fs/btrfs/relocation.c| 14 +++- fs/btrfs/tests/inode-tests.c | 15 ++-- 11 files changed, 309 insertions(+), 97 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 16885f6..fa6a19a 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -97,6 +97,19 @@ static const int btrfs_csum_sizes[] = { 4 }; #define BTRFS_DIRTY_METADATA_THRESHSZ_32M +/* + * for compression, max file extent size would be limited to 128K, so when + * reserving metadata for such delalloc writes, pass BTRFS_RESERVE_COMPRESS to + * btrfs_delalloc_reserve_metadata() or btrfs_delalloc_reserve_space() to + * calculate metadata, for none-compression, use BTRFS_RESERVE_NORMAL. + */ +enum btrfs_metadata_reserve_type { + BTRFS_RESERVE_NORMAL, + BTRFS_RESERVE_COMPRESS, +}; +int inode_need_compress(struct inode *inode); +u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type); + #define BTRFS_MAX_EXTENT_SIZE SZ_128M struct btrfs_mapping_tree { @@ -2677,10 +2690,14 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root, void btrfs_subvolume_release_metadata(struct btrfs_root *root, struct btrfs_block_rsv *rsv, u64 qgroup_reserved); -int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes); -void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes); -int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len); -void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len); +int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes, + enum btrfs_metadata_reserve_type reserve_type); +void