Re: [PATCH 2/2] btrfs: fix false enospc for compression

2016-11-01 Thread Wang Xiaoguang

hi,

I have rebased these 2 patches against Linux 4.9-rc2, sorry for being late.
After applying these patches,  Stefan does not see any ENOSPC error :)
If you have free time, please have a check, thanks.

Regards,
Xiaoguang Wang

On 11/01/2016 06:18 PM, Wang Xiaoguang wrote:

When testing btrfs compression, sometimes we got ENOSPC error, though fs
still has much free space, xfstests generic/171, generic/172, generic/173,
generic/174, generic/175 can reveal this bug in my test environment when
compression is enabled.

After some debuging work, we found that it's btrfs_delalloc_reserve_metadata()
which sometimes tries to reserve plenty of metadata space, even for very small
data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes
we try to reserve is calculated by the difference between outstanding_extents
and reserved_extents. Please see below case for how ENOSPC occurs:

   1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode
outstanding extents be 1, and reserved_extents be 1024. Note it's
btrfs_merge_extent_hook() that merges these 128KB units into one big
outstanding extent, but do not change reserved_extents.

   2, When writing dirty pages, for compression, cow_file_range_async() will
split above big extent in unit of 128KB(compression extent size is 128KB).
When first split opeartion finishes, we'll have 2 outstanding extents and 1024
reserved extents, and just right now the currently generated ordered extent is
dispatched to run and complete, then btrfs_delalloc_release_metadata()(see
btrfs_finish_ordered_io()) will be called to release metadata, after that we
will have 1 outstanding extents and 1 reserved extents(also see logic in
drop_outstanding_extent()). Later cow_file_range_async() continues to handles
left data range[128KB, 128MB), and if no other ordered extent was dispatched
to run, there will be 1023 outstanding extents and 1 reserved extent.

   3, Now if another bufferd write for this file enters, then
btrfs_delalloc_reserve_metadata() will at least try to reserve metadata
for 1023 outstanding extents' metadata, for 16KB node size, it'll be 
1023*16384*2*8,
about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, so
obviously it's not sane and can easily result in enospc error.

The root cause is that for compression, its max extent size will no longer be
BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation
method in btrfs is not appropriate or correct, here we introduce:
enum btrfs_metadata_reserve_type {
BTRFS_RESERVE_NORMAL,
BTRFS_RESERVE_COMPRESS,
};
and expand btrfs_delalloc_reserve_metadata() and btrfs_delalloc_reserve_space()
by adding a new enum btrfs_metadata_reserve_type argument. When a data range 
will
go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata.
Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go
through compression path.

With this patch, we can fix these false enospc error for compression.

Signed-off-by: Wang Xiaoguang 
Tested-by: Holger Hoffstätte 
Tested-by: Stefan Priebe 
---
  fs/btrfs/ctree.h |  31 ++--
  fs/btrfs/extent-tree.c   |  55 +
  fs/btrfs/extent_io.c |  59 +-
  fs/btrfs/extent_io.h |   2 +
  fs/btrfs/file.c  |  28 +--
  fs/btrfs/free-space-cache.c  |   6 +-
  fs/btrfs/inode-map.c |   5 +-
  fs/btrfs/inode.c | 185 ---
  fs/btrfs/ioctl.c |  12 ++-
  fs/btrfs/relocation.c|  14 +++-
  fs/btrfs/tests/inode-tests.c |  15 ++--
  11 files changed, 315 insertions(+), 97 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 52a8535..457b106 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -98,6 +98,19 @@ static const int btrfs_csum_sizes[] = { 4 };
  
  #define BTRFS_DIRTY_METADATA_THRESH	SZ_32M
  
+/*

+ * for compression, max file extent size would be limited to 128K, so when
+ * reserving metadata for such delalloc writes, pass BTRFS_RESERVE_COMPRESS to
+ * btrfs_delalloc_reserve_metadata() or btrfs_delalloc_reserve_space() to
+ * calculate metadata, for none-compression, use BTRFS_RESERVE_NORMAL.
+ */
+enum btrfs_metadata_reserve_type {
+   BTRFS_RESERVE_NORMAL,
+   BTRFS_RESERVE_COMPRESS,
+};
+int inode_need_compress(struct inode *inode);
+u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type);
+
  #define BTRFS_MAX_EXTENT_SIZE SZ_128M
  
  struct btrfs_mapping_tree {

@@ -2693,10 +2706,14 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root 
*root,
  void btrfs_subvolume_release_metadata(struct btrfs_root *root,
  struct btrfs_block_rsv *rsv,
  u64 qgroup_reserved);
-int btrfs_delalloc_reserve_metadata(struct 

[PATCH 2/2] btrfs: fix false enospc for compression

2016-11-01 Thread Wang Xiaoguang
When testing btrfs compression, sometimes we got ENOSPC error, though fs
still has much free space, xfstests generic/171, generic/172, generic/173,
generic/174, generic/175 can reveal this bug in my test environment when
compression is enabled.

After some debuging work, we found that it's btrfs_delalloc_reserve_metadata()
which sometimes tries to reserve plenty of metadata space, even for very small
data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes
we try to reserve is calculated by the difference between outstanding_extents
and reserved_extents. Please see below case for how ENOSPC occurs:

  1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode
outstanding extents be 1, and reserved_extents be 1024. Note it's
btrfs_merge_extent_hook() that merges these 128KB units into one big
outstanding extent, but do not change reserved_extents.

  2, When writing dirty pages, for compression, cow_file_range_async() will
split above big extent in unit of 128KB(compression extent size is 128KB).
When first split opeartion finishes, we'll have 2 outstanding extents and 1024
reserved extents, and just right now the currently generated ordered extent is
dispatched to run and complete, then btrfs_delalloc_release_metadata()(see
btrfs_finish_ordered_io()) will be called to release metadata, after that we
will have 1 outstanding extents and 1 reserved extents(also see logic in
drop_outstanding_extent()). Later cow_file_range_async() continues to handles
left data range[128KB, 128MB), and if no other ordered extent was dispatched
to run, there will be 1023 outstanding extents and 1 reserved extent.

  3, Now if another bufferd write for this file enters, then
btrfs_delalloc_reserve_metadata() will at least try to reserve metadata
for 1023 outstanding extents' metadata, for 16KB node size, it'll be 
1023*16384*2*8,
about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, so
obviously it's not sane and can easily result in enospc error.

The root cause is that for compression, its max extent size will no longer be
BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation
method in btrfs is not appropriate or correct, here we introduce:
enum btrfs_metadata_reserve_type {
BTRFS_RESERVE_NORMAL,
BTRFS_RESERVE_COMPRESS,
};
and expand btrfs_delalloc_reserve_metadata() and btrfs_delalloc_reserve_space()
by adding a new enum btrfs_metadata_reserve_type argument. When a data range 
will
go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata.
Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go
through compression path.

With this patch, we can fix these false enospc error for compression.

Signed-off-by: Wang Xiaoguang 
Tested-by: Holger Hoffstätte 
Tested-by: Stefan Priebe 
---
 fs/btrfs/ctree.h |  31 ++--
 fs/btrfs/extent-tree.c   |  55 +
 fs/btrfs/extent_io.c |  59 +-
 fs/btrfs/extent_io.h |   2 +
 fs/btrfs/file.c  |  28 +--
 fs/btrfs/free-space-cache.c  |   6 +-
 fs/btrfs/inode-map.c |   5 +-
 fs/btrfs/inode.c | 185 ---
 fs/btrfs/ioctl.c |  12 ++-
 fs/btrfs/relocation.c|  14 +++-
 fs/btrfs/tests/inode-tests.c |  15 ++--
 11 files changed, 315 insertions(+), 97 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 52a8535..457b106 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -98,6 +98,19 @@ static const int btrfs_csum_sizes[] = { 4 };
 
 #define BTRFS_DIRTY_METADATA_THRESHSZ_32M
 
+/*
+ * for compression, max file extent size would be limited to 128K, so when
+ * reserving metadata for such delalloc writes, pass BTRFS_RESERVE_COMPRESS to
+ * btrfs_delalloc_reserve_metadata() or btrfs_delalloc_reserve_space() to
+ * calculate metadata, for none-compression, use BTRFS_RESERVE_NORMAL.
+ */
+enum btrfs_metadata_reserve_type {
+   BTRFS_RESERVE_NORMAL,
+   BTRFS_RESERVE_COMPRESS,
+};
+int inode_need_compress(struct inode *inode);
+u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type);
+
 #define BTRFS_MAX_EXTENT_SIZE SZ_128M
 
 struct btrfs_mapping_tree {
@@ -2693,10 +2706,14 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root 
*root,
 void btrfs_subvolume_release_metadata(struct btrfs_root *root,
  struct btrfs_block_rsv *rsv,
  u64 qgroup_reserved);
-int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes);
-void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes);
-int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len);
-void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len);
+int btrfs_delalloc_reserve_metadata(struct 

Re: [PATCH 2/2] btrfs: fix false enospc for compression

2016-10-25 Thread Wang Xiaoguang

hi,

On 10/19/2016 10:23 PM, David Sterba wrote:

On Mon, Oct 17, 2016 at 05:01:46PM +0800, Wang Xiaoguang wrote:

[..]

   int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
- struct extent_state **cached_state);
+ struct extent_state **cached_state, int flag);
   int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
-   struct extent_state **cached_state);
+   struct extent_state **cached_state, int flag);

[..]

   int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
  struct page **pages, size_t num_pages,
  loff_t pos, size_t write_bytes,
- struct extent_state **cached);
+ struct extent_state **cached, int flag);

Instead of adding "int flag" why not use the already defined
btrfs_metadata_reserve_type enum? I know it's just an int at the end of
the day, but the dedupe support already added another "int dedupe" argument
and it's probably easy to cause confusion.
Maybe later it would be beneficial to consolidate the flags into a consistent
set of enum values to prevent more "int flag" inflation and better declare the
intent of the extent state change. Not sure if that makes sense.

Yes, agree.
I'll rebase them later, thanks.

Would be great. I won't manually merge the patch now as it's not a
conflict against the current state, btrfs_set_extent_delalloc has the
extra parameter already. Please consolidate them before this patch is
supposed to be merged. Thanks.

Sorry for being late, I have just finished the rebase work now.
I'll run some fstests job, if no regressions, I'll send two patches 
tomorrow :)


Regards,
Xiaoguang Wang








--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs: fix false enospc for compression

2016-10-19 Thread David Sterba
On Mon, Oct 17, 2016 at 05:01:46PM +0800, Wang Xiaoguang wrote:
> > [..]
> >>   int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
> >> -struct extent_state **cached_state);
> >> +struct extent_state **cached_state, int flag);
> >>   int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
> >> -  struct extent_state **cached_state);
> >> +  struct extent_state **cached_state, int flag);
> > [..]
> >>   int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
> >>  struct page **pages, size_t num_pages,
> >>  loff_t pos, size_t write_bytes,
> >> -struct extent_state **cached);
> >> +struct extent_state **cached, int flag);
> > Instead of adding "int flag" why not use the already defined
> > btrfs_metadata_reserve_type enum? I know it's just an int at the end of
> > the day, but the dedupe support already added another "int dedupe" argument
> > and it's probably easy to cause confusion.
> > Maybe later it would be beneficial to consolidate the flags into a 
> > consistent
> > set of enum values to prevent more "int flag" inflation and better declare 
> > the
> > intent of the extent state change. Not sure if that makes sense.
> Yes, agree.
> I'll rebase them later, thanks.

Would be great. I won't manually merge the patch now as it's not a
conflict against the current state, btrfs_set_extent_delalloc has the
extra parameter already. Please consolidate them before this patch is
supposed to be merged. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs: fix false enospc for compression

2016-10-17 Thread David Sterba
On Wed, Oct 12, 2016 at 11:12:42AM +0800, Wang Xiaoguang wrote:
> hi,
> 
> Stefan often reports enospc error in his servers when having btrfs 
> compression
> enabled. Now he has applied these 2 patches to run and no enospc error 
> occurs
> for more than 6 days, it seems they are useful :)
> 
> And these 2 patches are somewhat big, please check it, thanks.

It is. As per testing results from Stefan and Holger, I'll add them to
for-next, but won't queue them for merging until they get Josef's
blessing.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs: fix false enospc for compression

2016-10-17 Thread Wang Xiaoguang

hi,

On 10/14/2016 09:59 PM, Holger Hoffstätte wrote:

On 10/06/16 04:51, Wang Xiaoguang wrote:

When testing btrfs compression, sometimes we got ENOSPC error, though fs
still has much free space, xfstests generic/171, generic/172, generic/173,
generic/174, generic/175 can reveal this bug in my test environment when
compression is enabled.

After some debuging work, we found that it's btrfs_delalloc_reserve_metadata()
which sometimes tries to reserve plenty of metadata space, even for very small
data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes
we try to reserve is calculated by the difference between outstanding_extents
and reserved_extents. Please see below case for how ENOSPC occurs:

   1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode
outstanding extents be 1, and reserved_extents be 1024. Note it's
btrfs_merge_extent_hook() that merges these 128KB units into one big
outstanding extent, but do not change reserved_extents.

   2, When writing dirty pages, for compression, cow_file_range_async() will
split above big extent in unit of 128KB(compression extent size is 128KB).
When first split opeartion finishes, we'll have 2 outstanding extents and 1024
reserved extents, and just right now the currently generated ordered extent is
dispatched to run and complete, then btrfs_delalloc_release_metadata()(see
btrfs_finish_ordered_io()) will be called to release metadata, after that we
will have 1 outstanding extents and 1 reserved extents(also see logic in
drop_outstanding_extent()). Later cow_file_range_async() continues to handles
left data range[128KB, 128MB), and if no other ordered extent was dispatched
to run, there will be 1023 outstanding extents and 1 reserved extent.

   3, Now if another bufferd write for this file enters, then
btrfs_delalloc_reserve_metadata() will at least try to reserve metadata
for 1023 outstanding extents' metadata, for 16KB node size, it'll be 
1023*16384*2*8,
about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, so
obviously it's not sane and can easily result in enospc error.

The root cause is that for compression, its max extent size will no longer be
BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation
method in btrfs is not appropriate or correct, here we introduce:
enum btrfs_metadata_reserve_type {
BTRFS_RESERVE_NORMAL,
BTRFS_RESERVE_COMPRESS,
};
and expand btrfs_delalloc_reserve_metadata() and btrfs_delalloc_reserve_space()
by adding a new enum btrfs_metadata_reserve_type argument. When a data range 
will
go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata.
Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go
through compression path.

With this patch, we can fix these false enospc error for compression.

Signed-off-by: Wang Xiaoguang 

I took some time again to get this into my tree on top of what's in
btrfs-4.9rc1 and managed to merge it after all.

Both this and patch #1 seem to work fine, and they don't seem to cause any
regressions; ran a couple of both full and incremental rsync backups with

100GB on a new and now compressed subvolume without problem. Also Stefan

just reported that his ENOSPC seems to be gone as well, so it seems to be
good. \o/

So for both this and patch #1 have a careful:

Tested-by: Holger Hoffstätte 

Also a comment about something I found while resolving conflicts caused
by the preliminary dedupe suppoort:

[..]

  int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
- struct extent_state **cached_state);
+ struct extent_state **cached_state, int flag);
  int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
-   struct extent_state **cached_state);
+   struct extent_state **cached_state, int flag);

[..]

  int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
  struct page **pages, size_t num_pages,
  loff_t pos, size_t write_bytes,
- struct extent_state **cached);
+ struct extent_state **cached, int flag);

Instead of adding "int flag" why not use the already defined
btrfs_metadata_reserve_type enum? I know it's just an int at the end of
the day, but the dedupe support already added another "int dedupe" argument
and it's probably easy to cause confusion.
Maybe later it would be beneficial to consolidate the flags into a consistent
set of enum values to prevent more "int flag" inflation and better declare the
intent of the extent state change. Not sure if that makes sense.

Yes, agree.
I'll rebase them later, thanks.

Regards,
Xiaoguang Wang



thanks,
Holger







--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a 

Re: [PATCH 2/2] btrfs: fix false enospc for compression

2016-10-14 Thread Holger Hoffstätte
On 10/06/16 04:51, Wang Xiaoguang wrote:
> When testing btrfs compression, sometimes we got ENOSPC error, though fs
> still has much free space, xfstests generic/171, generic/172, generic/173,
> generic/174, generic/175 can reveal this bug in my test environment when
> compression is enabled.
> 
> After some debuging work, we found that it's btrfs_delalloc_reserve_metadata()
> which sometimes tries to reserve plenty of metadata space, even for very small
> data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes
> we try to reserve is calculated by the difference between outstanding_extents
> and reserved_extents. Please see below case for how ENOSPC occurs:
> 
>   1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode
> outstanding extents be 1, and reserved_extents be 1024. Note it's
> btrfs_merge_extent_hook() that merges these 128KB units into one big
> outstanding extent, but do not change reserved_extents.
> 
>   2, When writing dirty pages, for compression, cow_file_range_async() will
> split above big extent in unit of 128KB(compression extent size is 128KB).
> When first split opeartion finishes, we'll have 2 outstanding extents and 1024
> reserved extents, and just right now the currently generated ordered extent is
> dispatched to run and complete, then btrfs_delalloc_release_metadata()(see
> btrfs_finish_ordered_io()) will be called to release metadata, after that we
> will have 1 outstanding extents and 1 reserved extents(also see logic in
> drop_outstanding_extent()). Later cow_file_range_async() continues to handles
> left data range[128KB, 128MB), and if no other ordered extent was dispatched
> to run, there will be 1023 outstanding extents and 1 reserved extent.
> 
>   3, Now if another bufferd write for this file enters, then
> btrfs_delalloc_reserve_metadata() will at least try to reserve metadata
> for 1023 outstanding extents' metadata, for 16KB node size, it'll be 
> 1023*16384*2*8,
> about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, 
> so
> obviously it's not sane and can easily result in enospc error.
> 
> The root cause is that for compression, its max extent size will no longer be
> BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation
> method in btrfs is not appropriate or correct, here we introduce:
>   enum btrfs_metadata_reserve_type {
>   BTRFS_RESERVE_NORMAL,
>   BTRFS_RESERVE_COMPRESS,
>   };
> and expand btrfs_delalloc_reserve_metadata() and 
> btrfs_delalloc_reserve_space()
> by adding a new enum btrfs_metadata_reserve_type argument. When a data range 
> will
> go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata.
> Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go
> through compression path.
> 
> With this patch, we can fix these false enospc error for compression.
> 
> Signed-off-by: Wang Xiaoguang 

I took some time again to get this into my tree on top of what's in
btrfs-4.9rc1 and managed to merge it after all.

Both this and patch #1 seem to work fine, and they don't seem to cause any
regressions; ran a couple of both full and incremental rsync backups with
>100GB on a new and now compressed subvolume without problem. Also Stefan
just reported that his ENOSPC seems to be gone as well, so it seems to be
good. \o/

So for both this and patch #1 have a careful:

Tested-by: Holger Hoffstätte 

Also a comment about something I found while resolving conflicts caused
by the preliminary dedupe suppoort:

[..]
>  int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
> -   struct extent_state **cached_state);
> +   struct extent_state **cached_state, int flag);
>  int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
> - struct extent_state **cached_state);
> + struct extent_state **cached_state, int flag);
[..]
>  int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
> struct page **pages, size_t num_pages,
> loff_t pos, size_t write_bytes,
> -   struct extent_state **cached);
> +   struct extent_state **cached, int flag);

Instead of adding "int flag" why not use the already defined
btrfs_metadata_reserve_type enum? I know it's just an int at the end of
the day, but the dedupe support already added another "int dedupe" argument
and it's probably easy to cause confusion. 
Maybe later it would be beneficial to consolidate the flags into a consistent
set of enum values to prevent more "int flag" inflation and better declare the
intent of the extent state change. Not sure if that makes sense.

thanks,
Holger

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org

Re: [PATCH 2/2] btrfs: fix false enospc for compression

2016-10-14 Thread Stefan Priebe - Profihost AG

Am 06.10.2016 um 04:51 schrieb Wang Xiaoguang:
> When testing btrfs compression, sometimes we got ENOSPC error, though fs
> still has much free space, xfstests generic/171, generic/172, generic/173,
> generic/174, generic/175 can reveal this bug in my test environment when
> compression is enabled.
> 
> After some debuging work, we found that it's btrfs_delalloc_reserve_metadata()
> which sometimes tries to reserve plenty of metadata space, even for very small
> data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes
> we try to reserve is calculated by the difference between outstanding_extents
> and reserved_extents. Please see below case for how ENOSPC occurs:
> 
>   1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode
> outstanding extents be 1, and reserved_extents be 1024. Note it's
> btrfs_merge_extent_hook() that merges these 128KB units into one big
> outstanding extent, but do not change reserved_extents.
> 
>   2, When writing dirty pages, for compression, cow_file_range_async() will
> split above big extent in unit of 128KB(compression extent size is 128KB).
> When first split opeartion finishes, we'll have 2 outstanding extents and 1024
> reserved extents, and just right now the currently generated ordered extent is
> dispatched to run and complete, then btrfs_delalloc_release_metadata()(see
> btrfs_finish_ordered_io()) will be called to release metadata, after that we
> will have 1 outstanding extents and 1 reserved extents(also see logic in
> drop_outstanding_extent()). Later cow_file_range_async() continues to handles
> left data range[128KB, 128MB), and if no other ordered extent was dispatched
> to run, there will be 1023 outstanding extents and 1 reserved extent.
> 
>   3, Now if another bufferd write for this file enters, then
> btrfs_delalloc_reserve_metadata() will at least try to reserve metadata
> for 1023 outstanding extents' metadata, for 16KB node size, it'll be 
> 1023*16384*2*8,
> about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, 
> so
> obviously it's not sane and can easily result in enospc error.
> 
> The root cause is that for compression, its max extent size will no longer be
> BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation
> method in btrfs is not appropriate or correct, here we introduce:
>   enum btrfs_metadata_reserve_type {
>   BTRFS_RESERVE_NORMAL,
>   BTRFS_RESERVE_COMPRESS,
>   };
> and expand btrfs_delalloc_reserve_metadata() and 
> btrfs_delalloc_reserve_space()
> by adding a new enum btrfs_metadata_reserve_type argument. When a data range 
> will
> go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata.
> Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go
> through compression path.
> 
> With this patch, we can fix these false enospc error for compression.
> 
> Signed-off-by: Wang Xiaoguang 

Tested-by: Stefan Priebe 

Works fine since 8 days - no ENOSPC errors anymore.

Greets,
Stefan


> ---
>  fs/btrfs/ctree.h |  31 ++--
>  fs/btrfs/extent-tree.c   |  55 +
>  fs/btrfs/extent_io.c |  59 +-
>  fs/btrfs/extent_io.h |   2 +
>  fs/btrfs/file.c  |  26 +--
>  fs/btrfs/free-space-cache.c  |   6 +-
>  fs/btrfs/inode-map.c |   5 +-
>  fs/btrfs/inode.c | 181 
> ---
>  fs/btrfs/ioctl.c |  12 ++-
>  fs/btrfs/relocation.c|  14 +++-
>  fs/btrfs/tests/inode-tests.c |  15 ++--
>  11 files changed, 309 insertions(+), 97 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 16885f6..fa6a19a 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -97,6 +97,19 @@ static const int btrfs_csum_sizes[] = { 4 };
>  
>  #define BTRFS_DIRTY_METADATA_THRESH  SZ_32M
>  
> +/*
> + * for compression, max file extent size would be limited to 128K, so when
> + * reserving metadata for such delalloc writes, pass BTRFS_RESERVE_COMPRESS 
> to
> + * btrfs_delalloc_reserve_metadata() or btrfs_delalloc_reserve_space() to
> + * calculate metadata, for none-compression, use BTRFS_RESERVE_NORMAL.
> + */
> +enum btrfs_metadata_reserve_type {
> + BTRFS_RESERVE_NORMAL,
> + BTRFS_RESERVE_COMPRESS,
> +};
> +int inode_need_compress(struct inode *inode);
> +u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type);
> +
>  #define BTRFS_MAX_EXTENT_SIZE SZ_128M
>  
>  struct btrfs_mapping_tree {
> @@ -2677,10 +2690,14 @@ int btrfs_subvolume_reserve_metadata(struct 
> btrfs_root *root,
>  void btrfs_subvolume_release_metadata(struct btrfs_root *root,
> struct btrfs_block_rsv *rsv,
> u64 qgroup_reserved);
> -int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes);
> -void 

Re: [PATCH 2/2] btrfs: fix false enospc for compression

2016-10-11 Thread Wang Xiaoguang

hi,

Stefan often reports enospc error in his servers when having btrfs 
compression
enabled. Now he has applied these 2 patches to run and no enospc error 
occurs

for more than 6 days, it seems they are useful :)

And these 2 patches are somewhat big, please check it, thanks.

Regards,
Xiaoguang Wang
On 10/06/2016 10:51 AM, Wang Xiaoguang wrote:

When testing btrfs compression, sometimes we got ENOSPC error, though fs
still has much free space, xfstests generic/171, generic/172, generic/173,
generic/174, generic/175 can reveal this bug in my test environment when
compression is enabled.

After some debuging work, we found that it's btrfs_delalloc_reserve_metadata()
which sometimes tries to reserve plenty of metadata space, even for very small
data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes
we try to reserve is calculated by the difference between outstanding_extents
and reserved_extents. Please see below case for how ENOSPC occurs:

   1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode
outstanding extents be 1, and reserved_extents be 1024. Note it's
btrfs_merge_extent_hook() that merges these 128KB units into one big
outstanding extent, but do not change reserved_extents.

   2, When writing dirty pages, for compression, cow_file_range_async() will
split above big extent in unit of 128KB(compression extent size is 128KB).
When first split opeartion finishes, we'll have 2 outstanding extents and 1024
reserved extents, and just right now the currently generated ordered extent is
dispatched to run and complete, then btrfs_delalloc_release_metadata()(see
btrfs_finish_ordered_io()) will be called to release metadata, after that we
will have 1 outstanding extents and 1 reserved extents(also see logic in
drop_outstanding_extent()). Later cow_file_range_async() continues to handles
left data range[128KB, 128MB), and if no other ordered extent was dispatched
to run, there will be 1023 outstanding extents and 1 reserved extent.

   3, Now if another bufferd write for this file enters, then
btrfs_delalloc_reserve_metadata() will at least try to reserve metadata
for 1023 outstanding extents' metadata, for 16KB node size, it'll be 
1023*16384*2*8,
about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, so
obviously it's not sane and can easily result in enospc error.

The root cause is that for compression, its max extent size will no longer be
BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation
method in btrfs is not appropriate or correct, here we introduce:
enum btrfs_metadata_reserve_type {
BTRFS_RESERVE_NORMAL,
BTRFS_RESERVE_COMPRESS,
};
and expand btrfs_delalloc_reserve_metadata() and btrfs_delalloc_reserve_space()
by adding a new enum btrfs_metadata_reserve_type argument. When a data range 
will
go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata.
Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go
through compression path.

With this patch, we can fix these false enospc error for compression.

Signed-off-by: Wang Xiaoguang 
---
  fs/btrfs/ctree.h |  31 ++--
  fs/btrfs/extent-tree.c   |  55 +
  fs/btrfs/extent_io.c |  59 +-
  fs/btrfs/extent_io.h |   2 +
  fs/btrfs/file.c  |  26 +--
  fs/btrfs/free-space-cache.c  |   6 +-
  fs/btrfs/inode-map.c |   5 +-
  fs/btrfs/inode.c | 181 ---
  fs/btrfs/ioctl.c |  12 ++-
  fs/btrfs/relocation.c|  14 +++-
  fs/btrfs/tests/inode-tests.c |  15 ++--
  11 files changed, 309 insertions(+), 97 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 16885f6..fa6a19a 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -97,6 +97,19 @@ static const int btrfs_csum_sizes[] = { 4 };
  
  #define BTRFS_DIRTY_METADATA_THRESH	SZ_32M
  
+/*

+ * for compression, max file extent size would be limited to 128K, so when
+ * reserving metadata for such delalloc writes, pass BTRFS_RESERVE_COMPRESS to
+ * btrfs_delalloc_reserve_metadata() or btrfs_delalloc_reserve_space() to
+ * calculate metadata, for none-compression, use BTRFS_RESERVE_NORMAL.
+ */
+enum btrfs_metadata_reserve_type {
+   BTRFS_RESERVE_NORMAL,
+   BTRFS_RESERVE_COMPRESS,
+};
+int inode_need_compress(struct inode *inode);
+u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type);
+
  #define BTRFS_MAX_EXTENT_SIZE SZ_128M
  
  struct btrfs_mapping_tree {

@@ -2677,10 +2690,14 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root 
*root,
  void btrfs_subvolume_release_metadata(struct btrfs_root *root,
  struct btrfs_block_rsv *rsv,
  u64 qgroup_reserved);
-int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes);
-void 

Re: [PATCH 2/2] btrfs: fix false enospc for compression

2016-10-05 Thread Wang Xiaoguang

Hi,

On 10/06/2016 10:51 AM, Wang Xiaoguang wrote:

When testing btrfs compression, sometimes we got ENOSPC error, though fs
still has much free space, xfstests generic/171, generic/172, generic/173,
generic/174, generic/175 can reveal this bug in my test environment when
compression is enabled.

Sorry, here xfstests generic/171, generic/172, generic/173, generic/174,
generic/175 have some modifications by me, in original codes, they use
"nr_free=$(stat -f -c '%f' $testdir)" to count fs free space and then write
the nr_free blocks, but if fs has compression enabled, this operation will
not fill fs, so I use "dd if=/dev/zero of=testfile" to fill fs, just 
this modifications :)


Also you can execute below steps to check:
# sudo mkfs.btrfs -f -n 64K -b 320M /dev/sdc6
# sudo mount -o compress-force=lzo /dev/sdc6 mntpoint/
# cd mntpoint
# sudo dd if=/dev/zero of=testfile

In my test environment, it'll get enospc quickly, but fs is not full.
My virtual machine settings:
[lege@localhost mntpoint]$ free -m
  totalusedfree  shared buff/cache   
available

Mem:   1949 2211141  21 5871597
Swap:  1019   71012

[lege@localhost mntpoint]$ lscpu
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):4
On-line CPU(s) list:   0-3
Thread(s) per core:1
Core(s) per socket:1
Socket(s): 4
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 42
Model name:Intel Xeon E312xx (Sandy Bridge)
Stepping:  1
CPU MHz:   3192.816
BogoMIPS:  6445.42
Hypervisor vendor: KVM
Virtualization type:   full
L1d cache: 32K
L1i cache: 32K
L2 cache:  4096K
NUMA node0 CPU(s): 0-3


Regards,
Xiaoguang Wang


After some debuging work, we found that it's btrfs_delalloc_reserve_metadata()
which sometimes tries to reserve plenty of metadata space, even for very small
data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes
we try to reserve is calculated by the difference between outstanding_extents
and reserved_extents. Please see below case for how ENOSPC occurs:

   1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode
outstanding extents be 1, and reserved_extents be 1024. Note it's
btrfs_merge_extent_hook() that merges these 128KB units into one big
outstanding extent, but do not change reserved_extents.

   2, When writing dirty pages, for compression, cow_file_range_async() will
split above big extent in unit of 128KB(compression extent size is 128KB).
When first split opeartion finishes, we'll have 2 outstanding extents and 1024
reserved extents, and just right now the currently generated ordered extent is
dispatched to run and complete, then btrfs_delalloc_release_metadata()(see
btrfs_finish_ordered_io()) will be called to release metadata, after that we
will have 1 outstanding extents and 1 reserved extents(also see logic in
drop_outstanding_extent()). Later cow_file_range_async() continues to handles
left data range[128KB, 128MB), and if no other ordered extent was dispatched
to run, there will be 1023 outstanding extents and 1 reserved extent.

   3, Now if another bufferd write for this file enters, then
btrfs_delalloc_reserve_metadata() will at least try to reserve metadata
for 1023 outstanding extents' metadata, for 16KB node size, it'll be 
1023*16384*2*8,
about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, so
obviously it's not sane and can easily result in enospc error.

The root cause is that for compression, its max extent size will no longer be
BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation
method in btrfs is not appropriate or correct, here we introduce:
enum btrfs_metadata_reserve_type {
BTRFS_RESERVE_NORMAL,
BTRFS_RESERVE_COMPRESS,
};
and expand btrfs_delalloc_reserve_metadata() and btrfs_delalloc_reserve_space()
by adding a new enum btrfs_metadata_reserve_type argument. When a data range 
will
go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata.
Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go
through compression path.

With this patch, we can fix these false enospc error for compression.

Signed-off-by: Wang Xiaoguang 
---
  fs/btrfs/ctree.h |  31 ++--
  fs/btrfs/extent-tree.c   |  55 +
  fs/btrfs/extent_io.c |  59 +-
  fs/btrfs/extent_io.h |   2 +
  fs/btrfs/file.c  |  26 +--
  fs/btrfs/free-space-cache.c  |   6 +-
  fs/btrfs/inode-map.c |   5 +-
  fs/btrfs/inode.c | 181 ---
  fs/btrfs/ioctl.c |  12 ++-
  fs/btrfs/relocation.c   

[PATCH 2/2] btrfs: fix false enospc for compression

2016-10-05 Thread Wang Xiaoguang
When testing btrfs compression, sometimes we got ENOSPC error, though fs
still has much free space, xfstests generic/171, generic/172, generic/173,
generic/174, generic/175 can reveal this bug in my test environment when
compression is enabled.

After some debuging work, we found that it's btrfs_delalloc_reserve_metadata()
which sometimes tries to reserve plenty of metadata space, even for very small
data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes
we try to reserve is calculated by the difference between outstanding_extents
and reserved_extents. Please see below case for how ENOSPC occurs:

  1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode
outstanding extents be 1, and reserved_extents be 1024. Note it's
btrfs_merge_extent_hook() that merges these 128KB units into one big
outstanding extent, but do not change reserved_extents.

  2, When writing dirty pages, for compression, cow_file_range_async() will
split above big extent in unit of 128KB(compression extent size is 128KB).
When first split opeartion finishes, we'll have 2 outstanding extents and 1024
reserved extents, and just right now the currently generated ordered extent is
dispatched to run and complete, then btrfs_delalloc_release_metadata()(see
btrfs_finish_ordered_io()) will be called to release metadata, after that we
will have 1 outstanding extents and 1 reserved extents(also see logic in
drop_outstanding_extent()). Later cow_file_range_async() continues to handles
left data range[128KB, 128MB), and if no other ordered extent was dispatched
to run, there will be 1023 outstanding extents and 1 reserved extent.

  3, Now if another bufferd write for this file enters, then
btrfs_delalloc_reserve_metadata() will at least try to reserve metadata
for 1023 outstanding extents' metadata, for 16KB node size, it'll be 
1023*16384*2*8,
about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, so
obviously it's not sane and can easily result in enospc error.

The root cause is that for compression, its max extent size will no longer be
BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation
method in btrfs is not appropriate or correct, here we introduce:
enum btrfs_metadata_reserve_type {
BTRFS_RESERVE_NORMAL,
BTRFS_RESERVE_COMPRESS,
};
and expand btrfs_delalloc_reserve_metadata() and btrfs_delalloc_reserve_space()
by adding a new enum btrfs_metadata_reserve_type argument. When a data range 
will
go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata.
Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go
through compression path.

With this patch, we can fix these false enospc error for compression.

Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/ctree.h |  31 ++--
 fs/btrfs/extent-tree.c   |  55 +
 fs/btrfs/extent_io.c |  59 +-
 fs/btrfs/extent_io.h |   2 +
 fs/btrfs/file.c  |  26 +--
 fs/btrfs/free-space-cache.c  |   6 +-
 fs/btrfs/inode-map.c |   5 +-
 fs/btrfs/inode.c | 181 ---
 fs/btrfs/ioctl.c |  12 ++-
 fs/btrfs/relocation.c|  14 +++-
 fs/btrfs/tests/inode-tests.c |  15 ++--
 11 files changed, 309 insertions(+), 97 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 16885f6..fa6a19a 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -97,6 +97,19 @@ static const int btrfs_csum_sizes[] = { 4 };
 
 #define BTRFS_DIRTY_METADATA_THRESHSZ_32M
 
+/*
+ * for compression, max file extent size would be limited to 128K, so when
+ * reserving metadata for such delalloc writes, pass BTRFS_RESERVE_COMPRESS to
+ * btrfs_delalloc_reserve_metadata() or btrfs_delalloc_reserve_space() to
+ * calculate metadata, for none-compression, use BTRFS_RESERVE_NORMAL.
+ */
+enum btrfs_metadata_reserve_type {
+   BTRFS_RESERVE_NORMAL,
+   BTRFS_RESERVE_COMPRESS,
+};
+int inode_need_compress(struct inode *inode);
+u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type);
+
 #define BTRFS_MAX_EXTENT_SIZE SZ_128M
 
 struct btrfs_mapping_tree {
@@ -2677,10 +2690,14 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root 
*root,
 void btrfs_subvolume_release_metadata(struct btrfs_root *root,
  struct btrfs_block_rsv *rsv,
  u64 qgroup_reserved);
-int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes);
-void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes);
-int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len);
-void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len);
+int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes,
+   enum btrfs_metadata_reserve_type reserve_type);
+void