[PATCH] btrfs: Remove root argument from cow_file_range_inline

2018-03-01 Thread Nikolay Borisov
This argument is always set to the root of the inode, which is also
passed. So let's get a reference inside the function and simplify
the arg list.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/inode.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d71421ce90c1..346736e84e3f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -276,12 +276,12 @@ static int insert_inline_extent(struct btrfs_trans_handle 
*trans,
  * does the checks required to make sure the data is small enough
  * to fit as an inline extent.
  */
-static noinline int cow_file_range_inline(struct btrfs_root *root,
- struct inode *inode, u64 start,
+static noinline int cow_file_range_inline(struct inode *inode, u64 start,
  u64 end, size_t compressed_size,
  int compress_type,
  struct page **compressed_pages)
 {
+   struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_fs_info *fs_info = root->fs_info;
struct btrfs_trans_handle *trans;
u64 isize = i_size_read(inode);
@@ -457,7 +457,6 @@ static noinline void compress_file_range(struct inode 
*inode,
int *num_added)
 {
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-   struct btrfs_root *root = BTRFS_I(inode)->root;
u64 blocksize = fs_info->sectorsize;
u64 actual_end;
u64 isize = i_size_read(inode);
@@ -579,11 +578,11 @@ static noinline void compress_file_range(struct inode 
*inode,
/* we didn't compress the entire range, try
 * to make an uncompressed inline extent.
 */
-   ret = cow_file_range_inline(root, inode, start, end,
-   0, BTRFS_COMPRESS_NONE, NULL);
+   ret = cow_file_range_inline(inode, start, end, 0,
+   BTRFS_COMPRESS_NONE, NULL);
} else {
/* try making a compressed inline extent */
-   ret = cow_file_range_inline(root, inode, start, end,
+   ret = cow_file_range_inline(inode, start, end,
total_compressed,
compress_type, pages);
}
@@ -983,8 +982,8 @@ static noinline int cow_file_range(struct inode *inode,
 
if (start == 0) {
/* lets try to make an inline extent */
-   ret = cow_file_range_inline(root, inode, start, end, 0,
-   BTRFS_COMPRESS_NONE, NULL);
+   ret = cow_file_range_inline(inode, start, end, 0,
+   BTRFS_COMPRESS_NONE, NULL);
if (ret == 0) {
/*
 * We use DO_ACCOUNTING here because we need the
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Fix long standing -EOPNOTSUPP problem caused by

2018-03-01 Thread Qu Wenruo


On 2018年03月01日 22:53, David Sterba wrote:
> On Thu, Mar 01, 2018 at 10:47:43AM +0800, Qu Wenruo wrote:
>> Kernel doesn't support dropping range inside inline extent, and prevents
>> such thing happening by limiting max inline extent size to
>> min(max_inline, sectorsize - 1) in cow_file_range_inline().
>>
>> However btrfs-progs only inherit the BTRFS_MAX_INLINE_DATA_SIZE() macro,
>> which doesn't have sectorsize check.
>> And since btrfs-progs defaults to 16K nodesize, above macro allows large
>> inline extent over 15K size.
>>
>> This leads to unexpected kernel behavior.
>>
>> The bug exists from the very beginning of btrfs-convert, dating back to
>> 2008 when btrfs-convert is first introduced.
>>
>> Qu Wenruo (4):
>>   btrfs-progs: Limit inline extent below page size
>>   btrfs-progs: check/original mode: Check inline extent size
>>   btrfs-progs: check/lowmem mode: Check inline extent size
>>   btrfs-progs: test/convert: Add test case for invalid large inline data
>> extent
> 
> Thanks, added to devel. Fixes will be added to 4.15.2.

Just to mention, since we're checking inline extent size, and kernel can
still create such inline extent by symbol linking, I'm afraid we may get
some false alerts.
(Although it's should be less possible, as symbol link with over 4K size
is a little crazy)

Thanks,
Qu


> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


[PATCH 5/5] btrfs: Show more accurate max_inline

2018-03-01 Thread Qu Wenruo
Btrfs shows max_inline option into kernel message, but for
max_inline=4096, btrfs won't really inline 4096 bytes inline data if
it's not compressed.

Since we have unified the behavior and now BTRFS_MAX_INLINE_DATA_SIZE()
should handle most of the condition check, just limit
fs_info->max_inline to BTRFS_MAX_INLINE_DATA_SIZE(), so we could have
more accurate max_inline output.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/super.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 3a4dce153645..6685016bc0ec 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -618,8 +618,8 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char 
*options,
 
if (info->max_inline) {
info->max_inline = min_t(u64,
-   info->max_inline,
-   info->sectorsize);
+   info->max_inline,
+   BTRFS_MAX_INLINE_DATA_SIZE(info));
}
btrfs_info(info, "max_inline at %llu",
   info->max_inline);
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] btrfs: Parse options after node/sector size initialized

2018-03-01 Thread Qu Wenruo
This provides the basis for later max_inline enhancement, which needs to
access fs_info->nodesize.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/disk-io.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a8ecccfc36de..f7f985ed5af9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2644,12 +2644,6 @@ int open_ctree(struct super_block *sb,
 */
fs_info->compress_type = BTRFS_COMPRESS_ZLIB;
 
-   ret = btrfs_parse_options(fs_info, options, sb->s_flags);
-   if (ret) {
-   err = ret;
-   goto fail_alloc;
-   }
-
features = btrfs_super_incompat_flags(disk_super) &
~BTRFS_FEATURE_INCOMPAT_SUPP;
if (features) {
@@ -2692,6 +2686,13 @@ int open_ctree(struct super_block *sb,
fs_info->sectorsize = sectorsize;
fs_info->stripesize = stripesize;
 
+   /* Only parse options after node/sector size initialized */
+   ret = btrfs_parse_options(fs_info, options, sb->s_flags);
+   if (ret) {
+   err = ret;
+   goto fail_alloc;
+   }
+
/*
 * mixed block groups end up with duplicate but slightly offset
 * extent buffers for the same range.  It leads to corruptions
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] btrfs: Always limit inline extent size by uncompressed size

2018-03-01 Thread Qu Wenruo
Normally when specifying max_inline, we should normally limit it by
uncompressed extent size, as it's the only thing user can control.
(Control the algorithm and compressed data is almost impossible)

Since btrfs is providing *TRANSPARENT* compression, max_inline should
behave the same for both plain and compress data.

So this patch will use @inline_len instead of @data_len in
cow_file_range_inline() so user will know their max_inline mount option
works exactly the same for both plain and compressed data extent.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e1a7f3cb5be9..48472509239b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -303,7 +303,7 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
(!compressed_size &&
(actual_end & (fs_info->sectorsize - 1)) == 0) ||
end + 1 < isize ||
-   data_len > fs_info->max_inline) {
+   inline_len > fs_info->max_inline) {
return 1;
}
 
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] btrfs: Unify inline extent creation condition for plain and compressed data

2018-03-01 Thread Qu Wenruo
cow_file_range_inline() used different condition for plain and
compressed data.

For compressed data, it's allowed to have inline extent equal to sectorsize,
while for plain data, it's not allowed to have inline extent equal to
sectorsize.

However we limited BTRFS_MAX_INLINE_DATA_SIZE() to (sectorsize - 1),
and unified the inline extent condition, there is no such difference any
longer, just remove the extra check.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/inode.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index fe2991eeb337..1e9f4ff46b25 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -299,8 +299,6 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
 
if (start > 0 ||
actual_end > fs_info->sectorsize ||
-   (!compressed_size &&
-   (actual_end & (fs_info->sectorsize - 1)) == 0) ||
end + 1 < isize ||
inline_len > fs_info->max_inline) {
return 1;
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] btrfs: Embed sector size check into BTRFS_MAX_INLINE_DATA_SIZE()

2018-03-01 Thread Qu Wenruo
We have extra sector size check in cow_file_range_inline(), but doesn't
implement it in BTRFS_MAX_INLINE_DATA_SIZE().

The biggest reason is that btrfs_symlink() also uses this macro to check
name length.

In fact such behavior makes max_inline calculation quite confusing, and
cause unexpected large extent for symbol link.

Here we embed sector size check into BTRFS_MAX_INLINE_DATA_SIZE() so
that it will never exceed sector size.

The downside is, for symbol link, we will reduce max symbol link length
from 16K- to 4095, but it won't affect current system using that long
name, but only prevent later creation.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/ctree.h | 5 +++--
 fs/btrfs/inode.c | 1 -
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 13c260b525a1..90948096c00f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1297,8 +1297,9 @@ static inline u32 BTRFS_NODEPTRS_PER_BLOCK(const struct 
btrfs_fs_info *info)
(offsetof(struct btrfs_file_extent_item, disk_bytenr))
 static inline u32 BTRFS_MAX_INLINE_DATA_SIZE(const struct btrfs_fs_info *info)
 {
-   return BTRFS_MAX_ITEM_SIZE(info) -
-  BTRFS_FILE_EXTENT_INLINE_DATA_START;
+   return min_t(u32, info->sectorsize - 1,
+BTRFS_MAX_ITEM_SIZE(info) -
+BTRFS_FILE_EXTENT_INLINE_DATA_START);
 }
 
 static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 48472509239b..fe2991eeb337 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -299,7 +299,6 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
 
if (start > 0 ||
actual_end > fs_info->sectorsize ||
-   data_len > BTRFS_MAX_INLINE_DATA_SIZE(fs_info) ||
(!compressed_size &&
(actual_end & (fs_info->sectorsize - 1)) == 0) ||
end + 1 < isize ||
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] max_inline related enhancement

2018-03-01 Thread Qu Wenruo
This patchset intends to reduce confusion about "max_inline" mount
option.

The max_inline mount option has the following problems:

1) Different behavior for plain and compressed data extent
   For plain data extent, it's limiting the extent data size, and will
   never reach sector size.
   For compressed data extent, it's limiting the compressed data size,
   and compressed data size can reach sector size.

   The compressed behavior is very confusing for normal user, as it's
   almost impossible for end user to know if their operation will end up
   inlined or no inlined.

2) Inaccurate max inline output
   Passing max_inline=4096 and kernel will prompt max_inline is 4096,
   but we still don't allow inline plain data extent to reach 4096.

3) Symbol link can exceed sector size for its inlined data
   Since btrfs_symlink() is calling BTRFS_MAX_INLINE_DATA_SIZE()
   directly without extra truncation.

This patchset will fixes such problems by:

1) Limit both plain and compressed inline extent size by uncompressed
   data size
   So user know exactly what will end up on-disk, just by checking the data
   size.

2) Output max inline size by limiting it to BTRFS_MAX_INLINE_DATA_SIZE()
   other than sector size.

3) Embed sector size check into BTRFS_MAX_INLINE_DATA_SIZE()
   So now btrfs_symlink() won't create any inline extent larger than
   page size.
   (Only affects later operations, and can still read such existing
symbol link)

Qu Wenruo (5):
  btrfs: Parse options after node/sector size initialized
  btrfs: Always limit inline extent size by uncompressed size
  btrfs: Embed sector size check into BTRFS_MAX_INLINE_DATA_SIZE()
  btrfs: Unify inline extent creation condition for plain and compressed
data
  btrfs: Show more accurate max_inline

 fs/btrfs/ctree.h   |  5 +++--
 fs/btrfs/disk-io.c | 13 +++--
 fs/btrfs/inode.c   |  5 +
 fs/btrfs/super.c   |  4 ++--
 4 files changed, 13 insertions(+), 14 deletions(-)

-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ongoing Btrfs stability issues

2018-03-01 Thread Qu Wenruo


On 2018年03月02日 03:04, Alex Adriaanse wrote:
> On Feb 16, 2018, at 1:44 PM, Austin S. Hemmelgarn  
> wrote:
>> I would suggest changing this to eliminate the balance with '-dusage=10' 
>> (it's redundant with the '-dusage=20' one unless your filesystem is in 
>> pathologically bad shape), and adding equivalent filters for balancing 
>> metadata (which generally goes pretty fast).
>>
>> Unless you've got a huge filesystem, you can also cut down on that limit 
>> filter.  100 data chunks that are 40% full is up to 40GB of data to move on 
>> a normally sized filesystem, or potentially up to 200GB if you've got a 
>> really big filesystem (I forget what point BTRFS starts scaling up chunk 
>> sizes at, but I'm pretty sure it's in the TB range).
> 
> Thanks so much for the suggestions so far, everyone. I wanted to report back 
> on this. Last Friday I made the following changes per suggestions from this 
> thread:
> 
> 1. Change the nightly balance to the following:
> 
> btrfs balance start -dusage=20 
> btrfs balance start -dusage=40,limit=10 
> btrfs balance start -musage=30 
> 
> 2. Upgrade kernels for all VMs to 4.14.13-1~bpo9+1, which contains the SSD 
> space allocation fix.
> 
> 3. Boot Linux with the elevator=noop option
> 
> 4. Change /sys/block/xvd*/queue/scheduler to "none"
> 
> 5. Mount all our Btrfs filesystems with the "enospc_debug" option.
> 
> 6. I did NOT add the "nossd" flag because I didn't think it'd make much of a 
> difference after that SSD space allocation fix.
> 
> 7. After applying the above changes, ran a full balance on all the Btrfs 
> filesystems. I also have not experimented with autodefrag yet.
> 
> 
> Despite the changes above, we just experienced another crash this morning. 
> Kernel message (with enospc_debug turned on for the given mountpoint):

Would you please try to use "btrfs check" to check the filesystem offline?
I'm wondering if it's extent tree or free space cache get corrupted and
makes kernel confused about its space allocation.


I'm not completely sure, but it may also be something wrong with the
space cache.

So either mount it with nospace_cache option of use "btrfs check
--clear-space-cache v1" may help.

Thanks,
Qu

> 
> [496003.170278] use_block_rsv: 46 callbacks suppressed
> [496003.170279] BTRFS: block rsv returned -28
> [496003.173875] [ cut here ]
> [496003.177186] WARNING: CPU: 2 PID: 362 at 
> /build/linux-3RM5ap/linux-4.14.13/fs/btrfs/extent-tree.c:8458 
> btrfs_alloc_tree_block+0x39b/0x4c0 [btrfs]
> [496003.185369] Modules linked in: xt_nat xt_tcpudp veth ipt_MASQUERADE 
> nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo 
> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype 
> iptable_filter xt_conntrack nf_nat nf_conntrack libcrc32c crc32c_generic 
> br_netfilter bridge stp llc intel_rapl sb_edac crct10dif_pclmul crc32_pclmul 
> ghash_clmulni_intel ppdev intel_rapl_perf serio_raw parport_pc parport evdev 
> ip_tables x_tables autofs4 btrfs xor zstd_decompress zstd_compress xxhash 
> raid6_pq ata_generic crc32c_intel ata_piix libata xen_blkfront cirrus ttm 
> aesni_intel aes_x86_64 crypto_simd drm_kms_helper cryptd glue_helper ena 
> psmouse drm scsi_mod i2c_piix4 button
> [496003.218484] CPU: 2 PID: 362 Comm: btrfs-transacti Tainted: GW 
>   4.14.0-0.bpo.3-amd64 #1 Debian 4.14.13-1~bpo9+1
> [496003.224618] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
> [496003.228702] task: 8fc0fb6bd0c0 task.stack: 9e81c3ac
> [496003.233081] RIP: 0010:btrfs_alloc_tree_block+0x39b/0x4c0 [btrfs]
> [496003.237220] RSP: 0018:9e81c3ac3958 EFLAGS: 00010282
> [496003.241404] RAX: 001d RBX: 8fc0fbeac128 RCX: 
> 
> [496003.248004] RDX:  RSI: 8fc100a966f8 RDI: 
> 8fc100a966f8
> [496003.253896] RBP: 4000 R08: 0001 R09: 
> 0001667b
> [496003.258508] R10: 0001 R11: 0001667b R12: 
> 8fc0fbeac000
> [496003.264759] R13: 8fc0fac22800 R14: 0001 R15: 
> ffe4
> [496003.271203] FS:  () GS:8fc100a8() 
> knlGS:
> [496003.278169] CS:  0010 DS:  ES:  CR0: 80050033
> [496003.283917] CR2: 7efe00f36000 CR3: 000102a0a001 CR4: 
> 001606e0
> [496003.290309] DR0:  DR1:  DR2: 
> 
> [496003.296985] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [496003.303335] Call Trace:
> [496003.307113]  ? __pagevec_lru_add_fn+0x270/0x270
> [496003.312126]  __btrfs_cow_block+0x125/0x5c0 [btrfs]
> [496003.316995]  btrfs_cow_block+0xcb/0x1b0 [btrfs]
> [496003.321568]  btrfs_search_slot+0x1fd/0x9e0 [btrfs]
> [496003.326684]  lookup_inline_extent_backref+0x105/0x610 [btrfs]
> [496003.332724]  ? set_extent_bit+0x19/0x20 [btrfs]
> [496003.337991]  __btrfs_free_extent.isra.61+0xf5/0xd30 [btrfs]
> [496003.343436]  ? 

Re: [PATCH 1/3] btrfs: Embed sector size check into BTRFS_MAX_INLINE_DATA_SIZE()

2018-03-01 Thread Qu Wenruo
This patch along with any one uses sector/node size in
btrfs_parse_options() should only use them after it get initialized.

Unfortunately at this point, nodesize is not initialized in open_ctree().

So unfortunately, this patch needs extra work to move
btrfs_parse_options() after basic fs_info initialization, and please
ignore this version.

Thanks,
Qu

On 2018年03月02日 10:09, Qu Wenruo wrote:
> We have extra sector size check in cow_file_range_inline(), but doesn't
> implement it in BTRFS_MAX_INLINE_DATA_SIZE().
> 
> The biggest reason is that btrfs_symlink() also uses this macro to check
> name length.
> 
> In fact such behavior makes max_inline calculation quite confusing, and
> cause unexpected large extent for symbol link.
> 
> Here we embed sector size check into BTRFS_MAX_INLINE_DATA_SIZE() so
> that it will never exceed sector size.
> 
> The downside is, for symbol link, we will reduce max symbol link length
> from 16K- to 4095, but it won't affect current system using that long
> name, but only prevent later creation.
> 
> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/ctree.h | 5 +++--
>  fs/btrfs/inode.c | 1 -
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 13c260b525a1..90948096c00f 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1297,8 +1297,9 @@ static inline u32 BTRFS_NODEPTRS_PER_BLOCK(const struct 
> btrfs_fs_info *info)
>   (offsetof(struct btrfs_file_extent_item, disk_bytenr))
>  static inline u32 BTRFS_MAX_INLINE_DATA_SIZE(const struct btrfs_fs_info 
> *info)
>  {
> - return BTRFS_MAX_ITEM_SIZE(info) -
> -BTRFS_FILE_EXTENT_INLINE_DATA_START;
> + return min_t(u32, info->sectorsize - 1,
> +  BTRFS_MAX_ITEM_SIZE(info) -
> +  BTRFS_FILE_EXTENT_INLINE_DATA_START);
>  }
>  
>  static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info)
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index e1a7f3cb5be9..0f7041e10c67 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -299,7 +299,6 @@ static noinline int cow_file_range_inline(struct 
> btrfs_root *root,
>  
>   if (start > 0 ||
>   actual_end > fs_info->sectorsize ||
> - data_len > BTRFS_MAX_INLINE_DATA_SIZE(fs_info) ||
>   (!compressed_size &&
>   (actual_end & (fs_info->sectorsize - 1)) == 0) ||
>   end + 1 < isize ||
> 



signature.asc
Description: OpenPGP digital signature


[PATCH 1/3] btrfs: Embed sector size check into BTRFS_MAX_INLINE_DATA_SIZE()

2018-03-01 Thread Qu Wenruo
We have extra sector size check in cow_file_range_inline(), but doesn't
implement it in BTRFS_MAX_INLINE_DATA_SIZE().

The biggest reason is that btrfs_symlink() also uses this macro to check
name length.

In fact such behavior makes max_inline calculation quite confusing, and
cause unexpected large extent for symbol link.

Here we embed sector size check into BTRFS_MAX_INLINE_DATA_SIZE() so
that it will never exceed sector size.

The downside is, for symbol link, we will reduce max symbol link length
from 16K- to 4095, but it won't affect current system using that long
name, but only prevent later creation.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/ctree.h | 5 +++--
 fs/btrfs/inode.c | 1 -
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 13c260b525a1..90948096c00f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1297,8 +1297,9 @@ static inline u32 BTRFS_NODEPTRS_PER_BLOCK(const struct 
btrfs_fs_info *info)
(offsetof(struct btrfs_file_extent_item, disk_bytenr))
 static inline u32 BTRFS_MAX_INLINE_DATA_SIZE(const struct btrfs_fs_info *info)
 {
-   return BTRFS_MAX_ITEM_SIZE(info) -
-  BTRFS_FILE_EXTENT_INLINE_DATA_START;
+   return min_t(u32, info->sectorsize - 1,
+BTRFS_MAX_ITEM_SIZE(info) -
+BTRFS_FILE_EXTENT_INLINE_DATA_START);
 }
 
 static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e1a7f3cb5be9..0f7041e10c67 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -299,7 +299,6 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
 
if (start > 0 ||
actual_end > fs_info->sectorsize ||
-   data_len > BTRFS_MAX_INLINE_DATA_SIZE(fs_info) ||
(!compressed_size &&
(actual_end & (fs_info->sectorsize - 1)) == 0) ||
end + 1 < isize ||
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs: Show more accurate max_inline

2018-03-01 Thread Qu Wenruo
Btrfs shows max_inline option into kernel message, but for
max_inline=4096, btrfs won't really inline 4096 bytes inline data if
it's not compressed.

Since we have unified the behavior and now BTRFS_MAX_INLINE_DATA_SIZE()
should handle most of the condition check, just limit
fs_info->max_inline to BTRFS_MAX_INLINE_DATA_SIZE(), so we could have
more accurate max_inline output.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/super.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 3a4dce153645..6685016bc0ec 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -618,8 +618,8 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char 
*options,
 
if (info->max_inline) {
info->max_inline = min_t(u64,
-   info->max_inline,
-   info->sectorsize);
+   info->max_inline,
+   BTRFS_MAX_INLINE_DATA_SIZE(info));
}
btrfs_info(info, "max_inline at %llu",
   info->max_inline);
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs: Unify inline extent creation condition for plain and compressed data

2018-03-01 Thread Qu Wenruo
cow_file_range_inline() used different condition for plain and
compressed data.

For compressed data, it's allowed to have inline extent equal to sectorsize,
while for plain data, it's not allowed to have inline extent equal to
sectorsize.

But since we limit BTRFS_MAX_INLINE_DATA_SIZE() to (sectorsize - 1),
there is no such difference any long, just remove the extra check.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/inode.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0f7041e10c67..8c5e69bdbfb5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -299,8 +299,6 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
 
if (start > 0 ||
actual_end > fs_info->sectorsize ||
-   (!compressed_size &&
-   (actual_end & (fs_info->sectorsize - 1)) == 0) ||
end + 1 < isize ||
data_len > fs_info->max_inline) {
return 1;
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] btrfs-progs: Limit inline extent below page size

2018-03-01 Thread Qu Wenruo


On 2018年03月02日 01:47, Nikolay Borisov wrote:
> 
> 
> On  1.03.2018 04:47, Qu Wenruo wrote:
>> Kernel doesn't support to drop extent inside an inlined extent.
>> And kernel tends to limit inline extent just below sectorsize, so also
>> limit it in btrfs-progs.
>>
>> This fixes unexpected -EOPNOTSUPP error from __btrfs_drop_extents() on
>> converted btrfs.
>>
>> Fixes: 806528b8755f ("Add Yan Zheng's ext3->btrfs conversion program")
>> Reported-by: Peter Y. Chuang 
>> Signed-off-by: Qu Wenruo 
>> ---
>>  ctree.h | 11 +--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/ctree.h b/ctree.h
>> index 17cdac76c58c..0282deef339b 100644
>> --- a/ctree.h
>> +++ b/ctree.h
>> @@ -20,6 +20,7 @@
>>  #define __BTRFS_CTREE_H__
>>  
>>  #include 
>> +#include "internal.h"
>>  
>>  #if BTRFS_FLAT_INCLUDES
>>  #include "list.h"
>> @@ -1195,8 +1196,14 @@ static inline u32 BTRFS_NODEPTRS_PER_BLOCK(const 
>> struct btrfs_fs_info *info)
>>  (offsetof(struct btrfs_file_extent_item, disk_bytenr))
>>  static inline u32 BTRFS_MAX_INLINE_DATA_SIZE(const struct btrfs_fs_info 
>> *info)
>>  {
>> -return BTRFS_MAX_ITEM_SIZE(info) -
>> -BTRFS_FILE_EXTENT_INLINE_DATA_START;
>> +/*
>> + * Inline extent larger than pagesize could lead to kernel unexpected
>> + * error when dropping extents, so we need to limit the inline extent
>> + * size to less than sectorsize.
>> + */
>> +return min_t(u32, info->sectorsize - 1,
>> + BTRFS_MAX_ITEM_SIZE(info) -
>> + BTRFS_FILE_EXTENT_INLINE_DATA_START);
>>  }
> 
> Isn't the same change required in the kernel as well ?

Yep, kernel patch underway.

Although, kernel puts a lot of extra check into cow_file_range_inline()
instead of using the macro directly, I would like to enhance the check
in parse_options() so we could get a correct max_inline prompt.
(Currently we could pass max_inline=4096, and kernel prompt also shows
4096, while we can only inline 4095 bytes)

Thanks,
Qu

> 
>>  
>>  static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info)
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


Re: Ongoing Btrfs stability issues

2018-03-01 Thread Nikolay Borisov


On  1.03.2018 21:04, Alex Adriaanse wrote:
> On Feb 16, 2018, at 1:44 PM, Austin S. Hemmelgarn  
> wrote:
>> I would suggest changing this to eliminate the balance with '-dusage=10' 
>> (it's redundant with the '-dusage=20' one unless your filesystem is in 
>> pathologically bad shape), and adding equivalent filters for balancing 
>> metadata (which generally goes pretty fast).
>>
>> Unless you've got a huge filesystem, you can also cut down on that limit 
>> filter.  100 data chunks that are 40% full is up to 40GB of data to move on 
>> a normally sized filesystem, or potentially up to 200GB if you've got a 
>> really big filesystem (I forget what point BTRFS starts scaling up chunk 
>> sizes at, but I'm pretty sure it's in the TB range).
> 
> Thanks so much for the suggestions so far, everyone. I wanted to report back 
> on this. Last Friday I made the following changes per suggestions from this 
> thread:
> 
> 1. Change the nightly balance to the following:
> 
> btrfs balance start -dusage=20 
> btrfs balance start -dusage=40,limit=10 
> btrfs balance start -musage=30 
> 
> 2. Upgrade kernels for all VMs to 4.14.13-1~bpo9+1, which contains the SSD 
> space allocation fix.
> 
> 3. Boot Linux with the elevator=noop option
> 
> 4. Change /sys/block/xvd*/queue/scheduler to "none"
> 
> 5. Mount all our Btrfs filesystems with the "enospc_debug" option.

SO that's good, however you didn't apply the out of tree patch (it has
already been merged into the for-next so will likely land in 4.17) I
pointed you at. As a result when you your ENOSPC error there is no extra
information being printed so we can't really reason about what might be
going wrong in the metadata flushing algorithms.




> [496003.641729] BTRFS: error (device xvdc) in __btrfs_free_extent:7076: 
> errno=-28 No space left
> [496003.641994] BTRFS: error (device xvdc) in btrfs_drop_snapshot:9332: 
> errno=-28 No space left
> [496003.641996] BTRFS info (device xvdc): forced readonly
> [496003.641998] BTRFS: error (device xvdc) in merge_reloc_roots:2470: 
> errno=-28 No space left
> [496003.642060] BUG: unable to handle kernel NULL pointer dereference at  
>  (null)
> [496003.642086] IP: __del_reloc_root+0x3c/0x100 [btrfs]
> [496003.642087] PGD 8005fe08c067 P4D 8005fe08c067 PUD 3bd2f4067 PMD 0
> [496003.642091] Oops:  [#1] SMP PTI
> [496003.642093] Modules linked in: xt_nat xt_tcpudp veth ipt_MASQUERADE 
> nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo 
> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype 
> iptable_filter xt_conntrack nf_nat nf_conntrack libcrc32c crc32c_generic 
> br_netfilter bridge stp llc intel_rapl sb_edac crct10dif_pclmul crc32_pclmul 
> ghash_clmulni_intel ppdev intel_rapl_perf serio_raw parport_pc parport evdev 
> ip_tables x_tables autofs4 btrfs xor zstd_decompress zstd_compress xxhash 
> raid6_pq ata_generic crc32c_intel ata_piix libata xen_blkfront cirrus ttm 
> aesni_intel aes_x86_64 crypto_simd drm_kms_helper cryptd glue_helper ena 
> psmouse drm scsi_mod i2c_piix4 button
> [496003.642128] CPU: 1 PID: 25327 Comm: btrfs Tainted: GW   
> 4.14.0-0.bpo.3-amd64 #1 Debian 4.14.13-1~bpo9+1
> [496003.642129] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
> [496003.642130] task: 8fbffb8dd080 task.stack: 9e81c7b8c000
> [496003.642149] RIP: 0010:__del_reloc_root+0x3c/0x100 [btrfs]


if you happen to have the vmlinux of that kernel can you run the
following from the kernel source directory:

./scripts/faddr2line  __del_reloc_root+0x3c/0x100 vmlinux


> [496003.642151] RSP: 0018:9e81c7b8fab0 EFLAGS: 00010286
> [496003.642153] RAX:  RBX: 8fb90a10a3c0 RCX: 
> ca5d1fda5a5f
> [496003.642154] RDX: 0001 RSI: 8fc05eae62c0 RDI: 
> 8fbc4fd87d70
> [496003.642154] RBP: 8fbbb5139000 R08:  R09: 
> 
> [496003.642155] R10: 8fc05eae62c0 R11: 01bc R12: 
> 8fc0fbeac000
> [496003.642156] R13: 8fbc4fd87d70 R14: 8fbc4fd87800 R15: 
> ffe4
> [496003.642157] FS:  7f64196708c0() GS:8fc100a4() 
> knlGS:
> [496003.642159] CS:  0010 DS:  ES:  CR0: 80050033
> [496003.642160] CR2:  CR3: 00069b972004 CR4: 
> 001606e0
> [496003.642162] DR0:  DR1:  DR2: 
> 
> [496003.642163] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [496003.642164] Call Trace:
> [496003.642185]  free_reloc_roots+0x22/0x60 [btrfs]
> [496003.642202]  merge_reloc_roots+0x184/0x260 [btrfs]
> [496003.642217]  relocate_block_group+0x29a/0x610 [btrfs]
> [496003.642232]  btrfs_relocate_block_group+0x17b/0x230 [btrfs]
> [496003.642254]  btrfs_relocate_chunk+0x38/0xb0 [btrfs]
> [496003.642272]  btrfs_balance+0xa15/0x1250 [btrfs]
> [496003.642292]  btrfs_ioctl_balance+0x368/0x380 [btrfs]
> [496003.642309]  btrfs_ioctl+0x1170/0x2

Re: Ongoing Btrfs stability issues

2018-03-01 Thread Alex Adriaanse
On Feb 16, 2018, at 1:44 PM, Austin S. Hemmelgarn  wrote:
> I would suggest changing this to eliminate the balance with '-dusage=10' 
> (it's redundant with the '-dusage=20' one unless your filesystem is in 
> pathologically bad shape), and adding equivalent filters for balancing 
> metadata (which generally goes pretty fast).
> 
> Unless you've got a huge filesystem, you can also cut down on that limit 
> filter.  100 data chunks that are 40% full is up to 40GB of data to move on a 
> normally sized filesystem, or potentially up to 200GB if you've got a really 
> big filesystem (I forget what point BTRFS starts scaling up chunk sizes at, 
> but I'm pretty sure it's in the TB range).

Thanks so much for the suggestions so far, everyone. I wanted to report back on 
this. Last Friday I made the following changes per suggestions from this thread:

1. Change the nightly balance to the following:

btrfs balance start -dusage=20 
btrfs balance start -dusage=40,limit=10 
btrfs balance start -musage=30 

2. Upgrade kernels for all VMs to 4.14.13-1~bpo9+1, which contains the SSD 
space allocation fix.

3. Boot Linux with the elevator=noop option

4. Change /sys/block/xvd*/queue/scheduler to "none"

5. Mount all our Btrfs filesystems with the "enospc_debug" option.

6. I did NOT add the "nossd" flag because I didn't think it'd make much of a 
difference after that SSD space allocation fix.

7. After applying the above changes, ran a full balance on all the Btrfs 
filesystems. I also have not experimented with autodefrag yet.


Despite the changes above, we just experienced another crash this morning. 
Kernel message (with enospc_debug turned on for the given mountpoint):

[496003.170278] use_block_rsv: 46 callbacks suppressed
[496003.170279] BTRFS: block rsv returned -28
[496003.173875] [ cut here ]
[496003.177186] WARNING: CPU: 2 PID: 362 at 
/build/linux-3RM5ap/linux-4.14.13/fs/btrfs/extent-tree.c:8458 
btrfs_alloc_tree_block+0x39b/0x4c0 [btrfs]
[496003.185369] Modules linked in: xt_nat xt_tcpudp veth ipt_MASQUERADE 
nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype 
iptable_filter xt_conntrack nf_nat nf_conntrack libcrc32c crc32c_generic 
br_netfilter bridge stp llc intel_rapl sb_edac crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel ppdev intel_rapl_perf serio_raw parport_pc parport evdev 
ip_tables x_tables autofs4 btrfs xor zstd_decompress zstd_compress xxhash 
raid6_pq ata_generic crc32c_intel ata_piix libata xen_blkfront cirrus ttm 
aesni_intel aes_x86_64 crypto_simd drm_kms_helper cryptd glue_helper ena 
psmouse drm scsi_mod i2c_piix4 button
[496003.218484] CPU: 2 PID: 362 Comm: btrfs-transacti Tainted: GW   
4.14.0-0.bpo.3-amd64 #1 Debian 4.14.13-1~bpo9+1
[496003.224618] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[496003.228702] task: 8fc0fb6bd0c0 task.stack: 9e81c3ac
[496003.233081] RIP: 0010:btrfs_alloc_tree_block+0x39b/0x4c0 [btrfs]
[496003.237220] RSP: 0018:9e81c3ac3958 EFLAGS: 00010282
[496003.241404] RAX: 001d RBX: 8fc0fbeac128 RCX: 

[496003.248004] RDX:  RSI: 8fc100a966f8 RDI: 
8fc100a966f8
[496003.253896] RBP: 4000 R08: 0001 R09: 
0001667b
[496003.258508] R10: 0001 R11: 0001667b R12: 
8fc0fbeac000
[496003.264759] R13: 8fc0fac22800 R14: 0001 R15: 
ffe4
[496003.271203] FS:  () GS:8fc100a8() 
knlGS:
[496003.278169] CS:  0010 DS:  ES:  CR0: 80050033
[496003.283917] CR2: 7efe00f36000 CR3: 000102a0a001 CR4: 
001606e0
[496003.290309] DR0:  DR1:  DR2: 

[496003.296985] DR3:  DR6: fffe0ff0 DR7: 
0400
[496003.303335] Call Trace:
[496003.307113]  ? __pagevec_lru_add_fn+0x270/0x270
[496003.312126]  __btrfs_cow_block+0x125/0x5c0 [btrfs]
[496003.316995]  btrfs_cow_block+0xcb/0x1b0 [btrfs]
[496003.321568]  btrfs_search_slot+0x1fd/0x9e0 [btrfs]
[496003.326684]  lookup_inline_extent_backref+0x105/0x610 [btrfs]
[496003.332724]  ? set_extent_bit+0x19/0x20 [btrfs]
[496003.337991]  __btrfs_free_extent.isra.61+0xf5/0xd30 [btrfs]
[496003.343436]  ? btrfs_merge_delayed_refs+0x8f/0x560 [btrfs]
[496003.349322]  __btrfs_run_delayed_refs+0x516/0x12a0 [btrfs]
[496003.355157]  btrfs_run_delayed_refs+0x7a/0x270 [btrfs]
[496003.360707]  btrfs_commit_transaction+0x3e1/0x950 [btrfs]
[496003.366022]  ? remove_wait_queue+0x60/0x60
[496003.370898]  transaction_kthread+0x195/0x1b0 [btrfs]
[496003.376411]  kthread+0xfc/0x130
[496003.380741]  ? btrfs_cleanup_transaction+0x580/0x580 [btrfs]
[496003.386404]  ? kthread_create_on_node+0x70/0x70
[496003.391287]  ? do_group_exit+0x3a/0xa0
[496003.396201]  ret_from_fork+0x1f/0x30
[496003.400779] Code: ff 48 c7 c6 28 b7 4c c0 4

Re: [PATCH 1/4] btrfs-progs: Limit inline extent below page size

2018-03-01 Thread Nikolay Borisov


On  1.03.2018 04:47, Qu Wenruo wrote:
> Kernel doesn't support to drop extent inside an inlined extent.
> And kernel tends to limit inline extent just below sectorsize, so also
> limit it in btrfs-progs.
> 
> This fixes unexpected -EOPNOTSUPP error from __btrfs_drop_extents() on
> converted btrfs.
> 
> Fixes: 806528b8755f ("Add Yan Zheng's ext3->btrfs conversion program")
> Reported-by: Peter Y. Chuang 
> Signed-off-by: Qu Wenruo 
> ---
>  ctree.h | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/ctree.h b/ctree.h
> index 17cdac76c58c..0282deef339b 100644
> --- a/ctree.h
> +++ b/ctree.h
> @@ -20,6 +20,7 @@
>  #define __BTRFS_CTREE_H__
>  
>  #include 
> +#include "internal.h"
>  
>  #if BTRFS_FLAT_INCLUDES
>  #include "list.h"
> @@ -1195,8 +1196,14 @@ static inline u32 BTRFS_NODEPTRS_PER_BLOCK(const 
> struct btrfs_fs_info *info)
>   (offsetof(struct btrfs_file_extent_item, disk_bytenr))
>  static inline u32 BTRFS_MAX_INLINE_DATA_SIZE(const struct btrfs_fs_info 
> *info)
>  {
> - return BTRFS_MAX_ITEM_SIZE(info) -
> - BTRFS_FILE_EXTENT_INLINE_DATA_START;
> + /*
> +  * Inline extent larger than pagesize could lead to kernel unexpected
> +  * error when dropping extents, so we need to limit the inline extent
> +  * size to less than sectorsize.
> +  */
> + return min_t(u32, info->sectorsize - 1,
> +  BTRFS_MAX_ITEM_SIZE(info) -
> +  BTRFS_FILE_EXTENT_INLINE_DATA_START);
>  }

Isn't the same change required in the kernel as well ?

>  
>  static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info)
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Fix long standing -EOPNOTSUPP problem caused by

2018-03-01 Thread David Sterba
On Thu, Mar 01, 2018 at 10:47:43AM +0800, Qu Wenruo wrote:
> Kernel doesn't support dropping range inside inline extent, and prevents
> such thing happening by limiting max inline extent size to
> min(max_inline, sectorsize - 1) in cow_file_range_inline().
> 
> However btrfs-progs only inherit the BTRFS_MAX_INLINE_DATA_SIZE() macro,
> which doesn't have sectorsize check.
> And since btrfs-progs defaults to 16K nodesize, above macro allows large
> inline extent over 15K size.
> 
> This leads to unexpected kernel behavior.
> 
> The bug exists from the very beginning of btrfs-convert, dating back to
> 2008 when btrfs-convert is first introduced.
> 
> Qu Wenruo (4):
>   btrfs-progs: Limit inline extent below page size
>   btrfs-progs: check/original mode: Check inline extent size
>   btrfs-progs: check/lowmem mode: Check inline extent size
>   btrfs-progs: test/convert: Add test case for invalid large inline data
> extent

Thanks, added to devel. Fixes will be added to 4.15.2.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: dev-replace: skip prealloc extents when copy nocow pages

2018-03-01 Thread David Sterba
On Thu, Mar 01, 2018 at 12:52:11PM +, Filipe Manana wrote:
> On Wed, Feb 28, 2018 at 1:10 AM, Liu Bo  wrote:
> > It doens't make sense to process prealloc extents as pages will be
> > filled with zero when reading prealloc extents.
> >
> > Signed-off-by: Liu Bo 
> Reviewed-by: Filipe Manana 
> 
> Makes sense.

Added to next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: dev-replace: skip prealloc extents when copy nocow pages

2018-03-01 Thread Filipe Manana
On Wed, Feb 28, 2018 at 1:10 AM, Liu Bo  wrote:
> It doens't make sense to process prealloc extents as pages will be
> filled with zero when reading prealloc extents.
>
> Signed-off-by: Liu Bo 
Reviewed-by: Filipe Manana 

Makes sense.

> ---
>  fs/btrfs/scrub.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index ec56f33..9882513 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -4480,7 +4480,8 @@ static int check_extent_to_block(struct btrfs_inode 
> *inode, u64 start, u64 len,
>  * move on to the next inode.
>  */
> if (em->block_start > logical ||
> -   em->block_start + em->block_len < logical + len) {
> +   em->block_start + em->block_len < logical + len ||
> +   test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
> free_extent_map(em);
> ret = 1;
> goto out_unlock;
> --
> 2.9.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Open invoices

2018-03-01 Thread magagulab.magagula81
Hello ,



I am in need of a copy of receipt # 69838940 for $3,526.76. If you could please 
e-mail or fax it to me today I would be eternally grateful. Email is:  Or fax # 
352-168-3330.

>> http://condosiesta.com/New-order/




Best wishes,



Open Past Due Orders.pdf
Description: Adobe PDF document


Re: btrfs space used issue

2018-03-01 Thread Austin S. Hemmelgarn

On 2018-03-01 05:18, Andrei Borzenkov wrote:

On Thu, Mar 1, 2018 at 12:26 PM, vinayak hegde  wrote:

No, there is no opened file which is deleted, I did umount and mounted
again and reboot also.

I think I am hitting the below issue, lot of random writes were
happening and the file is not fully written and its sparse file.
Let me try with disabling COW.


file offset 0   offset 302g
[-prealloced 302g extent--]

(man it's impressive I got all that lined up right)

On disk you have 2 things. First your file which has file extents which says

inode 256, file offset 0, size 302g, offset0, disk bytenr 123, disklen 302g

and then in the extent tree, who keeps track of actual allocated space has this

extent bytenr 123, len 302g, refs 1

Now say you boot up your virt image and it writes 1 4k block to offset
0. Now you have this

[4k][302g-4k--]

And for your inode you now have this

inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g),
disklen 4k inode 256, file offset 4k, size 302g-4k, offset 4k,
diskbytenr 123, disklen 302g

and in your extent tree you have

extent bytenr 123, len 302g, refs 1
extent bytenr whatever, len 4k, refs 1

See that? Your file is still the same size, it is still 302g. If you
cp'ed it right now it would copy 302g of information. But what you
have actually allocated on disk? Well that's now 302g + 4k. Now lets
say your virt thing decides to write to the middle, lets say at offset
12k, now you have thisinode 256, file offset 0, size 4k, offset 0,
diskebytenr (123+302g), disklen 4k

inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g

inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever,
disklen 4k inode 256, file offset 16k, size 302g - 16k, offset 16k,
diskbytenr 123, disklen 302g

and in the extent tree you have this

extent bytenr 123, len 302g, refs 2
extent bytenr whatever, len 4k, refs 1
extent bytenr notimportant, len 4k, refs 1

See that refs 2 change? We split the original extent, so we have 2
file extents pointing to the same physical extents, so we bumped the
ref count. This will happen over and over again until we have
completely overwritten the original extent, at which point your space
usage will go back down to ~302g.


Sure, I just mentioned the same in another thread. But you said you
performed full defragmentation and I expect it to "fix" this condition
by relocating data and freeing original big extent. If this did not
happen, I wonder what are conditions when defragment decides to (not)
move data.

While I'm not certain exactly how it works, defragmentation tries to 
make all extents at least as large as a target extent size.  By default, 
this target size is 32MB (I believe it used to be 20, but I'm not 100% 
certain about that).  For files less than that size, they will always be 
fully defragmented if there is any fragmentation.  For files larger than 
that size, defrag may ignore extents larger than that size.  The `-t` 
option for the defrag command can be used to control this aspect.  It 
may also avoid given extents for other more complicated reasons 
involving free space fragmentation, but the primary one is the target 
extent size.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] btrfs: introduce feature to forget a btrfs device

2018-03-01 Thread Anand Jain



On 02/28/2018 08:57 AM, Liu Bo wrote:

On Thu, Jan 11, 2018 at 09:25:50AM +0800, Anand Jain wrote:

Support for a new command 'btrfs dev forget [dev]' is proposed here,
to undo the effects of 'btrfs dev scan [dev]'. For this purpose,
this patch proposes to use ioctl #5 as it was empty.
IOW(BTRFS_IOCTL_MAGIC, 5, ..)
This patch adds new ioctl BTRFS_IOC_FORGET_DEV which can be sent from
the /dev/btrfs-control to forget one or all devices, (devices which are
not mounted) from the btrfs kernel.



To me this seems to offer a debugging ability, could you please
elaborate the use case where we need to forget a particular device
instead of just wiping the uuid?



Right, debug is one use case, the other use case is to recover from
the split brain scenario [1].
When it happens we need to bring user's attention to decide which
disk is good and needs a way to un-scan/forget the device so that
the FS can be mounted with a disk missing.

[1]
 https://patchwork.kernel.org/patch/10055145/

Thanks, Anand



Thanks,

-liubo

The argument it takes is struct btrfs_ioctl_vol_args_v2, and ::name can be
set to specify the device path. And all unmounted devices can be removed
from the kernel using the BTRFS_DEVICE_SPEC_ALL_DEV flag. Remove all
devices functionality would override remove one device when both are
specified in an IOCTL call.
Again, the devices are removed only if the relevant fsid aren't mounted.

Signed-off-by: Anand Jain 
---
  fs/btrfs/super.c   | 27 +++
  fs/btrfs/volumes.c |  9 +
  fs/btrfs/volumes.h |  1 +
  include/uapi/linux/btrfs.h |  6 +-
  4 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 559fc53ff59e..6a9a5ce8af3b 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2219,21 +2219,37 @@ static long btrfs_control_ioctl(struct file *file, 
unsigned int cmd,
unsigned long arg)
  {
struct btrfs_ioctl_vol_args *vol;
+   struct btrfs_ioctl_vol_args_v2 *vol2;
struct btrfs_fs_devices *fs_devices;
int ret = -ENOTTY;
  
  	if (!capable(CAP_SYS_ADMIN))

return -EPERM;
  
-	vol = memdup_user((void __user *)arg, sizeof(*vol));

-   if (IS_ERR(vol))
-   return PTR_ERR(vol);
+   if (cmd == BTRFS_IOC_FORGET_DEV) {
+   vol2 = memdup_user((void __user *)arg, sizeof(*vol2));
+   if (IS_ERR(vol2))
+   return PTR_ERR(vol2);
+
+   if (vol2->flags & ~BTRFS_VOL_ARG_V2_FLAGS_SUPPORTED)
+   return -EOPNOTSUPP;
+   } else {
+   vol = memdup_user((void __user *)arg, sizeof(*vol));
+   if (IS_ERR(vol))
+   return PTR_ERR(vol);
+   }
  
  	switch (cmd) {

case BTRFS_IOC_SCAN_DEV:
ret = btrfs_scan_one_device(vol->name, FMODE_READ,
&btrfs_fs_type, &fs_devices);
break;
+   case BTRFS_IOC_FORGET_DEV:
+   if (vol2->flags & BTRFS_DEVICE_SPEC_ALL_DEV)
+   ret = btrfs_forget_devices(NULL);
+   else
+   ret = btrfs_forget_devices(vol2->name);
+   break;
case BTRFS_IOC_DEVICES_READY:
ret = btrfs_scan_one_device(vol->name, FMODE_READ,
&btrfs_fs_type, &fs_devices);
@@ -2246,7 +2262,10 @@ static long btrfs_control_ioctl(struct file *file, 
unsigned int cmd,
break;
}
  
-	kfree(vol);

+   if (cmd == BTRFS_IOC_FORGET_DEV)
+   kfree(vol2);
+   else
+   kfree(vol);
return ret;
  }
  
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c

index e947e47f8fff..b0c9948baf9a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1171,6 +1171,15 @@ static int btrfs_read_disk_super(struct block_device 
*bdev, u64 bytenr,
return 0;
  }
  
+int btrfs_forget_devices(const char *path)

+{
+   mutex_lock(&uuid_mutex);
+   btrfs_free_stale_devices(path, NULL);
+   mutex_unlock(&uuid_mutex);
+
+   return 0;
+}
+
  /*
   * Look for a btrfs signature on a device. This may be called out of the 
mount path
   * and we are not allowed to call set_blocksize during the scan. The 
superblock
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 15216fed918b..b954ca3b79a9 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -422,6 +422,7 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
   fmode_t flags, void *holder);
  int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
  struct btrfs_fs_devices **fs_devices_ret);
+int btrfs_forget_devices(const char *path);
  int btrfs_close_devices(struct btrfs_fs_devices *fs_devices);
  void btrfs_close_extra_devices(struct btrfs_fs_devices *fs_devices, int step);
  void btrfs_as

Re: [PATCH v3] btrfs: verify max_inline mount parameter

2018-03-01 Thread Anand Jain



On 02/27/2018 11:45 PM, David Sterba wrote:

On Mon, Feb 26, 2018 at 10:47:04AM +0800, Anand Jain wrote:

We aren't verifying the parameter passed to the max_inline mount option.
So we won't fail the mount if a junk value is specified, for example,
-o max_inline=abc. This patch checks if input is valid.

Signed-off-by: Anand Jain 
---
v2->v3: Handle parameter with unit, such as 4K. Use memparse() 2nd arg.
v1->v2: use match_int ret value if error
 use %u instead of %d for parser

  fs/btrfs/super.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 77e0537e1db5..76b58da8d56d 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -605,7 +605,14 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char 
*options,
case Opt_max_inline:
num = match_strdup(&args[0]);
if (num) {
-   info->max_inline = memparse(num, NULL);
+   char *retptr;
+
+   info->max_inline = memparse(num, &retptr);


I missed it in the patch that changed max_inline to u32, memparse
returns unsigned long long, so this is not entrely correct and requires
a temporary variable.

We should also report if the user-specified value is larger than
BTRFS_MAX_METADATA_BLOCKSIZE .


(Got diverted into something else. Sorry for the delay.)

Currently -o max_line can be only upto sectorsize.

We have MAX_INLINE_EXTENT_BUFFER_SIZE which is 64K and is equal to 
BTRFS_MAX_METADATA_BLOCKSIZE (also 64K)


I didn't get the point that max_inline is limited by sector size in the 
current design. Any idea?


Thanks, Anand

---
#define BTRFS_MAX_METADATA_BLOCKSIZE 65536
#define INLINE_EXTENT_BUFFER_PAGES 16
#define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * 
PAGE_SIZE)

-
static struct extent_buffer *
__alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
  unsigned long len)
{
::
   /*
   * Sanity checks, currently the maximum is 64k covered by 16x 4k pages
   */
BUILD_BUG_ON(BTRFS_MAX_METADATA_BLOCKSIZE
> MAX_INLINE_EXTENT_BUFFER_SIZE);
BUG_ON(len > MAX_INLINE_EXTENT_BUFFER_SIZE);
-
int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
unsigned long new_flags)
{
::
if (info->max_inline) {
info->max_inline = min_t(u64,
info->max_inline,
info->sectorsize);
}
-




This is not a trivial fix the existing patches so I'll remove "btrfs:
declare max_inline as u32".

To sum it up:

1. add check and return EINVAL with a message if max_inline is larger
than the metadata block size
2. switch max_inline to u32 and add a temporary value to read from
memparse
3. add change from this patch that catches the junk


+   if (*retptr != '\0') {
+   ret = -EINVAL;
+   kfree(num);


The kfree can be moved before the check, we don't need 'num'.


+   goto out;
+   }
kfree(num);
  
  if (info->max_inline) {

--
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs space used issue

2018-03-01 Thread Andrei Borzenkov
On Thu, Mar 1, 2018 at 12:26 PM, vinayak hegde  wrote:
> No, there is no opened file which is deleted, I did umount and mounted
> again and reboot also.
>
> I think I am hitting the below issue, lot of random writes were
> happening and the file is not fully written and its sparse file.
> Let me try with disabling COW.
>
>
> file offset 0   offset 302g
> [-prealloced 302g extent--]
>
> (man it's impressive I got all that lined up right)
>
> On disk you have 2 things. First your file which has file extents which says
>
> inode 256, file offset 0, size 302g, offset0, disk bytenr 123, disklen 302g
>
> and then in the extent tree, who keeps track of actual allocated space has 
> this
>
> extent bytenr 123, len 302g, refs 1
>
> Now say you boot up your virt image and it writes 1 4k block to offset
> 0. Now you have this
>
> [4k][302g-4k--]
>
> And for your inode you now have this
>
> inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g),
> disklen 4k inode 256, file offset 4k, size 302g-4k, offset 4k,
> diskbytenr 123, disklen 302g
>
> and in your extent tree you have
>
> extent bytenr 123, len 302g, refs 1
> extent bytenr whatever, len 4k, refs 1
>
> See that? Your file is still the same size, it is still 302g. If you
> cp'ed it right now it would copy 302g of information. But what you
> have actually allocated on disk? Well that's now 302g + 4k. Now lets
> say your virt thing decides to write to the middle, lets say at offset
> 12k, now you have thisinode 256, file offset 0, size 4k, offset 0,
> diskebytenr (123+302g), disklen 4k
>
> inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g
>
> inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever,
> disklen 4k inode 256, file offset 16k, size 302g - 16k, offset 16k,
> diskbytenr 123, disklen 302g
>
> and in the extent tree you have this
>
> extent bytenr 123, len 302g, refs 2
> extent bytenr whatever, len 4k, refs 1
> extent bytenr notimportant, len 4k, refs 1
>
> See that refs 2 change? We split the original extent, so we have 2
> file extents pointing to the same physical extents, so we bumped the
> ref count. This will happen over and over again until we have
> completely overwritten the original extent, at which point your space
> usage will go back down to ~302g.

Sure, I just mentioned the same in another thread. But you said you
performed full defragmentation and I expect it to "fix" this condition
by relocating data and freeing original big extent. If this did not
happen, I wonder what are conditions when defragment decides to (not)
move data.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs space used issue

2018-03-01 Thread vinayak hegde
No, there is no opened file which is deleted, I did umount and mounted
again and reboot also.

I think I am hitting the below issue, lot of random writes were
happening and the file is not fully written and its sparse file.
Let me try with disabling COW.


file offset 0   offset 302g
[-prealloced 302g extent--]

(man it's impressive I got all that lined up right)

On disk you have 2 things. First your file which has file extents which says

inode 256, file offset 0, size 302g, offset0, disk bytenr 123, disklen 302g

and then in the extent tree, who keeps track of actual allocated space has this

extent bytenr 123, len 302g, refs 1

Now say you boot up your virt image and it writes 1 4k block to offset
0. Now you have this

[4k][302g-4k--]

And for your inode you now have this

inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g),
disklen 4k inode 256, file offset 4k, size 302g-4k, offset 4k,
diskbytenr 123, disklen 302g

and in your extent tree you have

extent bytenr 123, len 302g, refs 1
extent bytenr whatever, len 4k, refs 1

See that? Your file is still the same size, it is still 302g. If you
cp'ed it right now it would copy 302g of information. But what you
have actually allocated on disk? Well that's now 302g + 4k. Now lets
say your virt thing decides to write to the middle, lets say at offset
12k, now you have thisinode 256, file offset 0, size 4k, offset 0,
diskebytenr (123+302g), disklen 4k

inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g

inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever,
disklen 4k inode 256, file offset 16k, size 302g - 16k, offset 16k,
diskbytenr 123, disklen 302g

and in the extent tree you have this

extent bytenr 123, len 302g, refs 2
extent bytenr whatever, len 4k, refs 1
extent bytenr notimportant, len 4k, refs 1

See that refs 2 change? We split the original extent, so we have 2
file extents pointing to the same physical extents, so we bumped the
ref count. This will happen over and over again until we have
completely overwritten the original extent, at which point your space
usage will go back down to ~302g.We split big extents with cow, so
unless you've got lots of space to spare or are going to use nodatacow
you should probably not pre-allocate virt images

Vinayak

On Wed, Feb 28, 2018 at 8:52 PM, Andrei Borzenkov  wrote:
> On Wed, Feb 28, 2018 at 9:01 AM, vinayak hegde  
> wrote:
>> I ran full defragement and balance both, but didnt help.
>
> Showing the same information immediately after full defragment would be 
> helpful.
>
>> My created and accounting usage files are matching the du -sh output.
>> But I am not getting why btrfs internals use so much extra space.
>> My worry is, will get no space error earlier than I expect.
>> Is it expected with btrfs internal that it will use so much extra space?
>>
>
> Did you try to reboot? Deleted opened file could well cause this effect.
>
>> Vinayak
>>
>>
>>
>>
>> On Tue, Feb 27, 2018 at 7:24 PM, Austin S. Hemmelgarn
>>  wrote:
>>> On 2018-02-27 08:09, vinayak hegde wrote:

 I am using btrfs, But I am seeing du -sh and df -h showing huge size
 difference on ssd.

 mount:
 /dev/drbd1 on /dc/fileunifier.datacache type btrfs

 (rw,noatime,nodiratime,flushoncommit,discard,nospace_cache,recovery,commit=5,subvolid=5,subvol=/)


 du -sh /dc/fileunifier.datacache/ -  331G

 df -h
 /dev/drbd1  746G  346G  398G  47% /dc/fileunifier.datacache

 btrfs fi usage /dc/fileunifier.datacache/
 Overall:
  Device size: 745.19GiB
  Device allocated: 368.06GiB
  Device unallocated: 377.13GiB
  Device missing: 0.00B
  Used: 346.73GiB
  Free (estimated): 396.36GiB(min: 207.80GiB)
  Data ratio:  1.00
  Metadata ratio:  2.00
  Global reserve: 176.00MiB(used: 0.00B)

 Data,single: Size:365.00GiB, Used:345.76GiB
 /dev/drbd1 365.00GiB

 Metadata,DUP: Size:1.50GiB, Used:493.23MiB
 /dev/drbd1   3.00GiB

 System,DUP: Size:32.00MiB, Used:80.00KiB
 /dev/drbd1  64.00MiB

 Unallocated:
 /dev/drbd1 377.13GiB


 Even if we consider 6G metadata its 331+6 = 337.
 where is 9GB used?

 Please explain.
>>>
>>> First, you're counting the metadata wrong.  The value shown per-device by
>>> `btrfs filesystem usage` already accounts for replication (so it's only 3 GB
>>> of metadata allocated, not 6 GB).  Neither `df` nor `du` looks at the chunk
>>> level allocations though.
>>>
>>> Now, with that out of the way, the discrepancy almost certainly comes form
>>> differences in how `df` and `du` calculate space usage.  In particu