Re: [PATCH] btrfs: fix invalid memory access with journal_info

2018-05-09 Thread robbieko

Omar Sandoval 於 2018-05-10 12:53 寫到:

On Wed, May 09, 2018 at 06:35:25PM +0800, robbieko wrote:

From: Robbie Ko 

When send process requires memory allocation, shrinker may be 
triggered

due to insufficient memory.
Then evict_inode gets called when inode is freed, and this function
may need to start transaction.
However, the journal_info is already points to BTRFS_SEND_TRANS_STUB, 
it

passed the if condition,
and the following use yields illegal memory access.

  if (current->journal_info) {
  WARN_ON(type & TRANS_EXTWRITERS);
  h = current->journal_info;
  refcount_inc(>use_count);
  WARN_ON(refcount_read(>use_count) > 2);
  h->orig_rsv = h->block_rsv;
  h->block_rsv = NULL;
  goto got_it;
  }


start_transaction() has

ASSERT(current->journal_info != BTRFS_SEND_TRANS_STUB);


I didn't turn on the configuration CONFIG_BTRFS_ASSERT, so this ASSERT 
has no effect


4506 #ifdef CONFIG_BTRFS_ASSERT
4507
4508 static inline void assfail(char *expr, char *file, int line)
4509 {
4510 pr_err("BTRFS: assertion failed: %s, file: %s, line: %d",
4511expr, file, line);
4512 BUG();
4513 }
4514
4515 #define ASSERT(expr)\
4516 (likely(expr) ? (void)0 : assfail(#expr, __FILE__, __LINE__))
4517 #else
4518 #define ASSERT(expr)((void)0)
4519 #endif




Are you saying that's wrong? Are there other cases where the shrinker
can end up starting a transaction?


I'm not sure if there are other cases where shrinker might trigger 
start_transaction.

But I confirm that btrfs_evict_inode triggers strat_transaction





Direct IO has a similar problem, journal_info will store 
btrfs_dio_data,

which will lead to illegal memory access.


I have patches getting rid of this for direct I/O here:
https://github.com/osandov/linux/tree/btrfs-journal-info-abuse

I originally did that for btrfs swapfile support, but if it actually
fixes an existing bug it should be easy to get merged.


We fixed the problem by save the journal_info and restore afterwards.

CallTrace looks like this:
 BUG: unable to handle kernel NULL pointer dereference at 
0021

 IP: [] start_transaction+0x64/0x450 [btrfs]
 PGD 8fea4b067 PUD a33bea067 PMD 0
 Oops:  [#1] SMP
 CPU: 3 PID: 12681 Comm: btrfs Tainted: P C O 3.10.102 #15266
 RIP: 0010:[] start_transaction+0x64/0x450 [btrfs]
 Call Trace:
 [] ? btrfs_evict_inode+0x3d8/0x580 [btrfs]
 [] ? evict+0xa2/0x1a0
 [] ? shrink_dentry_list+0x308/0x3d0
 [] ? prune_dcache_sb+0x133/0x160
 [] ? prune_super+0xcf/0x1a0
 [] ? shrink_slab+0x11f/0x1d0
 [] ? do_try_to_free_pages+0x452/0x560
 [] ? throttle_direct_reclaim+0x74/0x240
 [] ? try_to_free_pages+0xae/0xc0
 [] ? __alloc_pages_nodemask+0x53b/0x9f0
 [] ? __do_page_cache_readahead+0xec/0x270
 [] ? ondemand_readahead+0xbb/0x220
 [] ? fill_read_buf+0x2b3/0x3a0 [btrfs]
 [] ? send_extent_data+0x10e/0x300 [btrfs]
 [] ? process_extent+0x1fb/0x1310 [btrfs]
 [] ? iterate_dir_item.isra.28+0x1b0/0x250 [btrfs]
 [] ? send_set_xattr+0xa0/0xa0 [btrfs]
 [] ? changed_cb+0xd5/0xc40 [btrfs]
 [] ? full_send_tree+0xf2/0x1a0 [btrfs]
 [] ? btrfs_ioctl_send+0xfbb/0x1040 [btrfs]
 [] ? btrfs_ioctl+0x1084/0x32a0 [btrfs]

Signed-off-by: Robbie Ko 
---
 fs/btrfs/inode.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f534701..77aec8d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5295,6 +5295,7 @@ void btrfs_evict_inode(struct inode *inode)
int steal_from_global = 0;
u64 min_size;
int ret;
+   void *journal_info = NULL;


This initialization isn't necessary.


trace_btrfs_inode_evict(inode);

@@ -5303,6 +5304,16 @@ void btrfs_evict_inode(struct inode *inode)
return;
}

+   /*
+* Send or Direct IO may store information in journal_info.
+* However, this function may use start_transaction and
+* start_transaction will use journal_info.
+* To avoid accessing invalid memory, we can save the journal_info
+* and restore it later.
+*/
+   journal_info = current->journal_info;
+   current->journal_info = NULL;
+
min_size = btrfs_calc_trunc_metadata_size(fs_info, 1);

evict_inode_truncate_pages(inode);
@@ -5462,6 +5473,7 @@ void btrfs_evict_inode(struct inode *inode)
 no_delete:
btrfs_remove_delayed_node(BTRFS_I(inode));
clear_inode(inode);
+   current->journal_info = journal_info;
 }

 /*
--
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" 
in

the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" 
in

the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the 

Re: [PATCH] btrfs: fix invalid memory access with journal_info

2018-05-09 Thread Omar Sandoval
On Wed, May 09, 2018 at 06:35:25PM +0800, robbieko wrote:
> From: Robbie Ko 
> 
> When send process requires memory allocation, shrinker may be triggered
> due to insufficient memory.
> Then evict_inode gets called when inode is freed, and this function
> may need to start transaction.
> However, the journal_info is already points to BTRFS_SEND_TRANS_STUB, it
> passed the if condition,
> and the following use yields illegal memory access.
> 
>   if (current->journal_info) {
>   WARN_ON(type & TRANS_EXTWRITERS);
>   h = current->journal_info;
>   refcount_inc(>use_count);
>   WARN_ON(refcount_read(>use_count) > 2);
>   h->orig_rsv = h->block_rsv;
>   h->block_rsv = NULL;
>   goto got_it;
>   }

start_transaction() has

ASSERT(current->journal_info != BTRFS_SEND_TRANS_STUB);

Are you saying that's wrong? Are there other cases where the shrinker
can end up starting a transaction?

> Direct IO has a similar problem, journal_info will store btrfs_dio_data,
> which will lead to illegal memory access.

I have patches getting rid of this for direct I/O here:
https://github.com/osandov/linux/tree/btrfs-journal-info-abuse

I originally did that for btrfs swapfile support, but if it actually
fixes an existing bug it should be easy to get merged.

> We fixed the problem by save the journal_info and restore afterwards.
> 
> CallTrace looks like this:
>  BUG: unable to handle kernel NULL pointer dereference at 0021
>  IP: [] start_transaction+0x64/0x450 [btrfs]
>  PGD 8fea4b067 PUD a33bea067 PMD 0
>  Oops:  [#1] SMP
>  CPU: 3 PID: 12681 Comm: btrfs Tainted: P C O 3.10.102 #15266
>  RIP: 0010:[] start_transaction+0x64/0x450 [btrfs]
>  Call Trace:
>  [] ? btrfs_evict_inode+0x3d8/0x580 [btrfs]
>  [] ? evict+0xa2/0x1a0
>  [] ? shrink_dentry_list+0x308/0x3d0
>  [] ? prune_dcache_sb+0x133/0x160
>  [] ? prune_super+0xcf/0x1a0
>  [] ? shrink_slab+0x11f/0x1d0
>  [] ? do_try_to_free_pages+0x452/0x560
>  [] ? throttle_direct_reclaim+0x74/0x240
>  [] ? try_to_free_pages+0xae/0xc0
>  [] ? __alloc_pages_nodemask+0x53b/0x9f0
>  [] ? __do_page_cache_readahead+0xec/0x270
>  [] ? ondemand_readahead+0xbb/0x220
>  [] ? fill_read_buf+0x2b3/0x3a0 [btrfs]
>  [] ? send_extent_data+0x10e/0x300 [btrfs]
>  [] ? process_extent+0x1fb/0x1310 [btrfs]
>  [] ? iterate_dir_item.isra.28+0x1b0/0x250 [btrfs]
>  [] ? send_set_xattr+0xa0/0xa0 [btrfs]
>  [] ? changed_cb+0xd5/0xc40 [btrfs]
>  [] ? full_send_tree+0xf2/0x1a0 [btrfs]
>  [] ? btrfs_ioctl_send+0xfbb/0x1040 [btrfs]
>  [] ? btrfs_ioctl+0x1084/0x32a0 [btrfs]
> 
> Signed-off-by: Robbie Ko 
> ---
>  fs/btrfs/inode.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index f534701..77aec8d 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -5295,6 +5295,7 @@ void btrfs_evict_inode(struct inode *inode)
>   int steal_from_global = 0;
>   u64 min_size;
>   int ret;
> + void *journal_info = NULL;

This initialization isn't necessary.

>   trace_btrfs_inode_evict(inode);
>  
> @@ -5303,6 +5304,16 @@ void btrfs_evict_inode(struct inode *inode)
>   return;
>   }
>  
> + /*
> +  * Send or Direct IO may store information in journal_info.
> +  * However, this function may use start_transaction and
> +  * start_transaction will use journal_info.
> +  * To avoid accessing invalid memory, we can save the journal_info
> +  * and restore it later.
> +  */
> + journal_info = current->journal_info;
> + current->journal_info = NULL;
> +
>   min_size = btrfs_calc_trunc_metadata_size(fs_info, 1);
>  
>   evict_inode_truncate_pages(inode);
> @@ -5462,6 +5473,7 @@ void btrfs_evict_inode(struct inode *inode)
>  no_delete:
>   btrfs_remove_delayed_node(BTRFS_I(inode));
>   clear_inode(inode);
> + current->journal_info = journal_info;
>  }
>  
>  /*
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs-progs: print-tree: Enhance btrfs_print_tree() check to avoid out-of-boundary memory access

2018-05-09 Thread Qu Wenruo
For btrfs_print_tree(), if nr_items is corrupted, it can easily go
beyond extent buffer boundary.

Add extra nr_item check, and only print as many valid slots as possible.

Signed-off-by: Qu Wenruo 
---
changelog:
v2:
  Use better loop condition suggested by Su.
---
 print-tree.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/print-tree.c b/print-tree.c
index 31a851ef4413..1c2533a9678d 100644
--- a/print-tree.c
+++ b/print-tree.c
@@ -1364,6 +1364,7 @@ void btrfs_print_tree(struct extent_buffer *eb, int 
follow)
 {
u32 i;
u32 nr;
+   u32 ptr_num;
struct btrfs_fs_info *fs_info = eb->fs_info;
struct btrfs_disk_key disk_key;
struct btrfs_key key;
@@ -1376,6 +1377,11 @@ void btrfs_print_tree(struct extent_buffer *eb, int 
follow)
btrfs_print_leaf(eb);
return;
}
+   /* We are crossing eb boundary, this node must be corrupted */
+   if (nr > BTRFS_NODEPTRS_PER_EXTENT_BUFFER(eb))
+   warning(
+   "node nr_items corrupted, has %u limit %u, continue print 
anyway",
+   nr, BTRFS_NODEPTRS_PER_EXTENT_BUFFER(eb));
printf("node %llu level %d items %d free %u generation %llu owner ",
   (unsigned long long)eb->start,
btrfs_header_level(eb), nr,
@@ -1385,8 +1391,10 @@ void btrfs_print_tree(struct extent_buffer *eb, int 
follow)
printf("\n");
print_uuids(eb);
fflush(stdout);
-   for (i = 0; i < nr; i++) {
+   ptr_num = BTRFS_NODEPTRS_PER_EXTENT_BUFFER(eb);
+   for (i = 0; i < nr && i < ptr_num; i++) {
u64 blocknr = btrfs_node_blockptr(eb, i);
+
btrfs_node_key(eb, _key, i);
btrfs_disk_key_to_cpu(, _key);
printf("\t");
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs: return original error code when failing from option parsing

2018-05-09 Thread Chengguang Xu
It's no good to overwrite -ENOMEM using -EINVAL when failing
from mount option parsing, so just return original error code.

Signed-off-by: Chengguang Xu 
Reviewed-by: David Sterba 
Reviewed-by: Qu Wenruo 
---
v1->v2:
- Remove bracket for single line branch.

 fs/btrfs/super.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 0628092..c67fafa 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1782,10 +1782,8 @@ static int btrfs_remount(struct super_block *sb, int 
*flags, char *data)
}
 
ret = btrfs_parse_options(fs_info, data, *flags);
-   if (ret) {
-   ret = -EINVAL;
+   if (ret)
goto restore;
-   }
 
btrfs_remount_begin(fs_info, old_opts, *flags);
btrfs_resize_thread_pool(fs_info,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs-progs: check: Make btrfs check return error for qgroup mismatch

2018-05-09 Thread Qu Wenruo


On 2018年05月09日 01:45, David Sterba wrote:
> On Mon, Apr 30, 2018 at 02:16:59PM +0800, Qu Wenruo wrote:
>> Current btrfs-check will check qgroup consistency, but even when it
>> finds something wrong, the return value is still 0.
>>
>> Fix it by allowing report_qgroups() to return int to indicate qgroup
>> mismatch, and also add extra logical to return no error if qgroup repair
>> is successful.
>>
>> Without this patch, fstests can't detect qgroup corruption by its fsck
>> alone.
>>
>> Signed-off-by: Qu Wenruo 
> 
> Applied, thanks. Please send the 0/N (cover letter) mail even for
> 2-patch series.

Any advice on the necessity of the cover letter?

Is cover letter highly recommended for any patches which have any
dependency like the 2nd test case?

Thanks,
Qu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] btrfs: return original error code when failing from option parsing

2018-05-09 Thread Qu Wenruo


On 2018年05月10日 08:34, Qu Wenruo wrote:
> 
> 
> On 2018年05月09日 21:08, Chengguang Xu wrote:
>> It's no good to overwrite -ENOMEM using -EINVAL when failing
>> from mount option parsing, so just return original error code.
>>
>> Signed-off-by: Chengguang Xu 
> 
> Reviewed-by: Qu Wenruo 
> 
> Thanks,
> Qu
> 
>> ---
>>  fs/btrfs/super.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index 0628092..ae6447d 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -1783,7 +1783,6 @@ static int btrfs_remount(struct super_block *sb, int 
>> *flags, char *data)
>>  
>>  ret = btrfs_parse_options(fs_info, data, *flags);
>>  if (ret) {
>> -ret = -EINVAL;
>>  goto restore;
>>  }

Just a small nitpick, for single line branch, the bracket should be
removed according to code style.

Thanks,
Qu

>>  
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] btrfs: return original error code when failing from option parsing

2018-05-09 Thread Qu Wenruo


On 2018年05月09日 21:08, Chengguang Xu wrote:
> It's no good to overwrite -ENOMEM using -EINVAL when failing
> from mount option parsing, so just return original error code.
> 
> Signed-off-by: Chengguang Xu 

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> ---
>  fs/btrfs/super.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 0628092..ae6447d 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1783,7 +1783,6 @@ static int btrfs_remount(struct super_block *sb, int 
> *flags, char *data)
>  
>   ret = btrfs_parse_options(fs_info, data, *flags);
>   if (ret) {
> - ret = -EINVAL;
>   goto restore;
>   }
>  
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] fs: hoist BTRFS_IOC_[SG]ET_FSLABEL to vfs

2018-05-09 Thread David Sterba
On Wed, May 09, 2018 at 11:01:21AM -0500, Eric Sandeen wrote:
> Move the btrfs label ioctls up to the vfs for general use.
> 
> This retains 256 chars as the maximum size through the interface, which
> is the btrfs limit and AFAIK exceeds any other filesystem's maximum
> label size.
> 
> Signed-off-by: Eric Sandeen 

The btrfs changes and new ioctl naming looks good to me,

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: hoist BTRFS_IOC_[SG]ET_FSLABEL to vfs

2018-05-09 Thread Eric Sandeen


On 5/9/18 12:35 PM, Randy Dunlap wrote:
> On 05/09/2018 09:01 AM, Eric Sandeen wrote:
>> Move the btrfs label ioctls up to the vfs for general use.
>>
>> This retains 256 chars as the maximum size through the interface, which
>> is the btrfs limit and AFAIK exceeds any other filesystem's maximum
>> label size.
>>
>> Signed-off-by: Eric Sandeen 
>> ---
>>
>> Let the bikeshedding on the exact ioctl name begin ;)
>>
>>  fs/btrfs/ioctl.c   | 8 
>>  include/uapi/linux/btrfs.h | 6 ++
>>  include/uapi/linux/fs.h| 8 ++--
>>  3 files changed, 12 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 632e26d..2dd4cdf 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -5444,6 +5444,10 @@ long btrfs_ioctl(struct file *file, unsigned int
>>  return btrfs_ioctl_setflags(file, argp);
>>  case FS_IOC_GETVERSION:
>>  return btrfs_ioctl_getversion(file, argp);
>> +case FS_IOC_GETFSLABEL:
>> +return btrfs_ioctl_get_fslabel(file, argp);
>> +case FS_IOC_SETFSLABEL:
>> +return btrfs_ioctl_set_fslabel(file, argp);
>>  case FITRIM:
>>  return btrfs_ioctl_fitrim(file, argp);
>>  case BTRFS_IOC_SNAP_CREATE:
>> @@ -,10 +5559,6 @@ long btrfs_ioctl(struct file *file, unsigned int
>>  return btrfs_ioctl_quota_rescan_wait(file, argp);
>>  case BTRFS_IOC_DEV_REPLACE:
>>  return btrfs_ioctl_dev_replace(fs_info, argp);
>> -case BTRFS_IOC_GET_FSLABEL:
>> -return btrfs_ioctl_get_fslabel(file, argp);
>> -case BTRFS_IOC_SET_FSLABEL:
>> -return btrfs_ioctl_set_fslabel(file, argp);
>>  case BTRFS_IOC_GET_SUPPORTED_FEATURES:
>>  return btrfs_ioctl_get_supported_features(argp);
>>  case BTRFS_IOC_GET_FEATURES:
>> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
>> index c8d99b9..ec611c8 100644
>> --- a/include/uapi/linux/btrfs.h
>> +++ b/include/uapi/linux/btrfs.h
>> @@ -823,10 +823,8 @@ enum btrfs_err_code {
>>  #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \
>> struct btrfs_ioctl_quota_rescan_args)
>>  #define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46)
>> -#define BTRFS_IOC_GET_FSLABEL _IOR(BTRFS_IOCTL_MAGIC, 49, \
>> -   char[BTRFS_LABEL_SIZE])
>> -#define BTRFS_IOC_SET_FSLABEL _IOW(BTRFS_IOCTL_MAGIC, 50, \
>> -   char[BTRFS_LABEL_SIZE])
>> +#define BTRFS_IOC_GET_FSLABEL   FS_IOC_GETFSLABEL
>> +#define BTRFS_IOC_SET_FSLABEL   FS_IOC_SETFSLABEL
>>  #define BTRFS_IOC_GET_DEV_STATS _IOWR(BTRFS_IOCTL_MAGIC, 52, \
>>struct btrfs_ioctl_get_dev_stats)
>>  #define BTRFS_IOC_DEV_REPLACE _IOWR(BTRFS_IOCTL_MAGIC, 53, \
>> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
>> index d2a8313..1df3707 100644
>> --- a/include/uapi/linux/fs.h
>> +++ b/include/uapi/linux/fs.h
>> @@ -242,6 +242,8 @@ struct fsxattr {
>>  #define FICLONERANGE_IOW(0x94, 13, struct file_clone_range)
>>  #define FIDEDUPERANGE   _IOWR(0x94, 54, struct file_dedupe_range)
>>  
>> +#define FSLABEL_MAX 256 /* Max chars for the interface; each fs may 
>> differ */
>> +
>>  #define FS_IOC_GETFLAGS _IOR('f', 1, long)
>>  #define FS_IOC_SETFLAGS _IOW('f', 2, long)
>>  #define FS_IOC_GETVERSION   _IOR('v', 1, long)
>> @@ -251,8 +253,10 @@ struct fsxattr {
>>  #define FS_IOC32_SETFLAGS   _IOW('f', 2, int)
>>  #define FS_IOC32_GETVERSION _IOR('v', 1, int)
>>  #define FS_IOC32_SETVERSION _IOW('v', 2, int)
>> -#define FS_IOC_FSGETXATTR   _IOR ('X', 31, struct fsxattr)
>> -#define FS_IOC_FSSETXATTR   _IOW ('X', 32, struct fsxattr)
>> +#define FS_IOC_FSGETXATTR   _IOR('X', 31, struct fsxattr)
>> +#define FS_IOC_FSSETXATTR   _IOW('X', 32, struct fsxattr)
>> +#define FS_IOC_GETFSLABEL   _IOR(0x94, 49, char[FSLABEL_MAX])
>> +#define FS_IOC_SETFSLABEL   _IOW(0x94, 50, char[FSLABEL_MAX])
> 
> Also update Documentation/ioctl/ioctl-number.txt, where it says that 0x94:all
> are used for btrfs:
> 
> 0x94  all fs/btrfs/ioctl.h
> 
> AFAICT 0x94 is now split between vfs and btrfs.  Please correct me if I
> misunderstand.

It is split, though it has been for a while now, see also:

#define FICLONE _IOW(0x94, 9, int)
#define FICLONERANGE_IOW(0x94, 13, struct file_clone_range)
#define FIDEDUPERANGE   _IOWR(0x94, 54, struct file_dedupe_range)

but sure, I can send another patch for that on the next round.

Thanks,
-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: hoist BTRFS_IOC_[SG]ET_FSLABEL to vfs

2018-05-09 Thread Randy Dunlap
On 05/09/2018 09:01 AM, Eric Sandeen wrote:
> Move the btrfs label ioctls up to the vfs for general use.
> 
> This retains 256 chars as the maximum size through the interface, which
> is the btrfs limit and AFAIK exceeds any other filesystem's maximum
> label size.
> 
> Signed-off-by: Eric Sandeen 
> ---
> 
> Let the bikeshedding on the exact ioctl name begin ;)
> 
>  fs/btrfs/ioctl.c   | 8 
>  include/uapi/linux/btrfs.h | 6 ++
>  include/uapi/linux/fs.h| 8 ++--
>  3 files changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 632e26d..2dd4cdf 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -5444,6 +5444,10 @@ long btrfs_ioctl(struct file *file, unsigned int
>   return btrfs_ioctl_setflags(file, argp);
>   case FS_IOC_GETVERSION:
>   return btrfs_ioctl_getversion(file, argp);
> + case FS_IOC_GETFSLABEL:
> + return btrfs_ioctl_get_fslabel(file, argp);
> + case FS_IOC_SETFSLABEL:
> + return btrfs_ioctl_set_fslabel(file, argp);
>   case FITRIM:
>   return btrfs_ioctl_fitrim(file, argp);
>   case BTRFS_IOC_SNAP_CREATE:
> @@ -,10 +5559,6 @@ long btrfs_ioctl(struct file *file, unsigned int
>   return btrfs_ioctl_quota_rescan_wait(file, argp);
>   case BTRFS_IOC_DEV_REPLACE:
>   return btrfs_ioctl_dev_replace(fs_info, argp);
> - case BTRFS_IOC_GET_FSLABEL:
> - return btrfs_ioctl_get_fslabel(file, argp);
> - case BTRFS_IOC_SET_FSLABEL:
> - return btrfs_ioctl_set_fslabel(file, argp);
>   case BTRFS_IOC_GET_SUPPORTED_FEATURES:
>   return btrfs_ioctl_get_supported_features(argp);
>   case BTRFS_IOC_GET_FEATURES:
> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
> index c8d99b9..ec611c8 100644
> --- a/include/uapi/linux/btrfs.h
> +++ b/include/uapi/linux/btrfs.h
> @@ -823,10 +823,8 @@ enum btrfs_err_code {
>  #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \
>  struct btrfs_ioctl_quota_rescan_args)
>  #define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46)
> -#define BTRFS_IOC_GET_FSLABEL _IOR(BTRFS_IOCTL_MAGIC, 49, \
> -char[BTRFS_LABEL_SIZE])
> -#define BTRFS_IOC_SET_FSLABEL _IOW(BTRFS_IOCTL_MAGIC, 50, \
> -char[BTRFS_LABEL_SIZE])
> +#define BTRFS_IOC_GET_FSLABELFS_IOC_GETFSLABEL
> +#define BTRFS_IOC_SET_FSLABELFS_IOC_SETFSLABEL
>  #define BTRFS_IOC_GET_DEV_STATS _IOWR(BTRFS_IOCTL_MAGIC, 52, \
> struct btrfs_ioctl_get_dev_stats)
>  #define BTRFS_IOC_DEV_REPLACE _IOWR(BTRFS_IOCTL_MAGIC, 53, \
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index d2a8313..1df3707 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -242,6 +242,8 @@ struct fsxattr {
>  #define FICLONERANGE _IOW(0x94, 13, struct file_clone_range)
>  #define FIDEDUPERANGE_IOWR(0x94, 54, struct file_dedupe_range)
>  
> +#define FSLABEL_MAX 256  /* Max chars for the interface; each fs may 
> differ */
> +
>  #define  FS_IOC_GETFLAGS _IOR('f', 1, long)
>  #define  FS_IOC_SETFLAGS _IOW('f', 2, long)
>  #define  FS_IOC_GETVERSION   _IOR('v', 1, long)
> @@ -251,8 +253,10 @@ struct fsxattr {
>  #define FS_IOC32_SETFLAGS_IOW('f', 2, int)
>  #define FS_IOC32_GETVERSION  _IOR('v', 1, int)
>  #define FS_IOC32_SETVERSION  _IOW('v', 2, int)
> -#define FS_IOC_FSGETXATTR_IOR ('X', 31, struct fsxattr)
> -#define FS_IOC_FSSETXATTR_IOW ('X', 32, struct fsxattr)
> +#define FS_IOC_FSGETXATTR_IOR('X', 31, struct fsxattr)
> +#define FS_IOC_FSSETXATTR_IOW('X', 32, struct fsxattr)
> +#define FS_IOC_GETFSLABEL_IOR(0x94, 49, char[FSLABEL_MAX])
> +#define FS_IOC_SETFSLABEL_IOW(0x94, 50, char[FSLABEL_MAX])

Also update Documentation/ioctl/ioctl-number.txt, where it says that 0x94:all
are used for btrfs:

0x94all fs/btrfs/ioctl.h

AFAICT 0x94 is now split between vfs and btrfs.  Please correct me if I
misunderstand.


thanks,
-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: hoist BTRFS_IOC_[SG]ET_FSLABEL to vfs

2018-05-09 Thread Darrick J. Wong
On Wed, May 09, 2018 at 11:15:46AM -0600, Andreas Dilger wrote:
> On May 9, 2018, at 10:10 AM, Darrick J. Wong  wrote:
> > 
> > On Wed, May 09, 2018 at 11:01:21AM -0500, Eric Sandeen wrote:
> >> Move the btrfs label ioctls up to the vfs for general use.
> >> 
> >> This retains 256 chars as the maximum size through the interface, which
> >> is the btrfs limit and AFAIK exceeds any other filesystem's maximum
> >> label size.
> >> 
> >> Signed-off-by: Eric Sandeen 
> >> ---
> >> 
> >> Let the bikeshedding on the exact ioctl name begin ;)
> >> 
> >> fs/btrfs/ioctl.c   | 8 
> >> include/uapi/linux/btrfs.h | 6 ++
> >> include/uapi/linux/fs.h| 8 ++--
> >> 3 files changed, 12 insertions(+), 10 deletions(-)
> >> 
> >> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> >> index 632e26d..2dd4cdf 100644
> >> --- a/fs/btrfs/ioctl.c
> >> +++ b/fs/btrfs/ioctl.c
> >> @@ -5444,6 +5444,10 @@ long btrfs_ioctl(struct file *file, unsigned int
> >>return btrfs_ioctl_setflags(file, argp);
> >>case FS_IOC_GETVERSION:
> >>return btrfs_ioctl_getversion(file, argp);
> >> +  case FS_IOC_GETFSLABEL:
> >> +  return btrfs_ioctl_get_fslabel(file, argp);
> >> +  case FS_IOC_SETFSLABEL:
> >> +  return btrfs_ioctl_set_fslabel(file, argp);
> >>case FITRIM:
> >>return btrfs_ioctl_fitrim(file, argp);
> >>case BTRFS_IOC_SNAP_CREATE:
> >> @@ -,10 +5559,6 @@ long btrfs_ioctl(struct file *file, unsigned int
> >>return btrfs_ioctl_quota_rescan_wait(file, argp);
> >>case BTRFS_IOC_DEV_REPLACE:
> >>return btrfs_ioctl_dev_replace(fs_info, argp);
> >> -  case BTRFS_IOC_GET_FSLABEL:
> >> -  return btrfs_ioctl_get_fslabel(file, argp);
> >> -  case BTRFS_IOC_SET_FSLABEL:
> >> -  return btrfs_ioctl_set_fslabel(file, argp);
> >>case BTRFS_IOC_GET_SUPPORTED_FEATURES:
> >>return btrfs_ioctl_get_supported_features(argp);
> >>case BTRFS_IOC_GET_FEATURES:
> >> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
> >> index c8d99b9..ec611c8 100644
> >> --- a/include/uapi/linux/btrfs.h
> >> +++ b/include/uapi/linux/btrfs.h
> >> @@ -823,10 +823,8 @@ enum btrfs_err_code {
> >> #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \
> >>   struct btrfs_ioctl_quota_rescan_args)
> >> #define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46)
> >> -#define BTRFS_IOC_GET_FSLABEL _IOR(BTRFS_IOCTL_MAGIC, 49, \
> >> - char[BTRFS_LABEL_SIZE])
> >> -#define BTRFS_IOC_SET_FSLABEL _IOW(BTRFS_IOCTL_MAGIC, 50, \
> >> - char[BTRFS_LABEL_SIZE])
> >> +#define BTRFS_IOC_GET_FSLABEL FS_IOC_GETFSLABEL
> >> +#define BTRFS_IOC_SET_FSLABEL FS_IOC_SETFSLABEL
> >> #define BTRFS_IOC_GET_DEV_STATS _IOWR(BTRFS_IOCTL_MAGIC, 52, \
> >>  struct btrfs_ioctl_get_dev_stats)
> >> #define BTRFS_IOC_DEV_REPLACE _IOWR(BTRFS_IOCTL_MAGIC, 53, \
> >> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> >> index d2a8313..1df3707 100644
> >> --- a/include/uapi/linux/fs.h
> >> +++ b/include/uapi/linux/fs.h
> >> @@ -242,6 +242,8 @@ struct fsxattr {
> >> #define FICLONERANGE   _IOW(0x94, 13, struct file_clone_range)
> >> #define FIDEDUPERANGE  _IOWR(0x94, 54, struct file_dedupe_range)
> >> 
> >> +#define FSLABEL_MAX 256   /* Max chars for the interface; each fs may 
> >> differ */
> >> +
> >> #defineFS_IOC_GETFLAGS _IOR('f', 1, long)
> >> #defineFS_IOC_SETFLAGS _IOW('f', 2, long)
> >> #defineFS_IOC_GETVERSION   _IOR('v', 1, long)
> >> @@ -251,8 +253,10 @@ struct fsxattr {
> >> #define FS_IOC32_SETFLAGS  _IOW('f', 2, int)
> >> #define FS_IOC32_GETVERSION_IOR('v', 1, int)
> >> #define FS_IOC32_SETVERSION_IOW('v', 2, int)
> >> -#define FS_IOC_FSGETXATTR _IOR ('X', 31, struct fsxattr)
> >> -#define FS_IOC_FSSETXATTR _IOW ('X', 32, struct fsxattr)
> >> +#define FS_IOC_FSGETXATTR _IOR('X', 31, struct fsxattr)
> >> +#define FS_IOC_FSSETXATTR _IOW('X', 32, struct fsxattr)
> > 
> > Separate patch for whitespace cleanup.
> 
> Really?  I've heard Ted complain the other way, that whitespace cleanup
> by itself is useless and should only be done as part of other changes.
> As long as it is not overwhelming the rest of the patch I don't see an
> issue with a minor improvement being part of another patch.

I really only meant this as: put the whitespace changes in a second
patch after this one so that we don't have a patch that implements two
different changes, but fmeh, tired of discussing this.

--D

> Otherwise, no bikeshedding from me.  Looks very reasonable, and the 256-char
> limit is definitely large enough IMHO (also matches the normal filename size
> limit, so if the label is used as a pathname component there are no 

Re: [PATCH 1/2] fs: hoist BTRFS_IOC_[SG]ET_FSLABEL to vfs

2018-05-09 Thread Andreas Dilger
On May 9, 2018, at 10:10 AM, Darrick J. Wong  wrote:
> 
> On Wed, May 09, 2018 at 11:01:21AM -0500, Eric Sandeen wrote:
>> Move the btrfs label ioctls up to the vfs for general use.
>> 
>> This retains 256 chars as the maximum size through the interface, which
>> is the btrfs limit and AFAIK exceeds any other filesystem's maximum
>> label size.
>> 
>> Signed-off-by: Eric Sandeen 
>> ---
>> 
>> Let the bikeshedding on the exact ioctl name begin ;)
>> 
>> fs/btrfs/ioctl.c   | 8 
>> include/uapi/linux/btrfs.h | 6 ++
>> include/uapi/linux/fs.h| 8 ++--
>> 3 files changed, 12 insertions(+), 10 deletions(-)
>> 
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 632e26d..2dd4cdf 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -5444,6 +5444,10 @@ long btrfs_ioctl(struct file *file, unsigned int
>>  return btrfs_ioctl_setflags(file, argp);
>>  case FS_IOC_GETVERSION:
>>  return btrfs_ioctl_getversion(file, argp);
>> +case FS_IOC_GETFSLABEL:
>> +return btrfs_ioctl_get_fslabel(file, argp);
>> +case FS_IOC_SETFSLABEL:
>> +return btrfs_ioctl_set_fslabel(file, argp);
>>  case FITRIM:
>>  return btrfs_ioctl_fitrim(file, argp);
>>  case BTRFS_IOC_SNAP_CREATE:
>> @@ -,10 +5559,6 @@ long btrfs_ioctl(struct file *file, unsigned int
>>  return btrfs_ioctl_quota_rescan_wait(file, argp);
>>  case BTRFS_IOC_DEV_REPLACE:
>>  return btrfs_ioctl_dev_replace(fs_info, argp);
>> -case BTRFS_IOC_GET_FSLABEL:
>> -return btrfs_ioctl_get_fslabel(file, argp);
>> -case BTRFS_IOC_SET_FSLABEL:
>> -return btrfs_ioctl_set_fslabel(file, argp);
>>  case BTRFS_IOC_GET_SUPPORTED_FEATURES:
>>  return btrfs_ioctl_get_supported_features(argp);
>>  case BTRFS_IOC_GET_FEATURES:
>> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
>> index c8d99b9..ec611c8 100644
>> --- a/include/uapi/linux/btrfs.h
>> +++ b/include/uapi/linux/btrfs.h
>> @@ -823,10 +823,8 @@ enum btrfs_err_code {
>> #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \
>> struct btrfs_ioctl_quota_rescan_args)
>> #define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46)
>> -#define BTRFS_IOC_GET_FSLABEL _IOR(BTRFS_IOCTL_MAGIC, 49, \
>> -   char[BTRFS_LABEL_SIZE])
>> -#define BTRFS_IOC_SET_FSLABEL _IOW(BTRFS_IOCTL_MAGIC, 50, \
>> -   char[BTRFS_LABEL_SIZE])
>> +#define BTRFS_IOC_GET_FSLABEL   FS_IOC_GETFSLABEL
>> +#define BTRFS_IOC_SET_FSLABEL   FS_IOC_SETFSLABEL
>> #define BTRFS_IOC_GET_DEV_STATS _IOWR(BTRFS_IOCTL_MAGIC, 52, \
>>struct btrfs_ioctl_get_dev_stats)
>> #define BTRFS_IOC_DEV_REPLACE _IOWR(BTRFS_IOCTL_MAGIC, 53, \
>> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
>> index d2a8313..1df3707 100644
>> --- a/include/uapi/linux/fs.h
>> +++ b/include/uapi/linux/fs.h
>> @@ -242,6 +242,8 @@ struct fsxattr {
>> #define FICLONERANGE _IOW(0x94, 13, struct file_clone_range)
>> #define FIDEDUPERANGE_IOWR(0x94, 54, struct file_dedupe_range)
>> 
>> +#define FSLABEL_MAX 256 /* Max chars for the interface; each fs may 
>> differ */
>> +
>> #define  FS_IOC_GETFLAGS _IOR('f', 1, long)
>> #define  FS_IOC_SETFLAGS _IOW('f', 2, long)
>> #define  FS_IOC_GETVERSION   _IOR('v', 1, long)
>> @@ -251,8 +253,10 @@ struct fsxattr {
>> #define FS_IOC32_SETFLAGS_IOW('f', 2, int)
>> #define FS_IOC32_GETVERSION  _IOR('v', 1, int)
>> #define FS_IOC32_SETVERSION  _IOW('v', 2, int)
>> -#define FS_IOC_FSGETXATTR   _IOR ('X', 31, struct fsxattr)
>> -#define FS_IOC_FSSETXATTR   _IOW ('X', 32, struct fsxattr)
>> +#define FS_IOC_FSGETXATTR   _IOR('X', 31, struct fsxattr)
>> +#define FS_IOC_FSSETXATTR   _IOW('X', 32, struct fsxattr)
> 
> Separate patch for whitespace cleanup.

Really?  I've heard Ted complain the other way, that whitespace cleanup
by itself is useless and should only be done as part of other changes.
As long as it is not overwhelming the rest of the patch I don't see an
issue with a minor improvement being part of another patch.

Otherwise, no bikeshedding from me.  Looks very reasonable, and the 256-char
limit is definitely large enough IMHO (also matches the normal filename size
limit, so if the label is used as a pathname component there are no added
restrictions).

Reviewed-by: Andreas Dilger 

Cheers, Andreas

>> +#define FS_IOC_GETFSLABEL   _IOR(0x94, 49, char[FSLABEL_MAX])
>> +#define FS_IOC_SETFSLABEL   _IOW(0x94, 50, char[FSLABEL_MAX])
> 
> Looks ok otherwise,
> Reviewed-by: Darrick J. Wong 
> 
> --D
> 
>> 
>> /*
>>  * File system encryption support
>> --
>> 1.8.3.1
>> 
>> 

Re: [PATCH] test online label ioctl

2018-05-09 Thread Eric Sandeen
On 5/9/18 10:49 AM, Eryu Guan wrote:
> On Mon, Apr 30, 2018 at 04:43:18PM -0500, Eric Sandeen wrote:
>> This tests the online label ioctl that btrfs has, which has been
>> recently proposed for XFS.
>>
>> To run, it requires an updated xfs_io with the label command and a
>> filesystem that supports it
>>
>> A slight change here to _require_xfs_io_command as well, so that tests
>> which simply fail with "Inappropriate ioctl" can be caught in the
>> common case.
>>
>> Signed-off-by: Eric Sandeen 
>> ---
>>

...

>> +# And that it succeeds right at the filesystem max:
>> +case $FSTYP in
>> +xfs)
>> +MAXLEN=12;
>> +;;
>> +btrfs)
>> +MAXLEN=256
> 
> Seems this should be 255, otherwise I got failure like:
> 
> -label = "MAXLABEL"
> +label: Invalid argument
> 
> and MAXLEN=255 makes the test pass with btrfs.

You are correct, I missed that they exclude the trailing
null from the length.  (Sorry, thought I tested this :( )

>> +;;
>> +*)
>> +MAXLEN=256
>> +echo "Your filesystem supports online label, please add max length"
> 
> Perhaps we can introduce a new helper similar to _require_acl_get_max()
> and _notrun the test if current $FSTYP doesn't define a maxlen on
> filesystem label?

Ok, sure.

>> +;;
>> +esac
>> +LABEL=$(perl -e "print 'o' x $MAXLEN;")
>> +$XFS_IO_PROG -c "label $LABEL" $SCRATCH_MNT | sed -e 's/o\+/MAXLABEL/'
>> +
>> +# And that it fails just past the filesystem max:
>> +let TOOLONG=MAXLEN+1
>> +LABEL=$(perl -e "print 'o' x $TOOLONG;")
>> +$XFS_IO_PROG -c "label $LABEL" $SCRATCH_MNT
>> +
>> +# success, all done
>> +status=0
>> +exit
>> diff --git a/tests/generic/485.out b/tests/generic/485.out
>> new file mode 100644
>> index 000..bc54684
>> --- /dev/null
>> +++ b/tests/generic/485.out
>> @@ -0,0 +1,9 @@
>> +QA output created by 485
>> +label = "label.485"
>> +label = "label.485"
>> +SCRATCH_DEV: LABEL="label.485" 
>> +SCRATCH_DEV: LABEL="label.485" 
> 
> There're trailing whitespaces in above two lines, I thought they're the
> output from xfs_io label command at first, but actually I have to remove
> the spaces to make test pass.

It might need a filter, this is output from blkid; it might have changed.
I noticed the whitespace as well but IIRC it works here.

Will look into these and fix stuff up.

Thanks!
-Eric

> Thanks,
> Eryu
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] Btrfs: fiemap: pass correct bytenr when fm_extent_count is zero

2018-05-09 Thread Nikolay Borisov


On  7.05.2018 11:42, robbieko wrote:
> From: Robbie Ko 
> 
> [BUG]
> fm_mapped_extents is not correct when fm_extent_count is 0
> Like:
># mount /dev/vdb5 /mnt/btrfs
># dd if=/dev/zero bs=16K count=4 oflag=dsync of=/mnt/btrfs/file
># xfs_io -c "fiemap -v" /mnt/btrfs/file
>/mnt/btrfs/file:
>EXT: FILE-OFFSET  BLOCK-RANGE  TOTAL FLAGS
>  0: [0..127]:25088..25215   128   0x1
> 
> When user space wants to get the number of file extents,
> set fm_extent_count to 0 to run fiemap and then read fm_mapped_extents.
 "it sets fm_extent_count to 0 and then reads fm_mapped_extents"

> 
> In the above example, fiemap will return with fm_mapped_extents set to 4,
> but it should be 1 since there's only one entry in the output.
> 
> [REASON]
> The problem seems to be that disko is only set if
> fieinfo->fi_extents_max is set. And this member is initialized, in the
> generic ioctl_fiemap function, to the value of used-passed
  
  user-passed
> fm_extent_count. So when the user passes 0 then fi_extent_max is also
> set to zero and this causes btrfs to not initialize disko at all.
> Eventually this leads emit_fiemap_extent being called with a bogus
> 'phys' argument preventing proper fiemap entries merging.
> 
> [FIX]
> Move the disko initialization earlier in extent_fiemap making it
> independent of user-passed arguments, allowing emit_fiemap_extent to
> properly handle consecutive extent entries.
> 
> Signed-off-by: Robbie Ko 
Only a couple of minor nits w.r.t the changelog but I guess David can
fix this on the go. In any case :

Reviewed-by: Nikolay Borisov 

> ---
> V2:
> fix comments.
> 
>  fs/btrfs/extent_io.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 012d638..066b6df 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4567,7 +4567,7 @@ int extent_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>   offset_in_extent = em_start - em->start;
>   em_end = extent_map_end(em);
>   em_len = em_end - em_start;
> - disko = 0;
> + disko = em->block_start + offset_in_extent;
>   flags = 0;
>  
>   /*
> @@ -4590,8 +4590,6 @@ int extent_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>   u64 bytenr = em->block_start -
>   (em->start - em->orig_start);
>  
> - disko = em->block_start + offset_in_extent;
> -
>   /*
>* As btrfs supports shared space, this information
>* can be exported to userspace tools via
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] man2: New page documenting filesystem get/set label ioctls

2018-05-09 Thread Darrick J. Wong
On Wed, May 09, 2018 at 11:04:03AM -0500, Eric Sandeen wrote:
> This documents the proposed new vfs-level ioctls which can
> get or set a mounted filesytem's label.
> 
> Signed-off-by: Eric Sandeen 
> ---
> 
> btrfs folks, please verify that this accurately describes your
> current behavior, thanks.
> 
> diff --git a/man2/ioctl_fslabel.2 b/man2/ioctl_fslabel.2
> new file mode 100644
> index 000..150aa53
> --- /dev/null
> +++ b/man2/ioctl_fslabel.2
> @@ -0,0 +1,83 @@
> +.\" Copyright (c) 2018, Red Hat, Inc.  All rights reserved.
> +.\"
> +.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
> +.\" This is free documentation; you can redistribute it and/or
> +.\" modify it under the terms of the GNU General Public License as
> +.\" published by the Free Software Foundation; either version 2 of
> +.\" the License, or (at your option) any later version.
> +.\"
> +.\" The GNU General Public License's references to "object code"
> +.\" and "executables" are to be interpreted as the output of any
> +.\" document formatting or typesetting system, including
> +.\" intermediate and printed output.
> +.\"
> +.\" This manual is distributed in the hope that it will be useful,
> +.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
> +.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +.\" GNU General Public License for more details.
> +.\"
> +.\" You should have received a copy of the GNU General Public
> +.\" License along with this manual; if not, see
> +.\" .
> +.\" %%%LICENSE_END
> +.TH IOCTL-FSLABEL 2 2018-05-02 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +ioctl_fslabel \- get or set a filesystem label
> +.SH SYNOPSIS
> +.br
> +.B #include 
> +.br
> +.B #include 
> +.sp
> +.BI "int ioctl(int " fd ", FS_IOC_GETFSLABEL, char " label [FSLABEL_MAX]);
> +.br
> +.BI "int ioctl(int " fd ", FS_IOC_SETFSLABEL, char " label [FSLABEL_MAX]);
> +.SH DESCRIPTION
> +If a filesystem supports online label manipulation, these
> +.BR ioctl (2)
> +operations can be used to get or set the filesystem label for the filesystem
> +on which
> +.B fd
> +resides.

Does the calling process need special capabilities or permissions?  If
so, those should be listed here.

> +.SH RETURN VALUE
> +On success zero is returned.  On error, \-1 is returned, and
> +.I errno
> +is set to indicate the error.
> +.PP
> +.SH ERRORS
> +Error codes can be one of, but are not limited to, the following:
> +.TP
> +.B EINVAL
> +The specified label exceeds the maximum label length for the filesystem.
> +.TP
> +.B ENOTTY
> +This can appear if the filesystem does not support online label manipulation.
> +.TP
> +.B EPERM
> +The calling process does not have sufficient permissions to set the label.
> +.TP
> +.B EFAULT
> +.I label
> +references an inaccessible memory area.
> +.SH VERSIONS
> +These ioctl operations first appeared in Linux 4.18.
> +They were previously known as
> +.B BTRFS_IOC_GET_FSLABEL
> +and
> +.B BTRFS_IOC_SET_FSLABEL
> +and were private to Btrfs.
> +.SH CONFORMING TO
> +This API is Linux-specific.
> +.SH NOTES
> +The maximum string length for this interface is
> +.BR FSLABEL_MAX ,
> +including the terminating null byte (\(aq\\0\(aq).
> +Filesystems have differing maximum label lengths, which may or
> +may not include the terminating null.  The string provided to
> +.B FS_IOC_SETFSLABEL
> +must always be null-terminated, and the string returned by
> +.B FS_IOC_GETFSLABEL
> +will always be null-terminated.
> +.SH SEE ALSO
> +.BR ioctl (2),
> +.BR blkid (8)
> diff --git a/man2/ioctl_getfslabel.2 b/man2/ioctl_getfslabel.2
> new file mode 100644
> index 000..bfa8dca
> --- /dev/null
> +++ b/man2/ioctl_getfslabel.2
> @@ -0,0 +1 @@
> +.so man2/ioctl_fslabel.2
> diff --git a/man2/ioctl_setfslabel.2 b/man2/ioctl_setfslabel.2
> new file mode 100644
> index 000..bfa8dca
> --- /dev/null
> +++ b/man2/ioctl_setfslabel.2
> @@ -0,0 +1 @@
> +.so man2/ioctl_fslabel.2

Put all the manpage content into ioctl_getfslabel.2 and have
ioctl_setfslabel.2 point to it, rather than three files.

--D

> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: hoist BTRFS_IOC_[SG]ET_FSLABEL to vfs

2018-05-09 Thread Darrick J. Wong
On Wed, May 09, 2018 at 11:01:21AM -0500, Eric Sandeen wrote:
> Move the btrfs label ioctls up to the vfs for general use.
> 
> This retains 256 chars as the maximum size through the interface, which
> is the btrfs limit and AFAIK exceeds any other filesystem's maximum
> label size.
> 
> Signed-off-by: Eric Sandeen 
> ---
> 
> Let the bikeshedding on the exact ioctl name begin ;)
> 
>  fs/btrfs/ioctl.c   | 8 
>  include/uapi/linux/btrfs.h | 6 ++
>  include/uapi/linux/fs.h| 8 ++--
>  3 files changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 632e26d..2dd4cdf 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -5444,6 +5444,10 @@ long btrfs_ioctl(struct file *file, unsigned int
>   return btrfs_ioctl_setflags(file, argp);
>   case FS_IOC_GETVERSION:
>   return btrfs_ioctl_getversion(file, argp);
> + case FS_IOC_GETFSLABEL:
> + return btrfs_ioctl_get_fslabel(file, argp);
> + case FS_IOC_SETFSLABEL:
> + return btrfs_ioctl_set_fslabel(file, argp);
>   case FITRIM:
>   return btrfs_ioctl_fitrim(file, argp);
>   case BTRFS_IOC_SNAP_CREATE:
> @@ -,10 +5559,6 @@ long btrfs_ioctl(struct file *file, unsigned int
>   return btrfs_ioctl_quota_rescan_wait(file, argp);
>   case BTRFS_IOC_DEV_REPLACE:
>   return btrfs_ioctl_dev_replace(fs_info, argp);
> - case BTRFS_IOC_GET_FSLABEL:
> - return btrfs_ioctl_get_fslabel(file, argp);
> - case BTRFS_IOC_SET_FSLABEL:
> - return btrfs_ioctl_set_fslabel(file, argp);
>   case BTRFS_IOC_GET_SUPPORTED_FEATURES:
>   return btrfs_ioctl_get_supported_features(argp);
>   case BTRFS_IOC_GET_FEATURES:
> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
> index c8d99b9..ec611c8 100644
> --- a/include/uapi/linux/btrfs.h
> +++ b/include/uapi/linux/btrfs.h
> @@ -823,10 +823,8 @@ enum btrfs_err_code {
>  #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \
>  struct btrfs_ioctl_quota_rescan_args)
>  #define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46)
> -#define BTRFS_IOC_GET_FSLABEL _IOR(BTRFS_IOCTL_MAGIC, 49, \
> -char[BTRFS_LABEL_SIZE])
> -#define BTRFS_IOC_SET_FSLABEL _IOW(BTRFS_IOCTL_MAGIC, 50, \
> -char[BTRFS_LABEL_SIZE])
> +#define BTRFS_IOC_GET_FSLABELFS_IOC_GETFSLABEL
> +#define BTRFS_IOC_SET_FSLABELFS_IOC_SETFSLABEL
>  #define BTRFS_IOC_GET_DEV_STATS _IOWR(BTRFS_IOCTL_MAGIC, 52, \
> struct btrfs_ioctl_get_dev_stats)
>  #define BTRFS_IOC_DEV_REPLACE _IOWR(BTRFS_IOCTL_MAGIC, 53, \
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index d2a8313..1df3707 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -242,6 +242,8 @@ struct fsxattr {
>  #define FICLONERANGE _IOW(0x94, 13, struct file_clone_range)
>  #define FIDEDUPERANGE_IOWR(0x94, 54, struct file_dedupe_range)
>  
> +#define FSLABEL_MAX 256  /* Max chars for the interface; each fs may 
> differ */
> +
>  #define  FS_IOC_GETFLAGS _IOR('f', 1, long)
>  #define  FS_IOC_SETFLAGS _IOW('f', 2, long)
>  #define  FS_IOC_GETVERSION   _IOR('v', 1, long)
> @@ -251,8 +253,10 @@ struct fsxattr {
>  #define FS_IOC32_SETFLAGS_IOW('f', 2, int)
>  #define FS_IOC32_GETVERSION  _IOR('v', 1, int)
>  #define FS_IOC32_SETVERSION  _IOW('v', 2, int)
> -#define FS_IOC_FSGETXATTR_IOR ('X', 31, struct fsxattr)
> -#define FS_IOC_FSSETXATTR_IOW ('X', 32, struct fsxattr)
> +#define FS_IOC_FSGETXATTR_IOR('X', 31, struct fsxattr)
> +#define FS_IOC_FSSETXATTR_IOW('X', 32, struct fsxattr)

Separate patch for whitespace cleanup.

> +#define FS_IOC_GETFSLABEL_IOR(0x94, 49, char[FSLABEL_MAX])
> +#define FS_IOC_SETFSLABEL_IOW(0x94, 50, char[FSLABEL_MAX])

Looks ok otherwise,
Reviewed-by: Darrick J. Wong 

--D

>  
>  /*
>   * File system encryption support
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] man2: New page documenting filesystem get/set label ioctls

2018-05-09 Thread Eric Sandeen
This documents the proposed new vfs-level ioctls which can
get or set a mounted filesytem's label.

Signed-off-by: Eric Sandeen 
---

btrfs folks, please verify that this accurately describes your
current behavior, thanks.

diff --git a/man2/ioctl_fslabel.2 b/man2/ioctl_fslabel.2
new file mode 100644
index 000..150aa53
--- /dev/null
+++ b/man2/ioctl_fslabel.2
@@ -0,0 +1,83 @@
+.\" Copyright (c) 2018, Red Hat, Inc.  All rights reserved.
+.\"
+.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
+.\" This is free documentation; you can redistribute it and/or
+.\" modify it under the terms of the GNU General Public License as
+.\" published by the Free Software Foundation; either version 2 of
+.\" the License, or (at your option) any later version.
+.\"
+.\" The GNU General Public License's references to "object code"
+.\" and "executables" are to be interpreted as the output of any
+.\" document formatting or typesetting system, including
+.\" intermediate and printed output.
+.\"
+.\" This manual is distributed in the hope that it will be useful,
+.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
+.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+.\" GNU General Public License for more details.
+.\"
+.\" You should have received a copy of the GNU General Public
+.\" License along with this manual; if not, see
+.\" .
+.\" %%%LICENSE_END
+.TH IOCTL-FSLABEL 2 2018-05-02 "Linux" "Linux Programmer's Manual"
+.SH NAME
+ioctl_fslabel \- get or set a filesystem label
+.SH SYNOPSIS
+.br
+.B #include 
+.br
+.B #include 
+.sp
+.BI "int ioctl(int " fd ", FS_IOC_GETFSLABEL, char " label [FSLABEL_MAX]);
+.br
+.BI "int ioctl(int " fd ", FS_IOC_SETFSLABEL, char " label [FSLABEL_MAX]);
+.SH DESCRIPTION
+If a filesystem supports online label manipulation, these
+.BR ioctl (2)
+operations can be used to get or set the filesystem label for the filesystem
+on which
+.B fd
+resides.
+.SH RETURN VALUE
+On success zero is returned.  On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.PP
+.SH ERRORS
+Error codes can be one of, but are not limited to, the following:
+.TP
+.B EINVAL
+The specified label exceeds the maximum label length for the filesystem.
+.TP
+.B ENOTTY
+This can appear if the filesystem does not support online label manipulation.
+.TP
+.B EPERM
+The calling process does not have sufficient permissions to set the label.
+.TP
+.B EFAULT
+.I label
+references an inaccessible memory area.
+.SH VERSIONS
+These ioctl operations first appeared in Linux 4.18.
+They were previously known as
+.B BTRFS_IOC_GET_FSLABEL
+and
+.B BTRFS_IOC_SET_FSLABEL
+and were private to Btrfs.
+.SH CONFORMING TO
+This API is Linux-specific.
+.SH NOTES
+The maximum string length for this interface is
+.BR FSLABEL_MAX ,
+including the terminating null byte (\(aq\\0\(aq).
+Filesystems have differing maximum label lengths, which may or
+may not include the terminating null.  The string provided to
+.B FS_IOC_SETFSLABEL
+must always be null-terminated, and the string returned by
+.B FS_IOC_GETFSLABEL
+will always be null-terminated.
+.SH SEE ALSO
+.BR ioctl (2),
+.BR blkid (8)
diff --git a/man2/ioctl_getfslabel.2 b/man2/ioctl_getfslabel.2
new file mode 100644
index 000..bfa8dca
--- /dev/null
+++ b/man2/ioctl_getfslabel.2
@@ -0,0 +1 @@
+.so man2/ioctl_fslabel.2
diff --git a/man2/ioctl_setfslabel.2 b/man2/ioctl_setfslabel.2
new file mode 100644
index 000..bfa8dca
--- /dev/null
+++ b/man2/ioctl_setfslabel.2
@@ -0,0 +1 @@
+.so man2/ioctl_fslabel.2


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] fs: hoist BTRFS_IOC_[SG]ET_FSLABEL to vfs

2018-05-09 Thread Eric Sandeen
Move the btrfs label ioctls up to the vfs for general use.

This retains 256 chars as the maximum size through the interface, which
is the btrfs limit and AFAIK exceeds any other filesystem's maximum
label size.

Signed-off-by: Eric Sandeen 
---

Let the bikeshedding on the exact ioctl name begin ;)

 fs/btrfs/ioctl.c   | 8 
 include/uapi/linux/btrfs.h | 6 ++
 include/uapi/linux/fs.h| 8 ++--
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 632e26d..2dd4cdf 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -5444,6 +5444,10 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_setflags(file, argp);
case FS_IOC_GETVERSION:
return btrfs_ioctl_getversion(file, argp);
+   case FS_IOC_GETFSLABEL:
+   return btrfs_ioctl_get_fslabel(file, argp);
+   case FS_IOC_SETFSLABEL:
+   return btrfs_ioctl_set_fslabel(file, argp);
case FITRIM:
return btrfs_ioctl_fitrim(file, argp);
case BTRFS_IOC_SNAP_CREATE:
@@ -,10 +5559,6 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_quota_rescan_wait(file, argp);
case BTRFS_IOC_DEV_REPLACE:
return btrfs_ioctl_dev_replace(fs_info, argp);
-   case BTRFS_IOC_GET_FSLABEL:
-   return btrfs_ioctl_get_fslabel(file, argp);
-   case BTRFS_IOC_SET_FSLABEL:
-   return btrfs_ioctl_set_fslabel(file, argp);
case BTRFS_IOC_GET_SUPPORTED_FEATURES:
return btrfs_ioctl_get_supported_features(argp);
case BTRFS_IOC_GET_FEATURES:
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index c8d99b9..ec611c8 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -823,10 +823,8 @@ enum btrfs_err_code {
 #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \
   struct btrfs_ioctl_quota_rescan_args)
 #define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46)
-#define BTRFS_IOC_GET_FSLABEL _IOR(BTRFS_IOCTL_MAGIC, 49, \
-  char[BTRFS_LABEL_SIZE])
-#define BTRFS_IOC_SET_FSLABEL _IOW(BTRFS_IOCTL_MAGIC, 50, \
-  char[BTRFS_LABEL_SIZE])
+#define BTRFS_IOC_GET_FSLABEL  FS_IOC_GETFSLABEL
+#define BTRFS_IOC_SET_FSLABEL  FS_IOC_SETFSLABEL
 #define BTRFS_IOC_GET_DEV_STATS _IOWR(BTRFS_IOCTL_MAGIC, 52, \
  struct btrfs_ioctl_get_dev_stats)
 #define BTRFS_IOC_DEV_REPLACE _IOWR(BTRFS_IOCTL_MAGIC, 53, \
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index d2a8313..1df3707 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -242,6 +242,8 @@ struct fsxattr {
 #define FICLONERANGE   _IOW(0x94, 13, struct file_clone_range)
 #define FIDEDUPERANGE  _IOWR(0x94, 54, struct file_dedupe_range)
 
+#define FSLABEL_MAX 256/* Max chars for the interface; each fs may 
differ */
+
 #defineFS_IOC_GETFLAGS _IOR('f', 1, long)
 #defineFS_IOC_SETFLAGS _IOW('f', 2, long)
 #defineFS_IOC_GETVERSION   _IOR('v', 1, long)
@@ -251,8 +253,10 @@ struct fsxattr {
 #define FS_IOC32_SETFLAGS  _IOW('f', 2, int)
 #define FS_IOC32_GETVERSION_IOR('v', 1, int)
 #define FS_IOC32_SETVERSION_IOW('v', 2, int)
-#define FS_IOC_FSGETXATTR  _IOR ('X', 31, struct fsxattr)
-#define FS_IOC_FSSETXATTR  _IOW ('X', 32, struct fsxattr)
+#define FS_IOC_FSGETXATTR  _IOR('X', 31, struct fsxattr)
+#define FS_IOC_FSSETXATTR  _IOW('X', 32, struct fsxattr)
+#define FS_IOC_GETFSLABEL  _IOR(0x94, 49, char[FSLABEL_MAX])
+#define FS_IOC_SETFSLABEL  _IOW(0x94, 50, char[FSLABEL_MAX])
 
 /*
  * File system encryption support
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] hoist btrfs' label set/get ioctls to vfs, and document

2018-05-09 Thread Eric Sandeen
I'm planning to add online label set/get support to xfs, and to do so
I plan to re-use the existing btrfs ioctls, BTRFS_IOC_[SG]ET_FSLABEL

We're still working out minor details on the xfs side, but I'd like to
start the conversation regarding the new more generic interface ASAP,
so here goes - patches to move the ioctls to the vfs and document them.

(Other filesystems may wish to use this interface in the future as well)

Thanks,
-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] test online label ioctl

2018-05-09 Thread Eryu Guan
On Mon, Apr 30, 2018 at 04:43:18PM -0500, Eric Sandeen wrote:
> This tests the online label ioctl that btrfs has, which has been
> recently proposed for XFS.
> 
> To run, it requires an updated xfs_io with the label command and a
> filesystem that supports it
> 
> A slight change here to _require_xfs_io_command as well, so that tests
> which simply fail with "Inappropriate ioctl" can be caught in the
> common case.
> 
> Signed-off-by: Eric Sandeen 
> ---
> 
> this passes on btrfs, _notruns on xfs/ext4 of yore, and passes
> on xfs w/ my online label patchset (as long as xfs_io has the new
> capability)
> 
> diff --git a/common/rc b/common/rc
> index 9ffab7f..c53a721 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -2158,6 +2158,9 @@ _require_xfs_io_command()
>   echo $testio | grep -q "Inappropriate ioctl" && \
>   _notrun "xfs_io $command support is missing"
>   ;;
> + "label")
> + testio=`$XFS_IO_PROG -c "label" $TEST_DIR 2>&1`
> + ;;
>   "open")
>   # -c "open $f" is broken in xfs_io <= 4.8. Along with the fix,
>   # a new -C flag was introduced to execute one shot commands.
> @@ -2196,7 +2199,7 @@ _require_xfs_io_command()
>   rm -f $testfile 2>&1 > /dev/null
>   echo $testio | grep -q "not found" && \
>   _notrun "xfs_io $command support is missing"
> - echo $testio | grep -q "Operation not supported" && \
> + echo $testio | grep -q "Operation not supported\|Inappropriate ioctl" 
> && \
>   _notrun "xfs_io $command failed (old kernel/wrong fs?)"
>   echo $testio | grep -q "Invalid" && \
>   _notrun "xfs_io $command failed (old kernel/wrong fs/bad args?)"
> diff --git a/tests/generic/485 b/tests/generic/485
> new file mode 100755
> index 000..79902c2
> --- /dev/null
> +++ b/tests/generic/485
> @@ -0,0 +1,99 @@
> +#! /bin/bash
> +# FS QA Test 485
> +#
> +# Test the online filesystem label set/get ioctls
> +#
> +#---
> +# Copyright (c) 2018 Red Hat, Inc.  All Rights Reserved.
> +# Author: Eric Sandeen 
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + cd /
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +_supported_fs generic
> +_supported_os Linux
> +_require_scratch
> +_require_xfs_io_command "label"
> +
> +_scratch_mkfs > $seqres.full 2>&1
> +_scratch_mount
> +
> +# Make sure we can set & clear the label
> +$XFS_IO_PROG -c "label label.$seq" $SCRATCH_MNT
> +$XFS_IO_PROG -c "label" $SCRATCH_MNT
> +
> +# And that userspace can see it now, while mounted
> +blkid -s LABEL $SCRATCH_DEV | _filter_scratch
> +
> +# And that the it is still there when it's unmounted
> +_scratch_unmount
> +blkid -s LABEL $SCRATCH_DEV | _filter_scratch
> +
> +# And that it persists after a remount
> +_scratch_mount
> +$XFS_IO_PROG -c "label" $SCRATCH_MNT
> +
> +# And that a too-long label is rejected, beyond the interface max:
> +LABEL=$(perl -e "print 'l' x 257;")
> +$XFS_IO_PROG -c "label $LABEL" $SCRATCH_MNT
> +
> +# And that it succeeds right at the filesystem max:
> +case $FSTYP in
> +xfs)
> + MAXLEN=12;
> + ;;
> +btrfs)
> + MAXLEN=256

Seems this should be 255, otherwise I got failure like:

-label = "MAXLABEL"
+label: Invalid argument

and MAXLEN=255 makes the test pass with btrfs.

> + ;;
> +*)
> + MAXLEN=256
> + echo "Your filesystem supports online label, please add max length"

Perhaps we can introduce a new helper similar to _require_acl_get_max()
and _notrun the test if current $FSTYP doesn't define a maxlen on
filesystem label?

> + ;;
> +esac
> +LABEL=$(perl -e "print 'o' x $MAXLEN;")
> +$XFS_IO_PROG -c "label $LABEL" $SCRATCH_MNT | sed -e 's/o\+/MAXLABEL/'
> +
> +# And that it fails just past the filesystem max:
> +let 

[PATCH] Btrfs: fix duplicate extents after fsync of file with prealloc extents

2018-05-09 Thread fdmanana
From: Filipe Manana 

In commit 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size
after fsync log replay"), on fsync,  we started to always log all prealloc
extents beyond an inode's i_size in order to avoid losing them after a
power failure. However under some cases this can lead to the log replay
code to create duplicate extent items, with different lengths, in the
extent tree. That happens because, as of that commit, we can now log
extent items based on extent maps that are not on the "modified" list
of extent maps of the inode's extent map tree. Logging extent items based
on extent maps is used during the fast fsync path to save time and for
this to work reliably it requires that the extent maps are not merged
with other adjacent extent maps - having the extent maps in the list
of modified extents gives such guarantee.

Consider the following example, captured during a long run of fsstress,
which illustrates this problem.

We have inode 271, in the filesystem tree (root 5), for which all of the
following operations and discussion apply to.

A buffered write starts at offset 312391 with a length of 933471 bytes
(end offset at 1245862). At this point we have, for this inode, the
following extent maps with the their field values:

em A, start 0, orig_start 0, len 40960, block_start 18446744073709551613,
  block_len 0, orig_block_len 0
em B, start 40960, orig_start 40960, len 376832, block_start 1106399232,
  block_len 376832, orig_block_len 376832
em C, start 417792, orig_start 417792, len 782336, block_start
  18446744073709551613, block_len 0, orig_block_len 0
em D, start 1200128, orig_start 1200128, len 835584, block_start
  1106776064, block_len 835584, orig_block_len 835584
em E, start 2035712, orig_start 2035712, len 245760, block_start
  1107611648, block_len 245760, orig_block_len 245760

Extent map A corresponds to a hole and extent maps D and E correspond to
preallocated extents.

Extent map D ends where extent map E begins (1106776064 + 835584 =
1107611648), but these extent maps were not merged because they are in
the inode's list of modified extent maps.

An fsync against this inode is made, which triggers the fast path
(BTRFS_INODE_NEEDS_FULL_SYNC is not set). This fsync triggers writeback
of the data previously written using buffered IO, and when the respective
ordered extent finishes, btrfs_drop_extents() is called against the
(aligned) range 311296..1249279. This causes a split of extent map D at
btrfs_drop_extent_cache(), replacing extent map D with a new extent map
D', also added to the list of modified extents,  with the following
values:

em D', start 1249280, orig_start of 1200128,
   block_start 1106825216 (= 1106776064 + 1249280 - 1200128),
   orig_block_len 835584,
   block_len 786432 (835584 - (1249280 - 1200128))

Then, during the fast fsync, btrfs_log_changed_extents() is called and
extent maps D' and E are removed from the list of modified extents. The
flag EXTENT_FLAG_LOGGING is also set on them. After the extents are logged
clear_em_logging() is called on each of them, and that makes extent map E
to be merged with extent map D' (try_merge_map()), resulting in D' being
deleted and E adjusted to:

em E, start 1249280, orig_start 1200128, len 1032192,
  block_start 1106825216, block_len 1032192,
  orig_block_len 245760

A direct IO write at offset 1847296 and length of 360448 bytes (end offset
at 2207744) starts, and at that moment the following extent maps exist for
our inode:

em A, start 0, orig_start 0, len 40960, block_start 18446744073709551613,
  block_len 0, orig_block_len 0
em B, start 40960, orig_start 40960, len 270336, block_start 1106399232,
  block_len 270336, orig_block_len 376832
em C, start 311296, orig_start 311296, len 937984, block_start 1112842240,
  block_len 937984, orig_block_len 937984
em E (prealloc), start 1249280, orig_start 1200128, len 1032192,
  block_start 1106825216, block_len 1032192, orig_block_len 245760

The dio write results in drop_extent_cache() being called twice. The first
time for a range that starts at offset 1847296 and ends at offset 2035711
(length of 188416), which results in a double split of extent map E,
replacing it with two new extent maps:

em F, start 1249280, orig_start 1200128, block_start 1106825216,
  block_len 598016, orig_block_len 598016
em G, start 2035712, orig_start 1200128, block_start 1107611648,
  block_len 245760, orig_block_len 1032192

It also creates a new extent map that represents a part of the requested
IO (through create_io_em()):

em H, start 1847296, len 188416, block_start 1107423232, block_len 188416

The second call to drop_extent_cache() has a range with a start offset of
2035712 and end offset of 2207743 (length of 172032). This leads to
replacing extent map G with a new extent map I with the following values:

em I, start 2207744, orig_start 1200128, block_start 1107783680,
  block_len 73728, 

Re: [PATCH] btrfs: return original error code when failing from option parsing

2018-05-09 Thread David Sterba
On Wed, May 09, 2018 at 09:08:23PM +0800, Chengguang Xu wrote:
> It's no good to overwrite -ENOMEM using -EINVAL when failing
> from mount option parsing, so just return original error code.
> 
> Signed-off-by: Chengguang Xu 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/8] btrfs: rename btrfs_flags_to_ioctl to reflect which flags it touches

2018-05-09 Thread David Sterba
Converts btrfs_inode::flags to the FS_*_FL flags.

Signed-off-by: David Sterba 
---
 fs/btrfs/ioctl.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0b84f9e68f86..40eeb99310d5 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -105,9 +105,10 @@ static unsigned int btrfs_mask_fsflags_for_type(struct 
inode *inode,
 }
 
 /*
- * Export inode flags to the format expected by the FS_IOC_GETFLAGS ioctl.
+ * Export internal inode flags to the format expected by the FS_IOC_GETFLAGS
+ * ioctl.
  */
-static unsigned int btrfs_flags_to_ioctl(unsigned int flags)
+static unsigned int btrfs_inode_flags_to_fsflags(unsigned int flags)
 {
unsigned int iflags = 0;
 
@@ -161,7 +162,7 @@ void btrfs_sync_inode_flags_to_i_flags(struct inode *inode)
 static int btrfs_ioctl_getflags(struct file *file, void __user *arg)
 {
struct btrfs_inode *ip = BTRFS_I(file_inode(file));
-   unsigned int flags = btrfs_flags_to_ioctl(ip->flags);
+   unsigned int flags = btrfs_inode_flags_to_fsflags(ip->flags);
 
if (copy_to_user(arg, , sizeof(flags)))
return -EFAULT;
@@ -221,7 +222,7 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
mode = inode->i_mode;
 
flags = btrfs_mask_fsflags_for_type(inode, flags);
-   oldflags = btrfs_flags_to_ioctl(ip->flags);
+   oldflags = btrfs_inode_flags_to_fsflags(ip->flags);
if ((flags ^ oldflags) & (FS_APPEND_FL | FS_IMMUTABLE_FL)) {
if (!capable(CAP_LINUX_IMMUTABLE)) {
ret = -EPERM;
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/8] btrfs: rename check_flags to reflect which flags it touches

2018-05-09 Thread David Sterba
The FS_*_FL flags cannot be easily identified by a prefix but we still
need to recognize them so the 'fsflags' should be closer to the naming
scheme but again the 'fs' part sounds like it's a filesystem flag. I
don't have a better idea for now.

Signed-off-by: David Sterba 
---
 fs/btrfs/ioctl.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index e93dc3a6f554..0b84f9e68f86 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -168,7 +168,8 @@ static int btrfs_ioctl_getflags(struct file *file, void 
__user *arg)
return 0;
 }
 
-static int check_flags(unsigned int flags)
+/* Check if @flags are a supported and valid set of FS_*_FL flags */
+static int check_fsflags(unsigned int flags)
 {
if (flags & ~(FS_IMMUTABLE_FL | FS_APPEND_FL | \
  FS_NOATIME_FL | FS_NODUMP_FL | \
@@ -205,7 +206,7 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
if (copy_from_user(, arg, sizeof(flags)))
return -EFAULT;
 
-   ret = check_flags(flags);
+   ret = check_fsflags(flags);
if (ret)
return ret;
 
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 8/8] btrfs: unify naming of flags variables for SETFLAGS and XFLAGS

2018-05-09 Thread David Sterba
* The simple 'flags' refer to the btrfs inode
* ... that's in 'binode
* the FS_*_FL variables are 'fsflags'
* the old copies of the variable are prefixed by 'old_'
* Struct inode flags contain 'i_flags'.

Signed-off-by: David Sterba 
---
 fs/btrfs/ioctl.c | 106 +++
 1 file changed, 53 insertions(+), 53 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 838a7a80e6b0..c1904d9d99e2 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -140,18 +140,18 @@ static unsigned int btrfs_inode_flags_to_fsflags(unsigned 
int flags)
  */
 void btrfs_sync_inode_flags_to_i_flags(struct inode *inode)
 {
-   struct btrfs_inode *ip = BTRFS_I(inode);
+   struct btrfs_inode *binode = BTRFS_I(inode);
unsigned int new_fl = 0;
 
-   if (ip->flags & BTRFS_INODE_SYNC)
+   if (binode->flags & BTRFS_INODE_SYNC)
new_fl |= S_SYNC;
-   if (ip->flags & BTRFS_INODE_IMMUTABLE)
+   if (binode->flags & BTRFS_INODE_IMMUTABLE)
new_fl |= S_IMMUTABLE;
-   if (ip->flags & BTRFS_INODE_APPEND)
+   if (binode->flags & BTRFS_INODE_APPEND)
new_fl |= S_APPEND;
-   if (ip->flags & BTRFS_INODE_NOATIME)
+   if (binode->flags & BTRFS_INODE_NOATIME)
new_fl |= S_NOATIME;
-   if (ip->flags & BTRFS_INODE_DIRSYNC)
+   if (binode->flags & BTRFS_INODE_DIRSYNC)
new_fl |= S_DIRSYNC;
 
set_mask_bits(>i_flags,
@@ -161,8 +161,8 @@ void btrfs_sync_inode_flags_to_i_flags(struct inode *inode)
 
 static int btrfs_ioctl_getflags(struct file *file, void __user *arg)
 {
-   struct btrfs_inode *ip = BTRFS_I(file_inode(file));
-   unsigned int flags = btrfs_inode_flags_to_fsflags(ip->flags);
+   struct btrfs_inode *binode = BTRFS_I(file_inode(file));
+   unsigned int flags = btrfs_inode_flags_to_fsflags(binode->flags);
 
if (copy_to_user(arg, , sizeof(flags)))
return -EFAULT;
@@ -189,13 +189,13 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
 {
struct inode *inode = file_inode(file);
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-   struct btrfs_inode *ip = BTRFS_I(inode);
-   struct btrfs_root *root = ip->root;
+   struct btrfs_inode *binode = BTRFS_I(inode);
+   struct btrfs_root *root = binode->root;
struct btrfs_trans_handle *trans;
-   unsigned int flags, oldflags;
+   unsigned int fsflags, old_fsflags;
int ret;
-   u64 ip_oldflags;
-   unsigned int i_oldflags;
+   u64 old_flags;
+   unsigned int old_i_flags;
umode_t mode;
 
if (!inode_owner_or_capable(inode))
@@ -204,10 +204,10 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
if (btrfs_root_readonly(root))
return -EROFS;
 
-   if (copy_from_user(, arg, sizeof(flags)))
+   if (copy_from_user(, arg, sizeof(fsflags)))
return -EFAULT;
 
-   ret = check_fsflags(flags);
+   ret = check_fsflags(fsflags);
if (ret)
return ret;
 
@@ -217,44 +217,44 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
 
inode_lock(inode);
 
-   ip_oldflags = ip->flags;
-   i_oldflags = inode->i_flags;
+   old_flags = binode->flags;
+   old_i_flags = inode->i_flags;
mode = inode->i_mode;
 
-   flags = btrfs_mask_fsflags_for_type(inode, flags);
-   oldflags = btrfs_inode_flags_to_fsflags(ip->flags);
-   if ((flags ^ oldflags) & (FS_APPEND_FL | FS_IMMUTABLE_FL)) {
+   fsflags = btrfs_mask_fsflags_for_type(inode, fsflags);
+   old_fsflags = btrfs_inode_flags_to_fsflags(binode->flags);
+   if ((fsflags ^ old_fsflags) & (FS_APPEND_FL | FS_IMMUTABLE_FL)) {
if (!capable(CAP_LINUX_IMMUTABLE)) {
ret = -EPERM;
goto out_unlock;
}
}
 
-   if (flags & FS_SYNC_FL)
-   ip->flags |= BTRFS_INODE_SYNC;
+   if (fsflags & FS_SYNC_FL)
+   binode->flags |= BTRFS_INODE_SYNC;
else
-   ip->flags &= ~BTRFS_INODE_SYNC;
-   if (flags & FS_IMMUTABLE_FL)
-   ip->flags |= BTRFS_INODE_IMMUTABLE;
+   binode->flags &= ~BTRFS_INODE_SYNC;
+   if (fsflags & FS_IMMUTABLE_FL)
+   binode->flags |= BTRFS_INODE_IMMUTABLE;
else
-   ip->flags &= ~BTRFS_INODE_IMMUTABLE;
-   if (flags & FS_APPEND_FL)
-   ip->flags |= BTRFS_INODE_APPEND;
+   binode->flags &= ~BTRFS_INODE_IMMUTABLE;
+   if (fsflags & FS_APPEND_FL)
+   binode->flags |= BTRFS_INODE_APPEND;
else
-   ip->flags &= ~BTRFS_INODE_APPEND;
-   if (flags & FS_NODUMP_FL)
-   ip->flags |= BTRFS_INODE_NODUMP;
+   binode->flags &= ~BTRFS_INODE_APPEND;
+   if (fsflags & FS_NODUMP_FL)
+ 

[PATCH v2 6/8] btrfs: add FS_IOC_FSGETXATTR ioctl

2018-05-09 Thread David Sterba
The new ioctl is an extension to the FS_IOC_GETFLAGS and adds new
flags and is extensible. This patch allows to return the xflags portion
of the fsxattr structure, other items have no meaning for btrfs or can
be added later.

The original patch was written by Chandan Jay Sharma but was incomplete
and no further revision has been sent. Several cleanups were necessary
to avoid confusion with other ioctls, as we have another flavor of
flags.

Based-on-patches-by: Chandan Jay Sharma 
Signed-off-by: David Sterba 
---
 fs/btrfs/ioctl.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 54dfcb851ec2..a3bb59ac5666 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -370,6 +370,24 @@ static int check_xflags(unsigned int flags)
return 0;
 }
 
+/*
+ * Set the xflags from the internal inode flags. The remaining items of fsxattr
+ * are zeroed.
+ */
+static int btrfs_ioctl_fsgetxattr(struct file *file, void __user *arg)
+{
+   struct btrfs_inode *binode = BTRFS_I(file_inode(file));
+   struct fsxattr fa;
+
+   memset(, 0, sizeof(fa));
+   fa.fsx_xflags = btrfs_inode_flags_to_xflags(binode->flags);
+
+   if (copy_to_user(arg, , sizeof(fa)))
+   return -EFAULT;
+
+   return 0;
+}
+
 static int btrfs_ioctl_getversion(struct file *file, int __user *arg)
 {
struct inode *inode = file_inode(file);
@@ -5600,6 +5618,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_get_features(file, argp);
case BTRFS_IOC_SET_FEATURES:
return btrfs_ioctl_set_features(file, argp);
+   case FS_IOC_FSGETXATTR:
+   return btrfs_ioctl_fsgetxattr(file, argp);
}
 
return -ENOTTY;
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/8] btrfs: add helpers for FS_XFLAG_* conversion

2018-05-09 Thread David Sterba
Preparatory work for the FS_IOC_FSGETXATTR ioctl, basic conversions and
checking helpers.

Signed-off-by: David Sterba 
---
 fs/btrfs/ioctl.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 40eeb99310d5..54dfcb851ec2 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -338,6 +338,38 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
return ret;
 }
 
+/*
+ * Translate btrfs internal inode flags to xflags as expected by the
+ * FS_IOC_FSGETXATT ioctl. Filter only the supported ones, unknown flags are
+ * silently dropped.
+ */
+static unsigned int btrfs_inode_flags_to_xflags(unsigned int flags)
+{
+   unsigned int xflags = 0;
+
+   if (flags & BTRFS_INODE_APPEND)
+   xflags |= FS_XFLAG_APPEND;
+   if (flags & BTRFS_INODE_IMMUTABLE)
+   xflags |= FS_XFLAG_IMMUTABLE;
+   if (flags & BTRFS_INODE_NOATIME)
+   xflags |= FS_XFLAG_NOATIME;
+   if (flags & BTRFS_INODE_NODUMP)
+   xflags |= FS_XFLAG_NODUMP;
+   if (flags & BTRFS_INODE_SYNC)
+   xflags |= FS_XFLAG_SYNC;
+
+   return xflags;
+}
+
+/* Check if @flags are a supported and valid set of FS_XFLAGS_* flags */
+static int check_xflags(unsigned int flags)
+{
+   if (flags & ~(FS_XFLAG_APPEND | FS_XFLAG_IMMUTABLE | FS_XFLAG_NOATIME |
+ FS_XFLAG_NODUMP | FS_XFLAG_SYNC))
+   return -EOPNOTSUPP;
+   return 0;
+}
+
 static int btrfs_ioctl_getversion(struct file *file, int __user *arg)
 {
struct inode *inode = file_inode(file);
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 7/8] btrfs: add FS_IOC_FSSETXATTR ioctl

2018-05-09 Thread David Sterba
The new ioctl is an extension to the FS_IOC_SETFLAGS and adds new
flags and is extensible. Don't get fooled by the XATTR in the name, it
does not have anything in common with the extended attributes,
incidentally also abbreviated as XATTRs.

This patch allows to set the xflags portion of the fsxattr structure,
other items have no meaning and non-zero values will result in
EOPNOTSUPP.

Currently supported xflags:

- APPEND
- IMMUTABLE
- NOATIME
- NODUMP
- SYNC

The structure of btrfs_ioctl_fssetxattr copies btrfs_ioctl_setflags but
is simpler on the flag setting side.

The original patch was written by Chandan Jay Sharma but was incomplete
and no further revision has been sent.

Based-on-patches-by: Chandan Jay Sharma 
Signed-off-by: David Sterba 
---
 fs/btrfs/ioctl.c | 94 
 1 file changed, 94 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a3bb59ac5666..838a7a80e6b0 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -388,6 +388,98 @@ static int btrfs_ioctl_fsgetxattr(struct file *file, void 
__user *arg)
return 0;
 }
 
+static int btrfs_ioctl_fssetxattr(struct file *file, void __user *arg)
+{
+   struct inode *inode = file_inode(file);
+   struct btrfs_inode *binode = BTRFS_I(inode);
+   struct btrfs_root *root = binode->root;
+   struct btrfs_trans_handle *trans;
+   struct fsxattr fa;
+   unsigned old_flags;
+   unsigned old_i_flags;
+   int ret = 0;
+
+   if (!inode_owner_or_capable(inode))
+   return -EPERM;
+
+   if (btrfs_root_readonly(root))
+   return -EROFS;
+
+   memset(, 0, sizeof(fa));
+   if (copy_from_user(, arg, sizeof(fa)))
+   return -EFAULT;
+
+   ret = check_xflags(fa.fsx_xflags);
+   if (ret)
+   return ret;
+
+   if (fa.fsx_extsize != 0 || fa.fsx_projid != 0 || fa.fsx_cowextsize != 0)
+   return -EOPNOTSUPP;
+
+   ret = mnt_want_write_file(file);
+   if (ret)
+   return ret;
+
+   inode_lock(inode);
+
+   old_flags = binode->flags;
+   old_i_flags = inode->i_flags;
+
+   /* We need the capabilities to change append-only or immutable inode */
+   if (((old_flags & (BTRFS_INODE_APPEND | BTRFS_INODE_IMMUTABLE)) ||
+(fa.fsx_xflags & (FS_XFLAG_APPEND | FS_XFLAG_IMMUTABLE))) &&
+   !capable(CAP_LINUX_IMMUTABLE)) {
+   ret = -EPERM;
+   goto out_unlock;
+   }
+
+   if (fa.fsx_xflags & FS_XFLAG_SYNC)
+   binode->flags |= BTRFS_INODE_SYNC;
+   else
+   binode->flags &= ~BTRFS_INODE_SYNC;
+   if (fa.fsx_xflags & FS_XFLAG_IMMUTABLE)
+   binode->flags |= BTRFS_INODE_IMMUTABLE;
+   else
+   binode->flags &= ~BTRFS_INODE_IMMUTABLE;
+   if (fa.fsx_xflags & FS_XFLAG_APPEND)
+   binode->flags |= BTRFS_INODE_APPEND;
+   else
+   binode->flags &= ~BTRFS_INODE_APPEND;
+   if (fa.fsx_xflags & FS_XFLAG_NODUMP)
+   binode->flags |= BTRFS_INODE_NODUMP;
+   else
+   binode->flags &= ~BTRFS_INODE_NODUMP;
+   if (fa.fsx_xflags & FS_XFLAG_NOATIME)
+   binode->flags |= BTRFS_INODE_NOATIME;
+   else
+   binode->flags &= ~BTRFS_INODE_NOATIME;
+
+   /* 1 item for the inode */
+   trans = btrfs_start_transaction(root, 1);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   goto out_unlock;
+   }
+
+   btrfs_sync_inode_flags_to_i_flags(inode);
+   inode_inc_iversion(inode);
+   inode->i_ctime = current_time(inode);
+   ret = btrfs_update_inode(trans, root, inode);
+
+   btrfs_end_transaction(trans);
+
+out_unlock:
+   if (ret) {
+   binode->flags = old_flags;
+   inode->i_flags = old_i_flags;
+   }
+
+   inode_unlock(inode);
+   mnt_drop_write_file(file);
+
+   return ret;
+}
+
 static int btrfs_ioctl_getversion(struct file *file, int __user *arg)
 {
struct inode *inode = file_inode(file);
@@ -5620,6 +5712,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_set_features(file, argp);
case FS_IOC_FSGETXATTR:
return btrfs_ioctl_fsgetxattr(file, argp);
+   case FS_IOC_FSSETXATTR:
+   return btrfs_ioctl_fssetxattr(file, argp);
}
 
return -ENOTTY;
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/8] btrfs: rename btrfs_update_iflags to reflect which flags it touches

2018-05-09 Thread David Sterba
The btrfs inode flag flavour is now simply called 'inode flags' and the
vfs inode are i_flags. Also rename the internal variable to something
less confusing than 'ip'.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.h | 2 +-
 fs/btrfs/inode.c | 4 ++--
 fs/btrfs/ioctl.c | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2771cc56a622..d738eaa1d6e1 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3260,7 +3260,7 @@ void btrfs_test_inode_set_ops(struct inode *inode);
 long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
 long btrfs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long 
arg);
 int btrfs_ioctl_get_supported_features(void __user *arg);
-void btrfs_update_iflags(struct inode *inode);
+void btrfs_sync_inode_flags_to_i_flags(struct inode *inode);
 int btrfs_is_empty_uuid(u8 *uuid);
 int btrfs_defrag_file(struct inode *inode, struct file *file,
  struct btrfs_ioctl_defrag_range_args *range,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d241285a0d2a..c30afabcb6b0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3924,7 +3924,7 @@ static int btrfs_read_locked_inode(struct inode *inode)
break;
}
 
-   btrfs_update_iflags(inode);
+   btrfs_sync_inode_flags_to_i_flags(inode);
return 0;
 
 make_bad:
@@ -6263,7 +6263,7 @@ static void btrfs_inherit_iflags(struct inode *inode, 
struct inode *dir)
BTRFS_I(inode)->flags |= BTRFS_INODE_NODATASUM;
}
 
-   btrfs_update_iflags(inode);
+   btrfs_sync_inode_flags_to_i_flags(inode);
 }
 
 static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 632e26d6f7ce..11edbdf3df7d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -136,7 +136,7 @@ static unsigned int btrfs_flags_to_ioctl(unsigned int flags)
 /*
  * Update inode->i_flags based on the btrfs internal flags.
  */
-void btrfs_update_iflags(struct inode *inode)
+void btrfs_sync_inode_flags_to_i_flags(struct inode *inode)
 {
struct btrfs_inode *ip = BTRFS_I(inode);
unsigned int new_fl = 0;
@@ -317,7 +317,7 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
goto out_drop;
}
 
-   btrfs_update_iflags(inode);
+   btrfs_sync_inode_flags_to_i_flags(inode);
inode_inc_iversion(inode);
inode->i_ctime = current_time(inode);
ret = btrfs_update_inode(trans, root, inode);
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/8] Add ioctl to support extended inode flags

2018-05-09 Thread David Sterba
Changes for v2:
 - dropped renaming of variable from patch "btrfs: rename
   btrfs_flags_to_ioctl to reflect which flags it touches"
 - fixed the error handling in "btrfs: add FS_IOC_FSSETXATTR ioctl"
 - new patch to unify naming of some local variables


I'm about to add this patchset to the main queue for 4.18, unless there
are objections.


The patchset implements the existing VFS ioctls for reading extended
ioctl flags by btrfs.

There are many flags/attributes/extended/combined, the naming is
confusing, so let's recap what we have:

* generic VFS inode flags (i_flags)
  - S_* namespace
  https://elixir.bootlin.com/linux/v4.17-rc1/source/include/linux/fs.h#L1850
  - FS_IOC_GETFLAGS, FS_IOC_SETFLAGS
  - tools: lsatrr, chattr

* btrfs inode flags, on-disk format, independent of the above, with
  to/from conversions
  https://elixir.bootlin.com/linux/v4.17-rc1/source/fs/btrfs/ctree.h#L1416

* extended attributes, also called XATTR, but they're different entity,
  stored by an inode as key:value pairs
  - tools: getfattr, setfattr

* XFLAGs, another interface to the generic S_* flags, new ioctl added
  because [GS]ETFLAGS is frozen, new bits added, eg. for project quotas
  or DAX, and more that originate in XFS features
  https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/fs.h#L168
  - tools: xfs_io -c lsattr, xfs_io -c chattr

In the future, btrfs will probably get:

- nodefrag -- eg. to disable autodefrag or defrag ioctl
- nosymlinks -- for directories, prevent creating new symlinks
- dax

git://github.com/kdave/btrfs-devel dev/xflags

David Sterba (8):
  btrfs: rename btrfs_update_iflags to reflect which flags it touches
  btrfs: rename btrfs_mask_flags to reflect which flags it touches
  btrfs: rename check_flags to reflect which flags it touches
  btrfs: rename btrfs_flags_to_ioctl to reflect which flags it touches
  btrfs: add helpers for FS_XFLAG_* conversion
  btrfs: add FS_IOC_FSGETXATTR ioctl
  btrfs: add FS_IOC_FSSETXATTR ioctl
  btrfs: unify naming of flags variables for SETFLAGS and XFLAGS

 fs/btrfs/ctree.h |   2 +-
 fs/btrfs/inode.c |   4 +-
 fs/btrfs/ioctl.c | 271 ++-
 3 files changed, 213 insertions(+), 64 deletions(-)

-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/8] btrfs: rename btrfs_mask_flags to reflect which flags it touches

2018-05-09 Thread David Sterba
The FS_*_FL flags cannot be easily identified by a variable name prefix
but we still need to recognize them so the 'fsflags' should be closer to
the naming scheme but again the 'fs' part sounds like it's a filesystem
flag. I don't have a better idea for now.

Signed-off-by: David Sterba 
---
 fs/btrfs/ioctl.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 11edbdf3df7d..e93dc3a6f554 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -93,11 +93,12 @@ static int btrfs_clone(struct inode *src, struct inode 
*inode,
   int no_time_update);
 
 /* Mask out flags that are inappropriate for the given type of inode. */
-static unsigned int btrfs_mask_flags(umode_t mode, unsigned int flags)
+static unsigned int btrfs_mask_fsflags_for_type(struct inode *inode,
+   unsigned int flags)
 {
-   if (S_ISDIR(mode))
+   if (S_ISDIR(inode->i_mode))
return flags;
-   else if (S_ISREG(mode))
+   else if (S_ISREG(inode->i_mode))
return flags & ~FS_DIRSYNC_FL;
else
return flags & (FS_NODUMP_FL | FS_NOATIME_FL);
@@ -218,7 +219,7 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
i_oldflags = inode->i_flags;
mode = inode->i_mode;
 
-   flags = btrfs_mask_flags(inode->i_mode, flags);
+   flags = btrfs_mask_fsflags_for_type(inode, flags);
oldflags = btrfs_flags_to_ioctl(ip->flags);
if ((flags ^ oldflags) & (FS_APPEND_FL | FS_IMMUTABLE_FL)) {
if (!capable(CAP_LINUX_IMMUTABLE)) {
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: qgroup: Finish rescan when hit the last leaf of extent tree

2018-05-09 Thread Qu Wenruo


On 2018年05月09日 21:00, David Sterba wrote:
> On Fri, May 04, 2018 at 01:56:59PM +0800, Qu Wenruo wrote:
>> Under the following case, qgroup rescan can double account cowed tree
>> blocks:
>>
>> In this case, extent tree only has one tree block.
>>
>> -
>> | transid=5 last committed=4
>> | btrfs_qgroup_rescan_worker()
>> | |- btrfs_start_transaction()
>> | |  transid = 5
>> | |- qgroup_rescan_leaf()
>> ||- btrfs_search_slot_for_read() on extent tree
>> |   Get the only extent tree block from commit root (transid = 4).
>> |   Scan it, set qgroup_rescan_progress to the last
>> |   EXTENT/META_ITEM + 1
>> |   now qgroup_rescan_progress = A + 1.
>> |
>> | fs tree get CoWed, new tree block is at A + 16K
>> | transid 5 get committed
>> -
>> | transid=6 last committed=5
>> | btrfs_qgroup_rescan_worker()
>> | btrfs_qgroup_rescan_worker()
>> | |- btrfs_start_transaction()
>> | |  transid = 5
>> | |- qgroup_rescan_leaf()
>> ||- btrfs_search_slot_for_read() on extent tree
>> |   Get the only extent tree block from commit root (transid = 5).
>> |   scan it using qgroup_rescan_progress (A + 1).
>> |   found new tree block beyong A, and it's fs tree block,
>> |   account it to increase qgroup numbers.
>> -
>>
>> In above case, tree block A, and tree block A + 16K get accounted twice,
>> while qgroup rescan should stop when it already reach the last leaf,
>> other than continue using its qgroup_rescan_progress.
>>
>> Such case could happen by just looping btrfs/017 and with some
>> possibility it can hit such double qgroup accounting problem.
>>
>> Fix it by checking the path to determine if we should finish qgroup
>> rescan, other than relying on next loop to exit.
>>
>> Reported-by: Nikolay Borisov 
>> Signed-off-by: Qu Wenruo 
> 
> Is this something for 4.17-rc ?

No need to hurry.
It would be OK for next release since it's not some regression, but a
long existing bug.

Thanks,
Qu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


[PATCH] btrfs: return original error code when failing from option parsing

2018-05-09 Thread Chengguang Xu
It's no good to overwrite -ENOMEM using -EINVAL when failing
from mount option parsing, so just return original error code.

Signed-off-by: Chengguang Xu 
---
 fs/btrfs/super.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 0628092..ae6447d 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1783,7 +1783,6 @@ static int btrfs_remount(struct super_block *sb, int 
*flags, char *data)
 
ret = btrfs_parse_options(fs_info, data, *flags);
if (ret) {
-   ret = -EINVAL;
goto restore;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: qgroup: Search commit root for rescan to avoid missing extent

2018-05-09 Thread David Sterba
On Thu, May 03, 2018 at 03:20:52PM +0800, Qu Wenruo wrote:
> When doing qgroup rescan using the following script (modified from
> btrfs/017 test case), we can sometimes hit qgroup corruption.
> 
> --
> umount $dev &> /dev/null
> umount $mnt &> /dev/null
> 
> mkfs.btrfs -f -n 64k $dev
> mount $dev $mnt
> 
> extent_size=8192
> 
> xfs_io -f -d -c "pwrite 0 $extent_size" $mnt/foo > /dev/null
> btrfs subvolume snapshot $mnt $mnt/snap
> 
> xfs_io -f -c "reflink $mnt/foo" $mnt/foo-reflink > /dev/null
> xfs_io -f -c "reflink $mnt/foo" $mnt/snap/foo-reflink > /dev/null
> xfs_io -f -c "reflink $mnt/foo" $mnt/snap/foo-reflink2 > /dev/unll
> btrfs quota enable $mnt
> 
>  # -W is the new option to only wait rescan while not starting new one
> btrfs quota rescan -W $mnt
> btrfs qgroup show -prce $mnt
> 
>  # Need to patch btrfs-progs to report qgroup mismatch as error
> btrfs check $dev || _fail
> --
> 
> For fast machine, we can hit some corruption which missed accounting
> tree blocks:
> --
> qgroupid rfer excl max_rfer max_excl parent  child
>      --  -
> 0/5   8.00KiB0.00B none none --- ---
> 0/257 8.00KiB0.00B none none --- ---
> --
> 
> This is due to the fact that we're always searching commit root for
> btrfs_find_all_roots() at qgroup_rescan_leaf(), but the leaf we get is
> from current transaction, not commit root.
> 
> And if our tree blocks get modified in current transaction, we won't
> find any owner in commit root, thus causing the corruption.
> 
> Fix it by searching commit root for extent tree for
> qgroup_rescan_leaf().
> 
> Reported-by: Nikolay Borisov 
> Signed-off-by: Qu Wenruo 

Added to misc-next, thanks.

> ---
> 
> Please keep in mind that it is possible to hit another type of race
> which double accounting tree blocks:
> --
> qgroupid rfer excl max_rfer max_excl parent  child
>      --  -
> 0/5  136.00KiB 128.00KiB none none --- ---
> 0/257136.00KiB 128.00KiB none none --- ---
> --
> For this type of corruption, this patch could reduce the possibility,
> but the root cause is race between transaction commit and qgroup rescan,
> which needs to be addressed in another patch.

Both patches are now in misc-next, I saw the btrfs/017 failures
occasionally so will watch if it's all ok now.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: qgroup: Finish rescan when hit the last leaf of extent tree

2018-05-09 Thread David Sterba
On Fri, May 04, 2018 at 01:56:59PM +0800, Qu Wenruo wrote:
> Under the following case, qgroup rescan can double account cowed tree
> blocks:
> 
> In this case, extent tree only has one tree block.
> 
> -
> | transid=5 last committed=4
> | btrfs_qgroup_rescan_worker()
> | |- btrfs_start_transaction()
> | |  transid = 5
> | |- qgroup_rescan_leaf()
> ||- btrfs_search_slot_for_read() on extent tree
> |   Get the only extent tree block from commit root (transid = 4).
> |   Scan it, set qgroup_rescan_progress to the last
> |   EXTENT/META_ITEM + 1
> |   now qgroup_rescan_progress = A + 1.
> |
> | fs tree get CoWed, new tree block is at A + 16K
> | transid 5 get committed
> -
> | transid=6 last committed=5
> | btrfs_qgroup_rescan_worker()
> | btrfs_qgroup_rescan_worker()
> | |- btrfs_start_transaction()
> | |  transid = 5
> | |- qgroup_rescan_leaf()
> ||- btrfs_search_slot_for_read() on extent tree
> |   Get the only extent tree block from commit root (transid = 5).
> |   scan it using qgroup_rescan_progress (A + 1).
> |   found new tree block beyong A, and it's fs tree block,
> |   account it to increase qgroup numbers.
> -
> 
> In above case, tree block A, and tree block A + 16K get accounted twice,
> while qgroup rescan should stop when it already reach the last leaf,
> other than continue using its qgroup_rescan_progress.
> 
> Such case could happen by just looping btrfs/017 and with some
> possibility it can hit such double qgroup accounting problem.
> 
> Fix it by checking the path to determine if we should finish qgroup
> rescan, other than relying on next loop to exit.
> 
> Reported-by: Nikolay Borisov 
> Signed-off-by: Qu Wenruo 

Is this something for 4.17-rc ?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs-progs: tests: test mkfs.btrfs fails on small backing size thin provision device

2018-05-09 Thread David Sterba
On Thu, Apr 12, 2018 at 09:10:28AM +0800, Su Yue wrote:
> This tests is most similar to xfstests generic/405.
> It calls device mapper to create a thin provision device with small
> backing size and big virtual size. mkfs.btrfs should fail on such
> devices.
> 
> This test should pass after commit e805b143a4fe
> ("btrfs-progs: mkfs: return nozero value on thin provisioned device").
> 
> Signed-off-by: Su Yue 

Applied with some tweaks, thanks.

> + # wait for device to be fully settled
> + run_check $SUDO_HELPER udevadm settle
> + run_check $SUDO_HELPER dmsetup remove "$DMTHIN_VOL_NAME"
> + run_check $SUDO_HELPER dmsetup remove "$DMTHIN_POOL_NAME"
> + run_check $SUDO_HELPER dmsetup remove "$DMTHIN_META_NAME"
> + run_check $SUDO_HELPER dmsetup remove "$DMTHIN_DATA_NAME"

extrenal utilities should be checked by check_global_prereq

> +# mkfs.btrfs should fail due to the small backing device
> +run_mustfail "should fail for small backing size thin provision device" \
> +  $SUDO_HELPER "$TOP/mkfs.btrfs" -f "$DMTHIN_VOL_DEV"

the automatic trim will slow mkfs down, so -K
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs-progs: remove comments about delayed ref in backref.c

2018-05-09 Thread David Sterba
On Tue, Apr 24, 2018 at 01:52:31PM +0800, Su Yue wrote:
> There is no delayed ref in btrfs-progs, so remove related comments.
> 
> Signed-off-by: Su Yue 

1-3 applied. Please send cover letter for patch series with more than 1
patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] btrfs-progs: print-tree: Enhance btrfs_print_tree() check to avoid out-of-boundary memory access

2018-05-09 Thread David Sterba
On Mon, Apr 30, 2018 at 11:51:19AM +0800, Qu Wenruo wrote:
> >>   btrfs_print_leaf(eb);
> >>   return;
> >>   }
> >> +    /* We are crossing eb boundary, this node must be corrupted */
> >> +    if (nr > BTRFS_NODEPTRS_PER_EXTENT_BUFFER(eb))
> >> +    warning(
> >> +    "node nr_items corrupted, has %u limit %u, continue print
> >> anyway",
> >> +    nr, BTRFS_NODEPTRS_PER_EXTENT_BUFFER(eb));
> >>   printf("node %llu level %d items %d free %u generation %llu
> >> owner ",
> >>  (unsigned long long)eb->start,
> >>   btrfs_header_level(eb), nr,
> >> @@ -1386,7 +1391,11 @@ void btrfs_print_tree(struct extent_buffer *eb,
> >> int follow)
> >>   print_uuids(eb);
> >>   fflush(stdout);
> >>  
> >> -    u64 blocknr = btrfs_node_blockptr(eb, i);
> >> +    u64 blocknr;
> >> +
> >> +    if (i > BTRFS_NODEPTRS_PER_EXTENT_BUFFER(eb))
> >> +    break;
> > 
> > Should it be i >= BTRFS_NODEPTRS_PER_EXTENT_BUFFER(eb)?
> 
> BTRFS_NODEPTRS_PER_EXTENT_BUFFER() provides the maximum valid number.
> So it 's >=.
> 
> > 
> > Here BTRFS_NODEPTRS_PER_EXTENT_BUFFER() is called during iterations.
> > The judement can be calculated in advance like:
> > 
> > ptr_num = BTRFS_NODEPTRS_PER_EXTENT_BUFFER(eb);
> > ...
> > for (i = 0; i < nr && i < ptr_num  ; i++) {
> 
> Indeed looks better.

Please resend this patch with the suggested updates, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs-progs: Remove fs_info parameter from btrfs_leaf_free_space()

2018-05-09 Thread David Sterba
On Mon, Apr 30, 2018 at 11:15:43AM +0800, Qu Wenruo wrote:
> For btrfs_leaf_free_space(), to get leaf data size, we have two way to
> get it:
> 
> 1) leaf->fs_info->nodesize
> 2) leaf->len
> 
> Anyway, we could get rid of @fs_info parameter for
> btrfs_leaf_free_space().
> And here we choose method 2), as it provides extra benefit to get leaf
> free space without initializing a real fs_info.
> 
> Signed-off-by: Qu Wenruo 

1 and 2 applied, 3 has some comments, please resend.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs-progs: convert: fix support for e2fsprogs < 1.42

2018-05-09 Thread David Sterba
On Mon, Apr 30, 2018 at 10:37:06AM -0400, je...@suse.com wrote:
> From: Jeff Mahoney 
> 
> Commit 324d4c1857a (btrfs-progs: convert: Add larger device support)
> introduced new dependencies on the 64-bit API provided by e2fsprogs.
> That API was introduced in v1.42 (along with bigalloc).
> 
> This patch maps the following to their equivalents in e2fsprogs < 1.42.
> - ext2fs_get_block_bitmap_range2
> - ext2fs_inode_data_blocks2
> - ext2fs_read_ext_attr2
> 
> Since we need to detect and define EXT2_FLAG_64BITS for compatibilty
> anyway, it makes sense to use that to detect the older e2fsprogs instead
> of defining a new flag ourselves.
> 
> Signed-off-by: Jeff Mahoney 

1-3 applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] btrfs-progs: Some fix for fi usage

2018-05-09 Thread David Sterba
On Thu, Mar 29, 2018 at 05:21:29PM +0900, Misono Tomohiro wrote:
> v1->v2
>   These were sent several months ago, just rebased to current devel branch.
> 
> Patch 1 and 2 aims to fix the "fi du" to include the information of "fi df"
> even when running without root privilege.
> 
> Patch 3 is an independent cleanup.
> 
> Tomohiro Misono (3):
>   btrfs-progs: fi usage: change warning message more appropriately
>   btrfs-progs: fi usage: change to output more info without root privilege
>   btrfs-progs: fi usage: cleanup unnecessary permission error check

1-2 applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/10] Simplify function interfaces

2018-05-09 Thread David Sterba
On Wed, May 02, 2018 at 08:50:54AM +0300, Nikolay Borisov wrote:
> On 27.03.2018 10:19, Nikolay Borisov wrote:
> > A bunch of functions in lowmem mode take an 'ext_ref' parameter only to 
> > pass it
> > down the call chain where it eventually is consumed. Turns out the functions
> > which actually check the parameter are find_inode_ref and check_inode_item, 
> > the
> > are only passing it down to them. At the same time those functions can get 
> > a 
> > reference to fs_info and do the check in them at the appropriate time. 
> > 
> > This patchset cleanups the interface of various function by dropping the 
> > ext_ref argument and moving the actual query of the EXTREF feature closer 
> > to 
> > where it's being used. 
> > 
> > The final patch just changes signature of __btrfs_fs_incompat to better 
> > match 
> > the logic of the code and be identical to its kernel counterpart. All in 
> > all 
> > this series doesn't introduce any functional changes per-se.
> > 
> > Nikolay Borisov (10):
> >   btrfs-progs: Drop ext_ref parameter from find_inode_ref
> >   btrfs-progs: Drop ext_ref param from check_dir_item
> >   btrfs-progs: Drop ext_ref argument from check_inode_item
> >   btrfs-progs: Drop unused ext_ref parameter from process_one_leaf
> >   btrfs-progs: Remove ext_ref param from check_fs_first_inode
> >   btrfs-progs: Remove ext_ref param from walk_down_tree
> >   btrfs-progs: Drop ext_ref param from check_fs_first_inode
> >   btrfs-progs: Drop ext_ref arument from check_fs_root
> >   btrfs-progs: Remove ext_ref local variable from check_fs_roots_lowmem
> >   btrfs-progs: Make __btrfs_fs_incompat return bool
> > 
> >  check/mode-lowmem.c | 63 
> > +++--
> >  ctree.h |  2 +-
> >  2 files changed, 28 insertions(+), 37 deletions(-)
> 
> Ping

Applied. Please add the subsystem prefix, like "btrfs-progs: check: ...".
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs-progs: Fix DIR_ITEM checking in lowmem

2018-05-09 Thread David Sterba
On Wed, May 02, 2018 at 08:50:38AM +0300, Nikolay Borisov wrote:
> > correctly.
> > 
> > Fixes: 564901eac7a4 ("btrfs-progs: check: introduce print_dir_item_err()")
> > Signed-off-by: Nikolay Borisov 
> > ---
> 
> Ping

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix invalid memory access with journal_info

2018-05-09 Thread robbieko
From: Robbie Ko 

When send process requires memory allocation, shrinker may be triggered
due to insufficient memory.
Then evict_inode gets called when inode is freed, and this function
may need to start transaction.
However, the journal_info is already points to BTRFS_SEND_TRANS_STUB, it
passed the if condition,
and the following use yields illegal memory access.

  if (current->journal_info) {
  WARN_ON(type & TRANS_EXTWRITERS);
  h = current->journal_info;
  refcount_inc(>use_count);
  WARN_ON(refcount_read(>use_count) > 2);
  h->orig_rsv = h->block_rsv;
  h->block_rsv = NULL;
  goto got_it;
  }

Direct IO has a similar problem, journal_info will store btrfs_dio_data,
which will lead to illegal memory access.

We fixed the problem by save the journal_info and restore afterwards.

CallTrace looks like this:
 BUG: unable to handle kernel NULL pointer dereference at 0021
 IP: [] start_transaction+0x64/0x450 [btrfs]
 PGD 8fea4b067 PUD a33bea067 PMD 0
 Oops:  [#1] SMP
 CPU: 3 PID: 12681 Comm: btrfs Tainted: P C O 3.10.102 #15266
 RIP: 0010:[] start_transaction+0x64/0x450 [btrfs]
 Call Trace:
 [] ? btrfs_evict_inode+0x3d8/0x580 [btrfs]
 [] ? evict+0xa2/0x1a0
 [] ? shrink_dentry_list+0x308/0x3d0
 [] ? prune_dcache_sb+0x133/0x160
 [] ? prune_super+0xcf/0x1a0
 [] ? shrink_slab+0x11f/0x1d0
 [] ? do_try_to_free_pages+0x452/0x560
 [] ? throttle_direct_reclaim+0x74/0x240
 [] ? try_to_free_pages+0xae/0xc0
 [] ? __alloc_pages_nodemask+0x53b/0x9f0
 [] ? __do_page_cache_readahead+0xec/0x270
 [] ? ondemand_readahead+0xbb/0x220
 [] ? fill_read_buf+0x2b3/0x3a0 [btrfs]
 [] ? send_extent_data+0x10e/0x300 [btrfs]
 [] ? process_extent+0x1fb/0x1310 [btrfs]
 [] ? iterate_dir_item.isra.28+0x1b0/0x250 [btrfs]
 [] ? send_set_xattr+0xa0/0xa0 [btrfs]
 [] ? changed_cb+0xd5/0xc40 [btrfs]
 [] ? full_send_tree+0xf2/0x1a0 [btrfs]
 [] ? btrfs_ioctl_send+0xfbb/0x1040 [btrfs]
 [] ? btrfs_ioctl+0x1084/0x32a0 [btrfs]

Signed-off-by: Robbie Ko 
---
 fs/btrfs/inode.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f534701..77aec8d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5295,6 +5295,7 @@ void btrfs_evict_inode(struct inode *inode)
int steal_from_global = 0;
u64 min_size;
int ret;
+   void *journal_info = NULL;
 
trace_btrfs_inode_evict(inode);
 
@@ -5303,6 +5304,16 @@ void btrfs_evict_inode(struct inode *inode)
return;
}
 
+   /*
+* Send or Direct IO may store information in journal_info.
+* However, this function may use start_transaction and
+* start_transaction will use journal_info.
+* To avoid accessing invalid memory, we can save the journal_info
+* and restore it later.
+*/
+   journal_info = current->journal_info;
+   current->journal_info = NULL;
+
min_size = btrfs_calc_trunc_metadata_size(fs_info, 1);
 
evict_inode_truncate_pages(inode);
@@ -5462,6 +5473,7 @@ void btrfs_evict_inode(struct inode *inode)
 no_delete:
btrfs_remove_delayed_node(BTRFS_I(inode));
clear_inode(inode);
+   current->journal_info = journal_info;
 }
 
 /*
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/2] btrfs :incremental send, improve rmdir performance

2018-05-09 Thread David Sterba
On Tue, May 08, 2018 at 06:11:36PM +0800, robbieko wrote:
> From: Robbie Ko 
> 
> The following patch is to improve the btrfs send, the speed of large
> directory deletion.
> 
> 1. Optimization, avoid unnecessary allocations.
> 
> 2. Increase the speed of can_rmdir.
> 
> Robbie Ko (2):
>   btrfs: incremental send, optimization add orphan_dir_info
>   btrfs: incremental send, improve rmdir performance for large directory

Added to next, thanks. I've updated the changelogs a bit.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: incremental send, fix BUG when parent commit_root changed

2018-05-09 Thread robbieko

Hi Filipe Manana,

Ok. I will add all this information, in detail, to the changelog,
and than send a patch V2 later.

Thanks.
Robbie Ko

Filipe Manana 於 2018-05-09 16:29 寫到:

On Wed, May 9, 2018 at 2:10 AM, robbieko  wrote:

Filipe Manana 於 2018-05-08 19:12 寫到:


On Tue, May 8, 2018 at 11:30 AM, robbieko  
wrote:


Hi Filipe Manana,

Although the snapshot is readonly, when the snapshot is created,
in order to modify the last_snapshot, it will cause generation 
changes,

and it will affect the commit_root modification.



So what you to say is that the problem happens when creating a
snapshot of snapshot that is being used by send.
You don't mention that anywhere, except in the example below.

Just say that in the changelog, that the problem can happen if while
we are doing a send one of the snapshots used (parent or send) is
snapshotted, because snapshoting implies COWing the root of the 
source

subvolume/snaphot.
That is not mentioned anywhere in the changelog.



OK, I will add this information to changelog.




Now, why does this happen without your patch?
Before we use the commit roots, we call extent_buffer_get() on them 
to

make sure they are not released and we do that while holding the
semaphore commit_root_sem. Why isn't this enough to prevent
use-after-free problems?



Although we use extent_buffer_get to prevent the extent_buffer
from being released, because of the snapshoting implies COWing
the root, the commit_root original space (A) is released.
Therefore, when other process need alloc the new tree block,
it may use the space of A.
However, alloc_extent_buffer is created by the bytenr.
It will first find out if there is an existing extent_buffer through
find_extent_buffer and cause the original extent_buffer to be 
modified.

Eventually causing send process to access illegal memory.
Thus extent_buffer_get can only prevent extent_buffer from being 
released,

but it cannot prevent extent_buffer from being used by others.

btrfs_alloc_tree_block
--btrfs_reserve_extent (get A bytenr)
--btrfs_init_new_buffer (use A bytenr)
btrfs_find_create_tree_block
--alloc_extent_buffer
find_extent_buffer (get A)

So we fixed the problem by copy commit_root to avoid accessing illegal
memory


Here it should be mentioned as well how the space gets released
despite the fact that the extent buffer has references.
I.e. that __btrfs_cow_block() calls btrfs_free_tree_block().

Please add all this information, in detail, to the changelog.
The changelog you provided had no useful information at all - didn't
mention how the problem can happen nor why.

thanks






That should be explained as well in the changelog.

Those are the 2 pieces of information you need to put in the 
changelog.




Example:
#btrfs sub show snap1
Name:   snap1
UUID:   17ba3140-b79d-1649-a2e1-b238c3253a40
Parent UUID:6fe90a14-02f3-1b40-bdfc-be435060d023
Received UUID:  17ba3140-b79d-1649-a2e1-b238c3253a40
Creation time:  2018-04-27 18:04:11 +0800
Subvolume ID:   514
Generation: 4854
Gen at creation:506
Parent ID:  511
Top level ID:   511
Flags:  readonly
Snapshot(s):

#btrfs-debug-tree -t 514 /dev/md2 | more
btrfs-progs v4.0
file tree key (514 ROOT_ITEM 506)
node 259127066624 level 2 items 192 free 301 generation 4854 owner 
514

fs uuid 081b4c28-007c-430b-be7f-ead57aa71548
chunk uuid a48e7346-bd15-4eac-8957-b11a2bf9bbe2
key (256 INODE_ITEM 0) block 258707652608 (15790262) gen 505

#btrfs sub snap send_snap1 test_subvol
#btrfs-debug-tree -t 514 /dev/md2 | more
btrfs-progs v4.0
file tree key (514 ROOT_ITEM 506)
node 258444509184 level 2 items 192 free 301 generation 4881 owner 
514

fs uuid 081b4c28-007c-430b-be7f-ead57aa71548
chunk uuid a48e7346-bd15-4eac-8957-b11a2bf9bbe2
key (256 INODE_ITEM 0) block 258707652608 (15790262) gen 505

According to the above example node and generation will be updated.

This means that as long as the send or parent snapshot is taken 
during

the btrfs send, commit_root will change, and send process may access
invalid memory.

Thanks.
Robbie Ko


Filipe Manana 於 2018-05-08 17:15 寫到:



On Tue, May 8, 2018 at 9:15 AM, robbieko  
wrote:



From: Robbie Ko 

[BUG]
btrfs send BUG on parent commit_root node receive destination
is at the same volume.




I can't make sense of that sentence.



[Example]
btrfs send -e -p /mnt/snap1/ /mnt/snap2/ | btrfs receive -e 
/mnt/temp




So, is -e necessary?
So doing that, for any parent and send snapshots, triggers the bug?
(and regardless of -e)
If that was enough I'm pretty sure almost everyone would be 
reporting

send/receive as unusable.
Please clarify.



[REASON]
1. When send with the parent, the send process will get the parent
   commit_root(A), then send 

Re: [PATCH] btrfs: incremental send, fix BUG when parent commit_root changed

2018-05-09 Thread Filipe Manana
On Wed, May 9, 2018 at 2:10 AM, robbieko  wrote:
> Filipe Manana 於 2018-05-08 19:12 寫到:
>>
>> On Tue, May 8, 2018 at 11:30 AM, robbieko  wrote:
>>>
>>> Hi Filipe Manana,
>>>
>>> Although the snapshot is readonly, when the snapshot is created,
>>> in order to modify the last_snapshot, it will cause generation changes,
>>> and it will affect the commit_root modification.
>>
>>
>> So what you to say is that the problem happens when creating a
>> snapshot of snapshot that is being used by send.
>> You don't mention that anywhere, except in the example below.
>>
>> Just say that in the changelog, that the problem can happen if while
>> we are doing a send one of the snapshots used (parent or send) is
>> snapshotted, because snapshoting implies COWing the root of the source
>> subvolume/snaphot.
>> That is not mentioned anywhere in the changelog.
>
>
> OK, I will add this information to changelog.
>
>
>>
>> Now, why does this happen without your patch?
>> Before we use the commit roots, we call extent_buffer_get() on them to
>> make sure they are not released and we do that while holding the
>> semaphore commit_root_sem. Why isn't this enough to prevent
>> use-after-free problems?
>>
>
> Although we use extent_buffer_get to prevent the extent_buffer
> from being released, because of the snapshoting implies COWing
> the root, the commit_root original space (A) is released.
> Therefore, when other process need alloc the new tree block,
> it may use the space of A.
> However, alloc_extent_buffer is created by the bytenr.
> It will first find out if there is an existing extent_buffer through
> find_extent_buffer and cause the original extent_buffer to be modified.
> Eventually causing send process to access illegal memory.
> Thus extent_buffer_get can only prevent extent_buffer from being released,
> but it cannot prevent extent_buffer from being used by others.
>
> btrfs_alloc_tree_block
> --btrfs_reserve_extent (get A bytenr)
> --btrfs_init_new_buffer (use A bytenr)
> btrfs_find_create_tree_block
> --alloc_extent_buffer
> find_extent_buffer (get A)
>
> So we fixed the problem by copy commit_root to avoid accessing illegal
> memory

Here it should be mentioned as well how the space gets released
despite the fact that the extent buffer has references.
I.e. that __btrfs_cow_block() calls btrfs_free_tree_block().

Please add all this information, in detail, to the changelog.
The changelog you provided had no useful information at all - didn't
mention how the problem can happen nor why.

thanks

>
>
>
>> That should be explained as well in the changelog.
>>
>> Those are the 2 pieces of information you need to put in the changelog.
>>
>>>
>>> Example:
>>> #btrfs sub show snap1
>>> Name:   snap1
>>> UUID:   17ba3140-b79d-1649-a2e1-b238c3253a40
>>> Parent UUID:6fe90a14-02f3-1b40-bdfc-be435060d023
>>> Received UUID:  17ba3140-b79d-1649-a2e1-b238c3253a40
>>> Creation time:  2018-04-27 18:04:11 +0800
>>> Subvolume ID:   514
>>> Generation: 4854
>>> Gen at creation:506
>>> Parent ID:  511
>>> Top level ID:   511
>>> Flags:  readonly
>>> Snapshot(s):
>>>
>>> #btrfs-debug-tree -t 514 /dev/md2 | more
>>> btrfs-progs v4.0
>>> file tree key (514 ROOT_ITEM 506)
>>> node 259127066624 level 2 items 192 free 301 generation 4854 owner 514
>>> fs uuid 081b4c28-007c-430b-be7f-ead57aa71548
>>> chunk uuid a48e7346-bd15-4eac-8957-b11a2bf9bbe2
>>> key (256 INODE_ITEM 0) block 258707652608 (15790262) gen 505
>>>
>>> #btrfs sub snap send_snap1 test_subvol
>>> #btrfs-debug-tree -t 514 /dev/md2 | more
>>> btrfs-progs v4.0
>>> file tree key (514 ROOT_ITEM 506)
>>> node 258444509184 level 2 items 192 free 301 generation 4881 owner 514
>>> fs uuid 081b4c28-007c-430b-be7f-ead57aa71548
>>> chunk uuid a48e7346-bd15-4eac-8957-b11a2bf9bbe2
>>> key (256 INODE_ITEM 0) block 258707652608 (15790262) gen 505
>>>
>>> According to the above example node and generation will be updated.
>>>
>>> This means that as long as the send or parent snapshot is taken during
>>> the btrfs send, commit_root will change, and send process may access
>>> invalid memory.
>>>
>>> Thanks.
>>> Robbie Ko
>>>
>>>
>>> Filipe Manana 於 2018-05-08 17:15 寫到:


 On Tue, May 8, 2018 at 9:15 AM, robbieko  wrote:

> From: Robbie Ko 
>
> [BUG]
> btrfs send BUG on parent commit_root node receive destination
> is at the same volume.



 I can't make sense of that sentence.

>
> [Example]
> btrfs send -e -p /mnt/snap1/ /mnt/snap2/ | btrfs receive -e /mnt/temp



 So, is -e necessary?
 So doing that, for any parent and send snapshots, triggers the bug?
 (and regardless of -e)
 If that was enough I'm 

Re: [RFC][PATCH 0/76] vfs: 'views' for filesystems with more than one root

2018-05-09 Thread Dave Chinner
On Tue, May 08, 2018 at 10:06:44PM -0400, Jeff Mahoney wrote:
> On 5/8/18 7:38 PM, Dave Chinner wrote:
> > On Tue, May 08, 2018 at 11:03:20AM -0700, Mark Fasheh wrote:
> >> Hi,
> >>
> >> The VFS's super_block covers a variety of filesystem functionality. In
> >> particular we have a single structure representing both I/O and
> >> namespace domains.
> >>
> >> There are requirements to de-couple this functionality. For example,
> >> filesystems with more than one root (such as btrfs subvolumes) can
> >> have multiple inode namespaces. This starts to confuse userspace when
> >> it notices multiple inodes with the same inode/device tuple on a
> >> filesystem.
> > 
> > Devil's Advocate - I'm not looking at the code, I'm commenting on
> > architectural issues I see here.
> > 
> > The XFS subvolume work I've been doing explicitly uses a superblock
> > per subvolume. That's because subvolumes are designed to be
> > completely independent of the backing storage - they know nothing
> > about the underlying storage except to share a BDI for writeback
> > purposes and write to whatever block device the remapping layer
> > gives them at IO time.  Hence XFS subvolumes have (at this point)
> > their own unique s_dev, on-disk format configuration, journal, space
> > accounting, etc. i.e. They are fully independent filesystems in
> > their own right, and as such we do not have multiple inode
> > namespaces per superblock.
> 
> That's a fundamental difference between how your XFS subvolumes work and
> how btrfs subvolumes do.

Yup, you've just proved my point: this is not a "subvolume problem";
but rather a "multiple namespace per root" problem.

> There is no independence among btrfs
> subvolumes.  When a snapshot is created, it has a few new blocks but
> otherwise shares the metadata of the source subvolume.  The metadata
> trees are shared across all of the subvolumes and there are several
> internal trees used to manage all of it.

I don't need btrfs 101 stuff explained to me. :/

> a single transaction engine.  There are housekeeping and maintenance
> tasks that operate across the entire file system internally.  I
> understand that there are several problems you need to solve at the VFS
> layer to get your version of subvolumes up and running, but trying to
> shoehorn one into the other is bound to fail.

Actually, the VFS has provided everything I need for XFS subvolumes
so far without requiring any sort of modifications. That's the
perspective I'm approaching this from - if the VFS can do what we
need for XFS subvolumes, as well as overlay (which are effectively
VFS-based COW subvolumes), then lets see if we can make that work
for btrfs too.

> > So this doesn't sound like a "subvolume problem" - it's a "how do we
> > sanely support multiple independent namespaces per superblock"
> > problem. AFAICT, this same problem exists with bind mounts and mount
> > namespaces - they are effectively multiple roots on a single
> > superblock, but it's done at the vfsmount level and so the
> > superblock knows nothing about them.
> 
> In this case, you're talking about the user-visible file system
> hierarchy namespace that has no bearing on the underlying file system
> outside of per-mount flags.

Except that it tracks and provides infrastructure that allows user
visible  "multiple namespace per root" constructs. Subvolumes - as a
user visible namespace construct - are little different to bind
mounts in behaviour and functionality. 

How the underlying filesystem implements subvolumes is really up to
the filesystem, but we should be trying to implement a clean model
for "multiple namespaces on a single root" at the VFS so we have
consistent behaviour across all filesystems that implement similar
functionality.

FWIW, bind mounts and overlay also have similar inode number
namespace problems to what Mark describes for btrfs subvolumes.
e.g. overlay recently introduce the "xino" mount option to separate
the user presented inode number namespace for overlay inode from the
underlying parent filesystem inodes. How is that different to btrfs
subvolumes needing to present different inode number namespaces from
the underlying parent?

This sort of "namespace shifting" is needed for several different
pieces of information the kernel reports to userspace. The VFS
replacement for shiftfs is an example of this. So is inode number
remapping. I'm sure there's more.

My point is that if we are talking about infrastructure to remap
what userspace sees from different mountpoint views into a
filesystem, then it should be done above the filesystem layers in
the VFS so all filesystems behave the same way. And in this case,
the vfsmount maps exactly to the "fs_view" that Mark has proposed we
add to the superblock.

> It makes sense for that to be above the
> superblock because the file system doesn't care about them.  We're
> interested in the inode namespace, which for every other file system can
> be described using an inode and a superblock