[PATCH 3/3] btrfs-progs: fi usage: cleanup unneccessary permission error check

2017-11-29 Thread Misono, Tomohiro
Since BTRFS_IOC_FS_INFO does not require root privilege, there is no
need to check EPERM error.

Signed-off-by: Tomohiro Misono 
---
 cmds-fi-usage.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c
index 7bbc9896..b0721560 100644
--- a/cmds-fi-usage.c
+++ b/cmds-fi-usage.c
@@ -522,8 +522,6 @@ static int load_device_info(int fd, struct device_info 
**device_info_ptr,
 
ret = ioctl(fd, BTRFS_IOC_FS_INFO, _args);
if (ret < 0) {
-   if (errno == EPERM)
-   return -errno;
error("cannot get filesystem info: %s",
strerror(errno));
return 1;
@@ -597,11 +595,6 @@ int load_chunk_and_device_info(int fd, struct chunk_info 
**chunkinfo,
}
 
ret = load_device_info(fd, devinfo, devcount);
-   if (ret == -EPERM) {
-   warning(
-   "cannot get filesystem info from ioctl(FS_INFO), run as root");
-   ret = 0;
-   }
 
return ret;
 }
-- 
2.13.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs-progs: fi usage: change warning message more appropriately

2017-11-29 Thread Misono, Tomohiro
"fi usage" shows the warning "RAID5/6 numbers will be incorrect" when
runnning without root privilege even if raid5/6 is not used.  What
happens is it cannot get the per device profile usage info, so change
the message more appropriately.

Signed-off-by: Tomohiro Misono 
---
 cmds-fi-usage.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c
index 6c846c15..299c2dae 100644
--- a/cmds-fi-usage.c
+++ b/cmds-fi-usage.c
@@ -590,7 +590,8 @@ int load_chunk_and_device_info(int fd, struct chunk_info 
**chunkinfo,
ret = load_chunk_info(fd, chunkinfo, chunkcount);
if (ret == -EPERM) {
warning(
-"cannot read detailed chunk info, RAID5/6 numbers will be incorrect, run as 
root");
+   "cannot read detailed chunk info. " \
+   "per device profile usage will not be shown, run as root");
} else if (ret) {
return ret;
}
-- 
2.13.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Some fix for fi usage

2017-11-29 Thread Misono, Tomohiro
Patch 1 and 2 aims to fix the "fi du" to include the information of "fi df"
even when runnning without root previlege.

Patch 3 is a independent cleanup.

Tomohiro Misono (3):
  btrfs-progs: fi usage: change warning message more appropriately
  btrfs-progs: fi usage: change to output more info without root privilege
  btrfs-progs: fi usage: cleanup unneccessary permission error check

 cmds-fi-usage.c | 21 +++--
 1 file changed, 7 insertions(+), 14 deletions(-)

-- 
2.13.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] btrfs-progs: Add location check when process share_data_ref item

2017-11-29 Thread Qu Wenruo


On 2017年11月29日 03:02, David Sterba wrote:
> On Fri, Nov 24, 2017 at 06:41:28PM +0800, Gu Jinxiang wrote:
>> The following test failed becasuse share_data_ref be added into
>> extent_cache when deal with root tree node.
> 
> The whole function run_next_block would need to be revisited with
> respect to error handling and sanity checks. Adding them only when a
> fuzzed image hits the particular code path is not IMHO the best
> approach.

As suggested before, backporting tree-checker from kernel may be a good
solution for all later possible fuzzed problem.
(And can make btrfs-progs become a good testbed for tree-checker related
patches before merging into kernel)

For this particular case, planned key->type check against root should
handle it quite well.

Thanks,
Qu

> 
> If there's some fuzzed test case, we should try to find all similar
> missing checks and fix them before moving to another type. Addressing
> only the failed tests gives a false sense of fixing, there are usally
> more similar bugs.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v3 2/3] btrfs-progs: fi defrag: do not exit if defrag range ioctl is unsupported

2017-11-29 Thread Qu Wenruo


On 2017年11月30日 14:37, Qu Wenruo wrote:
> 
> 
> On 2017年11月29日 10:12, Su Yue wrote:
>> If ioctl of defrag range is unsupported, defrag will exit immediately.
>>
>> Since caller can handle the error, let cmd_filesystem_defrag()
>> close file, break the loop and return error instead of calling exit(1).
>>
>> Suggested-by: David Sterba 
>> Signed-off-by: Su Yue 
> 
> Reviewed-by: Qu Wenruo 
> 
> Thanks,
> Qu

Please ignore my tag.

The best solution is mentioned in the 1st patch.

Thanks,
Qu

>> ---
>> Changelog:
>> v2:  Separate the patch from commit 6e991b9161fa ("btrfs-progs: fi
>>  defrag: clean up duplicate code if find errors").
>> v3:  Call close_file_or_dir() before breaking the loop.
>> ---
>>  cmds-filesystem.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/cmds-filesystem.c b/cmds-filesystem.c
>> index 17d399d58adf..232d4e88e0c0 100644
>> --- a/cmds-filesystem.c
>> +++ b/cmds-filesystem.c
>> @@ -1049,8 +1049,10 @@ static int cmd_filesystem_defrag(int argc, char 
>> **argv)
>>  if (recursive && S_ISDIR(st.st_mode)) {
>>  ret = nftw(argv[i], defrag_callback, 10,
>>  FTW_MOUNT | FTW_PHYS);
>> -if (ret == ENOTTY)
>> -exit(1);
>> +if (ret == ENOTTY) {
>> +close_file_or_dir(fd, dirstream);
>> +break;
>> +}
>>  /* errors are handled in the callback */
>>  ret = 0;
>>  } else {
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 1/3] btrfs-progs: fi defrag: clean up duplicate code if find errors

2017-11-29 Thread Qu Wenruo


On 2017年11月28日 17:14, Su Yue wrote:
> In function cmd_filesystem_defrag(), lines of code for error handling
> are duplicate and hard to expand in further.
> 
> Create a jump label for errors.
> 
> Signed-off-by: Su Yue 
> ---
> Changelog:
> v2:   Record -errno to return value if open() and fstat() failed.
>   Move change "do no exit if defrag range ioctl is unsupported"
>   to another patch.
>   Suggested by David.
> ---
>  cmds-filesystem.c | 44 
>  1 file changed, 24 insertions(+), 20 deletions(-)
> 
> diff --git a/cmds-filesystem.c b/cmds-filesystem.c
> index 7728430f16a1..17d399d58adf 100644
> --- a/cmds-filesystem.c
> +++ b/cmds-filesystem.c
> @@ -1029,23 +1029,22 @@ static int cmd_filesystem_defrag(int argc, char 
> **argv)
>   if (fd < 0) {
>   error("cannot open %s: %s", argv[i],
>   strerror(errno));
> - defrag_global_errors++;
> - close_file_or_dir(fd, dirstream);
> - continue;
> + ret = -errno;
> + goto next;
>   }
> - if (fstat(fd, )) {
> +
> + ret = fstat(fd, );
> + if (ret) {
>   error("failed to stat %s: %s",
>   argv[i], strerror(errno));
> - defrag_global_errors++;
> - close_file_or_dir(fd, dirstream);
> - continue;
> + ret = -errno;
> + goto next;
>   }
>   if (!(S_ISDIR(st.st_mode) || S_ISREG(st.st_mode))) {
>   error("%s is not a directory or a regular file",
>   argv[i]);
> - defrag_global_errors++;
> - close_file_or_dir(fd, dirstream);
> - continue;
> + ret = -EINVAL;
> + goto next;

Here, unlike the ftw callback, which ignore non-regular file, here we
count them as error.

So if there is some especial file in dir (use "example_dir" as an
example), then
# btrfs fi defrag -r example_dir
will return no error.

While
# btrfs fi defrag -r example_dir/*
Will return error.

IIRC such inconsistent behavior is a bigger problem, and reusing the
defrag_callback() will not only makes the behavior consistent, but also
simplify the loop.

Thanks,
Qu


>   }
>   if (recursive && S_ISDIR(st.st_mode)) {
>   ret = nftw(argv[i], defrag_callback, 10,
> @@ -1060,20 +1059,25 @@ static int cmd_filesystem_defrag(int argc, char 
> **argv)
>   ret = ioctl(fd, BTRFS_IOC_DEFRAG_RANGE,
>   _global_range);
>   defrag_err = errno;
> - }
> - close_file_or_dir(fd, dirstream);
> - if (ret && defrag_err == ENOTTY) {
> - error(
> + if (ret && defrag_err == ENOTTY) {
> + error(
>  "defrag range ioctl not supported in this kernel version, 2.6.33 and newer 
> is required");
> - defrag_global_errors++;
> - break;
> + defrag_global_errors++;
> + close_file_or_dir(fd, dirstream);
> + break;
> + }
> + if (ret) {
> + error("defrag failed on %s: %s", argv[i],
> +   strerror(defrag_err));
> + goto next;
> + }
>   }
> - if (ret) {
> - error("defrag failed on %s: %s", argv[i],
> - strerror(defrag_err));
> +next:
> + if (ret)
>   defrag_global_errors++;
> - }
> + close_file_or_dir(fd, dirstream);
>   }
> +
>   if (defrag_global_errors)
>   fprintf(stderr, "total %d failures\n", defrag_global_errors);
>  
> 



signature.asc
Description: OpenPGP digital signature


Re: A partially failing disk in raid0 needs replacement

2017-11-29 Thread Klaus Agnoletti
Hi Chris
>
> I don't see how you get an IO error in user space without the kernel
> reporting the source of that IO error, whatever it is.
>
I totally agree, so I just retried the deletion. The only thing
related I could see in /var/log/messages is this:
Nov 30 07:29:57 box kernel: [368193.019160] BTRFS info (device sdb):
found 207 extents

Shortly after this, btrfs gives me the I/O error. I am guessing that
the kernel with log to this file, and it did before I changed the disk
- but not anymore, it seems.

> If Btrfs detects corruption of data extents, it will tell you the
> exact path to file names affected, as kernel messages. If you aren't
> getting that, then it's some other problem.

So it smells like some other problem, I guess. I have no idea what, so
my best plan is to move forward on my plans to backup date, rebuild fs
and restore everything.. Does anyone have any better suggestion? :)

Thanks again for everything,

/klaus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] btrfs-progs: fi defrag: do not exit if defrag range ioctl is unsupported

2017-11-29 Thread Qu Wenruo


On 2017年11月29日 10:12, Su Yue wrote:
> If ioctl of defrag range is unsupported, defrag will exit immediately.
> 
> Since caller can handle the error, let cmd_filesystem_defrag()
> close file, break the loop and return error instead of calling exit(1).
> 
> Suggested-by: David Sterba 
> Signed-off-by: Su Yue 

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> ---
> Changelog:
> v2:   Separate the patch from commit 6e991b9161fa ("btrfs-progs: fi
>   defrag: clean up duplicate code if find errors").
> v3:   Call close_file_or_dir() before breaking the loop.
> ---
>  cmds-filesystem.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/cmds-filesystem.c b/cmds-filesystem.c
> index 17d399d58adf..232d4e88e0c0 100644
> --- a/cmds-filesystem.c
> +++ b/cmds-filesystem.c
> @@ -1049,8 +1049,10 @@ static int cmd_filesystem_defrag(int argc, char **argv)
>   if (recursive && S_ISDIR(st.st_mode)) {
>   ret = nftw(argv[i], defrag_callback, 10,
>   FTW_MOUNT | FTW_PHYS);
> - if (ret == ENOTTY)
> - exit(1);
> + if (ret == ENOTTY) {
> + close_file_or_dir(fd, dirstream);
> + break;
> + }
>   /* errors are handled in the callback */
>   ret = 0;
>   } else {
> 



signature.asc
Description: OpenPGP digital signature


Re: A partially failing disk in raid0 needs replacement

2017-11-29 Thread Chris Murphy
On Wed, Nov 29, 2017 at 10:28 PM, Klaus Agnoletti  wrote:
> Hi Chris,
>
>>
>> I assume when you get that, either when deleting the device or
>> scrubbing, that you also see the device unrecoverable read error in
>> dmesg, as originally reported. If the drive must have the information
>> on that lost sector, and you can't increase SCT ERC time (as well as
>> the kernel SCSI command timer), or increasing it doesn't help, then
>> that data is lost. It's plausible btrfs check repair is smart enough
>> to be able to reconstruct this missing data, but I suspect it isn't
>> yet that capable.
>
> That's the 'fun' part: I don't get any kernel messages after changing
> the disk, hence my assumption that it's a relatively small, logical
> error somewhere on the fs.

I don't see how you get an IO error in user space without the kernel
reporting the source of that IO error, whatever it is.


>
>>
>> So my recommendation is to prepare to lose the file system. Which
>> means backing up whatever you can while it's still working, such as it
>> is.
>
> Yeah, luckily I can temporarily borrow a couple of 3TB disks to host
> the data while I rebuild the fs. So that will probably be what I do.
>
> It there any way I can remove just files with bad data from the disk
> error so I just get those out of the way?

If Btrfs detects corruption of data extents, it will tell you the
exact path to file names affected, as kernel messages. If you aren't
getting that, then it's some other problem.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A partially failing disk in raid0 needs replacement

2017-11-29 Thread Klaus Agnoletti
Hi Chris,

>
> I assume when you get that, either when deleting the device or
> scrubbing, that you also see the device unrecoverable read error in
> dmesg, as originally reported. If the drive must have the information
> on that lost sector, and you can't increase SCT ERC time (as well as
> the kernel SCSI command timer), or increasing it doesn't help, then
> that data is lost. It's plausible btrfs check repair is smart enough
> to be able to reconstruct this missing data, but I suspect it isn't
> yet that capable.

That's the 'fun' part: I don't get any kernel messages after changing
the disk, hence my assumption that it's a relatively small, logical
error somewhere on the fs.

>
> So my recommendation is to prepare to lose the file system. Which
> means backing up whatever you can while it's still working, such as it
> is.

Yeah, luckily I can temporarily borrow a couple of 3TB disks to host
the data while I rebuild the fs. So that will probably be what I do.

It there any way I can remove just files with bad data from the disk
error so I just get those out of the way?

Thanks

/klaus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/9] btrfs-progs: mkfs: Fix regression preventing --rootdir to create file

2017-11-29 Thread Misono, Tomohiro
On 2017/11/29 18:16, Qu Wenruo wrote:
> Commit 460e93f25754 ("btrfs-progs: mkfs: check the status of file at mkfs")
> will try to check the file state before creating fs on it.
> 
> The check is mostly fine for normal mkfs case, while for --rootdir
> option, it's allowed to create new file if destination file doesn't
> exist.
> 
> Fix it by allowing non-existing file if --rootdir is specified.
> 
> Fixes: 460e93f25754 ("btrfs-progs: mkfs: check the status of file at mkfs")
> Signed-off-by: Qu Wenruo 

Sorry, I didn't notice that. Thanks.
Reviewed-by: Tomohiro Misono 

> ---
>  mkfs/main.c |  6 --
>  utils.c | 15 +++
>  utils.h |  1 +
>  3 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/mkfs/main.c b/mkfs/main.c
> index 64cf03289188..a34b73bfb845 100644
> --- a/mkfs/main.c
> +++ b/mkfs/main.c
> @@ -733,7 +733,7 @@ int main(int argc, char **argv)
>   u32 stripesize = 4096;
>   int zero_end = 1;
>   int fd = -1;
> - int ret;
> + int ret = 0;
>   int close_ret;
>   int i;
>   int mixed = 0;
> @@ -913,7 +913,9 @@ int main(int argc, char **argv)
>  
>   while (dev_cnt-- > 0) {
>   file = argv[optind++];
> - if (is_block_device(file) == 1)
> + if (source_dir_set && is_path_exist(file) == 0)
> + ret = 0;
> + else if (is_block_device(file) == 1)
>   ret = test_dev_for_mkfs(file, force_overwrite);
>   else
>   ret = test_status_for_mkfs(file, force_overwrite);
> diff --git a/utils.c b/utils.c
> index 524f463d3140..22c137514592 100644
> --- a/utils.c
> +++ b/utils.c
> @@ -456,6 +456,21 @@ static int is_reg_file(const char *path)
>   return S_ISREG(statbuf.st_mode);
>  }
>  
> +int is_path_exist(const char *path)
> +{
> + struct stat statbuf;
> + int ret;
> +
> + ret = stat(path, );
> + if (ret < 0) {
> + if (errno == ENOENT)
> + return 0;
> + else
> + return -errno;
> + }
> + return 1;
> +}
> +
>  /*
>   * This function checks if the given input parameter is
>   * an uuid or a path
> diff --git a/utils.h b/utils.h
> index a82d46f6d7cc..860c25d3cdba 100644
> --- a/utils.h
> +++ b/utils.h
> @@ -122,6 +122,7 @@ int set_label(const char *btrfs_dev, const char *label);
>  char *__strncpy_null(char *dest, const char *src, size_t n);
>  int is_block_device(const char *file);
>  int is_mount_point(const char *file);
> +int is_path_exist(const char *file);
>  int check_arg_type(const char *input);
>  int open_path_or_dev_mnt(const char *path, DIR **dirstream, int verbose);
>  int btrfs_open(const char *path, DIR **dirstream, int verbose, int dir_only);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [btrfs_mount] general protection fault: 0000 [#1] SMP

2017-11-29 Thread Fengguang Wu

On Wed, Nov 29, 2017 at 06:44:43PM -0800, Linus Torvalds wrote:

On Wed, Nov 29, 2017 at 6:38 PM, Nick Terrell  wrote:


The stack trace looks like the bug fixed by

Qu Wenruo:
btrfs: Fix wild memory access in compression level parser [1]

That fix looks to be included in the pull request for 4.15-rc2 [2].


Which got merged earlier today. So maybe instead of a bisection, just
test the current git tree?


Sure. The 1 hour tests finished on your latest merge and the error
no long shows up there.

Regards,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [btrfs_mount] general protection fault: 0000 [#1] SMP

2017-11-29 Thread Linus Torvalds
On Wed, Nov 29, 2017 at 6:38 PM, Nick Terrell  wrote:
>
> The stack trace looks like the bug fixed by
>
> Qu Wenruo:
> btrfs: Fix wild memory access in compression level parser [1]
>
> That fix looks to be included in the pull request for 4.15-rc2 [2].

Which got merged earlier today. So maybe instead of a bisection, just
test the current git tree?

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [btrfs_mount] general protection fault: 0000 [#1] SMP

2017-11-29 Thread Nick Terrell

> On Nov 29, 2017, at 6:21 PM, Fengguang Wu  wrote:
> 
> Hello,
> 
> FYI this happens in mainline kernel 4.15.0-rc1.
> It looks like a new regression. Bisect is in progress.
> 
> It occurs in 11 out of 11 xfstests run.
> 
> [ 1456.361614]
> [ 1456.918942] BTRFS info (device vdb): disk space caching is enabled
> [ 1456.920760] BTRFS info (device vdb): has skinny extents
> [ 1457.111319] run fstests btrfs/094 at 2017-11-28 09:46:30
> [ 1457.702513] BTRFS: device fsid 5c26b547-822d-4338-be92-b2ec5f6b159d devid 
> 1 transid 5 /dev/vdb
> [ 1457.920372] general protection fault:  [#1] SMP
> [ 1457.921693] Modules linked in: dm_flakey btrfs xor zstd_decompress 
> zstd_compress xxhash raid6_pq dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 
> dns_resolver sr_mod cdrom sg ata_generic pata_acpi ppdev snd_pcm snd_timer 
> snd soundcore pcspkr serio_raw ata_piix i2c_piix4 libata parport_pc floppy 
> parport ip_tables
> [ 1457.927395] CPU: 3 PID: 19563 Comm: mount Not tainted 4.15.0-rc1 #1
> [ 1457.928804] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.10.2-1 04/01/2014
> [ 1457.930815] task: 880078f8ca00 task.stack: c90004828000
> [ 1457.934242] RIP: 0010:btrfs_compress_str2level+0x17/0x50 [btrfs]

The stack trace looks like the bug fixed by

Qu Wenruo:
btrfs: Fix wild memory access in compression level parser [1]

That fix looks to be included in the pull request for 4.15-rc2 [2].

[1] lkml.kernel.org/r/20171106024319.32584-1-...@suse.com
[2] lkml.kernel.org/r/cover.1511980478.git.dste...@suse.com

> [ 1457.936653] RSP: 0018:c9000482baa8 EFLAGS: 00010202
> [ 1457.938909] RAX: 0001 RBX: a057967f RCX: 
> 0004
> [ 1457.942574] RDX: 192000905763 RSI: 192000905763 RDI: 
> a057bc24
> [ 1457.946221] RBP: c9000482bb40 R08: 0063 R09: 
> 88007e8257a8
> [ 1457.948982] R10: 002c R11: 81a6a340 R12: 
> 8800750b
> [ 1457.952494] R13: 88007e8257a0 R14:  R15: 
> 1000
> [ 1457.956106] FS:  7fb80717d840() GS:88013fd8() 
> knlGS:
> [ 1457.960103] CS:  0010 DS:  ES:  CR0: 80050033
> [ 1457.962466] CR2: 010b6f88 CR3: 750ce000 CR4: 
> 06e0
> [ 1457.966100] Call Trace:
> [ 1457.966851]  btrfs_parse_options+0x96f/0xf20 [btrfs]
> [ 1457.970107]  ? open_ctree+0x1041/0x2410 [btrfs]
> [ 1457.971638]  open_ctree+0x1041/0x2410 [btrfs]
> [ 1457.973780]  btrfs_mount+0xcfa/0xe40 [btrfs]
> [ 1457.975889]  ? pcpu_alloc_area+0xc0/0x130:
>   pcpu_alloc_area at 
> mm/percpu.c:1010
> [ 1457.979028]  ? pcpu_next_unpop+0x37/0x50:
>   pcpu_next_unpop at 
> mm/percpu.c:264
> [ 1457.981051]  ? pcpu_alloc+0x2e1/0x650:
>   pcpu_alloc at mm/percpu.c:1472 
> (discriminator 1)
> [ 1457.983074]  mount_fs+0x36/0x140:
>   mount_fs at fs/super.c:1220
> [ 1457.983941]  vfs_kern_mount+0x62/0x130:
>   vfs_kern_mount at 
> fs/namespace.c:1038
> [ 1457.985951]  btrfs_mount+0x183/0xe40 [btrfs]
> [ 1457.989441]  ? pcpu_alloc_area+0xc0/0x130:
>   pcpu_alloc_area at 
> mm/percpu.c:1010
> [ 1457.991495]  ? pcpu_next_unpop+0x37/0x50:
>   pcpu_next_unpop at 
> mm/percpu.c:264
> [ 1457.993524]  ? pcpu_alloc+0x2e1/0x650:
>   pcpu_alloc at mm/percpu.c:1472 
> (discriminator 1)
> [ 1457.995502]  mount_fs+0x36/0x140:
>   mount_fs at fs/super.c:1220
> [ 1457.997415]  vfs_kern_mount+0x62/0x130:
>   vfs_kern_mount at 
> fs/namespace.c:1038
> [ 1457.999537]  do_mount+0x1d5/0xc90:
>   do_new_mount at 
> fs/namespace.c:2513
>(inlined by) do_mount at 
> fs/namespace.c:2841
> [ 1458.001440]  ? kmem_cache_alloc_trace+0x16d/0x1c0:
>   slab_pre_alloc_hook at 
> mm/slab.h:419
>(inlined by) slab_alloc_node 
> at mm/slub.c:2651
>(inlined by) slab_alloc at 
> mm/slub.c:2733
>(inlined by) 
> kmem_cache_alloc_trace at mm/slub.c:2750
> [ 1458.003603]  ? copy_mount_options+0x28/0x240:
>   copy_mount_options at 
> fs/namespace.c:2722
> [ 1458.005698]  SyS_mount+0x7e/0xd0
> [ 1458.007597]  entry_SYSCALL_64_fastpath+0x1a/0x7d:
>   entry_SYSCALL_64_fastpath at 
> arch/x86/entry/entry_64.S:210
> [ 1458.009808] RIP: 0033:0x7fb80683c98a
> [ 1458.011835] RSP: 

[PATCH v2.2 2/2] btrfs: Ensure btrfs_trim_fs can trim the whole fs

2017-11-29 Thread Qu Wenruo
[BUG]
fstrim on some btrfs only trims the unallocated space, not trimming any
space in existing block groups.

[CAUSE]
Before fstrim_range passed to btrfs_trim_fs(), it get truncated to
range [0, super->total_bytes).
So later btrfs_trim_fs() will only be able to trim block groups in range
[0, super->total_bytes).

While for btrfs, any bytenr aligned to sector size is valid, since btrfs use
its logical address space, there is nothing limiting the location where
we put block groups.

For btrfs with routine balance, it's quite easy to relocate all
block groups and bytenr of block groups will start beyond super->total_bytes.

In that case, btrfs will not trim existing block groups.

[FIX]
Just remove the truncation in btrfs_ioctl_fitrim(), so btrfs_trim_fs()
can get the unmodified range, which is normally set to [0, U64_MAX].

Reported-by: Chris Murphy 
Fixes: f4c697e6406d ("btrfs: return EINVAL if start > total_bytes in fitrim 
ioctl")
Cc:  # v4.0+
Signed-off-by: Qu Wenruo 
---
changelog:
v2:
  Locate the root cause in btrfs_ioctl_fitrim(), remove the truncation
  so we can still allow user to trim custom range.
v2.1:
  Include the missing change in btrfs_trim_fs() and update the commit
  message to reflect this.
v2.2:
  Remove unused variable.
---
 fs/btrfs/extent-tree.c | 10 +-
 fs/btrfs/ioctl.c   | 11 +++
 2 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f830aa91ac3d..b2a7532d6c07 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10967,19 +10967,11 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, 
struct fstrim_range *range)
u64 start;
u64 end;
u64 trimmed = 0;
-   u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
int bg_ret = 0;
int dev_ret = 0;
int ret = 0;
 
-   /*
-* try to trim all FS space, our block group may start from non-zero.
-*/
-   if (range->len == total_bytes)
-   cache = btrfs_lookup_first_block_group(fs_info, range->start);
-   else
-   cache = btrfs_lookup_block_group(fs_info, range->start);
-
+   cache = btrfs_lookup_first_block_group(fs_info, range->start);
for (; cache; cache = next_block_group(fs_info, cache)) {
if (cache->key.objectid >= (range->start + range->len)) {
btrfs_put_block_group(cache);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index fd172a93d11a..017fda31400d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -365,7 +365,6 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, 
void __user *arg)
struct fstrim_range range;
u64 minlen = ULLONG_MAX;
u64 num_devices = 0;
-   u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
int ret;
 
if (!capable(CAP_SYS_ADMIN))
@@ -389,11 +388,15 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, 
void __user *arg)
return -EOPNOTSUPP;
if (copy_from_user(, arg, sizeof(range)))
return -EFAULT;
-   if (range.start > total_bytes ||
-   range.len < fs_info->sb->s_blocksize)
+
+   /*
+* NOTE: Don't truncate the range using super->total_bytes.
+* Bytenr of btrfs block group is in btrfs logical address space,
+* which can be any sector size aligned bytenr in [0, U64_MAX].
+*/
+   if (range.len < fs_info->sb->s_blocksize)
return -EINVAL;
 
-   range.len = min(range.len, total_bytes - range.start);
range.minlen = max(range.minlen, minlen);
ret = btrfs_trim_fs(fs_info, );
if (ret < 0)
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] Btrfs fixes for 4.15-rc2

2017-11-29 Thread Linus Torvalds
On Wed, Nov 29, 2017 at 11:28 AM, David Sterba  wrote:
>
> With signed tag: for-4.15-rc2-tag
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-4.15-rc2

Oh, please actually ask me to pull the signed tag (exact same
pull-request, just point git request-pull at the tag), because now
what happened was that first I just pulled that branch you mentioned,
and only noticed that "With signed tag:" notice after I had already
pulled and was filling in the merge message.

Anyway, I redid the pull with the proper signed tag, but it was just
annoying extra work.

And I wonder how many times I _hadn't_ noticed that, because I didn't
have your key in my keyring either. Or maybe I caught it the first
time.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A partially failing disk in raid0 needs replacement

2017-11-29 Thread Chris Murphy
On Wed, Nov 29, 2017 at 6:33 AM, Klaus Agnoletti  wrote:
> Hi list
>
> Can anyone give me any hints here? If not, my plan right now is to
> start updating the server to latest debian stable (it's currently
> running Jessie), to get access to a newer btrfs driver and tools,
> hoping that decreases the risk of something screwing up, and then run
> btrfs check --repair on the unmounted fs and wish for the best.
>

New kernel and tools won't fix this:

ERROR: error removing the device '/dev/sdd' - Input/output error

I assume when you get that, either when deleting the device or
scrubbing, that you also see the device unrecoverable read error in
dmesg, as originally reported. If the drive must have the information
on that lost sector, and you can't increase SCT ERC time (as well as
the kernel SCSI command timer), or increasing it doesn't help, then
that data is lost. It's plausible btrfs check repair is smart enough
to be able to reconstruct this missing data, but I suspect it isn't
yet that capable.

So my recommendation is to prepare to lose the file system. Which
means backing up whatever you can while it's still working, such as it
is.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2.1 2/2] btrfs: Ensure btrfs_trim_fs can trim the whole fs

2017-11-29 Thread Holger Hoffstätte
On Tue, 28 Nov 2017 15:20:39 +0800, Qu Wenruo wrote:

> [BUG]
> fstrim on some btrfs only trims the unallocated space, not trimming any
> space in existing block groups.
> 
> [CAUSE]
> Before fstrim_range passed to btrfs_trim_fs(), it get truncated to
> range [0, super->total_bytes).
> So later btrfs_trim_fs() will only be able to trim block groups in range
> [0, super->total_bytes).
> 
> While for btrfs, any bytenr aligned to sector size is valid, since btrfs use
> its logical address space, there is nothing limiting the location where
> we put block groups.
> 
> For btrfs with routine balance, it's quite easy to relocate all
> block groups and bytenr of block groups will start beyond super->total_bytes.
> 
> In that case, btrfs will not trim existing block groups.
> 
> [FIX]
> Just remove the truncation in btrfs_ioctl_fitrim(), so btrfs_trim_fs()
> can get the unmodified range, which is normally set to [0, U64_MAX].
> 
> Reported-by: Chris Murphy 
> Fixes: f4c697e6406d ("btrfs: return EINVAL if start > total_bytes in fitrim 
> ioctl")
> Cc:  # v4.0+
> Signed-off-by: Qu Wenruo 
> ---
> changelog:
> v2:
>   Locate the root cause in btrfs_ioctl_fitrim(), remove the truncation
>   so we can still allow user to trim custom range.
> v2.1:
>   Include the missing change in btrfs_trim_fs() and update the commit
>   message to reflect this.
> ---
>  fs/btrfs/extent-tree.c |  9 +
>  fs/btrfs/ioctl.c   | 11 +++
>  2 files changed, 8 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index f830aa91ac3d..f74958e11008 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -10972,14 +10972,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, 
> struct fstrim_range *range)
>   int dev_ret = 0;
>   int ret = 0;
>  
> - /*
> -  * try to trim all FS space, our block group may start from non-zero.
> -  */
> - if (range->len == total_bytes)
> - cache = btrfs_lookup_first_block_group(fs_info, range->start);
> - else
> - cache = btrfs_lookup_block_group(fs_info, range->start);
> -
> + cache = btrfs_lookup_first_block_group(fs_info, range->start);
>   for (; cache; cache = next_block_group(fs_info, cache)) {
>   if (cache->key.objectid >= (range->start + range->len)) {
>   btrfs_put_block_group(cache);
(..snip..)

This first hunk should also remove total_bytes, otherwise we get an
unused-variable warning:

@@ -10972,19 +10972,11 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, 
struct fstrim_range *range)
u64 start;
u64 end;
u64 trimmed = 0;
-   u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
int bg_ret = 0;
int dev_ret = 0;
int ret = 0;
..etc..
 
cheers,
Holger

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: submit a normal bio for the mirror read

2017-11-29 Thread Liu Bo
On Wed, Nov 29, 2017 at 03:11:14PM +0800, Anand Jain wrote:
> When the fist mirror failed to read we submit bio again to read from the
> 2nd mirror, however during this, we also set the flag REQ_FAILFAST_DEV
> which indicates to the underlying block device driver not to perform the
> default IO retry (sd, 5 counts), and thus command will be failed and
> returned at the first failed attempt.
>
> On the contrary, in fact, it should have been other way around that is,
> as 2nd mirror being the last copy of the data, it should rather try
> equally hard to make this read cmd successful and at the same time with
> or without REQ_FAILFAST_DEV there is no performance benefits if the
> command is successful in the first attempt itself.
> 

There are some misunderstanding in the above commit log.

FAILFAST is only set for the validation phrase, i.e. retry read on the
same failed mirror.  When it comes to the 2nd mirror,
failed_bio->bi_vcnt becomes 1 so FAILFAST is not set any more.  IOW,
FAILFAST is only set when narrowing read failures to each 4K block, so
a quick retry is reasonable because anyway we have another mirror to
read.  You may want to check the comments in btrfs_check_repairable().

thanks,

-liubo

> The only benefit which would be achieved with REQ_FAILFAST_DEV is that
> when both the copies encounter failed read, then for the applications,
> the EIO will be reported roughly 40% faster (since it saves 4 retries).
> But when first mirror has failed whats more important will be to make
> a successful read from the 2nd mirror.
> 
> And for the DUP case where both the copies are on the same disk, this
> makes to retry 5 times on 2 different LBA/sector but on the same disk,
> which probably is not a good idea if your test case involves pulling
> out the disk. But as of now we don't have a way to tell the block layer
> how critical a read is, so that block layer can accommodate the retry
> dynamically.
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/extent_io.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index d8eb457f2a70..29c03fb4d5af 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2388,9 +2388,6 @@ static int bio_readpage_error(struct bio *failed_bio, 
> u64 phy_offset,
>   return -EIO;
>   }
>  
> - if (failed_bio->bi_vcnt > 1)
> - read_mode |= REQ_FAILFAST_DEV;
> -
>   phy_offset >>= inode->i_sb->s_blocksize_bits;
>   bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page,
> start - page_offset(page),
> -- 
> 2.15.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] Btrfs fixes for 4.15-rc2

2017-11-29 Thread David Sterba
Hi,

we've collected some fixes in since the pre-merge window freeze. There's
technically only one regression fix for 4.15, but the rest seems important and
candidates for stable. No merge conflicts, please pull, thanks.

- fix missing flush bio puts in error cases (is serious, but rarely happens)

- fix reporting stat::st_blocks for buffered append writes

- fix space cache invalidation

- fix out of bound memory access when setting zlib level

- fix potential memory corruption when fsync fails in the middle

- fix crash in integrity checker

- incremetnal send fix, path mixup for certain unlink/rename combination

- pass flags to writeback so compressed writes can be throttled properly

- error handling fixes

With signed tag: for-4.15-rc2-tag


The following changes since commit d28e649a5c58b779b303c252c66ee84a0f2c3b32:

  btrfs: Fix bug for misused dev_t when lookup in dev state hash table. 
(2017-11-01 20:45:36 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-4.15-rc2

for you to fetch changes up to ea37d5998b50a72b9045ba60a132eeb20e1c4230:

  Btrfs: incremental send, fix wrong unlink path after renaming file 
(2017-11-28 17:15:30 +0100)


David Sterba (2):
  btrfs: add missing device::flush_bio puts
  btrfs: dev_alloc_list is not protected by RCU, use normal list_del

Filipe Manana (3):
  Btrfs: move definition of the function btrfs_find_new_delalloc_bytes
  Btrfs: fix reported number of inode blocks after buffered append writes
  Btrfs: incremental send, fix wrong unlink path after renaming file

Josef Bacik (2):
  btrfs: clear space cache inode generation always
  btrfs: fix deadlock when writing out space cache

Liu Bo (3):
  Btrfs: add write_flags for compression bio
  Btrfs: bail out gracefully rather than BUG_ON
  Btrfs: fix list_add corruption and soft lockups in fsync

Nikolay Borisov (1):
  btrfs: Fix transaction abort during failure in btrfs_rm_dev_item

Qu Wenruo (2):
  btrfs: Fix wild memory access in compression level parser
  btrfs: tree-checker: Fix false panic for sanity test

 fs/btrfs/compression.c   |   9 +--
 fs/btrfs/compression.h   |   5 +-
 fs/btrfs/ctree.h |   1 +
 fs/btrfs/disk-io.c   |  10 ++-
 fs/btrfs/extent-tree.c   |  14 ++---
 fs/btrfs/extent_io.c |   2 +-
 fs/btrfs/extent_io.h |   8 ++-
 fs/btrfs/file.c  | 130 +--
 fs/btrfs/free-space-cache.c  |   3 +-
 fs/btrfs/inode.c |  34 +++---
 fs/btrfs/relocation.c|   3 +-
 fs/btrfs/send.c  | 124 +++--
 fs/btrfs/super.c |  13 +++-
 fs/btrfs/tests/extent-io-tests.c |   6 +-
 fs/btrfs/tests/inode-tests.c |  12 ++--
 fs/btrfs/tree-checker.c  |  27 ++--
 fs/btrfs/tree-checker.h  |  14 -
 fs/btrfs/tree-log.c  |   2 +-
 fs/btrfs/volumes.c   |  32 +++---
 19 files changed, 314 insertions(+), 135 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read before you deploy btrfs + zstd

2017-11-29 Thread Andrei Borzenkov
29.11.2017 16:24, Austin S. Hemmelgarn пишет:
> On 2017-11-28 18:49, David Sterba wrote:
>> On Tue, Nov 28, 2017 at 09:31:57PM +, Nick Terrell wrote:
>>>
 On Nov 21, 2017, at 8:22 AM, David Sterba  wrote:

 On Wed, Nov 15, 2017 at 08:09:15PM +, Nick Terrell wrote:
> On 11/15/17, 6:41 AM, "David Sterba"  wrote:
>> The branch is now in a state that can be tested. Turns out the memory
>> requirements are too much for grub, so the boot fails with "not
>> enough
>> memory". The calculated value
>>
>> ZSTD_BTRFS_MAX_INPUT: 131072
>> ZSTD_DStreamWorkspaceBound with ZSTD_BTRFS_MAX_INPUT: 549424
>>
>> This is not something I could fix easily, we'd probalby need a tuned
>> version of ZSTD for grub constraints. Adding Nick to CC.
>
> If I understand the grub code correctly, we only need to read, and
> we have
> the entire input and output buffer in one segment. In that case you
> can use
> ZSTD_initDCtx(), and ZSTD_decompressDCtx().
> ZSTD_DCtxWorkspaceBound() is
> only 155984. See decompress_single() in
> https://patchwork.kernel.org/patch/9997909/ for an example.

 Does not help, still ENOMEM.
>>>
>>> It looks like XZ had the same issue, and they make the decompression
>>> context a static object (grep for GRUB_EMBED_DECOMPRESSOR). We could
>>> potentially do the same and statically allocate the workspace:
>>>
>>> ```
>>> /* Could also be size_t */
>>> #define BTRFS_ZSTD_WORKSPACE_SIZE_U64 (155984 / sizeof(uint64_t))
>>> static uint64_t workspace[BTRFS_ZSTD_WORKSPACE_SIZE_U64];
>>>
>>> /* ... */
>>>
>>> assert(sizeof(workspace) >= ZSTD_DCtxWorkspaceBound());
>>> ```
>>
>> Interesting, thanks for the tip, I'll try it next.
>>
>> I've meanwhile tried to tweak the numbers, the maximum block for zstd,
>> that squeezed the DCtx somewhere under 48k, with block size 8k. Still
>> enomem.
>>
>> I've tried to add some debugging prints to see what numbers get actually
>> passed to the allocator, but did not see anything printed.  I'm sure
>> there is a more intelligent way to test the grub changes.  So far each
>> test loop takes quite some time, as I build the rpm package, test it in
>> a VM and have to recreate the environmet each time.
> On the note of testing, have you tried writing up a module to just test
> the decompressor?  If so, you could probably use the 'emu' platform to
> save the need to handle the RPM package and the VM until you get the
> decompressor working by itself, at which point the FUSE modules used to
> test the GRUB filesystem modules may be of some use (or you might be
> able to just use them directly).

There is also grub-fstest which directly calls filesystem drivers; usage
is something like "grub-fstest /dev/sdb1 cat /foo". Replace /dev/sdb1
with any btrfs image. As this is user space it is easy to single step if
needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup

2017-11-29 Thread Chris Mason

On 11/29/2017 12:05 PM, Tejun Heo wrote:

On Wed, Nov 29, 2017 at 09:03:30AM -0800, Tejun Heo wrote:

Hello,

On Wed, Nov 29, 2017 at 05:56:08PM +0100, Jan Kara wrote:

What has happened with this patch set?


No idea.  cc'ing Chris directly.  Chris, if the patchset looks good,
can you please route them through the btrfs tree?


lol looking at the patchset again, I'm not sure that's obviously the
right tree.  It can either be cgroup, block or btrfs.  If no one
objects, I'll just route them through cgroup.


We'll have to coordinate a bit during the next merge window but I don't 
have a problem with these going in through cgroup.  Dave does this sound 
good to you?


I'd like to include my patch to do all crcs inline (instead of handing 
off to helper threads) when io controls are in place.  By the merge 
window we should have some good data on how much it's all helping.


-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] retry write on error

2017-11-29 Thread Liu Bo
On Tue, Nov 28, 2017 at 08:22:36PM +0100, David Sterba wrote:
> On Tue, Nov 21, 2017 at 05:35:51PM -0700, Liu Bo wrote:
> > If the underlying protocal doesn't support retry and there are some
> > transient errors happening somewhere in our IO stack, we'd like to
> > give an extra chance for IO.  Or sometimes you see btrfs reporting
> > 'wrr 1 flush 0 read 0 blabla' but the disk drive is 100% good, this
> > retry may help a bit.
> 
> A limited number of retries may make sense, though I saw some long
> stalls after retries on bad disks. Tracking the retries would be a good
> addition to the dev stats, ie. a soft error but still worth reporting.
>
> > In btrfs, read retry is handled in bio_readpage_error() with the retry
> > unit being page size, for write retry however, we're going to do it in
> > a different way, as a write may consist of several writes onto
> > different stripes, retry write needs to be done right after the IO on
> > each stripe completes and arrives at endio.
> > 
> > Patch 1-3 are the implementation of retry write on error for
> > non-raid56 profile.  Patch 4-6 are for raid56 profile.  Both raid56
> > and non-raid56 shares one retry function helper.
> > 
> > Patch 3 does retry sector by sector, but since this patch set doesn't
> > included badblocks support, patch 7 changes it back to retry the whole
> > bio.  (I didn't fold patch 7 to patch 3 in the hope of just reverting
> > patch 7 once badblocks support is done, but I'm open to it.)
> 
> What does 'badblocks' refer to? I know about the badblocks utility that
> find and reportts bad blocks, possibly ext2 understands that and avoids
> allocating them. Btrfs does not have such support.

The same thing, badblocks refers to block/badblocks.c, it derives
from md's badblocks table, and now also serves as tracking bad cells
in pmem.  And yes, btrfs is yet to have that.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 05/11] writeback: convert the flexible prop stuff to bytes

2017-11-29 Thread Jan Kara
On Wed 22-11-17 16:16:00, Josef Bacik wrote:
> From: Josef Bacik 
> 
> The flexible proportions were all page based, but now that we are doing
> metadata writeout that can be smaller or larger than page size we need
> to account for this in bytes instead of number of pages.
> 
> Signed-off-by: Josef Bacik 

Looks good to me. You can add:

Reviewed-by: Jan Kara 

Honza

> ---
>  mm/page-writeback.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index e4563645749a..2a1994194cc1 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -574,11 +574,11 @@ static unsigned long wp_next_time(unsigned long 
> cur_time)
>   return cur_time;
>  }
>  
> -static void wb_domain_writeout_inc(struct wb_domain *dom,
> +static void wb_domain_writeout_add(struct wb_domain *dom,
>  struct fprop_local_percpu *completions,
> -unsigned int max_prop_frac)
> +long bytes, unsigned int max_prop_frac)
>  {
> - __fprop_inc_percpu_max(>completions, completions,
> + __fprop_add_percpu_max(>completions, completions, bytes,
>  max_prop_frac);
>   /* First event after period switching was turned off? */
>   if (unlikely(!dom->period_time)) {
> @@ -602,12 +602,12 @@ static inline void __wb_writeout_add(struct 
> bdi_writeback *wb, long bytes)
>   struct wb_domain *cgdom;
>  
>   __add_wb_stat(wb, WB_WRITTEN_BYTES, bytes);
> - wb_domain_writeout_inc(_wb_domain, >completions,
> + wb_domain_writeout_add(_wb_domain, >completions, bytes,
>  wb->bdi->max_prop_frac);
>  
>   cgdom = mem_cgroup_wb_domain(wb);
>   if (cgdom)
> - wb_domain_writeout_inc(cgdom, wb_memcg_completions(wb),
> + wb_domain_writeout_add(cgdom, wb_memcg_completions(wb), bytes,
>  wb->bdi->max_prop_frac);
>  }
>  
> -- 
> 2.7.5
> 
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup

2017-11-29 Thread Tejun Heo
On Wed, Nov 29, 2017 at 09:03:30AM -0800, Tejun Heo wrote:
> Hello,
> 
> On Wed, Nov 29, 2017 at 05:56:08PM +0100, Jan Kara wrote:
> > What has happened with this patch set?
> 
> No idea.  cc'ing Chris directly.  Chris, if the patchset looks good,
> can you please route them through the btrfs tree?

lol looking at the patchset again, I'm not sure that's obviously the
right tree.  It can either be cgroup, block or btrfs.  If no one
objects, I'll just route them through cgroup.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 03/11] lib: make the fprop batch size a multiple of PAGE_SIZE

2017-11-29 Thread Jan Kara
On Wed 22-11-17 16:15:58, Josef Bacik wrote:
> From: Josef Bacik 
> 
> We are converting the writeback counters to use bytes instead of pages,
> so we need to make the batch size for the percpu modifications align
> properly with the new units.  Since we used pages before, just multiply
> by PAGE_SIZE to get the equivalent bytes for the batch size.
> 
> Signed-off-by: Josef Bacik 

Looks good to me, just please make this part of patch 5/11. Otherwise
bisection may get broken by too large errors in per-cpu counters of IO
completions... Thanks!

Honza

> ---
>  lib/flex_proportions.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/flex_proportions.c b/lib/flex_proportions.c
> index 2cc1f94e03a1..b0343ae71f5e 100644
> --- a/lib/flex_proportions.c
> +++ b/lib/flex_proportions.c
> @@ -166,7 +166,7 @@ void fprop_fraction_single(struct fprop_global *p,
>  /*
>   *  PERCPU 
>   */
> -#define PROP_BATCH (8*(1+ilog2(nr_cpu_ids)))
> +#define PROP_BATCH (8*PAGE_SIZE*(1+ilog2(nr_cpu_ids)))
>  
>  int fprop_local_init_percpu(struct fprop_local_percpu *pl, gfp_t gfp)
>  {
> -- 
> 2.7.5
> 
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 04/11] lib: add a __fprop_add_percpu_max

2017-11-29 Thread Jan Kara
On Wed 22-11-17 16:15:59, Josef Bacik wrote:
> From: Josef Bacik 
> 
> This helper allows us to add an arbitrary amount to the fprop
> structures.
> 
> Signed-off-by: Josef Bacik 

Looks good to me. You can add:

Reviewed-by: Jan Kara 

Honza

> ---
>  include/linux/flex_proportions.h | 11 +--
>  lib/flex_proportions.c   |  9 +
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/flex_proportions.h 
> b/include/linux/flex_proportions.h
> index 0d348e011a6e..9f88684bf0a0 100644
> --- a/include/linux/flex_proportions.h
> +++ b/include/linux/flex_proportions.h
> @@ -83,8 +83,8 @@ struct fprop_local_percpu {
>  int fprop_local_init_percpu(struct fprop_local_percpu *pl, gfp_t gfp);
>  void fprop_local_destroy_percpu(struct fprop_local_percpu *pl);
>  void __fprop_inc_percpu(struct fprop_global *p, struct fprop_local_percpu 
> *pl);
> -void __fprop_inc_percpu_max(struct fprop_global *p, struct 
> fprop_local_percpu *pl,
> - int max_frac);
> +void __fprop_add_percpu_max(struct fprop_global *p, struct 
> fprop_local_percpu *pl,
> + unsigned long nr, int max_frac);
>  void fprop_fraction_percpu(struct fprop_global *p,
>   struct fprop_local_percpu *pl, unsigned long *numerator,
>   unsigned long *denominator);
> @@ -99,4 +99,11 @@ void fprop_inc_percpu(struct fprop_global *p, struct 
> fprop_local_percpu *pl)
>   local_irq_restore(flags);
>  }
>  
> +static inline
> +void __fprop_inc_percpu_max(struct fprop_global *p,
> + struct fprop_local_percpu *pl, int max_frac)
> +{
> + __fprop_add_percpu_max(p, pl, 1, max_frac);
> +}
> +
>  #endif
> diff --git a/lib/flex_proportions.c b/lib/flex_proportions.c
> index b0343ae71f5e..fd95791a2c93 100644
> --- a/lib/flex_proportions.c
> +++ b/lib/flex_proportions.c
> @@ -255,8 +255,9 @@ void fprop_fraction_percpu(struct fprop_global *p,
>   * Like __fprop_inc_percpu() except that event is counted only if the given
>   * type has fraction smaller than @max_frac/FPROP_FRAC_BASE
>   */
> -void __fprop_inc_percpu_max(struct fprop_global *p,
> - struct fprop_local_percpu *pl, int max_frac)
> +void __fprop_add_percpu_max(struct fprop_global *p,
> + struct fprop_local_percpu *pl, unsigned long nr,
> + int max_frac)
>  {
>   if (unlikely(max_frac < FPROP_FRAC_BASE)) {
>   unsigned long numerator, denominator;
> @@ -267,6 +268,6 @@ void __fprop_inc_percpu_max(struct fprop_global *p,
>   return;
>   } else
>   fprop_reflect_period_percpu(p, pl);
> - percpu_counter_add_batch(>events, 1, PROP_BATCH);
> - percpu_counter_add(>events, 1);
> + percpu_counter_add_batch(>events, nr, PROP_BATCH);
> + percpu_counter_add(>events, nr);
>  }
> -- 
> 2.7.5
> 
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup

2017-11-29 Thread Tejun Heo
Hello,

On Wed, Nov 29, 2017 at 05:56:08PM +0100, Jan Kara wrote:
> What has happened with this patch set?

No idea.  cc'ing Chris directly.  Chris, if the patchset looks good,
can you please route them through the btrfs tree?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 02/11] writeback: convert WB_WRITTEN/WB_DIRITED counters to bytes

2017-11-29 Thread Jan Kara
On Wed 22-11-17 16:15:57, Josef Bacik wrote:
> From: Josef Bacik 
> 
> These are counters that constantly go up in order to do bandwidth 
> calculations.
> It isn't important what the units are in, as long as they are consistent 
> between
> the two of them, so convert them to count bytes written/dirtied, and allow the
> metadata accounting stuff to change the counters as well.
> 
> Signed-off-by: Josef Bacik 
> Acked-by: Tejun Heo 

Looks good to me. You can add:

Reviewed-by: Jan Kara 

Honza

> ---
>  fs/fuse/file.c   |  4 ++--
>  include/linux/backing-dev-defs.h |  4 ++--
>  include/linux/backing-dev.h  |  2 +-
>  mm/backing-dev.c |  9 +
>  mm/page-writeback.c  | 20 ++--
>  5 files changed, 20 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index cb7dff5c45d7..67e7c4fac28d 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1471,7 +1471,7 @@ static void fuse_writepage_finish(struct fuse_conn *fc, 
> struct fuse_req *req)
>   for (i = 0; i < req->num_pages; i++) {
>   dec_wb_stat(>wb, WB_WRITEBACK);
>   dec_node_page_state(req->pages[i], NR_WRITEBACK_TEMP);
> - wb_writeout_inc(>wb);
> + wb_writeout_add(>wb, PAGE_SIZE);
>   }
>   wake_up(>page_waitq);
>  }
> @@ -1776,7 +1776,7 @@ static bool fuse_writepage_in_flight(struct fuse_req 
> *new_req,
>  
>   dec_wb_stat(>wb, WB_WRITEBACK);
>   dec_node_page_state(page, NR_WRITEBACK_TEMP);
> - wb_writeout_inc(>wb);
> + wb_writeout_add(>wb, PAGE_SIZE);
>   fuse_writepage_free(fc, new_req);
>   fuse_request_free(new_req);
>   goto out;
> diff --git a/include/linux/backing-dev-defs.h 
> b/include/linux/backing-dev-defs.h
> index 866c433e7d32..ded45ac2cec7 100644
> --- a/include/linux/backing-dev-defs.h
> +++ b/include/linux/backing-dev-defs.h
> @@ -36,8 +36,8 @@ typedef int (congested_fn)(void *, int);
>  enum wb_stat_item {
>   WB_RECLAIMABLE,
>   WB_WRITEBACK,
> - WB_DIRTIED,
> - WB_WRITTEN,
> + WB_DIRTIED_BYTES,
> + WB_WRITTEN_BYTES,
>   NR_WB_STAT_ITEMS
>  };
>  
> diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
> index 14e266d12620..39b8dc486ea7 100644
> --- a/include/linux/backing-dev.h
> +++ b/include/linux/backing-dev.h
> @@ -89,7 +89,7 @@ static inline s64 wb_stat_sum(struct bdi_writeback *wb, 
> enum wb_stat_item item)
>   return percpu_counter_sum_positive(>stat[item]);
>  }
>  
> -extern void wb_writeout_inc(struct bdi_writeback *wb);
> +extern void wb_writeout_add(struct bdi_writeback *wb, long bytes);
>  
>  /*
>   * maximal error of a stat counter.
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index e19606bb41a0..62a332a91b38 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -68,14 +68,15 @@ static int bdi_debug_stats_show(struct seq_file *m, void 
> *v)
>   wb_thresh = wb_calc_thresh(wb, dirty_thresh);
>  
>  #define K(x) ((x) << (PAGE_SHIFT - 10))
> +#define BtoK(x) ((x) >> 10)
>   seq_printf(m,
>  "BdiWriteback:   %10lu kB\n"
>  "BdiReclaimable: %10lu kB\n"
>  "BdiDirtyThresh: %10lu kB\n"
>  "DirtyThresh:%10lu kB\n"
>  "BackgroundThresh:   %10lu kB\n"
> -"BdiDirtied: %10lu kB\n"
> -"BdiWritten: %10lu kB\n"
> +"BdiDirtiedBytes:%10lu kB\n"
> +"BdiWrittenBytes:%10lu kB\n"
>  "BdiWriteBandwidth:  %10lu kBps\n"
>  "b_dirty:%10lu\n"
>  "b_io:   %10lu\n"
> @@ -88,8 +89,8 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
>  K(wb_thresh),
>  K(dirty_thresh),
>  K(background_thresh),
> -(unsigned long) K(wb_stat(wb, WB_DIRTIED)),
> -(unsigned long) K(wb_stat(wb, WB_WRITTEN)),
> +(unsigned long) BtoK(wb_stat(wb, WB_DIRTIED_BYTES)),
> +(unsigned long) BtoK(wb_stat(wb, WB_WRITTEN_BYTES)),
>  (unsigned long) K(wb->write_bandwidth),
>  nr_dirty,
>  nr_io,
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 1a47d4296750..e4563645749a 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -597,11 +597,11 @@ static void wb_domain_writeout_inc(struct wb_domain 
> *dom,
>   * Increment @wb's writeout completion count and the global writeout
>   * completion count. Called from test_clear_page_writeback().
>   */
> -static inline void __wb_writeout_inc(struct bdi_writeback *wb)
> +static inline void __wb_writeout_add(struct bdi_writeback *wb, long bytes)
>  {

Re: [PATCH v7 1/5] add infrastructure for tagging functions as error injectable

2017-11-29 Thread Daniel Borkmann
On 11/28/2017 09:02 PM, Josef Bacik wrote:
> On Tue, Nov 28, 2017 at 11:58:41AM -0700, Jonathan Corbet wrote:
>> On Wed, 22 Nov 2017 16:23:30 -0500
>> Josef Bacik  wrote:
>>> From: Josef Bacik 
>>>
>>> Using BPF we can override kprob'ed functions and return arbitrary
>>> values.  Obviously this can be a bit unsafe, so make this feature opt-in
>>> for functions.  Simply tag a function with KPROBE_ERROR_INJECT_SYMBOL in
>>> order to give BPF access to that function for error injection purposes.
>>>
>>> Signed-off-by: Josef Bacik 
>>> Acked-by: Ingo Molnar 
>>> ---
>>>  arch/x86/include/asm/asm.h|   6 ++
>>>  include/asm-generic/vmlinux.lds.h |  10 +++
>>>  include/linux/bpf.h   |  11 +++
>>>  include/linux/kprobes.h   |   1 +
>>>  include/linux/module.h|   5 ++
>>>  kernel/kprobes.c  | 163 
>>> ++
>>>  kernel/module.c   |   6 +-
>>>  7 files changed, 201 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
>>> index b0dc91f4bedc..340f4cc43255 100644
>>> --- a/arch/x86/include/asm/asm.h
>>> +++ b/arch/x86/include/asm/asm.h
>>> @@ -85,6 +85,12 @@
>>> _ASM_PTR (entry);   \
>>> .popsection
>>>  
>>> +# define _ASM_KPROBE_ERROR_INJECT(entry)   \
>>> +   .pushsection "_kprobe_error_inject_list","aw" ; \
>>> +   _ASM_ALIGN ;\
>>> +   _ASM_PTR (entry);   \
>>> +   .popseciton
>>
>> So this stuff is not my area of greatest expertise, but I do have to wonder
>> how ".popseciton" can work ... ?
> 
> Well fuck, do you want me to send a increment Daniel/Alexei or resend this 
> patch
> fixed?  Thanks,

Sorry for late reply, please rebase + respin the whole series with
this fixed. There were also few typos in the cover letter / commit
messages that would be good to get fixed along the way.

Also, could you debug why this wasn't caught at compile/runtime during
testing?

Thanks a lot,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup

2017-11-29 Thread Jan Kara
Hi Tejun,

What has happened with this patch set?

Honza

On Tue 10-10-17 08:54:36, Tejun Heo wrote:
> Changes from the last version are
> 
> * blkcg_root_css exported to fix build breakage on modular btrfs.
> 
> * Use ext4_should_journal_data() test instead of
>   EXT4_MOUNT_JOURNAL_DATA.
> 
> * Separated out create_bh_bio() and used it to implement
>   submit_bh_blkcg_css() as suggested by Jan.
> 
> btrfs has different ways to issue metadata IOs and may end up issuing
> metadata or otherwise shared IOs from a non-root cgroup, which can
> lead to priority inversion and ineffective IO isolation.
> 
> This patchset makes sure that btrfs issues all metadata and shared IOs
> from the root cgroup by exempting btree_inodes from cgroup writeback
> and explicitly associating shared IOs with the root cgroup.
> 
> This patchset containst he following three patches
> 
>  [PATCH 1/5] blkcg: export blkcg_root_css
>  [PATCH 2/5] cgroup, writeback: replace SB_I_CGROUPWB with per-inode
>  [PATCH 3/5] buffer_head: separate out create_bh_bio() from
>  [PATCH 4/5] cgroup, buffer_head: implement submit_bh_blkcg_css()
>  [PATCH 5/5] btrfs: ensure that metadata and flush are issued from the
> 
> and is also available in the following git branch
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 
> review-cgroup-btrfs-metadata-v2
> 
> diffstat follows.  Thanks.
> 
>  block/blk-cgroup.c  |1 +
>  fs/block_dev.c  |3 +--
>  fs/btrfs/check-integrity.c  |2 +-
>  fs/btrfs/disk-io.c  |4 
>  fs/btrfs/ioctl.c|6 +-
>  fs/btrfs/super.c|1 -
>  fs/buffer.c |   42 ++
>  fs/ext2/inode.c |3 ++-
>  fs/ext2/super.c |1 -
>  fs/ext4/inode.c |5 -
>  fs/ext4/super.c |2 --
>  include/linux/backing-dev.h |2 +-
>  include/linux/buffer_head.h |3 +++
>  include/linux/fs.h  |3 ++-
>  14 files changed, 58 insertions(+), 20 deletions(-)
> 
> --
> tejun
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] retry write on error

2017-11-29 Thread David Sterba
On Wed, Nov 29, 2017 at 12:09:29PM +0800, Anand Jain wrote:
> On 11/29/2017 07:41 AM, p...@btrfs.list.sabi.co.uk wrote:
>  If the underlying protocal doesn't support retry and there
>  are some transient errors happening somewhere in our IO
>  stack, we'd like to give an extra chance for IO.
> > 
> >>> A limited number of retries may make sense, though I saw some
> >>> long stalls after retries on bad disks.
> > 
> > Indeed! One of the major issues in actual storage administration
> > is to find ways to reliably disable most retries, or to shorten
> > them, both at the block device level and the device level,
> > because in almost all cases where storage reliability matters
> > what is important is simply swapping out the failing device
> > immediately and then examining and possible refreshing it
> > offline.
> > 
> > To the point that many device manufacturers deliberately cripple
> > in cheaper products retry shortening or disabling options to
> > force long stalls, so that people who care about reliability
> > more than price will buy the more expensive version that can
> > disable or shorten retries.
> > 
> >> Seems preferable to avoid issuing retries when the underlying
> >> transport layer(s) has already done so, but I am not sure
> >> there is a way to know that at the fs level.
> > 
> > Inded, and to use an euphemism, a third layer of retries at the
> > filesystem level are currently a thoroughly imbecilic idea :-),
> > as whether retries are worth doing is not a filesystem dependent
> > issue (but then plugging is done at the block io level when it
> > is entirely device dependent whether it is worth doing, so there
> > is famous precedent).
> > 
> > There are excellent reasons why error recovery is in general not
> > done at the filesystem level since around 20 years ago, which do
> > not need repeating every time. However one of them is that where
> > it makes sense device firmware does retries, and the block
> > device layer does retries too, which is often a bad idea, and
> > where it is not, the block io level should be do that, not the
> > filesystem.
> > 
> > A large part of the above discussion would not be needed if
> > Linux kernel "developers" exposed a clear notion of hardware
> > device and block device state machine and related semantics, or
> > even knew that it were desirable, but that's an idea that is
> > only 50 years old, so may not have yet reached popularity :-).
> 
>   I agree with Ed and Peter, similar opinion was posted here [1].
>  https://www.spinics.net/lists/linux-btrfs/msg70240.html

All the points in this thread speak against retries on the filesystem
level and I agree. Without an interface to query the block layer if the
retries make sense, it's just guessing, likely to be wrong.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] btrfs: factor __btrfs_open_devices() to create btrfs_open_one_device()

2017-11-29 Thread Lakshmipathi.G
Hi Anand,

Applied your fix, now the issue resolved. thanks!


Cheers,
Lakshmipathi.G
http://www.giis.co.in http://www.webminal.org


On Mon, Nov 27, 2017 at 5:17 PM, Anand Jain  wrote:
>
> Hi Lakshmipathi,
>
>  Oops I can see the same. I am trying to narrow down.
>
> Thanks, Anand
>
>
> On 11/27/2017 02:47 PM, Lakshmipathi.G wrote:
>>
>> Hi Anand,
>>
>> With this patch applied, btrfs-progs/misc-test/021 error out. Is this
>> same for you?
>>
>> Without this patch: https://asciinema.org/a/RJmE5469mHlL3S1BIOCifWVn6
>> With this patch:https://asciinema.org/a/1h5UX6DIFNsvvMXgLo4GiEgdE
>>
>> thanks!
>> 
>> Cheers,
>> Lakshmipathi.G
>> http://www.giis.co.in http://www.webminal.org
>>
>>
>> On Thu, Nov 9, 2017 at 9:15 PM, Anand Jain  wrote:
>>>
>>> No functional changes, create btrfs_open_one_device() from
>>> __btrfs_open_devices(). This is a preparatory work to add dynamic
>>> device scan.
>>>
>>> Signed-off-by: Anand Jain 
>>> ---
>>>   fs/btrfs/volumes.c | 126
>>> +
>>>   1 file changed, 69 insertions(+), 57 deletions(-)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 0857b580014d..d24e966ee29f 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -601,6 +601,73 @@ void btrfs_free_stale_device(struct btrfs_device
>>> *cur_dev)
>>>  }
>>>   }
>>>
>>> +static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
>>> +   struct btrfs_device *device, fmode_t flags,
>>> +   void *holder)
>>> +{
>>> +   struct request_queue *q;
>>> +   struct block_device *bdev;
>>> +   struct buffer_head *bh;
>>> +   struct btrfs_super_block *disk_super;
>>> +   u64 devid;
>>> +   int ret;
>>> +
>>> +   if (device->bdev)
>>> +   return -EINVAL;
>>> +   if (!device->name)
>>> +   return -EINVAL;
>>> +
>>> +   ret = btrfs_get_bdev_and_sb(device->name->str, flags, holder, 1,
>>> +   , );
>>> +   if (ret)
>>> +   return ret;
>>> +
>>> +   disk_super = (struct btrfs_super_block *)bh->b_data;
>>> +   devid = btrfs_stack_device_id(_super->dev_item);
>>> +   if (devid != device->devid)
>>> +   goto error_brelse;
>>> +
>>> +   if (memcmp(device->uuid, disk_super->dev_item.uuid,
>>> +  BTRFS_UUID_SIZE))
>>> +   goto error_brelse;
>>> +
>>> +   device->generation = btrfs_super_generation(disk_super);
>>> +
>>> +   if (btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_SEEDING) {
>>> +   device->writeable = 0;
>>> +   fs_devices->seeding = 1;
>>> +   } else {
>>> +   device->writeable = !bdev_read_only(bdev);
>>> +   }
>>> +
>>> +   q = bdev_get_queue(bdev);
>>> +   if (blk_queue_discard(q))
>>> +   device->can_discard = 1;
>>> +   if (!blk_queue_nonrot(q))
>>> +   fs_devices->rotating = 1;
>>> +
>>> +   device->bdev = bdev;
>>> +   device->in_fs_metadata = 0;
>>> +   device->mode = flags;
>>> +
>>> +   fs_devices->open_devices++;
>>> +   if (device->writeable &&
>>> +   device->devid != BTRFS_DEV_REPLACE_DEVID) {
>>> +   fs_devices->rw_devices++;
>>> +   list_add(>dev_alloc_list,
>>> +_devices->alloc_list);
>>> +   }
>>> +   brelse(bh);
>>> +
>>> +   return 0;
>>> +
>>> +error_brelse:
>>> +   brelse(bh);
>>> +   blkdev_put(bdev, flags);
>>> +
>>> +   return -EINVAL;
>>> +}
>>> +
>>>   /*
>>>* Add new device to list of registered devices
>>>*
>>> @@ -978,69 +1045,14 @@ static int __btrfs_open_devices(struct
>>> btrfs_fs_devices *fs_devices,
>>>  flags |= FMODE_EXCL;
>>>
>>>  list_for_each_entry(device, head, dev_list) {
>>> -   struct request_queue *q;
>>> -   struct block_device *bdev;
>>> -   struct buffer_head *bh;
>>> -   struct btrfs_super_block *disk_super;
>>> -   u64 devid;
>>> -
>>> -   if (device->bdev)
>>> -   continue;
>>> -   if (!device->name)
>>> -   continue;
>>> -
>>>  /* Just open everything we can; ignore failures here */
>>> -   if (btrfs_get_bdev_and_sb(device->name->str, flags,
>>> holder, 1,
>>> -   , ))
>>> +   ret = btrfs_open_one_device(fs_devices, device, flags,
>>> holder);
>>> +   if (ret)
>>>  continue;
>>>
>>> -   disk_super = (struct btrfs_super_block *)bh->b_data;
>>> -   devid = btrfs_stack_device_id(_super->dev_item);
>>> -   if (devid != device->devid)
>>> -   goto error_brelse;
>>> -
>>> -   if 

[PATCH 1/2] btrfs-progs: convert: Fix a bug in rollback check which overwrite return value

2017-11-29 Thread Qu Wenruo
Commit 1170ac307900 ("btrfs-progs: convert: Introduce function to check if
convert image is able to be rolled back") reworked rollback check
condition, by checking 1:1 mapping of each file extent.

The idea itself has nothing wrong, but error handler is not implemented
correctly, which over writes the return value and always try to rollback
the fs even it fails to pass the check.

Fix it by correctly return the error before rollback the fs.

Fixes: 1170ac307900 ("btrfs-progs: convert: Introduce function to check if 
convert image is able to be rolled back")
Reported-by: Jeff Mahoney 
Signed-off-by: Qu Wenruo 
---
 convert/main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/convert/main.c b/convert/main.c
index af2855316fee..89f9261172ca 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -1443,6 +1443,8 @@ next:
}
}
btrfs_release_path();
+   if (ret)
+   return ret;
/*
 * For HOLES mode (without NO_HOLES), we must ensure file extents
 * cover the whole range of the image
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs-progs: tests/convert: Introduce test case to ensure btrfs-convert won't rollback the filesystem after balance

2017-11-29 Thread Qu Wenruo
Signed-off-by: Qu Wenruo 
---
 .../015-not-to-rollback-after-balance/test.sh  | 33 ++
 1 file changed, 33 insertions(+)
 create mode 100755 
tests/convert-tests/015-not-to-rollback-after-balance/test.sh

diff --git a/tests/convert-tests/015-not-to-rollback-after-balance/test.sh 
b/tests/convert-tests/015-not-to-rollback-after-balance/test.sh
new file mode 100755
index ..0d523d2250db
--- /dev/null
+++ b/tests/convert-tests/015-not-to-rollback-after-balance/test.sh
@@ -0,0 +1,33 @@
+#!/bin/bash
+# Check if btrfs-convert refuse to rollback the filesystem, and leave
+# the btrfs and the convert image untouched
+
+source "$TOP/tests/common"
+source "$TOP/tests/common.convert"
+
+setup_root_helper
+prepare_test_dev
+check_prereq btrfs-convert
+check_global_prereq mke2fs
+
+# convert_test_prep_fs() will create large enough file inside the
+# test device, that's good enough for us to test rollback failure.
+convert_test_prep_fs ext4 mke2fs -t ext4 -b 4096
+run_check_umount_test_dev
+convert_test_do_convert "" 4096
+
+run_check_mount_test_dev
+
+# Now the fs is converted, balance the fs so later rollback should fail
+run_check $SUDO_HELPER "$TOP/btrfs" balance start --full-balance $TEST_MNT
+run_check_umount_test_dev
+
+# rollback should fail
+run_mustfail "rollback fs after balance" "$TOP/btrfs-convert" -r $TEST_DEV
+
+# Ensure the fs and convert image can pass the check
+run_check "$TOP/btrfs" check $TEST_DEV
+
+run_check_mount_test_dev
+run_check $SUDO_HELPER e2fsck -fn $TEST_MNT/ext2_saved/image
+run_check_umount_test_dev
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A partially failing disk in raid0 needs replacement

2017-11-29 Thread Klaus Agnoletti
Hi list

Can anyone give me any hints here? If not, my plan right now is to
start updating the server to latest debian stable (it's currently
running Jessie), to get access to a newer btrfs driver and tools,
hoping that decreases the risk of something screwing up, and then run
btrfs check --repair on the unmounted fs and wish for the best.

Does that make sense?

Thanks,

/klaus

On Tue, Nov 14, 2017 at 9:36 AM, Klaus Agnoletti  wrote:
> Hi list
>
> I used to have 3x2TB in a btrfs in raid0. A few weeks ago, one of the
> 2TB disks started giving me I/O errors in dmesg like this:
>
> [388659.173819] ata5.00: exception Emask 0x0 SAct 0x7fff SErr 0x0 action 
> 0x0
> [388659.175589] ata5.00: irq_stat 0x4008
> [388659.177312] ata5.00: failed command: READ FPDMA QUEUED
> [388659.179045] ata5.00: cmd 60/20:60:80:96:95/00:00:c4:00:00/40 tag
> 12 ncq 1638
>  4 in
>  res 51/40:1c:84:96:95/00:00:c4:00:00/40 Emask 0x409 (media error) 
> [388659.182552] ata5.00: status: { DRDY ERR }
> [388659.184303] ata5.00: error: { UNC }
> [388659.188899] ata5.00: configured for UDMA/133
> [388659.188956] sd 4:0:0:0: [sdd] Unhandled sense code
> [388659.188960] sd 4:0:0:0: [sdd]
> [388659.188962] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [388659.188965] sd 4:0:0:0: [sdd]
> [388659.188967] Sense Key : Medium Error [current] [descriptor]
> [388659.188970] Descriptor sense data with sense descriptors (in hex):
> [388659.188972] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> [388659.188981] c4 95 96 84
> [388659.188985] sd 4:0:0:0: [sdd]
> [388659.188988] Add. Sense: Unrecovered read error - auto reallocate failed
> [388659.188991] sd 4:0:0:0: [sdd] CDB:
> [388659.188992] Read(10): 28 00 c4 95 96 80 00 00 20 00
> [388659.189000] end_request: I/O error, dev sdd, sector 3298137732
> [388659.190740] BTRFS: bdev /dev/sdd errs: wr 0, rd 3120, flush 0,
> corrupt 0, ge
>n 0
> [388659.192556] ata5: EH complete
>
> At the same time, I started getting mails from smartd:
>
> Device: /dev/sdd [SAT], 2 Currently unreadable (pending) sectors
> Device info:
> Hitachi HDS723020BLA642, S/N:MN1220F30MNHUD, WWN:5-000cca-369c8f00b,
> FW:MN6OA580, 2.00 TB
>
> For details see host's SYSLOG.
>
> To fix it, it ended up with me adding a new 6TB disk and trying to
> delete the failing 2TB disks.
>
> That didn't go so well; apparently, the delete command aborts when
> ever it encounters I/O errors. So now my raid0 looks like this:
>
> klaus@box:~$ sudo btrfs fi show
> [sudo] password for klaus:
> Label: none  uuid: 5db5f82c-2571-4e62-a6da-50da0867888a
> Total devices 4 FS bytes used 5.14TiB
> devid1 size 1.82TiB used 1.78TiB path /dev/sde
> devid2 size 1.82TiB used 1.78TiB path /dev/sdf
> devid3 size 0.00B used 1.49TiB path /dev/sdd
> devid4 size 5.46TiB used 305.21GiB path /dev/sdb
>
> Btrfs v3.17
>
> Obviously, I want /dev/sdd emptied and deleted from the raid.
>
> So how do I do that?
>
> I thought of three possibilities myself. I am sure there are more,
> given that I am in no way a btrfs expert:
>
> 1)Try to force a deletion of /dev/sdd where btrfs copies all intact
> data to the other disks
> 2) Somehow re-balances the raid so that sdd is emptied, and then deleted
> 3) converting into a raid1, physically removing the failing disk,
> simulating a hard error, starting the raid degraded, and converting it
> back to raid0 again.
>
> How do you guys think I should go about this? Given that it's a raid0
> for a reason, it's not the end of the world losing all data, but I'd
> really prefer losing as little as possible, obviously.
>
> FYI, I tried doing some scrubbing and balancing. There's traces of
> that in the syslog and dmesg I've attached. It's being used as
> firewall too, so there's a lof of Shorewall block messages smapping
> the log I'm afraid.
>
> Additional info:
> klaus@box:~$ uname -a
> Linux box 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19)
> x86_64 GNU/Linux
> klaus@box:~$ sudo btrfs --version
> Btrfs v3.17
> klaus@box:~$ sudo btrfs fi df /mnt
> Data, RAID0: total=5.34TiB, used=5.14TiB
> System, RAID0: total=96.00MiB, used=384.00KiB
> Metadata, RAID0: total=7.22GiB, used=5.82GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> Thanks a lot for any help you guys can give me. Btrfs is so incredibly
> cool, compared to md :-) I love it!
>
> --
> Klaus Agnoletti



-- 
Klaus Agnoletti
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read before you deploy btrfs + zstd

2017-11-29 Thread Austin S. Hemmelgarn

On 2017-11-28 18:49, David Sterba wrote:

On Tue, Nov 28, 2017 at 09:31:57PM +, Nick Terrell wrote:



On Nov 21, 2017, at 8:22 AM, David Sterba  wrote:

On Wed, Nov 15, 2017 at 08:09:15PM +, Nick Terrell wrote:

On 11/15/17, 6:41 AM, "David Sterba"  wrote:

The branch is now in a state that can be tested. Turns out the memory
requirements are too much for grub, so the boot fails with "not enough
memory". The calculated value

ZSTD_BTRFS_MAX_INPUT: 131072
ZSTD_DStreamWorkspaceBound with ZSTD_BTRFS_MAX_INPUT: 549424

This is not something I could fix easily, we'd probalby need a tuned
version of ZSTD for grub constraints. Adding Nick to CC.


If I understand the grub code correctly, we only need to read, and we have
the entire input and output buffer in one segment. In that case you can use
ZSTD_initDCtx(), and ZSTD_decompressDCtx(). ZSTD_DCtxWorkspaceBound() is
only 155984. See decompress_single() in
https://patchwork.kernel.org/patch/9997909/ for an example.


Does not help, still ENOMEM.


It looks like XZ had the same issue, and they make the decompression
context a static object (grep for GRUB_EMBED_DECOMPRESSOR). We could
potentially do the same and statically allocate the workspace:

```
/* Could also be size_t */
#define BTRFS_ZSTD_WORKSPACE_SIZE_U64 (155984 / sizeof(uint64_t))
static uint64_t workspace[BTRFS_ZSTD_WORKSPACE_SIZE_U64];

/* ... */

assert(sizeof(workspace) >= ZSTD_DCtxWorkspaceBound());
```


Interesting, thanks for the tip, I'll try it next.

I've meanwhile tried to tweak the numbers, the maximum block for zstd,
that squeezed the DCtx somewhere under 48k, with block size 8k. Still
enomem.

I've tried to add some debugging prints to see what numbers get actually
passed to the allocator, but did not see anything printed.  I'm sure
there is a more intelligent way to test the grub changes.  So far each
test loop takes quite some time, as I build the rpm package, test it in
a VM and have to recreate the environmet each time.
On the note of testing, have you tried writing up a module to just test 
the decompressor?  If so, you could probably use the 'emu' platform to 
save the need to handle the RPM package and the VM until you get the 
decompressor working by itself, at which point the FUSE modules used to 
test the GRUB filesystem modules may be of some use (or you might be 
able to just use them directly).


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: drop btrfs_device::can_discard to query directly

2017-11-29 Thread Nikolay Borisov


On 29.11.2017 12:53, Anand Jain wrote:
> We can query the bdev directly when needed at btrfs_discard_extent()
> so drop btrfs_device::can_discard.
> 
> Signed-off-by: Anand Jain 
> Suggested-by: Nikolay Borisov 

Reviewed-by: Nikolay Borisov 

> ---
>  fs/btrfs/extent-tree.c | 5 -
>  fs/btrfs/volumes.c | 8 
>  fs/btrfs/volumes.h | 1 -
>  3 files changed, 4 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 85a545cf4395..830666c95360 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2145,7 +2145,10 @@ int btrfs_discard_extent(struct btrfs_fs_info 
> *fs_info, u64 bytenr,
>  
>   for (i = 0; i < bbio->num_stripes; i++, stripe++) {
>   u64 bytes;
> - if (!stripe->dev->can_discard)
> + struct request_queue *req_q;
> +
> + req_q = bdev_get_queue(stripe->dev->bdev);
> + if (!blk_queue_discard(req_q))
>   continue;
>  
>   ret = btrfs_issue_discard(stripe->dev->bdev,
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index ff62919d568f..ff52ecf17402 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -630,8 +630,6 @@ static int btrfs_open_one_device(struct btrfs_fs_devices 
> *fs_devices,
>   }
>  
>   q = bdev_get_queue(bdev);
> - if (blk_queue_discard(q))
> - device->can_discard = 1;
>   if (!blk_queue_nonrot(q))
>   fs_devices->rotating = 1;
>  
> @@ -2387,8 +2385,6 @@ int btrfs_init_new_device(struct btrfs_fs_info 
> *fs_info, const char *device_path
>   }
>  
>   q = bdev_get_queue(bdev);
> - if (blk_queue_discard(q))
> - device->can_discard = 1;
>   set_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state);
>   device->generation = trans->transid;
>   device->io_width = fs_info->sectorsize;
> @@ -2539,7 +2535,6 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info 
> *fs_info,
> struct btrfs_device *srcdev,
> struct btrfs_device **device_out)
>  {
> - struct request_queue *q;
>   struct btrfs_device *device;
>   struct block_device *bdev;
>   struct list_head *devices;
> @@ -2596,9 +2591,6 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info 
> *fs_info,
>   }
>   rcu_assign_pointer(device->name, name);
>  
> - q = bdev_get_queue(bdev);
> - if (blk_queue_discard(q))
> - device->can_discard = 1;
>   mutex_lock(_info->fs_devices->device_list_mutex);
>   set_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state);
>   device->generation = 0;
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 2fbff6902c8d..1cc693716b9a 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -74,7 +74,6 @@ struct btrfs_device {
>   fmode_t mode;
>  
>   unsigned long dev_state;
> - int can_discard;
>   int is_tgtdev_for_dev_replace;
>   blk_status_t last_flush_error;
>   int flush_bio_sent;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How about adding an ioctl to convert a directory to a subvolume?

2017-11-29 Thread Lu Fengqi

On Tue, Nov 28, 2017 at 07:48:28PM +0100, David Sterba wrote:
>On Mon, Nov 27, 2017 at 05:41:56PM +0800, Lu Fengqi wrote:
>> As we all know, under certain circumstances, it is more appropriate to
>> create some subvolumes rather than keep everything in the same
>> subvolume. As the condition of demand change, the user may need to
>> convert a previous directory to a subvolume. For this reason,how about
>> adding an ioctl to convert a directory to a subvolume?

Thanks for taking time to reply.

>
>I'd say too difficult to get everything right in kernel. This is
>possible to be done in userspace, with existing tools.
>
>The problem is that the conversion cannot be done atomically in most
>cases, so even if it's just one ioctl call, there are several possible
>intermediate states that would exist during the call. Reporting where
>did the ioctl fail would need some extended error code semantics.

Make sense. 

>
>> Users can convert by the scripts mentioned in this
>> thread(https://www.spinics.net/lists/linux-btrfs/msg33252.html), but is
>> it easier to use the off-the-shelf btrfs subcommand?
>
>Adding a subcommand would work, though I'd rather avoid reimplementing
>'cp -ax' or 'rsync -ax'.  We want to copy the files preserving all
>attributes, with reflink, and be able to identify partially synced
>files, and not cross the mountpoints or subvolumes.

I agree that re-implementation of cp is worthless. 'cp -ax
--reflink=always' already meet the above requirements, except identifing
partially synced files.

In fact, I'm not sure what the partially synced files mean? The files
being partially written to disk? If yes, why we need to identify these?
Or this refers to the open and live files mentioned below?

>
>The middle step with snapshotting the containing subvolume before
>syncing the data is also a valid option, but not always necessary.
>
>> After an initial consideration, our implementation is broadly divided
>> into the following steps:
>> 1. Freeze the filesystem or set the subvolume above the source directory
>> to read-only;
>
>Freezing the filesystme will freeze all IO, so this would not work, but
>I understand what you mean. The file data are synced before the snapshot
>is taken, but nothing prevents applications to continue writing data.
>
>Open and live files is a problem and don't see a nice solution here.
>
>> 2. Perform a pre-check, for example, check if a cross-device link
>> creation during the conversion;
>
>Cross-device links are not a problem as long as we use 'cp' ie. the
>manual creation of files in the target.

└── subvol_1
├── dir_1
│   └── file_1_hard_link
└── file_1

If we want to convert dir_1 to a new subvolume, we can't create the hard
link of file1 in the new subvolume so that we must abort the conversion.

>
>> 3. Perform conversion, such as creating a new subvolume and moving the
>> contents of the source directory;
>> 4. Thaw the filesystem or restore the subvolume writable property.
>> 
>> In fact, I am not so sure whether this use of freeze is appropriate
>> because the source directory the user needs to convert may be located
>> at / or /home and this pre-check and conversion process may take a long
>> time, which can lead to some shell and graphical application suspended.
>
>I think the closest operation is a read-only remount, which is not
>always possible due to open files and can otherwise considered as quite
>intrusive operation to the whole system. And the root filesystem cannot
>be easily remounted read-only in the systemd days anyway.
>
>

-- 
Thanks,
Lu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] btrfs: cleanup device states define BTRFS_DEV_STATE_CAN_DISCARD

2017-11-29 Thread Anand Jain



On 11/29/2017 05:39 PM, Nikolay Borisov wrote:



On 29.11.2017 06:45, Anand Jain wrote:

Currently device state is being managed by each individual int
variable such as struct btrfs_device::can_discard. Instead of that
declare btrfs_device::dev_state BTRFS_DEV_STATE_CAN_DISCARD and use
the bit operations.

Signed-off-by: Anand Jain 
---
  fs/btrfs/extent-tree.c | 3 ++-
  fs/btrfs/volumes.c | 6 +++---
  fs/btrfs/volumes.h | 2 +-
  3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f81d928754e1..ee79f7cdc543 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2155,7 +2155,8 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, 
u64 bytenr,
  
  		for (i = 0; i < bbio->num_stripes; i++, stripe++) {

u64 bytes;
-   if (!stripe->dev->can_discard)
+   if (!test_bit(BTRFS_DEV_STATE_CAN_DISCARD,
+   >dev->dev_state))
continue;


Given that we only check for discard support here I can't help but think
do we really need to duplicate the information. We already have struct
block_device in struct btrfs_device, why don't we query for discard
support directly the underlying device, rather than duplicating
information?


 I agree. bdev_get_queue(bdev) and blk_queue_discard(q) are light weight
 too.
 So I shall withdraw this patch and sent the following patch to
 remove can_discard altogether.
   [PATCH] btrfs: drop btrfs_device::can_discard to query directly

Thanks, Anand


  
  			ret = btrfs_issue_discard(stripe->dev->bdev,

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 75839e07ce10..a9c6486f06f4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -647,7 +647,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices 
*fs_devices,
  
  	q = bdev_get_queue(bdev);

if (blk_queue_discard(q))
-   device->can_discard = 1;
+   set_bit(BTRFS_DEV_STATE_CAN_DISCARD, >dev_state);
if (!blk_queue_nonrot(q))
fs_devices->rotating = 1;
  
@@ -2401,7 +2401,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
  
  	q = bdev_get_queue(bdev);

if (blk_queue_discard(q))
-   device->can_discard = 1;
+   set_bit(BTRFS_DEV_STATE_CAN_DISCARD, >dev_state);
set_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state);
device->generation = trans->transid;
device->io_width = fs_info->sectorsize;
@@ -2601,7 +2601,7 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info 
*fs_info,
  
  	q = bdev_get_queue(bdev);

if (blk_queue_discard(q))
-   device->can_discard = 1;
+   set_bit(BTRFS_DEV_STATE_CAN_DISCARD, >dev_state);
mutex_lock(_info->fs_devices->device_list_mutex);
set_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state);
device->generation = 0;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 2fbff6902c8d..85e4b2dcc071 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -50,6 +50,7 @@ struct btrfs_pending_bios {
  #define BTRFS_DEV_STATE_WRITEABLE (1UL << 1)
  #define BTRFS_DEV_STATE_IN_FS_METADATA(1UL << 2)
  #define BTRFS_DEV_STATE_MISSING   (1UL << 3)
+#define BTRFS_DEV_STATE_CAN_DISCARD(1UL << 4)
  
  struct btrfs_device {

struct list_head dev_list;
@@ -74,7 +75,6 @@ struct btrfs_device {
fmode_t mode;
  
  	unsigned long dev_state;

-   int can_discard;
int is_tgtdev_for_dev_replace;
blk_status_t last_flush_error;
int flush_bio_sent;


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: drop btrfs_device::can_discard to query directly

2017-11-29 Thread Anand Jain
We can query the bdev directly when needed at btrfs_discard_extent()
so drop btrfs_device::can_discard.

Signed-off-by: Anand Jain 
Suggested-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 5 -
 fs/btrfs/volumes.c | 8 
 fs/btrfs/volumes.h | 1 -
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 85a545cf4395..830666c95360 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2145,7 +2145,10 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, 
u64 bytenr,
 
for (i = 0; i < bbio->num_stripes; i++, stripe++) {
u64 bytes;
-   if (!stripe->dev->can_discard)
+   struct request_queue *req_q;
+
+   req_q = bdev_get_queue(stripe->dev->bdev);
+   if (!blk_queue_discard(req_q))
continue;
 
ret = btrfs_issue_discard(stripe->dev->bdev,
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ff62919d568f..ff52ecf17402 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -630,8 +630,6 @@ static int btrfs_open_one_device(struct btrfs_fs_devices 
*fs_devices,
}
 
q = bdev_get_queue(bdev);
-   if (blk_queue_discard(q))
-   device->can_discard = 1;
if (!blk_queue_nonrot(q))
fs_devices->rotating = 1;
 
@@ -2387,8 +2385,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, 
const char *device_path
}
 
q = bdev_get_queue(bdev);
-   if (blk_queue_discard(q))
-   device->can_discard = 1;
set_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state);
device->generation = trans->transid;
device->io_width = fs_info->sectorsize;
@@ -2539,7 +2535,6 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info 
*fs_info,
  struct btrfs_device *srcdev,
  struct btrfs_device **device_out)
 {
-   struct request_queue *q;
struct btrfs_device *device;
struct block_device *bdev;
struct list_head *devices;
@@ -2596,9 +2591,6 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info 
*fs_info,
}
rcu_assign_pointer(device->name, name);
 
-   q = bdev_get_queue(bdev);
-   if (blk_queue_discard(q))
-   device->can_discard = 1;
mutex_lock(_info->fs_devices->device_list_mutex);
set_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state);
device->generation = 0;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 2fbff6902c8d..1cc693716b9a 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -74,7 +74,6 @@ struct btrfs_device {
fmode_t mode;
 
unsigned long dev_state;
-   int can_discard;
int is_tgtdev_for_dev_replace;
blk_status_t last_flush_error;
int flush_bio_sent;
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] btrfs: cleanup device states define BTRFS_DEV_STATE_WRITEABLE

2017-11-29 Thread Anand Jain



On 11/29/2017 05:14 PM, Nikolay Borisov wrote:



On 29.11.2017 06:45, Anand Jain wrote:

Currently device state is being managed by each individual int
variable such as struct btrfs_device::writeable. Instead of that
declare device state BTRFS_DEV_STATE_WRITEABLE and use the
bit operations.

Signed-off-by: Anand Jain 
---
  fs/btrfs/disk-io.c | 12 ++
  fs/btrfs/extent-tree.c |  2 +-
  fs/btrfs/extent_io.c   |  3 ++-
  fs/btrfs/ioctl.c   |  2 +-
  fs/btrfs/scrub.c   |  3 ++-
  fs/btrfs/volumes.c | 63 +-
  fs/btrfs/volumes.h |  4 +++-
  7 files changed, 54 insertions(+), 35 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index dfdab849037b..0d361b6713e1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3563,7 +3563,8 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
continue;
if (!dev->bdev)
continue;
-   if (!dev->in_fs_metadata || !dev->writeable)
+   if (!dev->in_fs_metadata ||
+   !test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state))
continue;
  
  		write_dev_flush(dev);

@@ -3578,7 +3579,8 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
errors_wait++;
continue;
}
-   if (!dev->in_fs_metadata || !dev->writeable)
+   if (!dev->in_fs_metadata ||
+   !test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state))
continue;
  
  		ret = wait_dev_flush(dev);

@@ -3675,7 +3677,8 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int 
max_mirrors)
total_errors++;
continue;
}
-   if (!dev->in_fs_metadata || !dev->writeable)
+   if (!dev->in_fs_metadata ||
+   !test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state))
continue;
  
  		btrfs_set_stack_device_generation(dev_item, 0);

@@ -3714,7 +3717,8 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int 
max_mirrors)
list_for_each_entry_rcu(dev, head, dev_list) {
if (!dev->bdev)
continue;
-   if (!dev->in_fs_metadata || !dev->writeable)
+   if (!dev->in_fs_metadata ||
+   !test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state))
continue;
  
  		ret = wait_dev_supers(dev, max_mirrors);

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e2d7e86b51d1..f81d928754e1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10935,7 +10935,7 @@ static int btrfs_trim_free_extents(struct btrfs_device 
*device,
*trimmed = 0;
  
  	/* Not writeable = nothing to do. */

-   if (!device->writeable)
+   if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state))
return 0;
  
  	/* No free space = nothing to do. */

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 7fa50e12f18e..f51c797847c7 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2028,7 +2028,8 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 
ino, u64 start,
bio->bi_iter.bi_sector = sector;
dev = bbio->stripes[bbio->mirror_num - 1].dev;
btrfs_put_bbio(bbio);
-   if (!dev || !dev->bdev || !dev->writeable) {
+   if (!dev || !dev->bdev ||
+   !test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state)) {
btrfs_bio_counter_dec(fs_info);
bio_put(bio);
return -EIO;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 6c7a49faf4e0..8a74c83503d6 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1518,7 +1518,7 @@ static noinline int btrfs_ioctl_resize(struct file *file,
goto out_free;
}
  
-	if (!device->writeable) {

+   if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state)) {
btrfs_info(fs_info,
   "resizer unable to apply on readonly device %llu",
   devid);
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index e3f6c49e5c4d..e027e0de66a5 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -4117,7 +4117,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
devid, u64 start,
return -ENODEV;
}
  
-	if (!is_dev_replace && !readonly && !dev->writeable) {

+   if (!is_dev_replace && !readonly &&
+   !test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state)) {
mutex_unlock(_info->fs_devices->device_list_mutex);
rcu_read_lock();
name = rcu_dereference(dev->name);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 397b335d108c..0f5be1808c6e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -634,10 +634,15 @@ static int 

Re: [PATCH 4/4] btrfs: cleanup device states define BTRFS_DEV_STATE_CAN_DISCARD

2017-11-29 Thread Nikolay Borisov


On 29.11.2017 06:45, Anand Jain wrote:
> Currently device state is being managed by each individual int
> variable such as struct btrfs_device::can_discard. Instead of that
> declare btrfs_device::dev_state BTRFS_DEV_STATE_CAN_DISCARD and use
> the bit operations.
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/extent-tree.c | 3 ++-
>  fs/btrfs/volumes.c | 6 +++---
>  fs/btrfs/volumes.h | 2 +-
>  3 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index f81d928754e1..ee79f7cdc543 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2155,7 +2155,8 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, 
> u64 bytenr,
>  
>   for (i = 0; i < bbio->num_stripes; i++, stripe++) {
>   u64 bytes;
> - if (!stripe->dev->can_discard)
> + if (!test_bit(BTRFS_DEV_STATE_CAN_DISCARD,
> + >dev->dev_state))
>   continue;

Given that we only check for discard support here I can't help but think
do we really need to duplicate the information. We already have struct
block_device in struct btrfs_device, why don't we query for discard
support directly the underlying device, rather than duplicating
information?

>  
>   ret = btrfs_issue_discard(stripe->dev->bdev,
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 75839e07ce10..a9c6486f06f4 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -647,7 +647,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices 
> *fs_devices,
>  
>   q = bdev_get_queue(bdev);
>   if (blk_queue_discard(q))
> - device->can_discard = 1;
> + set_bit(BTRFS_DEV_STATE_CAN_DISCARD, >dev_state);
>   if (!blk_queue_nonrot(q))
>   fs_devices->rotating = 1;
>  
> @@ -2401,7 +2401,7 @@ int btrfs_init_new_device(struct btrfs_fs_info 
> *fs_info, const char *device_path
>  
>   q = bdev_get_queue(bdev);
>   if (blk_queue_discard(q))
> - device->can_discard = 1;
> + set_bit(BTRFS_DEV_STATE_CAN_DISCARD, >dev_state);
>   set_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state);
>   device->generation = trans->transid;
>   device->io_width = fs_info->sectorsize;
> @@ -2601,7 +2601,7 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info 
> *fs_info,
>  
>   q = bdev_get_queue(bdev);
>   if (blk_queue_discard(q))
> - device->can_discard = 1;
> + set_bit(BTRFS_DEV_STATE_CAN_DISCARD, >dev_state);
>   mutex_lock(_info->fs_devices->device_list_mutex);
>   set_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state);
>   device->generation = 0;
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 2fbff6902c8d..85e4b2dcc071 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -50,6 +50,7 @@ struct btrfs_pending_bios {
>  #define BTRFS_DEV_STATE_WRITEABLE(1UL << 1)
>  #define BTRFS_DEV_STATE_IN_FS_METADATA   (1UL << 2)
>  #define BTRFS_DEV_STATE_MISSING  (1UL << 3)
> +#define BTRFS_DEV_STATE_CAN_DISCARD  (1UL << 4)
>  
>  struct btrfs_device {
>   struct list_head dev_list;
> @@ -74,7 +75,6 @@ struct btrfs_device {
>   fmode_t mode;
>  
>   unsigned long dev_state;
> - int can_discard;
>   int is_tgtdev_for_dev_replace;
>   blk_status_t last_flush_error;
>   int flush_bio_sent;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] btrfs: cleanup device states define BTRFS_DEV_STATE_MISSING

2017-11-29 Thread Nikolay Borisov


On 29.11.2017 06:45, Anand Jain wrote:
> Currently device state is being managed by each individual int
> variable such as struct btrfs_device::missing. Instead of that
> declare btrfs_device::dev_state BTRFS_DEV_STATE_MISSING and use
> the bit operations.
> 
> Signed-off-by: Anand Jain 

Reviewed-by : Nikolay Borisov 
> ---
>  fs/btrfs/dev-replace.c |  3 ++-
>  fs/btrfs/disk-io.c |  4 ++--
>  fs/btrfs/scrub.c   |  7 ---
>  fs/btrfs/super.c   |  2 +-
>  fs/btrfs/volumes.c | 33 +++--
>  fs/btrfs/volumes.h |  2 +-
>  6 files changed, 29 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 4b6ceb38cb5f..559db7667f38 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -306,7 +306,8 @@ void btrfs_after_dev_replace_commit(struct btrfs_fs_info 
> *fs_info)
>  
>  static inline char *dev_missing_or_rcu_str(struct btrfs_device *device)
>  {
> - return device->missing ? "" : rcu_str_deref(device->name);
> + return test_bit(BTRFS_DEV_STATE_MISSING, >dev_state) ?
> + "" : rcu_str_deref(device->name);
>  }
>  
>  int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index ab1a514e5c8d..ac1079fc9382 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3559,7 +3559,7 @@ static int barrier_all_devices(struct btrfs_fs_info 
> *info)
>   /* send down all the barriers */
>   head = >fs_devices->devices;
>   list_for_each_entry_rcu(dev, head, dev_list) {
> - if (dev->missing)
> + if (test_bit(BTRFS_DEV_STATE_MISSING, >dev_state))
>   continue;
>   if (!dev->bdev)
>   continue;
> @@ -3575,7 +3575,7 @@ static int barrier_all_devices(struct btrfs_fs_info 
> *info)
>  
>   /* wait for all the barriers */
>   list_for_each_entry_rcu(dev, head, dev_list) {
> - if (dev->missing)
> + if (test_bit(BTRFS_DEV_STATE_MISSING, >dev_state))
>   continue;
>   if (!dev->bdev) {
>   errors_wait++;
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 060fa93731e5..666c00cbee03 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -2535,7 +2535,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 
> logical, u64 len,
>   }
>  
>   WARN_ON(sblock->page_count == 0);
> - if (dev->missing) {
> + if (test_bit(BTRFS_DEV_STATE_MISSING, >dev_state)) {
>   /*
>* This case should only be hit for RAID 5/6 device replace. See
>* the comment in scrub_missing_raid56_pages() for details.
> @@ -2870,7 +2870,7 @@ static int scrub_extent_for_parity(struct scrub_parity 
> *sparity,
>   u8 csum[BTRFS_CSUM_SIZE];
>   u32 blocksize;
>  
> - if (dev->missing) {
> + if (test_bit(BTRFS_DEV_STATE_MISSING, >dev_state)) {
>   scrub_parity_mark_sectors_error(sparity, logical, len);
>   return 0;
>   }
> @@ -4112,7 +4112,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
> devid, u64 start,
>  
>   mutex_lock(_info->fs_devices->device_list_mutex);
>   dev = btrfs_find_device(fs_info, devid, NULL, NULL);
> - if (!dev || (dev->missing && !is_dev_replace)) {
> + if (!dev || (test_bit(BTRFS_DEV_STATE_MISSING, >dev_state) &&
> + !is_dev_replace)) {
>   mutex_unlock(_info->fs_devices->device_list_mutex);
>   return -ENODEV;
>   }
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 9089aad2f3aa..4b70ae6f0245 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -2243,7 +2243,7 @@ static int btrfs_show_devname(struct seq_file *m, 
> struct dentry *root)
>   while (cur_devices) {
>   head = _devices->devices;
>   list_for_each_entry(dev, head, dev_list) {
> - if (dev->missing)
> + if (test_bit(BTRFS_DEV_STATE_MISSING, >dev_state))
>   continue;
>   if (!dev->name)
>   continue;
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 13a6ae80bee1..75839e07ce10 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -773,9 +773,9 @@ static noinline int device_list_add(const char *path,
>   return -ENOMEM;
>   rcu_string_free(device->name);
>   rcu_assign_pointer(device->name, name);
> - if (device->missing) {
> + if (test_bit(BTRFS_DEV_STATE_MISSING, >dev_state)) {
>   fs_devices->missing_devices--;
> - device->missing = 0;
> + clear_bit(BTRFS_DEV_STATE_MISSING, >dev_state);
>   }
>   }
>  
> @@ -957,7 +957,7 

Re: [PATCH 2/4] btrfs: cleanup device states define BTRFS_DEV_STATE_IN_FS_METADATA

2017-11-29 Thread Nikolay Borisov


On 29.11.2017 06:45, Anand Jain wrote:
> Currently device state is being managed by each individual int
> variable such as struct btrfs_device::in_fs_metadata. Instead of
> that declare device state BTRFS_DEV_STATE_IN_FS_METADATA and use
> the bit operations.
> 
> Signed-off-by: Anand Jain 

Reviewed-by: Nikolay Borisov 
> ---
>  fs/btrfs/disk-io.c | 21 ++---
>  fs/btrfs/scrub.c   |  3 ++-
>  fs/btrfs/super.c   |  5 +++--
>  fs/btrfs/volumes.c | 29 +
>  fs/btrfs/volumes.h |  2 +-
>  5 files changed, 37 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 0d361b6713e1..ab1a514e5c8d 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3563,8 +3563,10 @@ static int barrier_all_devices(struct btrfs_fs_info 
> *info)
>   continue;
>   if (!dev->bdev)
>   continue;
> - if (!dev->in_fs_metadata ||
> - !test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state))
> + if (!test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
> + >dev_state) ||
> + !test_bit(BTRFS_DEV_STATE_WRITEABLE,
> + >dev_state))
>   continue;
>  
>   write_dev_flush(dev);
> @@ -3579,8 +3581,10 @@ static int barrier_all_devices(struct btrfs_fs_info 
> *info)
>   errors_wait++;
>   continue;
>   }
> - if (!dev->in_fs_metadata ||
> - !test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state))
> + if (!test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
> + >dev_state) ||
> + !test_bit(BTRFS_DEV_STATE_WRITEABLE,
> + >dev_state))
>   continue;
>  
>   ret = wait_dev_flush(dev);
> @@ -3677,7 +3681,8 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int 
> max_mirrors)
>   total_errors++;
>   continue;
>   }
> - if (!dev->in_fs_metadata ||
> + if (!test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
> + >dev_state) ||
>   !test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state))
>   continue;
>  
> @@ -3717,8 +3722,10 @@ int write_all_supers(struct btrfs_fs_info *fs_info, 
> int max_mirrors)
>   list_for_each_entry_rcu(dev, head, dev_list) {
>   if (!dev->bdev)
>   continue;
> - if (!dev->in_fs_metadata ||
> - !test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state))
> + if (!test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
> + >dev_state) ||
> + !test_bit(BTRFS_DEV_STATE_WRITEABLE,
> + >dev_state))
>   continue;
>  
>   ret = wait_dev_supers(dev, max_mirrors);
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index e027e0de66a5..060fa93731e5 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -4129,7 +4129,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
> devid, u64 start,
>   }
>  
>   mutex_lock(_info->scrub_lock);
> - if (!dev->in_fs_metadata || dev->is_tgtdev_for_dev_replace) {
> + if (!test_bit(BTRFS_DEV_STATE_IN_FS_METADATA, >dev_state) ||
> + dev->is_tgtdev_for_dev_replace) {
>   mutex_unlock(_info->scrub_lock);
>   mutex_unlock(_info->fs_devices->device_list_mutex);
>   return -EIO;
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 161694b66038..9089aad2f3aa 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1945,8 +1945,9 @@ static int btrfs_calc_avail_data_space(struct 
> btrfs_fs_info *fs_info,
>  
>   rcu_read_lock();
>   list_for_each_entry_rcu(device, _devices->devices, dev_list) {
> - if (!device->in_fs_metadata || !device->bdev ||
> - device->is_tgtdev_for_dev_replace)
> + if (!test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
> + >dev_state) ||
> + !device->bdev || device->is_tgtdev_for_dev_replace)
>   continue;
>  
>   if (i >= nr_devices)
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 0f5be1808c6e..13a6ae80bee1 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -652,7 +652,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices 
> *fs_devices,
>   fs_devices->rotating = 1;
>  
>   device->bdev = bdev;
> - device->in_fs_metadata = 0;
> + clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, >dev_state);
>   

[PATCH 9/9] btrfs-progs: tests/mkfs: Introduce test case to verify if mkfs.btrfs rootdir shrink behaves correctly

2017-11-29 Thread Qu Wenruo
Signed-off-by: Qu Wenruo 
---
 tests/mkfs-tests/012-rootdir-no-shrink/test.sh | 38 ++
 1 file changed, 38 insertions(+)
 create mode 100755 tests/mkfs-tests/012-rootdir-no-shrink/test.sh

diff --git a/tests/mkfs-tests/012-rootdir-no-shrink/test.sh 
b/tests/mkfs-tests/012-rootdir-no-shrink/test.sh
new file mode 100755
index ..1dfa55b376f6
--- /dev/null
+++ b/tests/mkfs-tests/012-rootdir-no-shrink/test.sh
@@ -0,0 +1,38 @@
+#!/bin/bash
+# Test if mkfs.btrfs --rootdir will skip shrinking correctly
+
+source "$TOP/tests/common"
+
+check_prereq mkfs.btrfs
+
+setup_root_helper
+
+fs_size=$((512 * 1024 * 1024))
+bs=$((1024 * 1024))
+tmp=$(mktemp -d --tmpdir btrfs-progs-mkfs.rootdirXXX)
+
+prepare_test_dev $fs_size
+
+### no shrink case ###
+
+run_check "$TOP/mkfs.btrfs" -f -r $tmp "$TEST_DEV"
+run_check_mount_test_dev
+
+# We should be able to write at least half of the fs size data since
+# the fs is not shrunk
+run_check $SUDO_HELPER dd if=/dev/zero bs=$bs count=$(($fs_size / $bs / 2)) \
+   of="$TEST_MNT/file"
+
+run_check_umount_test_dev
+
+### shrink case ###
+run_check "$TOP/mkfs.btrfs" -f -r $tmp --shrink "$TEST_DEV"
+run_check_mount_test_dev
+
+run_mustfail "mkfs.btrfs for shrink rootdir" \
+   $SUDO_HELPER dd if=/dev/zero bs=$bs count=$(($fs_size / $bs / 2)) \
+   of="$TEST_MNT/file"
+
+run_check_umount_test_dev
+
+rm -rf -- "$tmp"
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/9] btrfs-progs: mkfs/rootdir: Shrink fs for rootdir option

2017-11-29 Thread Qu Wenruo
Use the new dev extent based shrink method for rootdir option.

Signed-off-by: Qu Wenruo 
---
 mkfs/main.c|   5 +++
 mkfs/rootdir.c | 111 +
 mkfs/rootdir.h |   1 +
 3 files changed, 117 insertions(+)

diff --git a/mkfs/main.c b/mkfs/main.c
index eb49182ebe81..eb3e5bb1f92e 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1256,6 +1256,11 @@ raid_groups:
goto out;
}
}
+   ret = btrfs_mkfs_shrink_fs(fs_info, NULL);
+   if (ret < 0) {
+   error("error while shrinking filesystem: %d", ret);
+   goto out;
+   }
}
 
if (verbose) {
diff --git a/mkfs/rootdir.c b/mkfs/rootdir.c
index f8cff44d06eb..271167fe0c02 100644
--- a/mkfs/rootdir.c
+++ b/mkfs/rootdir.c
@@ -822,3 +822,114 @@ out:
 
return ret;
 }
+
+/*
+ * Set device size to @new_size.
+ *
+ * Only used for --rootdir option.
+ * We will need to reset the following values:
+ * 1) dev item in chunk tree
+ * 2) super->dev_item
+ * 3) super->total_bytes
+ */
+static int set_device_size(struct btrfs_fs_info *fs_info,
+  struct btrfs_device *device, u64 new_size)
+{
+   struct btrfs_root *chunk_root = fs_info->chunk_root;
+   struct btrfs_trans_handle *trans;
+   struct btrfs_dev_item *di;
+   struct btrfs_path path;
+   struct btrfs_key key;
+   int ret;
+
+   /*
+* Update in-meory device->total_bytes, so that at trans commit time,
+* it super->dev_item will also get updated
+*/
+   device->total_bytes = new_size;
+   btrfs_init_path();
+
+   /* Update device item in chunk tree */
+   trans = btrfs_start_transaction(chunk_root, 1);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   error("failed to start transaction: %d (%s)", ret,
+   strerror(-ret));
+   return ret;
+   }
+   key.objectid = BTRFS_DEV_ITEMS_OBJECTID;
+   key.type = BTRFS_DEV_ITEM_KEY;
+   key.offset = device->devid;
+
+   ret = btrfs_search_slot(trans, chunk_root, , , 0, 1);
+   if (ret < 0)
+   goto err;
+   if (ret > 0)
+   ret = -ENOENT;
+   di = btrfs_item_ptr(path.nodes[0], path.slots[0],
+   struct btrfs_dev_item);
+   btrfs_set_device_total_bytes(path.nodes[0], di, new_size);
+   btrfs_mark_buffer_dirty(path.nodes[0]);
+
+   /*
+* Update super->total_bytes, since it's only used for --rootdir,
+* there is only one device, just use the @new_size.
+*/
+   btrfs_set_super_total_bytes(fs_info->super_copy, new_size);
+
+   /*
+* Commit transaction to reflect the updated super->total_bytes and
+* super->dev_item
+*/
+   ret = btrfs_commit_transaction(trans, chunk_root);
+   if (ret < 0)
+   error("failed to commit current transaction: %d (%s)",
+   ret, strerror(-ret));
+   btrfs_release_path();
+   return ret;
+
+err:
+   btrfs_release_path();
+   /*
+* Commit trans here won't cause problem since the fs still has
+* bad magic, and something wrong already happened, we don't
+* care the return value anyway.
+*/
+   btrfs_commit_transaction(trans, chunk_root);
+   return ret;
+}
+
+int btrfs_mkfs_shrink_fs(struct btrfs_fs_info *fs_info, u64 *new_size_ret)
+{
+   u64 new_size;
+   struct btrfs_device *device;
+   struct list_head *cur;
+   int nr_devs = 0;
+   int ret;
+
+   list_for_each(cur, _info->fs_devices->devices)
+   nr_devs++;
+
+   if (nr_devs > 1) {
+   error("cannot shrink fs with more than 1 device");
+   return -ENOTTY;
+   }
+
+   ret = get_device_extent_end(fs_info, 1, _size);
+   if (ret < 0) {
+   error("failed to get minimal device size: %d (%s)",
+   ret, strerror(-ret));
+   return ret;
+   }
+
+   BUG_ON(!IS_ALIGNED(new_size, fs_info->sectorsize));
+
+   device = list_entry(fs_info->fs_devices->devices.next,
+  struct btrfs_device, dev_list);
+   ret = set_device_size(fs_info, device, new_size);
+   if (ret < 0)
+   return ret;
+   if (new_size_ret)
+   *new_size_ret = new_size;
+   return ret;
+}
diff --git a/mkfs/rootdir.h b/mkfs/rootdir.h
index ada50ccb6ac4..91bebc507f02 100644
--- a/mkfs/rootdir.h
+++ b/mkfs/rootdir.h
@@ -32,4 +32,5 @@ int btrfs_mkfs_fill_dir(const char *source_dir, struct 
btrfs_root *root,
bool verbose);
 u64 btrfs_mkfs_size_dir(const char *dir_name, u32 sectorsize, u64 min_dev_size,
u64 meta_profile, u64 data_profile);
+int btrfs_mkfs_shrink_fs(struct btrfs_fs_info *fs_info, u64 

[PATCH 8/9] btrfs-progs: mkfs: Use the whole file or block device to mkfs for rootdir

2017-11-29 Thread Qu Wenruo
For --rootdir, even for large existing file or block device, it will
always shrink the result filesystem.

The problem is, mkfs.btrfs will try to calculate the dir size, and use
it as @block_count to mkfs, which makes the result filesystem shrinked.

Fix by trying to get the original block device or file size as
@block_count, so mkfs.btrfs can use the full file/block device for
--rootdir option.

Signed-off-by: Qu Wenruo 
---
 mkfs/main.c | 22 --
 utils.c |  2 +-
 utils.h |  1 +
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/mkfs/main.c b/mkfs/main.c
index a34b73bfb845..eeeba9292856 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -995,13 +995,31 @@ int main(int argc, char **argv)
 * This must be done before minimal device size check.
 */
if (source_dir_set) {
-   fd = open(file, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR | S_IRGRP |
- S_IWGRP | S_IROTH);
+   int oflags = O_RDWR;
+   struct stat statbuf;
+
+   if (is_path_exist(file) == 0)
+   oflags |= O_CREAT;
+
+   fd = open(file, oflags , S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP |
+S_IROTH);
if (fd < 0) {
error("unable to open %s: %s", file, strerror(errno));
goto error;
}
+   ret = fstat(fd, );
+   if (ret < 0) {
+   error("unable to stat %s: %s", file, strerror(errno));
+   ret = -errno;
+   goto error;
+   }
 
+   /*
+* Block_count not specified, use file/dev size first.
+* Or we will always use source_dir_size calculated for mkfs.
+*/
+   if (!block_count)
+   block_count = btrfs_device_size(fd, );
source_dir_size = btrfs_mkfs_size_dir(source_dir, sectorsize,
min_dev_size, metadata_profile, data_profile);
if (block_count < source_dir_size)
diff --git a/utils.c b/utils.c
index 22c137514592..80071e23fe2b 100644
--- a/utils.c
+++ b/utils.c
@@ -447,7 +447,7 @@ int is_mount_point(const char *path)
return ret;
 }
 
-static int is_reg_file(const char *path)
+int is_reg_file(const char *path)
 {
struct stat statbuf;
 
diff --git a/utils.h b/utils.h
index 860c25d3cdba..bd7484fd3620 100644
--- a/utils.h
+++ b/utils.h
@@ -123,6 +123,7 @@ char *__strncpy_null(char *dest, const char *src, size_t n);
 int is_block_device(const char *file);
 int is_mount_point(const char *file);
 int is_path_exist(const char *file);
+int is_reg_file(const char *path);
 int check_arg_type(const char *input);
 int open_path_or_dev_mnt(const char *path, DIR **dirstream, int verbose);
 int btrfs_open(const char *path, DIR **dirstream, int verbose, int dir_only);
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/9] btrfs-progs: mkfs: Don't use custom chunk allocator for rootdir

2017-11-29 Thread Qu Wenruo
Remove these custom chunk allocator for mkfs.
Use generic btrfs chunk allocator instead.

Signed-off-by: Qu Wenruo 
---
 mkfs/main.c | 62 -
 1 file changed, 62 deletions(-)

diff --git a/mkfs/main.c b/mkfs/main.c
index a81718d1517d..716395c4b6b4 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -413,53 +413,6 @@ static char *parse_subvol_name(const char *input)
return strdup(input);
 }
 
-static int create_chunks(struct btrfs_trans_handle *trans,
-struct btrfs_root *root, u64 num_of_meta_chunks,
-u64 size_of_data,
-struct mkfs_allocation *allocation)
-{
-   struct btrfs_fs_info *fs_info = root->fs_info;
-   u64 chunk_start;
-   u64 chunk_size;
-   u64 meta_type = BTRFS_BLOCK_GROUP_METADATA;
-   u64 data_type = BTRFS_BLOCK_GROUP_DATA;
-   u64 minimum_data_chunk_size = SZ_8M;
-   u64 i;
-   int ret;
-
-   for (i = 0; i < num_of_meta_chunks; i++) {
-   ret = btrfs_alloc_chunk(trans, fs_info,
-   _start, _size, meta_type);
-   if (ret)
-   return ret;
-   ret = btrfs_make_block_group(trans, fs_info, 0,
-meta_type, 
BTRFS_FIRST_CHUNK_TREE_OBJECTID,
-chunk_start, chunk_size);
-   allocation->metadata += chunk_size;
-   if (ret)
-   return ret;
-   set_extent_dirty(>fs_info->free_space_cache,
-chunk_start, chunk_start + chunk_size - 1);
-   }
-
-   if (size_of_data < minimum_data_chunk_size)
-   size_of_data = minimum_data_chunk_size;
-
-   ret = btrfs_alloc_data_chunk(trans, fs_info,
-_start, size_of_data, data_type, 0);
-   if (ret)
-   return ret;
-   ret = btrfs_make_block_group(trans, fs_info, 0,
-data_type, BTRFS_FIRST_CHUNK_TREE_OBJECTID,
-chunk_start, size_of_data);
-   allocation->data += size_of_data;
-   if (ret)
-   return ret;
-   set_extent_dirty(>fs_info->free_space_cache,
-chunk_start, chunk_start + size_of_data - 1);
-   return ret;
-}
-
 static int zero_output_file(int out_fd, u64 size)
 {
int loop_num;
@@ -1256,21 +1209,6 @@ raid_groups:
}
 
if (source_dir_set) {
-   trans = btrfs_start_transaction(root, 1);
-   BUG_ON(IS_ERR(trans));
-   ret = create_chunks(trans, root,
-   num_of_meta_chunks, size_of_data,
-   );
-   if (ret) {
-   error("unable to create chunks: %d", ret);
-   goto out;
-   }
-   ret = btrfs_commit_transaction(trans, root);
-   if (ret) {
-   error("transaction commit failed: %d", ret);
-   goto out;
-   }
-
if (subvol_name_set) {
u64 dirid, objectid;
struct btrfs_root *file_root;
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/9] btrfs-progs: mkfs/rootdir: Use over-reserve method to make size estimate easier

2017-11-29 Thread Qu Wenruo
Use an easier method to calculate the estimate device for mkfs.btrfs
--rootdir.

The new method will over-estimate, but should ensure we won't encounter
ENOSPC.

It relies on the following data to estimate:
1) number of inodes
   for metadata chunk size
2) rounded up data size of each regular inode
   for data chunk size.

Total meta chunk size = round_up(nr_inode * (PATH_MAX * 3 + sectorsize),
min_chunk_size) * profile_multiplier

PATH_MAX is the maximum size possible for INODE_REF/DIR_INDEX/DIR_ITEM.
Sectorsize is the maximum size possible for inline extent.
min_chunk_size is 8M for SINGLE, and 32M for DUP, get from
btrfs_alloc_chunk().
profile_multiplier is 1 for Single, 2 for DUP.

Total data chunk size is much easier.
Total data chunk size = round_up(total_data_usage, min_chunk_size) *
profile_multiplier

Total_data_usage is the sum of *rounded up* size of each regular inode
use.
min_chunk_size is 8M for SINGLE, 64M for DUP, get from
btrfS_alloc_chunk().
Same profile_multiplier for meta.

This over-estimate calculate is, of course, over-estimate.
But since we will later shrink the fs to its real usage, it doesn't
matter much now.

Signed-off-by: Qu Wenruo 
---
 mkfs/main.c| 109 ++--
 mkfs/rootdir.c | 119 +++--
 mkfs/rootdir.h |   5 +--
 3 files changed, 139 insertions(+), 94 deletions(-)

diff --git a/mkfs/main.c b/mkfs/main.c
index 716395c4b6b4..eb49182ebe81 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -747,8 +747,6 @@ int main(int argc, char **argv)
int subvol_name_set = 0;
char *source_dir = NULL;
int source_dir_set = 0;
-   u64 num_of_meta_chunks = 0;
-   u64 size_of_data = 0;
u64 source_dir_size = 0;
u64 min_dev_size;
int dev_cnt = 0;
@@ -977,6 +975,34 @@ int main(int argc, char **argv)
 
min_dev_size = btrfs_min_dev_size(nodesize, mixed, metadata_profile,
  data_profile);
+   /*
+* Enlarge the destination file or create new one, using the
+* size calculated from source dir.
+*
+* This must be done before minimal device size check.
+*/
+   if (source_dir_set) {
+   fd = open(file, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR | S_IRGRP |
+ S_IWGRP | S_IROTH);
+   if (fd < 0) {
+   error("unable to open %s: %s", file, strerror(errno));
+   goto error;
+   }
+
+   source_dir_size = btrfs_mkfs_size_dir(source_dir, sectorsize,
+   min_dev_size, metadata_profile, data_profile);
+   if (block_count < source_dir_size)
+   block_count = source_dir_size;
+   ret = zero_output_file(fd, block_count);
+   if (ret) {
+   error("unable to zero the output file");
+   close(fd);
+   goto error;
+   }
+   /* our "device" is the new image file */
+   dev_block_count = block_count;
+   close(fd);
+   }
/* Check device/block_count after the nodesize is determined */
if (block_count && block_count < min_dev_size) {
error("size %llu is too small to make a usable filesystem",
@@ -1010,51 +1036,28 @@ int main(int argc, char **argv)
 
dev_cnt--;
 
-   if (!source_dir_set) {
-   /*
-* open without O_EXCL so that the problem should not
-* occur by the following processing.
-* (btrfs_register_one_device() fails if O_EXCL is on)
-*/
-   fd = open(file, O_RDWR);
-   if (fd < 0) {
-   error("unable to open %s: %s", file, strerror(errno));
-   goto error;
-   }
-   ret = btrfs_prepare_device(fd, file, _block_count,
-   block_count,
-   (zero_end ? PREP_DEVICE_ZERO_END : 0) |
-   (discard ? PREP_DEVICE_DISCARD : 0) |
-   (verbose ? PREP_DEVICE_VERBOSE : 0));
-   if (ret) {
-   goto error;
-   }
-   if (block_count && block_count > dev_block_count) {
-   error("%s is smaller than requested size, expected 
%llu, found %llu",
-   file,
-   (unsigned long long)block_count,
-   (unsigned long long)dev_block_count);
-   goto error;
-   }
-   } else {
-   fd = open(file, O_CREAT | O_RDWR,
-   S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | 
S_IROTH);
-   if (fd < 0) {
-   

[PATCH 5/9] btrfs-progs: mkfs: Separate shrink from rootdir

2017-11-29 Thread Qu Wenruo
Make --shrink a separate option for --rootdir, and make it default to
off.
So this will cause less confusion.

Signed-off-by: Qu Wenruo 
---
 Documentation/mkfs.btrfs.asciidoc | 11 +++
 mkfs/main.c   | 26 --
 mkfs/rootdir.c| 21 -
 mkfs/rootdir.h|  3 ++-
 4 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/Documentation/mkfs.btrfs.asciidoc 
b/Documentation/mkfs.btrfs.asciidoc
index 62f5a5bebaee..ded798216a5b 100644
--- a/Documentation/mkfs.btrfs.asciidoc
+++ b/Documentation/mkfs.btrfs.asciidoc
@@ -106,6 +106,17 @@ Please see the mount option 'discard' for that in 
`btrfs`(5).
 *-r|--rootdir *::
 Populate the toplevel subvolume with files from 'rootdir'.  This does not
 require root permissions and does not mount the filesystem.
++
+NOTE: This option may enlarge the image or file to ensure it's large enough to
+contain the files from 'rootdir'.
+
+*--shrink*:
+Shrink the filesystem to its minimal size, only works with *-r|--rootdir*
+option.
++
+NOTE: If the destination is regular file, this option will also reduce the
+file size. Or it will only reduce the filesystem available space.
+Extra space will not be usable unless resized using 'btrfs filesystem resize'.
 
 *-O|--features [,...]*::
 A list of filesystem features turned on at mkfs time. Not all features are
diff --git a/mkfs/main.c b/mkfs/main.c
index eb3e5bb1f92e..64cf03289188 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -746,9 +746,11 @@ int main(int argc, char **argv)
char *subvol_name = NULL;
int subvol_name_set = 0;
char *source_dir = NULL;
-   int source_dir_set = 0;
+   bool source_dir_set = false;
+   bool shrink_rootdir = false;
u64 source_dir_size = 0;
u64 min_dev_size;
+   u64 shrink_size;
int dev_cnt = 0;
int saved_optind;
char fs_uuid[BTRFS_UUID_UNPARSED_SIZE] = { 0 };
@@ -758,6 +760,7 @@ int main(int argc, char **argv)
 
while(1) {
int c;
+   enum { GETOPT_VAL_SHRINK = 257 };
static const struct option long_options[] = {
{ "alloc-start", required_argument, NULL, 'A'},
{ "byte-count", required_argument, NULL, 'b' },
@@ -776,6 +779,7 @@ int main(int argc, char **argv)
{ "features", required_argument, NULL, 'O' },
{ "uuid", required_argument, NULL, 'U' },
{ "quiet", 0, NULL, 'q' },
+   { "shrink", no_argument, NULL, GETOPT_VAL_SHRINK },
{ "help", no_argument, NULL, GETOPT_VAL_HELP },
{ NULL, 0, NULL, 0}
};
@@ -847,7 +851,7 @@ int main(int argc, char **argv)
break;
case 'r':
source_dir = optarg;
-   source_dir_set = 1;
+   source_dir_set = true;
break;
case 'U':
strncpy(fs_uuid, optarg,
@@ -859,6 +863,9 @@ int main(int argc, char **argv)
case 'q':
verbose = 0;
break;
+   case GETOPT_VAL_SHRINK:
+   shrink_rootdir = true;
+   break;
case GETOPT_VAL_HELP:
default:
print_usage(c != GETOPT_VAL_HELP);
@@ -886,6 +893,10 @@ int main(int argc, char **argv)
error("the option -r is limited to a single device");
goto error;
}
+   if (shrink_rootdir && !source_dir_set) {
+   error("the option --shrink can only be paired with -r");
+   goto error;
+   }
 
if (*fs_uuid) {
uuid_t dummy_uuid;
@@ -1256,10 +1267,13 @@ raid_groups:
goto out;
}
}
-   ret = btrfs_mkfs_shrink_fs(fs_info, NULL);
-   if (ret < 0) {
-   error("error while shrinking filesystem: %d", ret);
-   goto out;
+   if (shrink_rootdir) {
+   ret = btrfs_mkfs_shrink_fs(fs_info, _size,
+  shrink_rootdir);
+   if (ret < 0) {
+   error("error while shrinking filesystem: %d", 
ret);
+   goto out;
+   }
}
}
 
diff --git a/mkfs/rootdir.c b/mkfs/rootdir.c
index 271167fe0c02..b91900834316 100644
--- a/mkfs/rootdir.c
+++ b/mkfs/rootdir.c
@@ -899,11 +899,13 @@ err:
return ret;
 }
 
-int btrfs_mkfs_shrink_fs(struct btrfs_fs_info *fs_info, 

[PATCH 7/9] btrfs-progs: tests/mkfs: Introduce test case to check if mkfs rootdir can create new file

2017-11-29 Thread Qu Wenruo
To test regression 460e93f25754 ("btrfs-progs: mkfs: check the status of file at
mkfs").

Signed-off-by: Qu Wenruo 
---
 tests/mkfs-tests/011-rootdir-create-file/test.sh | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100755 tests/mkfs-tests/011-rootdir-create-file/test.sh

diff --git a/tests/mkfs-tests/011-rootdir-create-file/test.sh 
b/tests/mkfs-tests/011-rootdir-create-file/test.sh
new file mode 100755
index ..c6f3d74da885
--- /dev/null
+++ b/tests/mkfs-tests/011-rootdir-create-file/test.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+# Regression test for mkfs.btrfs --rootdir on non-exist file
+# Designed behavior is, it should create a new file if destination doesn't 
exist
+# Regression 460e93f25754 ("btrfs-progs: mkfs: check the status of file at 
mkfs")
+
+
+source "$TOP/tests/common"
+
+check_prereq mkfs.btrfs
+
+tmp=$(mktemp -d --tmpdir btrfs-progs-mkfs.rootdirXXX)
+run_check "$TOP/mkfs.btrfs" -f -r "$TOP/Documentation/" $tmp/new_file
+
+rm -rf -- "$tmp"
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/9] btrfs-progs: mkfs: Fix regression preventing --rootdir to create file

2017-11-29 Thread Qu Wenruo
Commit 460e93f25754 ("btrfs-progs: mkfs: check the status of file at mkfs")
will try to check the file state before creating fs on it.

The check is mostly fine for normal mkfs case, while for --rootdir
option, it's allowed to create new file if destination file doesn't
exist.

Fix it by allowing non-existing file if --rootdir is specified.

Fixes: 460e93f25754 ("btrfs-progs: mkfs: check the status of file at mkfs")
Signed-off-by: Qu Wenruo 
---
 mkfs/main.c |  6 --
 utils.c | 15 +++
 utils.h |  1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/mkfs/main.c b/mkfs/main.c
index 64cf03289188..a34b73bfb845 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -733,7 +733,7 @@ int main(int argc, char **argv)
u32 stripesize = 4096;
int zero_end = 1;
int fd = -1;
-   int ret;
+   int ret = 0;
int close_ret;
int i;
int mixed = 0;
@@ -913,7 +913,9 @@ int main(int argc, char **argv)
 
while (dev_cnt-- > 0) {
file = argv[optind++];
-   if (is_block_device(file) == 1)
+   if (source_dir_set && is_path_exist(file) == 0)
+   ret = 0;
+   else if (is_block_device(file) == 1)
ret = test_dev_for_mkfs(file, force_overwrite);
else
ret = test_status_for_mkfs(file, force_overwrite);
diff --git a/utils.c b/utils.c
index 524f463d3140..22c137514592 100644
--- a/utils.c
+++ b/utils.c
@@ -456,6 +456,21 @@ static int is_reg_file(const char *path)
return S_ISREG(statbuf.st_mode);
 }
 
+int is_path_exist(const char *path)
+{
+   struct stat statbuf;
+   int ret;
+
+   ret = stat(path, );
+   if (ret < 0) {
+   if (errno == ENOENT)
+   return 0;
+   else
+   return -errno;
+   }
+   return 1;
+}
+
 /*
  * This function checks if the given input parameter is
  * an uuid or a path
diff --git a/utils.h b/utils.h
index a82d46f6d7cc..860c25d3cdba 100644
--- a/utils.h
+++ b/utils.h
@@ -122,6 +122,7 @@ int set_label(const char *btrfs_dev, const char *label);
 char *__strncpy_null(char *dest, const char *src, size_t n);
 int is_block_device(const char *file);
 int is_mount_point(const char *file);
+int is_path_exist(const char *file);
 int check_arg_type(const char *input);
 int open_path_or_dev_mnt(const char *path, DIR **dirstream, int verbose);
 int btrfs_open(const char *path, DIR **dirstream, int verbose, int dir_only);
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/9] btrfs-progs: mkfs: Cleanup temporary chunks before filling rootdir

2017-11-29 Thread Qu Wenruo
Cleanup those temporary chunks should be as soon as possible, and it
should be especially before doing large tree operations, like filling
rootdir.

Signed-off-by: Qu Wenruo 
---
 mkfs/main.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/mkfs/main.c b/mkfs/main.c
index 2da38cb9490e..a81718d1517d 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1248,6 +1248,13 @@ raid_groups:
goto out;
}
 
+   ret = cleanup_temp_chunks(fs_info, , data_profile,
+ metadata_profile, metadata_profile);
+   if (ret < 0) {
+   error("failed to cleanup temporary chunks: %d", ret);
+   goto out;
+   }
+
if (source_dir_set) {
trans = btrfs_start_transaction(root, 1);
BUG_ON(IS_ERR(trans));
@@ -1311,12 +1318,6 @@ raid_groups:
}
}
}
-   ret = cleanup_temp_chunks(fs_info, , data_profile,
- metadata_profile, metadata_profile);
-   if (ret < 0) {
-   error("failed to cleanup temporary chunks: %d", ret);
-   goto out;
-   }
 
if (verbose) {
char features_buf[64];
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/9] Remaining part of mkfs --rootdir rework

2017-11-29 Thread Qu Wenruo
Can be fetched from my github repo:
https://github.com/adam900710/btrfs-progs/tree/mkfs_rootdir_rework

Based on the following commit head of David's devel branch:
--
 commit af322ba5aa1dd0b2a3422e1c4acd8082948efa7b (david/devel)
 Author: Su Yue 
 Date:   Tue Nov 28 17:14:48 2017 +0800

btrfs-progs: fi defrag: clean up duplicate code if find errors

In function cmd_filesystem_defrag(), lines of code for error handling
are duplicate and hard to expand in further.

Create a jump label for errors.

Signed-off-by: Su Yue 
Signed-off-by: David Sterba 
--

Patch 1 is split from original patch.
Patch 2~5 are the patches rebased. Only minor conflicts.

Patch 6~7 are regression fix and its test case. Which prohibits
mkfs.btrfs --rootdir from creating new file.

Patch 8~9 are fix and test case for incorrect shrink behavior, which
will shrink the fs even --shrink is not specified.

Qu Wenruo (9):
  btrfs-progs: mkfs: Cleanup temporary chunks before filling rootdir
  btrfs-progs: mkfs: Don't use custom chunk allocator for rootdir
  btrfs-progs: mkfs/rootdir: Use over-reserve method to make size
estimate easier
  btrfs-progs: mkfs/rootdir: Shrink fs for rootdir option
  btrfs-progs: mkfs: Separate shrink from rootdir
  btrfs-progs: mkfs: Fix regression preventing --rootdir to create file
  btrfs-progs: tests/mkfs: Introduce test case to check if mkfs rootdir
can create new file
  btrfs-progs: mkfs: Use the whole file or block device to mkfs for
rootdir
  btrfs-progs: tests/mkfs: Introduce test case to verify if mkfs.btrfs
rootdir shrink behaves correctly

 Documentation/mkfs.btrfs.asciidoc|  11 +
 mkfs/main.c  | 231 ++---
 mkfs/rootdir.c   | 249 +++
 mkfs/rootdir.h   |   7 +-
 tests/mkfs-tests/011-rootdir-create-file/test.sh |  14 ++
 tests/mkfs-tests/012-rootdir-no-shrink/test.sh   |  38 
 utils.c  |  17 +-
 utils.h  |   2 +
 8 files changed, 402 insertions(+), 167 deletions(-)
 create mode 100755 tests/mkfs-tests/011-rootdir-create-file/test.sh
 create mode 100755 tests/mkfs-tests/012-rootdir-no-shrink/test.sh

-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] btrfs: cleanup device states define BTRFS_DEV_STATE_WRITEABLE

2017-11-29 Thread Nikolay Borisov


On 29.11.2017 06:45, Anand Jain wrote:
> Currently device state is being managed by each individual int
> variable such as struct btrfs_device::writeable. Instead of that
> declare device state BTRFS_DEV_STATE_WRITEABLE and use the
> bit operations.
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/disk-io.c | 12 ++
>  fs/btrfs/extent-tree.c |  2 +-
>  fs/btrfs/extent_io.c   |  3 ++-
>  fs/btrfs/ioctl.c   |  2 +-
>  fs/btrfs/scrub.c   |  3 ++-
>  fs/btrfs/volumes.c | 63 
> +-
>  fs/btrfs/volumes.h |  4 +++-
>  7 files changed, 54 insertions(+), 35 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index dfdab849037b..0d361b6713e1 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3563,7 +3563,8 @@ static int barrier_all_devices(struct btrfs_fs_info 
> *info)
>   continue;
>   if (!dev->bdev)
>   continue;
> - if (!dev->in_fs_metadata || !dev->writeable)
> + if (!dev->in_fs_metadata ||
> + !test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state))
>   continue;
>  
>   write_dev_flush(dev);
> @@ -3578,7 +3579,8 @@ static int barrier_all_devices(struct btrfs_fs_info 
> *info)
>   errors_wait++;
>   continue;
>   }
> - if (!dev->in_fs_metadata || !dev->writeable)
> + if (!dev->in_fs_metadata ||
> + !test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state))
>   continue;
>  
>   ret = wait_dev_flush(dev);
> @@ -3675,7 +3677,8 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int 
> max_mirrors)
>   total_errors++;
>   continue;
>   }
> - if (!dev->in_fs_metadata || !dev->writeable)
> + if (!dev->in_fs_metadata ||
> + !test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state))
>   continue;
>  
>   btrfs_set_stack_device_generation(dev_item, 0);
> @@ -3714,7 +3717,8 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int 
> max_mirrors)
>   list_for_each_entry_rcu(dev, head, dev_list) {
>   if (!dev->bdev)
>   continue;
> - if (!dev->in_fs_metadata || !dev->writeable)
> + if (!dev->in_fs_metadata ||
> + !test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state))
>   continue;
>  
>   ret = wait_dev_supers(dev, max_mirrors);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index e2d7e86b51d1..f81d928754e1 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -10935,7 +10935,7 @@ static int btrfs_trim_free_extents(struct 
> btrfs_device *device,
>   *trimmed = 0;
>  
>   /* Not writeable = nothing to do. */
> - if (!device->writeable)
> + if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state))
>   return 0;
>  
>   /* No free space = nothing to do. */
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 7fa50e12f18e..f51c797847c7 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2028,7 +2028,8 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, 
> u64 ino, u64 start,
>   bio->bi_iter.bi_sector = sector;
>   dev = bbio->stripes[bbio->mirror_num - 1].dev;
>   btrfs_put_bbio(bbio);
> - if (!dev || !dev->bdev || !dev->writeable) {
> + if (!dev || !dev->bdev ||
> + !test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state)) {
>   btrfs_bio_counter_dec(fs_info);
>   bio_put(bio);
>   return -EIO;
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 6c7a49faf4e0..8a74c83503d6 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1518,7 +1518,7 @@ static noinline int btrfs_ioctl_resize(struct file 
> *file,
>   goto out_free;
>   }
>  
> - if (!device->writeable) {
> + if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state)) {
>   btrfs_info(fs_info,
>  "resizer unable to apply on readonly device %llu",
>  devid);
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index e3f6c49e5c4d..e027e0de66a5 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -4117,7 +4117,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
> devid, u64 start,
>   return -ENODEV;
>   }
>  
> - if (!is_dev_replace && !readonly && !dev->writeable) {
> + if (!is_dev_replace && !readonly &&
> + !test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state)) {
>   mutex_unlock(_info->fs_devices->device_list_mutex);
>   rcu_read_lock();
>   name = rcu_dereference(dev->name);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>