Re: btrfs space used issue

2018-02-27 Thread vinayak hegde
I ran full defragement and balance both, but didnt help.
My created and accounting usage files are matching the du -sh output.
But I am not getting why btrfs internals use so much extra space.
My worry is, will get no space error earlier than I expect.
Is it expected with btrfs internal that it will use so much extra space?

Vinayak




On Tue, Feb 27, 2018 at 7:24 PM, Austin S. Hemmelgarn
 wrote:
> On 2018-02-27 08:09, vinayak hegde wrote:
>>
>> I am using btrfs, But I am seeing du -sh and df -h showing huge size
>> difference on ssd.
>>
>> mount:
>> /dev/drbd1 on /dc/fileunifier.datacache type btrfs
>>
>> (rw,noatime,nodiratime,flushoncommit,discard,nospace_cache,recovery,commit=5,subvolid=5,subvol=/)
>>
>>
>> du -sh /dc/fileunifier.datacache/ -  331G
>>
>> df -h
>> /dev/drbd1  746G  346G  398G  47% /dc/fileunifier.datacache
>>
>> btrfs fi usage /dc/fileunifier.datacache/
>> Overall:
>>  Device size: 745.19GiB
>>  Device allocated: 368.06GiB
>>  Device unallocated: 377.13GiB
>>  Device missing: 0.00B
>>  Used: 346.73GiB
>>  Free (estimated): 396.36GiB(min: 207.80GiB)
>>  Data ratio:  1.00
>>  Metadata ratio:  2.00
>>  Global reserve: 176.00MiB(used: 0.00B)
>>
>> Data,single: Size:365.00GiB, Used:345.76GiB
>> /dev/drbd1 365.00GiB
>>
>> Metadata,DUP: Size:1.50GiB, Used:493.23MiB
>> /dev/drbd1   3.00GiB
>>
>> System,DUP: Size:32.00MiB, Used:80.00KiB
>> /dev/drbd1  64.00MiB
>>
>> Unallocated:
>> /dev/drbd1 377.13GiB
>>
>>
>> Even if we consider 6G metadata its 331+6 = 337.
>> where is 9GB used?
>>
>> Please explain.
>
> First, you're counting the metadata wrong.  The value shown per-device by
> `btrfs filesystem usage` already accounts for replication (so it's only 3 GB
> of metadata allocated, not 6 GB).  Neither `df` nor `du` looks at the chunk
> level allocations though.
>
> Now, with that out of the way, the discrepancy almost certainly comes form
> differences in how `df` and `du` calculate space usage.  In particular, `df`
> calls statvfs and looks at the f_blocks and f_bfree values to compute space
> usage, while `du` walks the filesystem tree calling stat on everything and
> looking at st_blksize and st_blocks (or instead at st_size if you pass in
> `--apparent-size` as an option).  This leads to a couple of differences in
> what they will count:
>
> 1. `du` may or may not properly count hardlinks, sparse files, and
> transparently compressed data, dependent on whether or not you use
> `--apparent-sizes` (by default, it does properly count all of those), while
> `df` will always account for those properly.
> 2. `du` does not properly account for reflinked blocks (from deduplication,
> snapshots, or use of the CLONE ioctl), and will count each reflink of every
> block as part of the total size, while `df` will always count each block
> exactly once no matter how many reflinks it has.
> 3. `du` does not account for all of the BTRFS metadata allocations,
> functionally ignoring space allocated for anything but inline data, while
> `df` accounts for all BTRFS metadata properly.
> 4. `du` will recurse into other filesystems if you don't pass the `-x`
> option to it, while `df` will only report for each filesystem separately.
> 5. `du` will only count data usage under the given mount point, and won't
> account for data on other subvolumes that may be mounted elsewhere (and if
> you pass in `-x` won't count data on other subvolumes located under the
> given path either), while `df` will count all the data in all subvolumes.
> 6. There are a couple of other differences too, but they're rather complex
> and dependent on the internals of BTRFS.
>
> In your case, I think the issue is probably one of the various things under
> item 6.  Items 1, 2 and 4 will cause `du` to report more space usage than
> `df`, item 3 is irrelevant because `du` shows less space than the total data
> chunk usage reported by `btrfs filesystem usage`, and item 5 is irrelevant
> because you're mounting the root subvolume and not using the `-x` option on
> `du` (and therefore there can't be other subvolumes you're missing).
>
> Try running a full defrag of the given mount point.  If what I think is
> causing this is in fact the issue, that should bring the numbers back
> in-line with each other.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: incremental send/receive ERROR: cannot find parent subvolume

2018-02-27 Thread Andrei Borzenkov
27.02.2018 01:54, Emil.s пишет:
> Hello,
> 
> I'm trying to restore a subvolume from a backup, but I'm failing when
> I try to setup the replication chain again.
> 
> Previously I had disk A and B, where I was sending snapshots from A to
> B using "send -c /disk_a/1 /disk_a/2 | receive /disk_b" and so on.
> Now disk A failed, so I got a new disk: C.
> 
> First I sent the last backup from B to C by running "send disk_b/2 |
> receive /disk_c/".
> 
> Then my plan was to use disk B as the new main disk, so on disk B, I
> created a new snapshot by running "snapshot -r disk_b/2
> disk_b/my_volume".
> Now I want to send "disk_b/my_volume" to disk C, so that I can set
> disk_b/2 to RW and start using it again, but the last send fails with
> "ERROR: cannot find parent subvolume".
> 
> Disk B:
> root@tillberga: /backup #> btrfs subvol sh snapshots/user_data_2018-02-17/
> snapshots/user_data_2018-02-17
> Name: user_data_2018-02-17
> UUID: 93433261-d954-9f4b-8319-d450ae079a9a
> Parent UUID: 51180286-c202-c94c-b8f9-2ecc8d2b5b7c
> Received UUID: 014fc004-ae04-0148-9525-1bf556fd4d10
> Flags: readonly
> Snapshot(s):
> user_data_test
> 
> Disk C:
> root@hedemora: /btrfs #> btrfs subvol sh user_data_2018-02-17/
> user_data_2018-02-17
> Name: user_data_2018-02-17
> UUID: 35361429-a42c-594b-b390-51ffb9725324
> Parent UUID: -
> Received UUID: 93433261-d954-9f4b-8319-d450ae079a9a
> Flags: readonly
> Snapshot(s):
> 
> Disk B has UUID 93433261-d954-9f4b-8319-d450ae079a9a, and disk C has
> "Received UUID: 93433261-d954-9f4b-8319-d450ae079a9a". I think that's
> alright?
> 
> The new snapshot on disk B looks as following:
> root@tillberga: /backup #> btrfs subvol sh user_data_test
> user_data_test
> Name: user_data_test
> UUID: bf88000c-e3a6-434b-8f69-f5ce2174227e
> Parent UUID: 93433261-d954-9f4b-8319-d450ae079a9a
> Received UUID: 014fc004-ae04-0148-9525-1bf556fd4d10
> Flags: readonly
> Snapshot(s):
> 
> But when I'm trying to send it, I'm getting the following:
> root@tillberga: /backup #> btrfs send -v -c
> snapshots/user_data_2018-02-17 user_data_test | ssh user@hedemora
> "sudo btrfs receive -v /btrfs/"
> At subvol user_data_test
> BTRFS_IOC_SEND returned 0
> joining genl thread
> receiving snapshot user_data_test
> uuid=014fc004-ae04-0148-9525-1bf556fd4d10, ctransid=52373
> parent_uuid=014fc004-ae04-0148-9525-1bf556fd4d10,
> parent_ctransid=52373
> At snapshot user_data_test
> ERROR: cannot find parent subvolume
> 
> Note that the receiver says
> "parent_uuid=014fc004-ae04-0148-9525-1bf556fd4d10". Not really sure
> where that comes from, but disk B has the same, so maybe that's the
> UUID of the original snapshot on disk A?
> 


If "Received UUID" is set on a subvolume, "btrfs send" will forward it
and set on destination unchanged, instead of computing "correct"
Received UUID. This has been discussed several times.

> Is it possible to continue to send incremental snapshots between these
> two file systems, or must I do a full sync?

You would need to create writable clone on B first and start with this
clone; new writable clone won't have (misleading) Received UUID and you
start clean.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 14/27] libbtrfsutil: add btrfs_util_deleted_subvolumes()

2018-02-27 Thread Misono, Tomohiro
On 2018/02/24 8:33, Omar Sandoval wrote:
> On Fri, Feb 23, 2018 at 11:12:56AM +0900, Misono, Tomohiro wrote:
>>
>> On 2018/02/16 4:04, Omar Sandoval wrote:
>>> From: Omar Sandoval 
>>>
>>> Signed-off-by: Omar Sandoval 
>>> ---
>>>  libbtrfsutil/btrfsutil.h| 21 +++
>>>  libbtrfsutil/python/btrfsutilpy.h   |  3 +
>>>  libbtrfsutil/python/module.c| 30 ++
>>>  libbtrfsutil/python/qgroup.c| 17 +-
>>>  libbtrfsutil/python/subvolume.c | 30 ++
>>>  libbtrfsutil/python/tests/test_subvolume.py |  8 +++
>>>  libbtrfsutil/subvolume.c| 89 
>>> +
>>>  7 files changed, 183 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/libbtrfsutil/btrfsutil.h b/libbtrfsutil/btrfsutil.h
>>> index 00c86174..677ab3c1 100644
>>> --- a/libbtrfsutil/btrfsutil.h
>>> +++ b/libbtrfsutil/btrfsutil.h
>>> @@ -534,6 +534,27 @@ enum btrfs_util_error 
>>> btrfs_util_subvolume_iterator_next_info(struct btrfs_util_
>>>   char **path_ret,
>>>   struct 
>>> btrfs_util_subvolume_info *subvol);
>>>  
>>> +/**
>>> + * btrfs_util_deleted_subvolumes() - Get a list of subvolume which have 
>>> been
>>> + * deleted but not yet cleaned up.
>>> + * @path: Path on a Btrfs filesystem.
>>> + * @ids: Returned array of subvolume IDs.
>>> + * @n: Returned number of IDs in the @ids array.
>>> + *
>>> + * This requires appropriate privilege (CAP_SYS_ADMIN).
>>> + *
>>> + * Return: %BTRFS_UTIL_OK on success, non-zero error code on failure.
>>> + */
>>> +enum btrfs_util_error btrfs_util_deleted_subvolumes(const char *path,
>>> +   uint64_t **ids,
>>> +   size_t *n);
>>> +
>>> +/**
>>> + * btrfs_util_deleted_subvolumes_fd() - See 
>>> btrfs_util_deleted_subvolumes().
>>> + */
>>> +enum btrfs_util_error btrfs_util_deleted_subvolumes_fd(int fd, uint64_t 
>>> **ids,
>>> +  size_t *n);
>>> +
>>>  /**
>>>   * btrfs_util_create_qgroup_inherit() - Create a qgroup inheritance 
>>> specifier
>>>   * for btrfs_util_create_subvolume() or btrfs_util_create_snapshot().
>>> diff --git a/libbtrfsutil/python/btrfsutilpy.h 
>>> b/libbtrfsutil/python/btrfsutilpy.h
>>> index b3ec047f..be5122e2 100644
>>> --- a/libbtrfsutil/python/btrfsutilpy.h
>>> +++ b/libbtrfsutil/python/btrfsutilpy.h
>>> @@ -54,6 +54,8 @@ struct path_arg {
>>>  int path_converter(PyObject *o, void *p);
>>>  void path_cleanup(struct path_arg *path);
>>>  
>>> +PyObject *list_from_uint64_array(const uint64_t *arr, size_t n);
>>> +
>>>  void SetFromBtrfsUtilError(enum btrfs_util_error err);
>>>  void SetFromBtrfsUtilErrorWithPath(enum btrfs_util_error err,
>>>struct path_arg *path);
>>> @@ -72,6 +74,7 @@ PyObject *set_default_subvolume(PyObject *self, PyObject 
>>> *args, PyObject *kwds);
>>>  PyObject *create_subvolume(PyObject *self, PyObject *args, PyObject *kwds);
>>>  PyObject *create_snapshot(PyObject *self, PyObject *args, PyObject *kwds);
>>>  PyObject *delete_subvolume(PyObject *self, PyObject *args, PyObject *kwds);
>>> +PyObject *deleted_subvolumes(PyObject *self, PyObject *args, PyObject 
>>> *kwds);
>>>  
>>>  void add_module_constants(PyObject *m);
>>>  
>>> diff --git a/libbtrfsutil/python/module.c b/libbtrfsutil/python/module.c
>>> index e995a1be..eaa062ac 100644
>>> --- a/libbtrfsutil/python/module.c
>>> +++ b/libbtrfsutil/python/module.c
>>> @@ -125,6 +125,29 @@ err:
>>> return 0;
>>>  }
>>>  
>>> +PyObject *list_from_uint64_array(const uint64_t *arr, size_t n)
>>> +{
>>> +PyObject *ret;
>>> +size_t i;
>>> +
>>> +ret = PyList_New(n);
>>> +if (!ret)
>>> +   return NULL;
>>> +
>>> +for (i = 0; i < n; i++) {
>>> +   PyObject *tmp;
>>> +
>>> +   tmp = PyLong_FromUnsignedLongLong(arr[i]);
>>> +   if (!tmp) {
>>> +   Py_DECREF(ret);
>>> +   return NULL;
>>> +   }
>>> +   PyList_SET_ITEM(ret, i, tmp);
>>> +}
>>> +
>>> +return ret;
>>> +}
>>> +
>>>  void path_cleanup(struct path_arg *path)
>>>  {
>>> Py_CLEAR(path->object);
>>> @@ -214,6 +237,13 @@ static PyMethodDef btrfsutil_methods[] = {
>>>  "path -- string, bytes, or path-like object\n"
>>>  "recursive -- if the given subvolume has child subvolumes, delete\n"
>>>  "them instead of failing"},
>>> +   {"deleted_subvolumes", (PyCFunction)deleted_subvolumes,
>>> +METH_VARARGS | METH_KEYWORDS,
>>> +"deleted_subvolumes(path)\n\n"
>>> +"Get the list of subvolume IDs which have been deleted but not yet\n"
>>> +"cleaned up\n\n"
>>> +"Arguments:\n"
>>> +"path -- string, bytes, path-like object, or open file descriptor"},
>>> {},
>>>  };
>>>  
>>> diff --git a/libbtrfsutil/python/qgroup.c 

[PATCH] Btrfs: dev-replace: skip prealloc extents when copy nocow pages

2018-02-27 Thread Liu Bo
It doens't make sense to process prealloc extents as pages will be
filled with zero when reading prealloc extents.

Signed-off-by: Liu Bo 
---
 fs/btrfs/scrub.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ec56f33..9882513 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -4480,7 +4480,8 @@ static int check_extent_to_block(struct btrfs_inode 
*inode, u64 start, u64 len,
 * move on to the next inode.
 */
if (em->block_start > logical ||
-   em->block_start + em->block_len < logical + len) {
+   em->block_start + em->block_len < logical + len ||
+   test_bit(EXTENT_FLAG_PREALLOC, >flags)) {
free_extent_map(em);
ret = 1;
goto out_unlock;
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] btrfs: introduce feature to forget a btrfs device

2018-02-27 Thread Liu Bo
On Thu, Jan 11, 2018 at 09:25:50AM +0800, Anand Jain wrote:
> Support for a new command 'btrfs dev forget [dev]' is proposed here,
> to undo the effects of 'btrfs dev scan [dev]'. For this purpose,
> this patch proposes to use ioctl #5 as it was empty.
>   IOW(BTRFS_IOCTL_MAGIC, 5, ..)
> This patch adds new ioctl BTRFS_IOC_FORGET_DEV which can be sent from
> the /dev/btrfs-control to forget one or all devices, (devices which are
> not mounted) from the btrfs kernel.
>

To me this seems to offer a debugging ability, could you please
elaborate the use case where we need to forget a particular device
instead of just wiping the uuid?

Thanks,

-liubo
> The argument it takes is struct btrfs_ioctl_vol_args_v2, and ::name can be
> set to specify the device path. And all unmounted devices can be removed
> from the kernel using the BTRFS_DEVICE_SPEC_ALL_DEV flag. Remove all
> devices functionality would override remove one device when both are
> specified in an IOCTL call.
> Again, the devices are removed only if the relevant fsid aren't mounted.
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/super.c   | 27 +++
>  fs/btrfs/volumes.c |  9 +
>  fs/btrfs/volumes.h |  1 +
>  include/uapi/linux/btrfs.h |  6 +-
>  4 files changed, 38 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 559fc53ff59e..6a9a5ce8af3b 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -2219,21 +2219,37 @@ static long btrfs_control_ioctl(struct file *file, 
> unsigned int cmd,
>   unsigned long arg)
>  {
>   struct btrfs_ioctl_vol_args *vol;
> + struct btrfs_ioctl_vol_args_v2 *vol2;
>   struct btrfs_fs_devices *fs_devices;
>   int ret = -ENOTTY;
>  
>   if (!capable(CAP_SYS_ADMIN))
>   return -EPERM;
>  
> - vol = memdup_user((void __user *)arg, sizeof(*vol));
> - if (IS_ERR(vol))
> - return PTR_ERR(vol);
> + if (cmd == BTRFS_IOC_FORGET_DEV) {
> + vol2 = memdup_user((void __user *)arg, sizeof(*vol2));
> + if (IS_ERR(vol2))
> + return PTR_ERR(vol2);
> +
> + if (vol2->flags & ~BTRFS_VOL_ARG_V2_FLAGS_SUPPORTED)
> + return -EOPNOTSUPP;
> + } else {
> + vol = memdup_user((void __user *)arg, sizeof(*vol));
> + if (IS_ERR(vol))
> + return PTR_ERR(vol);
> + }
>  
>   switch (cmd) {
>   case BTRFS_IOC_SCAN_DEV:
>   ret = btrfs_scan_one_device(vol->name, FMODE_READ,
>   _fs_type, _devices);
>   break;
> + case BTRFS_IOC_FORGET_DEV:
> + if (vol2->flags & BTRFS_DEVICE_SPEC_ALL_DEV)
> + ret = btrfs_forget_devices(NULL);
> + else
> + ret = btrfs_forget_devices(vol2->name);
> + break;
>   case BTRFS_IOC_DEVICES_READY:
>   ret = btrfs_scan_one_device(vol->name, FMODE_READ,
>   _fs_type, _devices);
> @@ -2246,7 +2262,10 @@ static long btrfs_control_ioctl(struct file *file, 
> unsigned int cmd,
>   break;
>   }
>  
> - kfree(vol);
> + if (cmd == BTRFS_IOC_FORGET_DEV)
> + kfree(vol2);
> + else
> + kfree(vol);
>   return ret;
>  }
>  
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index e947e47f8fff..b0c9948baf9a 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1171,6 +1171,15 @@ static int btrfs_read_disk_super(struct block_device 
> *bdev, u64 bytenr,
>   return 0;
>  }
>  
> +int btrfs_forget_devices(const char *path)
> +{
> + mutex_lock(_mutex);
> + btrfs_free_stale_devices(path, NULL);
> + mutex_unlock(_mutex);
> +
> + return 0;
> +}
> +
>  /*
>   * Look for a btrfs signature on a device. This may be called out of the 
> mount path
>   * and we are not allowed to call set_blocksize during the scan. The 
> superblock
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 15216fed918b..b954ca3b79a9 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -422,6 +422,7 @@ int btrfs_open_devices(struct btrfs_fs_devices 
> *fs_devices,
>  fmode_t flags, void *holder);
>  int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
> struct btrfs_fs_devices **fs_devices_ret);
> +int btrfs_forget_devices(const char *path);
>  int btrfs_close_devices(struct btrfs_fs_devices *fs_devices);
>  void btrfs_close_extra_devices(struct btrfs_fs_devices *fs_devices, int 
> step);
>  void btrfs_assign_next_active_device(struct btrfs_fs_info *fs_info,
> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
> index c8d99b9ca550..4e8ec1391872 100644
> --- a/include/uapi/linux/btrfs.h
> +++ b/include/uapi/linux/btrfs.h
> @@ -41,12 +41,14 @@ 

Re: [PATCH 1/4] btrfs-progs: check: Check data csum for all copies

2018-02-27 Thread Su Yue



On 02/27/2018 06:31 PM, Qu Wenruo wrote:



On 2018年02月27日 18:01, Su Yue wrote:



On 02/27/2018 05:12 PM, Qu Wenruo wrote:

Original --check-data-csum option will skip the extra copy if the first
copy matches csum.

Since offline scrub (with recoverability report) is still out-of-tree, at
least enhance --check-data-csum option to handle multiple copies.

Signed-off-by: Qu Wenruo 
---
   check/main.c | 65
++--
   1 file changed, 32 insertions(+), 33 deletions(-)

diff --git a/check/main.c b/check/main.c
index 97baae583f04..f25fdc765d63 100644
--- a/check/main.c
+++ b/check/main.c
@@ -5381,42 +5381,37 @@ static int check_extent_csums(struct
btrfs_root *root, u64 bytenr,
   if (!data)
   return -ENOMEM;
   +    num_copies = btrfs_num_copies(root->fs_info, bytenr, num_bytes);
   while (offset < num_bytes) {
-    mirror = 0;
-again:
-    read_len = num_bytes - offset;
-    /* read as much space once a time */
-    ret = read_extent_data(fs_info, data + offset,
-    bytenr + offset, _len, mirror);
-    if (ret)
-    goto out;
-    data_checked = 0;
-    /* verify every 4k data's checksum */
-    while (data_checked < read_len) {
-    csum = ~(u32)0;
-    tmp = offset + data_checked;
-
-    csum = btrfs_csum_data((char *)data + tmp,
-   csum, fs_info->sectorsize);
-    btrfs_csum_final(csum, (u8 *));
-
-    csum_offset = leaf_offset +
- tmp / fs_info->sectorsize * csum_size;
-    read_extent_buffer(eb, (char *)_expected,
-   csum_offset, csum_size);
-    /* try another mirror */
-    if (csum != csum_expected) {
-    fprintf(stderr, "mirror %d bytenr %llu csum %u
expected csum %u\n",
+    for (mirror = 1; mirror <= num_copies; mirror++) {


Got your point.
But what confuses me is that why mirror starts from 1 here.
The mirror influences btrfs_map_block() which is not related
to this patch though.


mirror number has different meanings in fact.

For 0, it means try to read *ANY* valid copy, it can be the copy of a
duplication, or even rebuilt data from RAID5/6.

For 1, it means the fist copy. Either the only copy of
SINGLE/RAID0/RAID5/6, or the first copy of RAID1/10.

For 2, it means the 2nd copy, it can be real copy for RAID1/RAID10, or
rebuilt data from RAID5/6.

And in fact Liu Bo is adding new copy numbers for RAID5/6 to allow us to
specify how to build the missing data for RAID6.

So here starting from mirror 1 is the correct behavior.


Make sense.

Thanks for your explanation!
Su


Thanks,
Qu


Thanks,
Su

+    read_len = num_bytes - offset;
+    /* read as much space once a time */
+    ret = read_extent_data(fs_info, data + offset,
+    bytenr + offset, _len, mirror);
+    if (ret)
+    goto out;
+
+    data_checked = 0;
+    /* verify every 4k data's checksum */
+    while (data_checked < read_len) {
+    csum = ~(u32)0;
+    tmp = offset + data_checked;
+
+    csum = btrfs_csum_data((char *)data + tmp,
+    csum, fs_info->sectorsize);
+    btrfs_csum_final(csum, (u8 *));
+
+    csum_offset = leaf_offset +
+ tmp / fs_info->sectorsize * csum_size;
+    read_extent_buffer(eb, (char *)_expected,
+   csum_offset, csum_size);
+    if (csum != csum_expected)
+    fprintf(stderr,
+    "mirror %d bytenr %llu csum %u expected csum %u\n",
   mirror, bytenr + tmp,
   csum, csum_expected);
-    num_copies = btrfs_num_copies(root->fs_info,
-    bytenr, num_bytes);
-    if (mirror < num_copies - 1) {
-    mirror += 1;
-    goto again;
-    }
+    data_checked += fs_info->sectorsize;
   }
-    data_checked += fs_info->sectorsize;
   }
   offset += read_len;
   }
@@ -5624,7 +5619,11 @@ static int check_csums(struct btrfs_root *root)
   leaf_offset = btrfs_item_ptr_offset(leaf, path.slots[0]);
   ret = check_extent_csums(root, key.offset, data_len,
    leaf_offset, leaf);
-    if (ret)
+    /*
+ * Only break for fatal errors, if mismatch is found,
+ * continue checking until all extents are checked.
+ */
+    if (ret < 0)
   break;
   skip_csum_check:
   if (!num_bytes) {




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the 

Re: [PATCH v2 00/27] btrfs-progs: introduce libbtrfsutil, "btrfs-progs as a library"

2018-02-27 Thread Omar Sandoval
On Tue, Feb 27, 2018 at 04:04:28PM +0100, David Sterba wrote:
> On Mon, Feb 26, 2018 at 03:36:41PM -0800, Omar Sandoval wrote:
> > On Fri, Feb 23, 2018 at 09:28:42PM +0100, David Sterba wrote:
> > > On Wed, Feb 21, 2018 at 10:50:32AM -0800, Omar Sandoval wrote:
> > > > On Wed, Feb 21, 2018 at 04:13:38PM +0100, David Sterba wrote:
> > > > > On Tue, Feb 20, 2018 at 07:50:48PM +0100, David Sterba wrote:
> > > > > > I have more comments or maybe questions about the future development
> > > > > > workflow, but at this point the patchset is in a good shape for
> > > > > > incremental merge.
> > > > > 
> > > > > After removnig the first patch adding subvolume.c (with
> > > > > linux/btrfs_tree.h) and what depends on it, I'm left with:
> > > > > 
> > > > > Omar Sandoval (4):
> > > > >   Add libbtrfsutil
> > > > >   libbtrfsutil: add Python bindings
> > > > >   libbtrfsutil: add qgroup inheritance helpers
> > > > >   libbtrfsutil: add filesystem sync helpers
> > > > > 
> > > > > with some context updates. That builds and passes the CI tests.
> > > > 
> > > > Great. Does the CI system run the Python tests yet?
> > > 
> > > Tested here https://travis-ci.org/kdave/btrfs-progs/jobs/345410536 ,
> > > does not pass.
> > > 
> > > 
> > > test_start_sync (test_filesystem.TestSubvolume) ... mkfs.btrfs: invalid 
> > > option -- 'q'
> > > usage: mkfs.btrfs [options] dev [ dev ... ]
> > > 
> > > 
> > > Looks like it tries to use the system mkfs.btrfs that is old.
> > 
> > Hm... according the documentation for the existing tests, the person
> > running the tests is expected to set PATH to contain the local binaries,
> 
> No, where is this written? The closest hit is in the 'Exported
> testsuite' but otherwise all paths must be "$TOP/mkfs.btrfs". The
> testsuite will detect where it is running and will set the TOP variable
> accordingly, but this is transparent to the tests.
> 
> The python tests should probably build on the same exec path magic.

Oops, I misunderstood from skimming the exported testsuite too quickly.
I'll update the Python tests to do something similar.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] btrfs: fix NPD when target device is missing

2018-02-27 Thread David Sterba
On Sat, Feb 24, 2018 at 10:07:33PM +0800, Anand Jain wrote:
> >> IP: btrfs_destroy_dev_replace_tgtdev+0x43/0xf0 [btrfs]
> >> Call Trace:
> >> btrfs_dev_replace_cancel+0x15f/0x180 [btrfs]
> >> btrfs_ioctl+0x2216/0x2590 [btrfs]
> >> do_vfs_ioctl+0x625/0x650
> >> SyS_ioctl+0x4e/0x80
> >> do_syscall_64+0x5d/0x160
> >> entry_SYSCALL64_slow_path+0x25/0x25
> > 
> > Do you have a reproducer for that?
> 
>   For now, I used a tweaked btrfs.ko [1], then
> 
>   mkfs.btrfs -fq /dev/sdb && mount /dev/sdb /btrfs
>   btrfs rep start -B /dev/sdb /dev/sdc
> after reboot, we have the replace target device
> and now use non-tweaked btrfs.ko
>   mount -o degraded /dev/sdb /btrfs
> 
> [1]
> ---
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 476981c2cf55..8ea4856b6368 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -25,6 +25,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
>   #include 
>   #include "ctree.h"
>   #include "extent_map.h"
> @@ -419,6 +420,8 @@ int btrfs_dev_replace_start(struct btrfs_fs_info 
> *fs_info,
>btrfs_device_get_total_bytes(src_device),
>_replace->scrub_progress, 0, 1);
> 
> +   emergency_restart();

Ok, not something that we can easily turn into a regression test.

I'll reorder this fix before patch "btrfs: log, when replace, is
canceled by the user", so it is bisectable. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please update the BTRFS status page

2018-02-27 Thread David Sterba
On Sat, Feb 24, 2018 at 03:05:48AM +0100, waxhead wrote:
> The latest released kernel is 4.15

Updated. There's one pending ack for the degraded mount that's still
mostly-ok but IMHO it's been fixed in 4.14 so it'll be OK.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] btrfs: verify max_inline mount parameter

2018-02-27 Thread David Sterba
On Mon, Feb 26, 2018 at 10:47:04AM +0800, Anand Jain wrote:
> We aren't verifying the parameter passed to the max_inline mount option.
> So we won't fail the mount if a junk value is specified, for example,
> -o max_inline=abc. This patch checks if input is valid.
> 
> Signed-off-by: Anand Jain 
> ---
> v2->v3: Handle parameter with unit, such as 4K. Use memparse() 2nd arg.
> v1->v2: use match_int ret value if error
> use %u instead of %d for parser
> 
>  fs/btrfs/super.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 77e0537e1db5..76b58da8d56d 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -605,7 +605,14 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char 
> *options,
>   case Opt_max_inline:
>   num = match_strdup([0]);
>   if (num) {
> - info->max_inline = memparse(num, NULL);
> + char *retptr;
> +
> + info->max_inline = memparse(num, );

I missed it in the patch that changed max_inline to u32, memparse
returns unsigned long long, so this is not entrely correct and requires
a temporary variable.

We should also report if the user-specified value is larger than
BTRFS_MAX_METADATA_BLOCKSIZE .

This is not a trivial fix the existing patches so I'll remove "btrfs:
declare max_inline as u32".

To sum it up:

1. add check and return EINVAL with a message if max_inline is larger
   than the metadata block size
2. switch max_inline to u32 and add a temporary value to read from
   memparse
3. add change from this patch that catches the junk

> + if (*retptr != '\0') {
> + ret = -EINVAL;
> + kfree(num);

The kfree can be moved before the check, we don't need 'num'.

> + goto out;
> + }
>   kfree(num);
>  
>   if (info->max_inline) {
> -- 
> 2.15.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs: Remove redundant comment from btrfs_search_forward

2018-02-27 Thread Nikolay Borisov
This function always sets keep_locks to 1 and saves the old value of
keep_locks which is restored at the end. So there is no way it can be
called without keep_locks being set. Remove comment imposing redundant
requirement on callers.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index b88a79e69ddf..a80fcd285b34 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -5145,9 +5145,6 @@ int btrfs_prev_leaf(struct btrfs_root *root, struct 
btrfs_path *path)
  * into min_key, so you can call btrfs_search_slot with cow=1 on the
  * key and get a writable path.
  *
- * This does lock as it descends, and path->keep_locks should be set
- * to 1 by the caller.
- *
  * This honors path->lowest_level to prevent descent past a given level
  * of the tree.
  *
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs: Remove root argument from btrfs_log_dentry_safe

2018-02-27 Thread Nikolay Borisov
Now that nothing uses the root arg of btrfs_log_dentry_safe it can be
safely removed. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/file.c | 2 +-
 fs/btrfs/tree-log.c | 2 +-
 fs/btrfs/tree-log.h | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 41ab9073d1d4..f048879c222f 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2214,7 +2214,7 @@ int btrfs_sync_file(struct file *file, loff_t start, 
loff_t end, int datasync)
}
trans->sync = true;
 
-   ret = btrfs_log_dentry_safe(trans, root, dentry, start, end, );
+   ret = btrfs_log_dentry_safe(trans, dentry, start, end, );
if (ret < 0) {
/* Fallthrough and commit/free transaction. */
ret = 1;
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index f6b1795c44ee..f83048023666 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -5572,7 +5572,7 @@ static int btrfs_log_inode_parent(struct 
btrfs_trans_handle *trans,
  * data on disk.
  */
 int btrfs_log_dentry_safe(struct btrfs_trans_handle *trans,
- struct btrfs_root *root, struct dentry *dentry,
+ struct dentry *dentry,
  const loff_t start,
  const loff_t end,
  struct btrfs_log_ctx *ctx)
diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
index 483027f9a7f4..88abc43312a1 100644
--- a/fs/btrfs/tree-log.h
+++ b/fs/btrfs/tree-log.h
@@ -65,7 +65,7 @@ int btrfs_free_log_root_tree(struct btrfs_trans_handle *trans,
 struct btrfs_fs_info *fs_info);
 int btrfs_recover_log_trees(struct btrfs_root *tree_root);
 int btrfs_log_dentry_safe(struct btrfs_trans_handle *trans,
- struct btrfs_root *root, struct dentry *dentry,
+ struct dentry *dentry,
  const loff_t start,
  const loff_t end,
  struct btrfs_log_ctx *ctx);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs: Remove root arg from btrfs_log_inode_parent

2018-02-27 Thread Nikolay Borisov
btrfs_log_inode_parent is called from 2 places (btrfs_log_dentry_safe
and btrfs_log_new_name) both of which pass inode->root as the root
argument and the inode itself. Remove the redundant root argument and
get a reference to the root directly from the inode, also remove
redundant root != inode->root check from the same function. No
functional change.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/tree-log.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 6eb702c8f80b..f6b1795c44ee 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -5407,7 +5407,6 @@ static int btrfs_log_all_parents(struct 
btrfs_trans_handle *trans,
  * the last committed transaction
  */
 static int btrfs_log_inode_parent(struct btrfs_trans_handle *trans,
- struct btrfs_root *root,
  struct btrfs_inode *inode,
  struct dentry *parent,
  const loff_t start,
@@ -5415,6 +5414,7 @@ static int btrfs_log_inode_parent(struct 
btrfs_trans_handle *trans,
  int inode_only,
  struct btrfs_log_ctx *ctx)
 {
+   struct btrfs_root *root = inode->root;
struct btrfs_fs_info *fs_info = root->fs_info;
struct super_block *sb;
struct dentry *old_parent = NULL;
@@ -5440,7 +5440,7 @@ static int btrfs_log_inode_parent(struct 
btrfs_trans_handle *trans,
goto end_no_trans;
}
 
-   if (root != inode->root || btrfs_root_refs(>root_item) == 0) {
+   if (btrfs_root_refs(>root_item) == 0) {
ret = 1;
goto end_no_trans;
}
@@ -5580,8 +5580,8 @@ int btrfs_log_dentry_safe(struct btrfs_trans_handle 
*trans,
struct dentry *parent = dget_parent(dentry);
int ret;
 
-   ret = btrfs_log_inode_parent(trans, root, BTRFS_I(d_inode(dentry)),
-   parent, start, end, LOG_INODE_ALL, ctx);
+   ret = btrfs_log_inode_parent(trans, BTRFS_I(d_inode(dentry)), parent,
+start, end, LOG_INODE_ALL, ctx);
dput(parent);
 
return ret;
@@ -5843,7 +5843,6 @@ int btrfs_log_new_name(struct btrfs_trans_handle *trans,
struct dentry *parent)
 {
struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
-   struct btrfs_root *root = inode->root;
 
/*
 * this will force the logging code to walk the dentry chain
@@ -5860,7 +5859,7 @@ int btrfs_log_new_name(struct btrfs_trans_handle *trans,
(!old_dir || old_dir->logged_trans <= 
fs_info->last_trans_committed))
return 0;
 
-   return btrfs_log_inode_parent(trans, root, inode, parent, 0,
- LLONG_MAX, LOG_INODE_EXISTS, NULL);
+   return btrfs_log_inode_parent(trans, inode, parent, 0, LLONG_MAX,
+ LOG_INODE_EXISTS, NULL);
 }
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/27] btrfs-progs: introduce libbtrfsutil, "btrfs-progs as a library"

2018-02-27 Thread David Sterba
On Mon, Feb 26, 2018 at 03:36:41PM -0800, Omar Sandoval wrote:
> On Fri, Feb 23, 2018 at 09:28:42PM +0100, David Sterba wrote:
> > On Wed, Feb 21, 2018 at 10:50:32AM -0800, Omar Sandoval wrote:
> > > On Wed, Feb 21, 2018 at 04:13:38PM +0100, David Sterba wrote:
> > > > On Tue, Feb 20, 2018 at 07:50:48PM +0100, David Sterba wrote:
> > > > > I have more comments or maybe questions about the future development
> > > > > workflow, but at this point the patchset is in a good shape for
> > > > > incremental merge.
> > > > 
> > > > After removnig the first patch adding subvolume.c (with
> > > > linux/btrfs_tree.h) and what depends on it, I'm left with:
> > > > 
> > > > Omar Sandoval (4):
> > > >   Add libbtrfsutil
> > > >   libbtrfsutil: add Python bindings
> > > >   libbtrfsutil: add qgroup inheritance helpers
> > > >   libbtrfsutil: add filesystem sync helpers
> > > > 
> > > > with some context updates. That builds and passes the CI tests.
> > > 
> > > Great. Does the CI system run the Python tests yet?
> > 
> > Tested here https://travis-ci.org/kdave/btrfs-progs/jobs/345410536 ,
> > does not pass.
> > 
> > 
> > test_start_sync (test_filesystem.TestSubvolume) ... mkfs.btrfs: invalid 
> > option -- 'q'
> > usage: mkfs.btrfs [options] dev [ dev ... ]
> > 
> > 
> > Looks like it tries to use the system mkfs.btrfs that is old.
> 
> Hm... according the documentation for the existing tests, the person
> running the tests is expected to set PATH to contain the local binaries,

No, where is this written? The closest hit is in the 'Exported
testsuite' but otherwise all paths must be "$TOP/mkfs.btrfs". The
testsuite will detect where it is running and will set the TOP variable
accordingly, but this is transparent to the tests.

The python tests should probably build on the same exec path magic.

> otherwise it'll use the system ones. Does the CI system not do that?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Minor xattr handler cleanups

2018-02-27 Thread Nikolay Borisov


On 27.02.2018 16:48, David Sterba wrote:
> Renames, declearation updates.
> 
> David Sterba (4):
>   btrfs: drop underscores from exported xattr functions
>   btrfs: drop extern from function declarations
>   btrfs: adjust return type of btrfs_getxattr
>   btrfs: move btrfs_listxattr prototype to xattr.h
> 
>  fs/btrfs/acl.c   |  6 +++---
>  fs/btrfs/ctree.h |  3 ---
>  fs/btrfs/extent_io.h |  4 ++--
>  fs/btrfs/props.c |  6 +++---
>  fs/btrfs/xattr.c | 12 ++--
>  fs/btrfs/xattr.h |  7 ---
>  6 files changed, 18 insertions(+), 20 deletions(-)
> 

The whole series LGTM:

Reviewed-by: Nikolay Borisov 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] btrfs: drop extern from function declarations

2018-02-27 Thread David Sterba
Extern for functions does not make any difference, there are only a few
so let's remove them before it's too late.

Signed-off-by: David Sterba 
---
 fs/btrfs/extent_io.h | 4 ++--
 fs/btrfs/xattr.h | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index a7a850abd600..226c4876ddf8 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -83,8 +83,8 @@ static inline int le_test_bit(int nr, const u8 *addr)
return 1U & (addr[BIT_BYTE(nr)] >> (nr & (BITS_PER_BYTE-1)));
 }
 
-extern void le_bitmap_set(u8 *map, unsigned int start, int len);
-extern void le_bitmap_clear(u8 *map, unsigned int start, int len);
+void le_bitmap_set(u8 *map, unsigned int start, int len);
+void le_bitmap_clear(u8 *map, unsigned int start, int len);
 
 struct extent_state;
 struct btrfs_root;
diff --git a/fs/btrfs/xattr.h b/fs/btrfs/xattr.h
index 3c227c549dfc..96e6f3a304d3 100644
--- a/fs/btrfs/xattr.h
+++ b/fs/btrfs/xattr.h
@@ -23,13 +23,13 @@
 
 extern const struct xattr_handler *btrfs_xattr_handlers[];
 
-extern ssize_t btrfs_getxattr(struct inode *inode, const char *name,
+ssize_t btrfs_getxattr(struct inode *inode, const char *name,
void *buffer, size_t size);
-extern int btrfs_setxattr(struct btrfs_trans_handle *trans,
+int btrfs_setxattr(struct btrfs_trans_handle *trans,
struct inode *inode, const char *name,
const void *value, size_t size, int flags);
 
-extern int btrfs_xattr_security_init(struct btrfs_trans_handle *trans,
+int btrfs_xattr_security_init(struct btrfs_trans_handle *trans,
 struct inode *inode, struct inode *dir,
 const struct qstr *qstr);
 
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] btrfs: adjust return type of btrfs_getxattr

2018-02-27 Thread David Sterba
The xattr_handler::get prototype returns int, use it. The only ssize_t
exception is the per-inode listxattr handler.

Signed-off-by: David Sterba 
---
 fs/btrfs/xattr.c | 2 +-
 fs/btrfs/xattr.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c
index 414ccbe26c71..e1e8177deb5e 100644
--- a/fs/btrfs/xattr.c
+++ b/fs/btrfs/xattr.c
@@ -33,7 +33,7 @@
 #include "locking.h"
 
 
-ssize_t btrfs_getxattr(struct inode *inode, const char *name,
+int btrfs_getxattr(struct inode *inode, const char *name,
void *buffer, size_t size)
 {
struct btrfs_dir_item *di;
diff --git a/fs/btrfs/xattr.h b/fs/btrfs/xattr.h
index 96e6f3a304d3..57c638730617 100644
--- a/fs/btrfs/xattr.h
+++ b/fs/btrfs/xattr.h
@@ -23,7 +23,7 @@
 
 extern const struct xattr_handler *btrfs_xattr_handlers[];
 
-ssize_t btrfs_getxattr(struct inode *inode, const char *name,
+int btrfs_getxattr(struct inode *inode, const char *name,
void *buffer, size_t size);
 int btrfs_setxattr(struct btrfs_trans_handle *trans,
struct inode *inode, const char *name,
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] Minor xattr handler cleanups

2018-02-27 Thread David Sterba
Renames, declearation updates.

David Sterba (4):
  btrfs: drop underscores from exported xattr functions
  btrfs: drop extern from function declarations
  btrfs: adjust return type of btrfs_getxattr
  btrfs: move btrfs_listxattr prototype to xattr.h

 fs/btrfs/acl.c   |  6 +++---
 fs/btrfs/ctree.h |  3 ---
 fs/btrfs/extent_io.h |  4 ++--
 fs/btrfs/props.c |  6 +++---
 fs/btrfs/xattr.c | 12 ++--
 fs/btrfs/xattr.h |  7 ---
 6 files changed, 18 insertions(+), 20 deletions(-)

-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] btrfs: drop underscores from exported xattr functions

2018-02-27 Thread David Sterba
Signed-off-by: David Sterba 
---
 fs/btrfs/acl.c   |  6 +++---
 fs/btrfs/props.c |  6 +++---
 fs/btrfs/xattr.c | 12 ++--
 fs/btrfs/xattr.h |  4 ++--
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c
index 1ba49ebe67da..f8a1bdf06b2a 100644
--- a/fs/btrfs/acl.c
+++ b/fs/btrfs/acl.c
@@ -46,12 +46,12 @@ struct posix_acl *btrfs_get_acl(struct inode *inode, int 
type)
BUG();
}
 
-   size = __btrfs_getxattr(inode, name, "", 0);
+   size = btrfs_getxattr(inode, name, "", 0);
if (size > 0) {
value = kzalloc(size, GFP_KERNEL);
if (!value)
return ERR_PTR(-ENOMEM);
-   size = __btrfs_getxattr(inode, name, value, size);
+   size = btrfs_getxattr(inode, name, value, size);
}
if (size > 0) {
acl = posix_acl_from_xattr(_user_ns, value, size);
@@ -101,7 +101,7 @@ static int __btrfs_set_acl(struct btrfs_trans_handle *trans,
goto out;
}
 
-   ret = __btrfs_setxattr(trans, inode, name, value, size, 0);
+   ret = btrfs_setxattr(trans, inode, name, value, size, 0);
 out:
kfree(value);
 
diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
index b30a056963ab..6aa95fad33f3 100644
--- a/fs/btrfs/props.c
+++ b/fs/btrfs/props.c
@@ -116,7 +116,7 @@ static int __btrfs_set_prop(struct btrfs_trans_handle 
*trans,
return -EINVAL;
 
if (value_len == 0) {
-   ret = __btrfs_setxattr(trans, inode, handler->xattr_name,
+   ret = btrfs_setxattr(trans, inode, handler->xattr_name,
   NULL, 0, flags);
if (ret)
return ret;
@@ -130,13 +130,13 @@ static int __btrfs_set_prop(struct btrfs_trans_handle 
*trans,
ret = handler->validate(value, value_len);
if (ret)
return ret;
-   ret = __btrfs_setxattr(trans, inode, handler->xattr_name,
+   ret = btrfs_setxattr(trans, inode, handler->xattr_name,
   value, value_len, flags);
if (ret)
return ret;
ret = handler->apply(inode, value, value_len);
if (ret) {
-   __btrfs_setxattr(trans, inode, handler->xattr_name,
+   btrfs_setxattr(trans, inode, handler->xattr_name,
 NULL, 0, flags);
return ret;
}
diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c
index de7d072c78ef..414ccbe26c71 100644
--- a/fs/btrfs/xattr.c
+++ b/fs/btrfs/xattr.c
@@ -33,7 +33,7 @@
 #include "locking.h"
 
 
-ssize_t __btrfs_getxattr(struct inode *inode, const char *name,
+ssize_t btrfs_getxattr(struct inode *inode, const char *name,
void *buffer, size_t size)
 {
struct btrfs_dir_item *di;
@@ -233,7 +233,7 @@ static int do_setxattr(struct btrfs_trans_handle *trans,
 /*
  * @value: "" makes the attribute to empty, NULL removes it
  */
-int __btrfs_setxattr(struct btrfs_trans_handle *trans,
+int btrfs_setxattr(struct btrfs_trans_handle *trans,
 struct inode *inode, const char *name,
 const void *value, size_t size, int flags)
 {
@@ -374,7 +374,7 @@ static int btrfs_xattr_handler_get(const struct 
xattr_handler *handler,
   const char *name, void *buffer, size_t size)
 {
name = xattr_full_name(handler, name);
-   return __btrfs_getxattr(inode, name, buffer, size);
+   return btrfs_getxattr(inode, name, buffer, size);
 }
 
 static int btrfs_xattr_handler_set(const struct xattr_handler *handler,
@@ -383,7 +383,7 @@ static int btrfs_xattr_handler_set(const struct 
xattr_handler *handler,
   size_t size, int flags)
 {
name = xattr_full_name(handler, name);
-   return __btrfs_setxattr(NULL, inode, name, buffer, size, flags);
+   return btrfs_setxattr(NULL, inode, name, buffer, size, flags);
 }
 
 static int btrfs_xattr_handler_set_prop(const struct xattr_handler *handler,
@@ -448,8 +448,8 @@ static int btrfs_initxattrs(struct inode *inode,
}
strcpy(name, XATTR_SECURITY_PREFIX);
strcpy(name + XATTR_SECURITY_PREFIX_LEN, xattr->name);
-   err = __btrfs_setxattr(trans, inode, name,
-  xattr->value, xattr->value_len, 0);
+   err = btrfs_setxattr(trans, inode, name, xattr->value,
+   xattr->value_len, 0);
kfree(name);
if (err < 0)
break;
diff --git a/fs/btrfs/xattr.h b/fs/btrfs/xattr.h
index 15fc4743dc70..3c227c549dfc 100644
--- a/fs/btrfs/xattr.h
+++ b/fs/btrfs/xattr.h
@@ -23,9 +23,9 @@
 
 extern const struct xattr_handler *btrfs_xattr_handlers[];
 
-extern ssize_t __btrfs_getxattr(struct inode *inode, const char *name,

[PATCH 4/4] btrfs: move btrfs_listxattr prototype to xattr.h

2018-02-27 Thread David Sterba
There's a proper header for xattr handlers.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.h | 3 ---
 fs/btrfs/xattr.h | 1 +
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 1a462ab85c49..736ba2805d03 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3282,9 +3282,6 @@ void btrfs_exit_sysfs(void);
 int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info);
 void btrfs_sysfs_remove_mounted(struct btrfs_fs_info *fs_info);
 
-/* xattr.c */
-ssize_t btrfs_listxattr(struct dentry *dentry, char *buffer, size_t size);
-
 /* super.c */
 int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
unsigned long new_flags);
diff --git a/fs/btrfs/xattr.h b/fs/btrfs/xattr.h
index 57c638730617..e215a3212a2a 100644
--- a/fs/btrfs/xattr.h
+++ b/fs/btrfs/xattr.h
@@ -28,6 +28,7 @@ int btrfs_getxattr(struct inode *inode, const char *name,
 int btrfs_setxattr(struct btrfs_trans_handle *trans,
struct inode *inode, const char *name,
const void *value, size_t size, int flags);
+ssize_t btrfs_listxattr(struct dentry *dentry, char *buffer, size_t size);
 
 int btrfs_xattr_security_init(struct btrfs_trans_handle *trans,
 struct inode *inode, struct inode *dir,
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs space used issue

2018-02-27 Thread Austin S. Hemmelgarn

On 2018-02-27 08:09, vinayak hegde wrote:

I am using btrfs, But I am seeing du -sh and df -h showing huge size
difference on ssd.

mount:
/dev/drbd1 on /dc/fileunifier.datacache type btrfs
(rw,noatime,nodiratime,flushoncommit,discard,nospace_cache,recovery,commit=5,subvolid=5,subvol=/)


du -sh /dc/fileunifier.datacache/ -  331G

df -h
/dev/drbd1  746G  346G  398G  47% /dc/fileunifier.datacache

btrfs fi usage /dc/fileunifier.datacache/
Overall:
 Device size: 745.19GiB
 Device allocated: 368.06GiB
 Device unallocated: 377.13GiB
 Device missing: 0.00B
 Used: 346.73GiB
 Free (estimated): 396.36GiB(min: 207.80GiB)
 Data ratio:  1.00
 Metadata ratio:  2.00
 Global reserve: 176.00MiB(used: 0.00B)

Data,single: Size:365.00GiB, Used:345.76GiB
/dev/drbd1 365.00GiB

Metadata,DUP: Size:1.50GiB, Used:493.23MiB
/dev/drbd1   3.00GiB

System,DUP: Size:32.00MiB, Used:80.00KiB
/dev/drbd1  64.00MiB

Unallocated:
/dev/drbd1 377.13GiB


Even if we consider 6G metadata its 331+6 = 337.
where is 9GB used?

Please explain.
First, you're counting the metadata wrong.  The value shown per-device 
by `btrfs filesystem usage` already accounts for replication (so it's 
only 3 GB of metadata allocated, not 6 GB).  Neither `df` nor `du` looks 
at the chunk level allocations though.


Now, with that out of the way, the discrepancy almost certainly comes 
form differences in how `df` and `du` calculate space usage.  In 
particular, `df` calls statvfs and looks at the f_blocks and f_bfree 
values to compute space usage, while `du` walks the filesystem tree 
calling stat on everything and looking at st_blksize and st_blocks (or 
instead at st_size if you pass in `--apparent-size` as an option).  This 
leads to a couple of differences in what they will count:


1. `du` may or may not properly count hardlinks, sparse files, and 
transparently compressed data, dependent on whether or not you use 
`--apparent-sizes` (by default, it does properly count all of those), 
while `df` will always account for those properly.
2. `du` does not properly account for reflinked blocks (from 
deduplication, snapshots, or use of the CLONE ioctl), and will count 
each reflink of every block as part of the total size, while `df` will 
always count each block exactly once no matter how many reflinks it has.
3. `du` does not account for all of the BTRFS metadata allocations, 
functionally ignoring space allocated for anything but inline data, 
while `df` accounts for all BTRFS metadata properly.
4. `du` will recurse into other filesystems if you don't pass the `-x` 
option to it, while `df` will only report for each filesystem separately.
5. `du` will only count data usage under the given mount point, and 
won't account for data on other subvolumes that may be mounted elsewhere 
(and if you pass in `-x` won't count data on other subvolumes located 
under the given path either), while `df` will count all the data in all 
subvolumes.
6. There are a couple of other differences too, but they're rather 
complex and dependent on the internals of BTRFS.


In your case, I think the issue is probably one of the various things 
under item 6.  Items 1, 2 and 4 will cause `du` to report more space 
usage than `df`, item 3 is irrelevant because `du` shows less space than 
the total data chunk usage reported by `btrfs filesystem usage`, and 
item 5 is irrelevant because you're mounting the root subvolume and not 
using the `-x` option on `du` (and therefore there can't be other 
subvolumes you're missing).


Try running a full defrag of the given mount point.  If what I think is 
causing this is in fact the issue, that should bring the numbers back 
in-line with each other.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs space used issue

2018-02-27 Thread vinayak hegde
I am using btrfs, But I am seeing du -sh and df -h showing huge size
difference on ssd.

mount:
/dev/drbd1 on /dc/fileunifier.datacache type btrfs
(rw,noatime,nodiratime,flushoncommit,discard,nospace_cache,recovery,commit=5,subvolid=5,subvol=/)


du -sh /dc/fileunifier.datacache/ -  331G

df -h
/dev/drbd1  746G  346G  398G  47% /dc/fileunifier.datacache

btrfs fi usage /dc/fileunifier.datacache/
Overall:
Device size: 745.19GiB
Device allocated: 368.06GiB
Device unallocated: 377.13GiB
Device missing: 0.00B
Used: 346.73GiB
Free (estimated): 396.36GiB(min: 207.80GiB)
Data ratio:  1.00
Metadata ratio:  2.00
Global reserve: 176.00MiB(used: 0.00B)

Data,single: Size:365.00GiB, Used:345.76GiB
   /dev/drbd1 365.00GiB

Metadata,DUP: Size:1.50GiB, Used:493.23MiB
   /dev/drbd1   3.00GiB

System,DUP: Size:32.00MiB, Used:80.00KiB
   /dev/drbd1  64.00MiB

Unallocated:
   /dev/drbd1 377.13GiB


Even if we consider 6G metadata its 331+6 = 337.
where is 9GB used?

Please explain.

Vinayak
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] btrfs-progs: check: Check data csum for all copies

2018-02-27 Thread Qu Wenruo


On 2018年02月27日 19:09, Nikolay Borisov wrote:
> 
> 
> On 27.02.2018 12:31, Qu Wenruo wrote:
>>
>>
>> On 2018年02月27日 18:01, Su Yue wrote:
>>>
>>>
>>> On 02/27/2018 05:12 PM, Qu Wenruo wrote:
 Original --check-data-csum option will skip the extra copy if the first
 copy matches csum.

 Since offline scrub (with recoverability report) is still out-of-tree, at
 least enhance --check-data-csum option to handle multiple copies.

 Signed-off-by: Qu Wenruo 
 ---
   check/main.c | 65
 ++--
   1 file changed, 32 insertions(+), 33 deletions(-)

 diff --git a/check/main.c b/check/main.c
 index 97baae583f04..f25fdc765d63 100644
 --- a/check/main.c
 +++ b/check/main.c
 @@ -5381,42 +5381,37 @@ static int check_extent_csums(struct
 btrfs_root *root, u64 bytenr,
   if (!data)
   return -ENOMEM;
   +    num_copies = btrfs_num_copies(root->fs_info, bytenr, num_bytes);
   while (offset < num_bytes) {
 -    mirror = 0;
 -again:
 -    read_len = num_bytes - offset;
 -    /* read as much space once a time */
 -    ret = read_extent_data(fs_info, data + offset,
 -    bytenr + offset, _len, mirror);
 -    if (ret)
 -    goto out;
 -    data_checked = 0;
 -    /* verify every 4k data's checksum */
 -    while (data_checked < read_len) {
 -    csum = ~(u32)0;
 -    tmp = offset + data_checked;
 -
 -    csum = btrfs_csum_data((char *)data + tmp,
 -   csum, fs_info->sectorsize);
 -    btrfs_csum_final(csum, (u8 *));
 -
 -    csum_offset = leaf_offset +
 - tmp / fs_info->sectorsize * csum_size;
 -    read_extent_buffer(eb, (char *)_expected,
 -   csum_offset, csum_size);
 -    /* try another mirror */
 -    if (csum != csum_expected) {
 -    fprintf(stderr, "mirror %d bytenr %llu csum %u
 expected csum %u\n",
 +    for (mirror = 1; mirror <= num_copies; mirror++) {
>>>
>>> Got your point.
>>> But what confuses me is that why mirror starts from 1 here.
>>> The mirror influences btrfs_map_block() which is not related
>>> to this patch though.
>>
>> mirror number has different meanings in fact.
>>
>> For 0, it means try to read *ANY* valid copy, it can be the copy of a
>> duplication, or even rebuilt data from RAID5/6.
>>
>> For 1, it means the fist copy. Either the only copy of
>> SINGLE/RAID0/RAID5/6, or the first copy of RAID1/10.
>>
>> For 2, it means the 2nd copy, it can be real copy for RAID1/RAID10, or
>> rebuilt data from RAID5/6.
>>
>> And in fact Liu Bo is adding new copy numbers for RAID5/6 to allow us to
>> specify how to build the missing data for RAID6.
>>
>> So here starting from mirror 1 is the correct behavior.
> 
> Shouldn't those be documented?

Makes sense.

I'll add such comment for both kernel and btrfs-progs.

Thanks,
Quuu

> 
>>
>> Thanks,
>> Qu
>>>
>>> Thanks,
>>> Su
 +    read_len = num_bytes - offset;
 +    /* read as much space once a time */
 +    ret = read_extent_data(fs_info, data + offset,
 +    bytenr + offset, _len, mirror);
 +    if (ret)
 +    goto out;
 +
 +    data_checked = 0;
 +    /* verify every 4k data's checksum */
 +    while (data_checked < read_len) {
 +    csum = ~(u32)0;
 +    tmp = offset + data_checked;
 +
 +    csum = btrfs_csum_data((char *)data + tmp,
 +    csum, fs_info->sectorsize);
 +    btrfs_csum_final(csum, (u8 *));
 +
 +    csum_offset = leaf_offset +
 + tmp / fs_info->sectorsize * csum_size;
 +    read_extent_buffer(eb, (char *)_expected,
 +   csum_offset, csum_size);
 +    if (csum != csum_expected)
 +    fprintf(stderr,
 +    "mirror %d bytenr %llu csum %u expected csum %u\n",
   mirror, bytenr + tmp,
   csum, csum_expected);
 -    num_copies = btrfs_num_copies(root->fs_info,
 -    bytenr, num_bytes);
 -    if (mirror < num_copies - 1) {
 -    mirror += 1;
 -    goto again;
 -    }
 +    data_checked += fs_info->sectorsize;
   }
 -    data_checked += fs_info->sectorsize;
   }
   offset += read_len;
   }
 @@ -5624,7 +5619,11 @@ static int check_csums(struct btrfs_root *root)
   leaf_offset = 

Re: [PATCH 1/4] btrfs-progs: check: Check data csum for all copies

2018-02-27 Thread Nikolay Borisov


On 27.02.2018 12:31, Qu Wenruo wrote:
> 
> 
> On 2018年02月27日 18:01, Su Yue wrote:
>>
>>
>> On 02/27/2018 05:12 PM, Qu Wenruo wrote:
>>> Original --check-data-csum option will skip the extra copy if the first
>>> copy matches csum.
>>>
>>> Since offline scrub (with recoverability report) is still out-of-tree, at
>>> least enhance --check-data-csum option to handle multiple copies.
>>>
>>> Signed-off-by: Qu Wenruo 
>>> ---
>>>   check/main.c | 65
>>> ++--
>>>   1 file changed, 32 insertions(+), 33 deletions(-)
>>>
>>> diff --git a/check/main.c b/check/main.c
>>> index 97baae583f04..f25fdc765d63 100644
>>> --- a/check/main.c
>>> +++ b/check/main.c
>>> @@ -5381,42 +5381,37 @@ static int check_extent_csums(struct
>>> btrfs_root *root, u64 bytenr,
>>>   if (!data)
>>>   return -ENOMEM;
>>>   +    num_copies = btrfs_num_copies(root->fs_info, bytenr, num_bytes);
>>>   while (offset < num_bytes) {
>>> -    mirror = 0;
>>> -again:
>>> -    read_len = num_bytes - offset;
>>> -    /* read as much space once a time */
>>> -    ret = read_extent_data(fs_info, data + offset,
>>> -    bytenr + offset, _len, mirror);
>>> -    if (ret)
>>> -    goto out;
>>> -    data_checked = 0;
>>> -    /* verify every 4k data's checksum */
>>> -    while (data_checked < read_len) {
>>> -    csum = ~(u32)0;
>>> -    tmp = offset + data_checked;
>>> -
>>> -    csum = btrfs_csum_data((char *)data + tmp,
>>> -   csum, fs_info->sectorsize);
>>> -    btrfs_csum_final(csum, (u8 *));
>>> -
>>> -    csum_offset = leaf_offset +
>>> - tmp / fs_info->sectorsize * csum_size;
>>> -    read_extent_buffer(eb, (char *)_expected,
>>> -   csum_offset, csum_size);
>>> -    /* try another mirror */
>>> -    if (csum != csum_expected) {
>>> -    fprintf(stderr, "mirror %d bytenr %llu csum %u
>>> expected csum %u\n",
>>> +    for (mirror = 1; mirror <= num_copies; mirror++) {
>>
>> Got your point.
>> But what confuses me is that why mirror starts from 1 here.
>> The mirror influences btrfs_map_block() which is not related
>> to this patch though.
> 
> mirror number has different meanings in fact.
> 
> For 0, it means try to read *ANY* valid copy, it can be the copy of a
> duplication, or even rebuilt data from RAID5/6.
> 
> For 1, it means the fist copy. Either the only copy of
> SINGLE/RAID0/RAID5/6, or the first copy of RAID1/10.
> 
> For 2, it means the 2nd copy, it can be real copy for RAID1/RAID10, or
> rebuilt data from RAID5/6.
> 
> And in fact Liu Bo is adding new copy numbers for RAID5/6 to allow us to
> specify how to build the missing data for RAID6.
> 
> So here starting from mirror 1 is the correct behavior.

Shouldn't those be documented?

> 
> Thanks,
> Qu
>>
>> Thanks,
>> Su
>>> +    read_len = num_bytes - offset;
>>> +    /* read as much space once a time */
>>> +    ret = read_extent_data(fs_info, data + offset,
>>> +    bytenr + offset, _len, mirror);
>>> +    if (ret)
>>> +    goto out;
>>> +
>>> +    data_checked = 0;
>>> +    /* verify every 4k data's checksum */
>>> +    while (data_checked < read_len) {
>>> +    csum = ~(u32)0;
>>> +    tmp = offset + data_checked;
>>> +
>>> +    csum = btrfs_csum_data((char *)data + tmp,
>>> +    csum, fs_info->sectorsize);
>>> +    btrfs_csum_final(csum, (u8 *));
>>> +
>>> +    csum_offset = leaf_offset +
>>> + tmp / fs_info->sectorsize * csum_size;
>>> +    read_extent_buffer(eb, (char *)_expected,
>>> +   csum_offset, csum_size);
>>> +    if (csum != csum_expected)
>>> +    fprintf(stderr,
>>> +    "mirror %d bytenr %llu csum %u expected csum %u\n",
>>>   mirror, bytenr + tmp,
>>>   csum, csum_expected);
>>> -    num_copies = btrfs_num_copies(root->fs_info,
>>> -    bytenr, num_bytes);
>>> -    if (mirror < num_copies - 1) {
>>> -    mirror += 1;
>>> -    goto again;
>>> -    }
>>> +    data_checked += fs_info->sectorsize;
>>>   }
>>> -    data_checked += fs_info->sectorsize;
>>>   }
>>>   offset += read_len;
>>>   }
>>> @@ -5624,7 +5619,11 @@ static int check_csums(struct btrfs_root *root)
>>>   leaf_offset = btrfs_item_ptr_offset(leaf, path.slots[0]);
>>>   ret = check_extent_csums(root, key.offset, data_len,
>>>    leaf_offset, leaf);
>>> -    if (ret)
>>> +    /*
>>> + * Only break for fatal errors, if mismatch is found,
>>> +  

Re: [PATCH 1/4] btrfs-progs: check: Check data csum for all copies

2018-02-27 Thread Qu Wenruo


On 2018年02月27日 18:01, Su Yue wrote:
> 
> 
> On 02/27/2018 05:12 PM, Qu Wenruo wrote:
>> Original --check-data-csum option will skip the extra copy if the first
>> copy matches csum.
>>
>> Since offline scrub (with recoverability report) is still out-of-tree, at
>> least enhance --check-data-csum option to handle multiple copies.
>>
>> Signed-off-by: Qu Wenruo 
>> ---
>>   check/main.c | 65
>> ++--
>>   1 file changed, 32 insertions(+), 33 deletions(-)
>>
>> diff --git a/check/main.c b/check/main.c
>> index 97baae583f04..f25fdc765d63 100644
>> --- a/check/main.c
>> +++ b/check/main.c
>> @@ -5381,42 +5381,37 @@ static int check_extent_csums(struct
>> btrfs_root *root, u64 bytenr,
>>   if (!data)
>>   return -ENOMEM;
>>   +    num_copies = btrfs_num_copies(root->fs_info, bytenr, num_bytes);
>>   while (offset < num_bytes) {
>> -    mirror = 0;
>> -again:
>> -    read_len = num_bytes - offset;
>> -    /* read as much space once a time */
>> -    ret = read_extent_data(fs_info, data + offset,
>> -    bytenr + offset, _len, mirror);
>> -    if (ret)
>> -    goto out;
>> -    data_checked = 0;
>> -    /* verify every 4k data's checksum */
>> -    while (data_checked < read_len) {
>> -    csum = ~(u32)0;
>> -    tmp = offset + data_checked;
>> -
>> -    csum = btrfs_csum_data((char *)data + tmp,
>> -   csum, fs_info->sectorsize);
>> -    btrfs_csum_final(csum, (u8 *));
>> -
>> -    csum_offset = leaf_offset +
>> - tmp / fs_info->sectorsize * csum_size;
>> -    read_extent_buffer(eb, (char *)_expected,
>> -   csum_offset, csum_size);
>> -    /* try another mirror */
>> -    if (csum != csum_expected) {
>> -    fprintf(stderr, "mirror %d bytenr %llu csum %u
>> expected csum %u\n",
>> +    for (mirror = 1; mirror <= num_copies; mirror++) {
> 
> Got your point.
> But what confuses me is that why mirror starts from 1 here.
> The mirror influences btrfs_map_block() which is not related
> to this patch though.

mirror number has different meanings in fact.

For 0, it means try to read *ANY* valid copy, it can be the copy of a
duplication, or even rebuilt data from RAID5/6.

For 1, it means the fist copy. Either the only copy of
SINGLE/RAID0/RAID5/6, or the first copy of RAID1/10.

For 2, it means the 2nd copy, it can be real copy for RAID1/RAID10, or
rebuilt data from RAID5/6.

And in fact Liu Bo is adding new copy numbers for RAID5/6 to allow us to
specify how to build the missing data for RAID6.

So here starting from mirror 1 is the correct behavior.

Thanks,
Qu
> 
> Thanks,
> Su
>> +    read_len = num_bytes - offset;
>> +    /* read as much space once a time */
>> +    ret = read_extent_data(fs_info, data + offset,
>> +    bytenr + offset, _len, mirror);
>> +    if (ret)
>> +    goto out;
>> +
>> +    data_checked = 0;
>> +    /* verify every 4k data's checksum */
>> +    while (data_checked < read_len) {
>> +    csum = ~(u32)0;
>> +    tmp = offset + data_checked;
>> +
>> +    csum = btrfs_csum_data((char *)data + tmp,
>> +    csum, fs_info->sectorsize);
>> +    btrfs_csum_final(csum, (u8 *));
>> +
>> +    csum_offset = leaf_offset +
>> + tmp / fs_info->sectorsize * csum_size;
>> +    read_extent_buffer(eb, (char *)_expected,
>> +   csum_offset, csum_size);
>> +    if (csum != csum_expected)
>> +    fprintf(stderr,
>> +    "mirror %d bytenr %llu csum %u expected csum %u\n",
>>   mirror, bytenr + tmp,
>>   csum, csum_expected);
>> -    num_copies = btrfs_num_copies(root->fs_info,
>> -    bytenr, num_bytes);
>> -    if (mirror < num_copies - 1) {
>> -    mirror += 1;
>> -    goto again;
>> -    }
>> +    data_checked += fs_info->sectorsize;
>>   }
>> -    data_checked += fs_info->sectorsize;
>>   }
>>   offset += read_len;
>>   }
>> @@ -5624,7 +5619,11 @@ static int check_csums(struct btrfs_root *root)
>>   leaf_offset = btrfs_item_ptr_offset(leaf, path.slots[0]);
>>   ret = check_extent_csums(root, key.offset, data_len,
>>    leaf_offset, leaf);
>> -    if (ret)
>> +    /*
>> + * Only break for fatal errors, if mismatch is found,
>> + * continue checking until all extents are checked.
>> + */
>> +    if (ret < 0)
>>   break;
>>   skip_csum_check:
>>   if (!num_bytes) {
>>
> 
> 
> -- 
> To unsubscribe from this 

[PATCH] btrfs-progs: check/lowmem: Fix the incorrect error message of check_extent_data_item

2018-02-27 Thread Lu Fengqi
Instead of the disk_bytenr and disk_num_bytes of the extent_item which the
file extent references, we should output the objectid and offset of the
file extent.

Fixes: b0d360b541f0 ("btrfs-progs: check: introduce function to check data 
backref in extent tree")
Signed-off-by: Lu Fengqi 
---
 check/mode-lowmem.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 62bcf3d2e126..07863a9312ce 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -2631,9 +2631,8 @@ static int check_extent_data_item(struct btrfs_root *root,
 
if (!(extent_flags & BTRFS_EXTENT_FLAG_DATA)) {
error(
-   "extent[%llu %llu] backref type mismatch, wanted bit: %llx",
-   disk_bytenr, disk_num_bytes,
-   BTRFS_EXTENT_FLAG_DATA);
+   "file extent[%llu %llu] backref type mismatch, wanted bit: %llx",
+   fi_key.objectid, fi_key.offset, BTRFS_EXTENT_FLAG_DATA);
err |= BACKREF_MISMATCH;
}
 
@@ -2722,8 +2721,8 @@ out:
err |= BACKREF_MISSING;
btrfs_release_path();
if (err & BACKREF_MISSING) {
-   error("data extent[%llu %llu] backref lost",
- disk_bytenr, disk_num_bytes);
+   error("file extent[%llu %llu] backref lost",
+   fi_key.objectid, fi_key.offset);
}
return err;
 }
-- 
2.16.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] btrfs-progs: check: Check data csum for all copies

2018-02-27 Thread Su Yue



On 02/27/2018 05:12 PM, Qu Wenruo wrote:

Original --check-data-csum option will skip the extra copy if the first
copy matches csum.

Since offline scrub (with recoverability report) is still out-of-tree, at
least enhance --check-data-csum option to handle multiple copies.

Signed-off-by: Qu Wenruo 
---
  check/main.c | 65 ++--
  1 file changed, 32 insertions(+), 33 deletions(-)

diff --git a/check/main.c b/check/main.c
index 97baae583f04..f25fdc765d63 100644
--- a/check/main.c
+++ b/check/main.c
@@ -5381,42 +5381,37 @@ static int check_extent_csums(struct btrfs_root *root, 
u64 bytenr,
if (!data)
return -ENOMEM;
  
+	num_copies = btrfs_num_copies(root->fs_info, bytenr, num_bytes);

while (offset < num_bytes) {
-   mirror = 0;
-again:
-   read_len = num_bytes - offset;
-   /* read as much space once a time */
-   ret = read_extent_data(fs_info, data + offset,
-   bytenr + offset, _len, mirror);
-   if (ret)
-   goto out;
-   data_checked = 0;
-   /* verify every 4k data's checksum */
-   while (data_checked < read_len) {
-   csum = ~(u32)0;
-   tmp = offset + data_checked;
-
-   csum = btrfs_csum_data((char *)data + tmp,
-  csum, fs_info->sectorsize);
-   btrfs_csum_final(csum, (u8 *));
-
-   csum_offset = leaf_offset +
-tmp / fs_info->sectorsize * csum_size;
-   read_extent_buffer(eb, (char *)_expected,
-  csum_offset, csum_size);
-   /* try another mirror */
-   if (csum != csum_expected) {
-   fprintf(stderr, "mirror %d bytenr %llu csum %u 
expected csum %u\n",
+   for (mirror = 1; mirror <= num_copies; mirror++) {


Got your point.
But what confuses me is that why mirror starts from 1 here.
The mirror influences btrfs_map_block() which is not related
to this patch though.

Thanks,
Su

+   read_len = num_bytes - offset;
+   /* read as much space once a time */
+   ret = read_extent_data(fs_info, data + offset,
+   bytenr + offset, _len, mirror);
+   if (ret)
+   goto out;
+
+   data_checked = 0;
+   /* verify every 4k data's checksum */
+   while (data_checked < read_len) {
+   csum = ~(u32)0;
+   tmp = offset + data_checked;
+
+   csum = btrfs_csum_data((char *)data + tmp,
+   csum, fs_info->sectorsize);
+   btrfs_csum_final(csum, (u8 *));
+
+   csum_offset = leaf_offset +
+tmp / fs_info->sectorsize * csum_size;
+   read_extent_buffer(eb, (char *)_expected,
+  csum_offset, csum_size);
+   if (csum != csum_expected)
+   fprintf(stderr,
+   "mirror %d bytenr %llu csum %u expected csum %u\n",
mirror, bytenr + tmp,
csum, csum_expected);
-   num_copies = btrfs_num_copies(root->fs_info,
-   bytenr, num_bytes);
-   if (mirror < num_copies - 1) {
-   mirror += 1;
-   goto again;
-   }
+   data_checked += fs_info->sectorsize;
}
-   data_checked += fs_info->sectorsize;
}
offset += read_len;
}
@@ -5624,7 +5619,11 @@ static int check_csums(struct btrfs_root *root)
leaf_offset = btrfs_item_ptr_offset(leaf, path.slots[0]);
ret = check_extent_csums(root, key.offset, data_len,
 leaf_offset, leaf);
-   if (ret)
+   /*
+* Only break for fatal errors, if mismatch is found,
+* continue checking until all extents are checked.
+*/
+   if (ret < 0)
break;
  skip_csum_check:
if (!num_bytes) {




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to 

[PATCH 1/4] btrfs-progs: check: Check data csum for all copies

2018-02-27 Thread Qu Wenruo
Original --check-data-csum option will skip the extra copy if the first
copy matches csum.

Since offline scrub (with recoverability report) is still out-of-tree, at
least enhance --check-data-csum option to handle multiple copies.

Signed-off-by: Qu Wenruo 
---
 check/main.c | 65 ++--
 1 file changed, 32 insertions(+), 33 deletions(-)

diff --git a/check/main.c b/check/main.c
index 97baae583f04..f25fdc765d63 100644
--- a/check/main.c
+++ b/check/main.c
@@ -5381,42 +5381,37 @@ static int check_extent_csums(struct btrfs_root *root, 
u64 bytenr,
if (!data)
return -ENOMEM;
 
+   num_copies = btrfs_num_copies(root->fs_info, bytenr, num_bytes);
while (offset < num_bytes) {
-   mirror = 0;
-again:
-   read_len = num_bytes - offset;
-   /* read as much space once a time */
-   ret = read_extent_data(fs_info, data + offset,
-   bytenr + offset, _len, mirror);
-   if (ret)
-   goto out;
-   data_checked = 0;
-   /* verify every 4k data's checksum */
-   while (data_checked < read_len) {
-   csum = ~(u32)0;
-   tmp = offset + data_checked;
-
-   csum = btrfs_csum_data((char *)data + tmp,
-  csum, fs_info->sectorsize);
-   btrfs_csum_final(csum, (u8 *));
-
-   csum_offset = leaf_offset +
-tmp / fs_info->sectorsize * csum_size;
-   read_extent_buffer(eb, (char *)_expected,
-  csum_offset, csum_size);
-   /* try another mirror */
-   if (csum != csum_expected) {
-   fprintf(stderr, "mirror %d bytenr %llu csum %u 
expected csum %u\n",
+   for (mirror = 1; mirror <= num_copies; mirror++) {
+   read_len = num_bytes - offset;
+   /* read as much space once a time */
+   ret = read_extent_data(fs_info, data + offset,
+   bytenr + offset, _len, mirror);
+   if (ret)
+   goto out;
+
+   data_checked = 0;
+   /* verify every 4k data's checksum */
+   while (data_checked < read_len) {
+   csum = ~(u32)0;
+   tmp = offset + data_checked;
+
+   csum = btrfs_csum_data((char *)data + tmp,
+   csum, fs_info->sectorsize);
+   btrfs_csum_final(csum, (u8 *));
+
+   csum_offset = leaf_offset +
+tmp / fs_info->sectorsize * csum_size;
+   read_extent_buffer(eb, (char *)_expected,
+  csum_offset, csum_size);
+   if (csum != csum_expected)
+   fprintf(stderr,
+   "mirror %d bytenr %llu csum %u expected csum %u\n",
mirror, bytenr + tmp,
csum, csum_expected);
-   num_copies = btrfs_num_copies(root->fs_info,
-   bytenr, num_bytes);
-   if (mirror < num_copies - 1) {
-   mirror += 1;
-   goto again;
-   }
+   data_checked += fs_info->sectorsize;
}
-   data_checked += fs_info->sectorsize;
}
offset += read_len;
}
@@ -5624,7 +5619,11 @@ static int check_csums(struct btrfs_root *root)
leaf_offset = btrfs_item_ptr_offset(leaf, path.slots[0]);
ret = check_extent_csums(root, key.offset, data_len,
 leaf_offset, leaf);
-   if (ret)
+   /*
+* Only break for fatal errors, if mismatch is found,
+* continue checking until all extents are checked.
+*/
+   if (ret < 0)
break;
 skip_csum_check:
if (!num_bytes) {
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] btrfs-progs: check: Distingusih csum checking output for --check-data-csum

2018-02-27 Thread Qu Wenruo
No matter --check-data-csum is passed or not, btrfs check will always
output message like "checking csum".

This message could be a little confusing, change it according to
--check-data-csum option.

Signed-off-by: Qu Wenruo 
---
 check/main.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/check/main.c b/check/main.c
index 22a78273be15..3f4a6bcac4df 100644
--- a/check/main.c
+++ b/check/main.c
@@ -9845,7 +9845,11 @@ int cmd_check(int argc, char **argv)
goto out;
}
 
-   fprintf(stderr, "checking csums\n");
+   if (check_data_csum)
+   fprintf(stderr, "checking csums against data\n");
+   else
+   fprintf(stderr,
+   "checking csum items alone (without verifying data)\n");
ret = check_csums(root);
/*
 * Data csum error is not fatal, and it may indicates more serious
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] btrfs-progs: check: Continue check even csum error is found

2018-02-27 Thread Qu Wenruo
Since data csum is not a fatal error compared to fs/extent trees,
continue check.

Signed-off-by: Qu Wenruo 
---
 check/main.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/check/main.c b/check/main.c
index 15b3c402c9f5..22a78273be15 100644
--- a/check/main.c
+++ b/check/main.c
@@ -9847,11 +9847,13 @@ int cmd_check(int argc, char **argv)
 
fprintf(stderr, "checking csums\n");
ret = check_csums(root);
-   err |= !!ret;
-   if (ret) {
+   /*
+* Data csum error is not fatal, and it may indicates more serious
+* corruption, continue checking.
+*/
+   if (ret)
error("errors found in csum tree");
-   goto out;
-   }
+   err |= !!ret;
 
fprintf(stderr, "checking root refs\n");
/* For low memory mode, check_fs_roots_v2 handles root refs */
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] btrfs-progs: check: Fix data csum check return value

2018-02-27 Thread Qu Wenruo
When --check-data-csum option found csum mismatch, btrfs check still
return 0.

Fix it so log-replay could automatically pause when found csum error.

Signed-off-by: Qu Wenruo 
---
 check/main.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/check/main.c b/check/main.c
index f25fdc765d63..15b3c402c9f5 100644
--- a/check/main.c
+++ b/check/main.c
@@ -5356,6 +5356,13 @@ static int check_space_cache(struct btrfs_root *root)
return error ? -EINVAL : 0;
 }
 
+/*
+ * Check data checksum for [@bytenr, @bytenr + @num_bytes).
+ *
+ * Return <0 for fatal error (fails to read checksum/data or allocate memory).
+ * Return >0 for csum mismatch for *ANY* copy.
+ * Return 0 if everything is OK.
+ */
 static int check_extent_csums(struct btrfs_root *root, u64 bytenr,
u64 num_bytes, unsigned long leaf_offset,
struct extent_buffer *eb)
@@ -5373,6 +5380,7 @@ static int check_extent_csums(struct btrfs_root *root, 
u64 bytenr,
int ret = 0;
int mirror;
int num_copies;
+   bool csum_mismatch = false;
 
if (num_bytes % fs_info->sectorsize)
return -EINVAL;
@@ -5405,11 +5413,13 @@ static int check_extent_csums(struct btrfs_root *root, 
u64 bytenr,
 tmp / fs_info->sectorsize * csum_size;
read_extent_buffer(eb, (char *)_expected,
   csum_offset, csum_size);
-   if (csum != csum_expected)
+   if (csum != csum_expected) {
+   csum_mismatch = true;
fprintf(stderr,
"mirror %d bytenr %llu csum %u expected csum %u\n",
mirror, bytenr + tmp,
csum, csum_expected);
+   }
data_checked += fs_info->sectorsize;
}
}
@@ -5417,6 +5427,8 @@ static int check_extent_csums(struct btrfs_root *root, 
u64 bytenr,
}
 out:
free(data);
+   if (!ret && csum_mismatch)
+   ret = 1;
return ret;
 }
 
@@ -5625,6 +5637,8 @@ static int check_csums(struct btrfs_root *root)
 */
if (ret < 0)
break;
+   if (ret > 0)
+   errors++;
 skip_csum_check:
if (!num_bytes) {
offset = key.offset;
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] btrfs check --check-data-csum enhancement for

2018-02-27 Thread Qu Wenruo
Log-writes tool from Josef exposed btrfs data corruption.

Although still digging, at least enhance --check-data-csum option so
that log-replay --check can stop exactly at the point where btrfs data
get corrupted, instead of manually checking the output.

Qu Wenruo (4):
  btrfs-progs: check: Check data csum for all copies
  btrfs-progs: check: Fix data csum check return value
  btrfs-progs: check: Continue check even csum error is found
  btrfs-progs: check: Distingusih csum checking output for
--check-data-csum

 check/main.c | 93 
 1 file changed, 56 insertions(+), 37 deletions(-)

-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html