Re: [PATCH v5 8/8] btrfs: new ioctls to do logical-inode and inode-path resolving

2011-07-22 Thread Jan Schmidt
On 21.07.2011 22:14, Andi Kleen wrote:
 Jan Schmidt list.bt...@jan-o-sch.net writes:
 +
 +static long btrfs_ioctl_logical_to_ino(struct btrfs_root *root,
 +void __user *arg)
 +{
 +int ret = 0;
 +int size;
 +u64 extent_offset;
 +struct btrfs_ioctl_logical_ino_args *loi;
 +struct btrfs_data_container *inodes = NULL;
 +struct btrfs_path *path = NULL;
 +struct btrfs_key key;
 
 This really needs to be root-only for obvious reasons.
 The same for the ino_path function
 
 +
 +loi = memdup_user(arg, sizeof(*loi));
 +if (IS_ERR(loi)) {
 +ret = PTR_ERR(loi);
 +loi = NULL;
 +goto out;
 +}
 +
 +path = btrfs_alloc_path();
 +if (!path) {
 +ret = -ENOMEM;
 +goto out;
 +}
 +
 +size = min(loi-size, 4096);
 
 This is likely a root hole. loi-size is signed! Consider the case
 of a negative value being passed in.
 
 Same for the earlier function.

Sigh. Thanks for pointing these out. Shouldn't release code that was
fine for development without carefully reconsidering such things. I'll
send a v6.

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Broken btrfs?

2011-07-22 Thread Jan Schmidt
On 21.07.2011 23:13, Jan Schubert wrote:
 On 07/18/2011 10:29 AM, Jan Schmidt wrote:
 If you are on a 3.0 kernel, get the most current version of btrfs
 tools from Hugo's integration-20110705 branch at
 http://git.darksatanic.net/repo/btrfs-progs-unstable.git/ and do a
 scrub. -Jan 
 
 Thx Jan, I did. This is the result:
 
 scrub status for 03201fc0-7695-4468-9a10-f61ad79f23ca
 scrub started at Thu Jul 21 22:27:31 2011 and finished after 787
 seconds
 total bytes scrubbed: 173.91GB with 2211 errors
 error details: csum=2211
 corrected errors: 0, uncorrectable errors: 2211
 
 Any help what to do now? Should I stick with this filesystem or create a
 new one?

Well, you won't be able to repair the broken files. You can create a new
filesystem. It is not guaranteed that this won't result in similar
problems, though. You might have a built on a sandy hard drive.

 The good thing is, running 3.0 does not crash the system anymore while
 accessing corrupt data but just printing an I/O error.

Scrub should be printing inode numbers to your system log while
detecting those errors. If you want to know the exact files corrupted,
you can grab my patch set with subject Btrfs scrub: print path to
corrupted files and trigger nodatasum fixup from the list and give it a
try.

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: new metadata reader/writer locks in integration-test

2011-07-22 Thread Miao Xie
On  fri, 22 Jul 2011 12:06:40 +0800, Miao Xie wrote:
 On thu, 21 Jul 2011 20:53:24 -0400, Chris Mason wrote:
 Hi everyone,

 I just rebased Josef's enospc fixes into integration-test, it should fix
 the warnings in extent-tree.c


 Unfortunately, I got the following messages.


 Jul 21 09:41:22 luna kernel: [ cut here ]
 Jul 21 09:41:22 luna kernel: WARNING: at fs/btrfs/extent-tree.c:5564 
 btrfs_alloc_reserved_file_extent+0xf8/0x100 [btrfs]()
 Jul 21 09:41:22 luna kernel: Hardware name: PRIMERGY
 Jul 21 09:41:22 luna kernel: Modules linked in: btrfs zlib_deflate crc32c 
 libcrc32c autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq 
 freq_table mperf ipv6 ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm 
 uinput ppdev parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt 
 iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac_core ext4 
 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas floppy 
 pata_acpi ata_generic ata_piix libata scsi_mod [last unloaded: microcode]
 Jul 21 09:41:22 luna kernel: Pid: 5517, comm: btrfs-endio-wri Tainted: G   
  W   2.6.39btrfs-tc1+ #1
 Jul 21 09:41:22 luna kernel: Call Trace:
 Jul 21 09:41:22 luna kernel: [8106004f] 
 warn_slowpath_common+0x7f/0xc0
 Jul 21 09:41:22 luna kernel: [810600aa] 
 warn_slowpath_null+0x1a/0x20
 Jul 21 09:41:22 luna kernel: [a044a068] 
 btrfs_alloc_reserved_file_extent+0xf8/0x100 [btrfs]
 Jul 21 09:41:22 luna kernel: [a0464121] 
 insert_reserved_file_extent.clone.0+0x201/0x270 [btrfs]
 Jul 21 09:41:22 luna kernel: [a0468c0b] 
 btrfs_finish_ordered_io+0x2eb/0x360 [btrfs]
 Jul 21 09:41:22 luna kernel: [8106fe23] ? 
 try_to_del_timer_sync+0x83/0xe0
 Jul 21 09:41:22 luna kernel: [a0468cd0] 
 btrfs_writepage_end_io_hook+0x50/0xa0 [btrfs]
 Jul 21 09:41:22 luna kernel: [a049a3c6] 
 end_compressed_bio_write+0x86/0xf0 [btrfs]
 Jul 21 09:41:22 luna kernel: [8117f96d] bio_endio+0x1d/0x40
 Jul 21 09:41:22 luna kernel: [a0459d84] 
 end_workqueue_fn+0xf4/0x130 [btrfs]
 Jul 21 09:41:22 luna kernel: [a048841e] worker_loop+0x13e/0x540 
 [btrfs]
 Jul 21 09:41:22 luna kernel: [a04882e0] ? 
 btrfs_queue_worker+0x2d0/0x2d0 [btrfs]
 Jul 21 09:41:22 luna kernel: [a04882e0] ? 
 btrfs_queue_worker+0x2d0/0x2d0 [btrfs]
 Jul 21 09:41:22 luna kernel: [81081756] kthread+0x96/0xa0
 Jul 21 09:41:22 luna kernel: [81486004] 
 kernel_thread_helper+0x4/0x10
 Jul 21 09:41:22 luna kernel: [810816c0] ? 
 kthread_worker_fn+0x1a0/0x1a0
 Jul 21 09:41:22 luna kernel: [81486000] ? gs_change+0x13/0x13
 Jul 21 09:41:22 luna kernel: ---[ end trace 02c1fa3044677043 ]---


 a very similar warning here, but without compression involved:

 Ok, these are probably the enospc fixes.  Could you please try bisecting
 out some of Josef's patches?
 
 I did binary search and found the following patch led to this problem.
 
 commit 97ffc7d564f55787c7d9ea557d5d30d9ecb2f003
 Author: Josef Bacik jo...@redhat.com
 Date:   Fri Jul 15 18:29:11 2011 +
 
 Btrfs: don't be as agressive with delalloc metadata reservations
 
 Currently we reserve enough space to COW an entirely full btree for every 
 ex
 we have reserved for an inode.  This _sucks_, because you only need to 
 COW o
 and then everybody else is ok.  Unfortunately we don't know we'll all be 
 abl
 get into the same transaction so that's what we have had to do.  But the 
 glo
 reserve holds a reservation large enough to cover a large percentage of 
 all 
 metadata currently in the fs.  So all we really need to account for is 
 any n
 blocks that we may allocate.  So fix this by
   ……

Please ignore my analysis and patch, which can not fix the problem.

 The reason is the calculation of the reservation is wrong, the nodes in the 
 search path
 may be split, and new nodes may be created, but the above patch didn't 
 reserve space for
 these new nodes.
 
 The following patch can fix it. Though my test passed, I still need Arne's 
 verification
 to make sure it can fix all the reported problems.
 Arne, Could you test it for me?
 
 Subject: [PATCH] Btrfs: fix wrong calculation of the reservation for the 
 transaction
 
 At worst, Btrfs may split all the nodes in the search path, so we must take
 those new nodes into account when we calculate the space that need be 
 reserved.
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/ctree.h |8 +++-
  1 files changed, 7 insertions(+), 1 deletions(-)
 
 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index d813a67..4f23819 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -2133,10 +2133,16 @@ static inline bool btrfs_mixed_space_info(struct 
 btrfs_space_info *space_info)
  }
  
  /* extent-tree.c */
 +/*
 + * This inline function is used to calc the size of new nodes/leaves that we
 + * may create. At worst, we may split all the nodes in the path 

[PATCH v6 4/8] btrfs scrub: bugfix: mirror_num off by one

2011-07-22 Thread Jan Schmidt
Fix the mirror_num determination in scrub_stripe. The rest of the scrub code
did not use mirror_num for anything important and that error went unnoticed.
The nodatasum fixup patch of this set depends on a correct mirror_num.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/scrub.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 221fd5c..59caf8f 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -452,7 +452,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix)
 * first find a good copy
 */
for (i = 0; i  multi-num_stripes; ++i) {
-   if (i == sbio-spag[ix].mirror_num)
+   if (i + 1 == sbio-spag[ix].mirror_num)
continue;
 
if (scrub_fixup_io(READ, multi-stripes[i].dev-bdev,
@@ -930,21 +930,21 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_dev *sdev,
if (map-type  BTRFS_BLOCK_GROUP_RAID0) {
offset = map-stripe_len * num;
increment = map-stripe_len * map-num_stripes;
-   mirror_num = 0;
+   mirror_num = 1;
} else if (map-type  BTRFS_BLOCK_GROUP_RAID10) {
int factor = map-num_stripes / map-sub_stripes;
offset = map-stripe_len * (num / map-sub_stripes);
increment = map-stripe_len * factor;
-   mirror_num = num % map-sub_stripes;
+   mirror_num = num % map-sub_stripes + 1;
} else if (map-type  BTRFS_BLOCK_GROUP_RAID1) {
increment = map-stripe_len;
-   mirror_num = num % map-num_stripes;
+   mirror_num = num % map-num_stripes + 1;
} else if (map-type  BTRFS_BLOCK_GROUP_DUP) {
increment = map-stripe_len;
-   mirror_num = num % map-num_stripes;
+   mirror_num = num % map-num_stripes + 1;
} else {
increment = map-stripe_len;
-   mirror_num = 0;
+   mirror_num = 1;
}
 
path = btrfs_alloc_path();
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 8/8] btrfs: new ioctls to do logical-inode and inode-path resolving

2011-07-22 Thread Jan Schmidt
these ioctls make use of the new functions initially added for scrub. they
return all inodes belonging to a logical address (BTRFS_IOC_LOGICAL_INO) and
all paths belonging to an inode (BTRFS_IOC_INO_PATHS).

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/ioctl.c |  145 ++
 fs/btrfs/ioctl.h |   19 +++
 2 files changed, 164 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a3c4751..aac4c05 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -51,6 +51,7 @@
 #include volumes.h
 #include locking.h
 #include inode-map.h
+#include backref.h
 
 /* Mask out flags that are inappropriate for the given type of inode. */
 static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
@@ -2836,6 +2837,146 @@ static long btrfs_ioctl_scrub_progress(struct 
btrfs_root *root,
return ret;
 }
 
+static long btrfs_ioctl_ino_to_path(struct btrfs_root *root, void __user *arg)
+{
+   int ret = 0;
+   int i;
+   unsigned long rel_ptr;
+   int size;
+   struct btrfs_ioctl_ino_path_args *ipa;
+   struct inode_fs_paths *ipath = NULL;
+   struct btrfs_path *path;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   path = btrfs_alloc_path();
+   if (!path) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   ipa = memdup_user(arg, sizeof(*ipa));
+   if (IS_ERR(ipa)) {
+   ret = PTR_ERR(ipa);
+   ipa = NULL;
+   goto out;
+   }
+
+   size = min(ipa-size, 4096);
+   ipath = init_ipath(size, root, path);
+   if (IS_ERR(ipath)) {
+   ret = PTR_ERR(ipath);
+   ipath = NULL;
+   goto out;
+   }
+
+   ret = paths_from_inode(ipa-inum, ipath);
+   if (ret  0)
+   goto out;
+
+   for (i = 0; i  ipath-fspath-elem_cnt; ++i) {
+   rel_ptr = ipath-fspath-str[i] - (char *)ipath-fspath-str;
+   ipath-fspath-str[i] = (void *)rel_ptr;
+   }
+
+   ret = copy_to_user(ipa-fspath, ipath-fspath, size);
+   if (ret) {
+   ret = -EFAULT;
+   goto out;
+   }
+
+out:
+   btrfs_free_path(path);
+   free_ipath(ipath);
+   kfree(ipa);
+
+   return ret;
+}
+
+static int build_ino_list(u64 inum, u64 offset, u64 root, void *ctx)
+{
+   struct btrfs_data_container *inodes = ctx;
+
+   inodes-size -= 3 * sizeof(u64);
+   if (inodes-size  0) {
+   inodes-val[inodes-elem_cnt] = inum;
+   inodes-val[inodes-elem_cnt + 1] = offset;
+   inodes-val[inodes-elem_cnt + 2] = root;
+   inodes-elem_cnt += 3;
+   } else {
+   inodes-elem_missed += 3;
+   }
+
+   return 0;
+}
+
+static long btrfs_ioctl_logical_to_ino(struct btrfs_root *root,
+   void __user *arg)
+{
+   int ret = 0;
+   int size;
+   u64 extent_offset;
+   struct btrfs_ioctl_logical_ino_args *loi;
+   struct btrfs_data_container *inodes = NULL;
+   struct btrfs_path *path = NULL;
+   struct btrfs_key key;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   loi = memdup_user(arg, sizeof(*loi));
+   if (IS_ERR(loi)) {
+   ret = PTR_ERR(loi);
+   loi = NULL;
+   goto out;
+   }
+
+   if (loi-size = 0) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   path = btrfs_alloc_path();
+   if (!path) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   size = min(loi-size, 4096);
+   inodes = init_data_container(size);
+   if (IS_ERR(inodes)) {
+   ret = PTR_ERR(inodes);
+   inodes = NULL;
+   goto out;
+   }
+
+   ret = extent_from_logical(root-fs_info, loi-logical, path, key);
+
+   if (ret  BTRFS_EXTENT_FLAG_TREE_BLOCK)
+   ret = -ENOENT;
+   if (ret  0)
+   goto out;
+
+   extent_offset = loi-logical - key.objectid;
+   ret = iterate_extent_inodes(root-fs_info, path, key.objectid,
+   extent_offset, build_ino_list, inodes);
+
+   if (ret  0)
+   goto out;
+
+   ret = copy_to_user(loi-inodes, inodes, size);
+   if (ret)
+   ret = -EFAULT;
+
+out:
+   btrfs_free_path(path);
+   kfree(inodes);
+   kfree(loi);
+
+   return ret;
+}
+
 long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
 {
@@ -2893,6 +3034,10 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_tree_search(file, argp);
case BTRFS_IOC_INO_LOOKUP:
return btrfs_ioctl_ino_lookup(file, argp);
+   case BTRFS_IOC_INO_PATHS:
+   return btrfs_ioctl_ino_to_path(root, argp);
+   case BTRFS_IOC_LOGICAL_INO:

[PATCH v6 1/8] btrfs: added helper functions to iterate backrefs

2011-07-22 Thread Jan Schmidt
These helper functions iterate back references and call a function for each
backref. There is also a function to resolve an inode to a path in the
file system.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/Makefile  |3 +-
 fs/btrfs/backref.c |  748 
 fs/btrfs/backref.h |   62 +
 fs/btrfs/ioctl.h   |   10 +
 4 files changed, 822 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 9b72dcf..c63f649 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -7,4 +7,5 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
root-tree.o dir-item.o \
   extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
   export.o tree-log.o acl.o free-space-cache.o zlib.o lzo.o \
-  compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o
+  compression.o delayed-ref.o relocation.o delayed-inode.o backref.o \
+  scrub.o
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
new file mode 100644
index 000..477f154
--- /dev/null
+++ b/fs/btrfs/backref.c
@@ -0,0 +1,748 @@
+/*
+ * Copyright (C) 2011 STRATO.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include ctree.h
+#include disk-io.h
+#include backref.h
+
+struct __data_ref {
+   struct list_head list;
+   u64 inum;
+   u64 root;
+   u64 extent_data_item_offset;
+};
+
+struct __shared_ref {
+   struct list_head list;
+   u64 disk_byte;
+};
+
+static int __inode_info(u64 inum, u64 ioff, u8 key_type,
+   struct btrfs_root *fs_root, struct btrfs_path *path,
+   struct btrfs_key *found_key)
+{
+   int ret;
+   struct btrfs_key key;
+   struct extent_buffer *eb;
+
+   key.type = key_type;
+   key.objectid = inum;
+   key.offset = ioff;
+
+   ret = btrfs_search_slot(NULL, fs_root, key, path, 0, 0);
+   if (ret  0)
+   return ret;
+
+   eb = path-nodes[0];
+   if (ret  path-slots[0] = btrfs_header_nritems(eb)) {
+   ret = btrfs_next_leaf(fs_root, path);
+   if (ret)
+   return ret;
+   eb = path-nodes[0];
+   }
+
+   btrfs_item_key_to_cpu(eb, found_key, path-slots[0]);
+   if (found_key-type != key.type || found_key-objectid != key.objectid)
+   return 1;
+
+   return 0;
+}
+
+/*
+ * this makes the path point to (inum INODE_ITEM ioff)
+ */
+int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
+   struct btrfs_path *path)
+{
+   struct btrfs_key key;
+   return __inode_info(inum, ioff, BTRFS_INODE_ITEM_KEY, fs_root, path,
+   key);
+}
+
+static int inode_ref_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
+   struct btrfs_path *path, int strict,
+   u64 *out_parent_inum,
+   struct extent_buffer **out_iref_eb,
+   int *out_slot)
+{
+   int ret;
+   struct btrfs_key found_key;
+
+   ret = __inode_info(inum, ioff, BTRFS_INODE_REF_KEY, fs_root, path,
+   found_key);
+
+   if (!ret) {
+   if (out_slot)
+   *out_slot = path-slots[0];
+   if (out_iref_eb)
+   *out_iref_eb = path-nodes[0];
+   if (out_parent_inum)
+   *out_parent_inum = found_key.offset;
+   }
+
+   btrfs_release_path(path);
+   return ret;
+}
+
+/*
+ * this iterates to turn a btrfs_inode_ref into a full filesystem path. 
elements
+ * of the path are separated by '/' and the path is guaranteed to be
+ * 0-terminated. the path is only given within the current file system.
+ * Therefore, it never starts with a '/'. the caller is responsible to provide
+ * size bytes in dest. the dest buffer will be filled backwards. finally,
+ * the start point of the resulting string is returned. this pointer is within
+ * dest, normally.
+ * in case the path buffer would overflow, the pointer is decremented further
+ * as if output was written to the buffer, though no more output is actually
+ * generated. that way, the caller 

[PATCH v6 7/8] btrfs scrub: add fixup code for errors on nodatasum files

2011-07-22 Thread Jan Schmidt
This removes a FIXME comment and introduces the first part of nodatasum
fixup: It gets the corresponding inode for a logical address and triggers a
regular readpage for the corrupted sector.

Once we have on-the-fly error correction our error will be automatically
corrected. The correction code is expected to clear the newly introduced
EXTENT_DAMAGED flag, making scrub report that error as corrected instead
of uncorrectable eventually.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/extent_io.h |1 +
 fs/btrfs/scrub.c |  188 --
 2 files changed, 183 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 22bf366..2734fd9 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -17,6 +17,7 @@
 #define EXTENT_NODATASUM (1  10)
 #define EXTENT_DO_ACCOUNTING (1  11)
 #define EXTENT_FIRST_DELALLOC (1  12)
+#define EXTENT_DAMAGED (1  13)
 #define EXTENT_IOBITS (EXTENT_LOCKED | EXTENT_WRITEBACK)
 #define EXTENT_CTLBITS (EXTENT_DO_ACCOUNTING | EXTENT_FIRST_DELALLOC)
 
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 41a0114..db09f01 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -22,6 +22,7 @@
 #include volumes.h
 #include disk-io.h
 #include ordered-data.h
+#include transaction.h
 #include backref.h
 
 /*
@@ -89,6 +90,7 @@ struct scrub_dev {
int first_free;
int curr;
atomic_tin_flight;
+   atomic_tfixup_cnt;
spinlock_t  list_lock;
wait_queue_head_t   list_wait;
u16 csum_size;
@@ -102,6 +104,14 @@ struct scrub_dev {
spinlock_t  stat_lock;
 };
 
+struct scrub_fixup_nodatasum {
+   struct scrub_dev*sdev;
+   u64 logical;
+   struct btrfs_root   *root;
+   struct btrfs_work   work;
+   int mirror_num;
+};
+
 struct scrub_warning {
struct btrfs_path   *path;
u64 extent_item_size;
@@ -190,12 +200,13 @@ struct scrub_dev *scrub_setup_dev(struct btrfs_device 
*dev)
 
if (i != SCRUB_BIOS_PER_DEV-1)
sdev-bios[i]-next_free = i + 1;
-else
+   else
sdev-bios[i]-next_free = -1;
}
sdev-first_free = 0;
sdev-curr = -1;
atomic_set(sdev-in_flight, 0);
+   atomic_set(sdev-fixup_cnt, 0);
atomic_set(sdev-cancel_req, 0);
sdev-csum_size = btrfs_super_csum_size(fs_info-super_copy);
INIT_LIST_HEAD(sdev-csum_list);
@@ -347,6 +358,151 @@ out:
kfree(swarn.msg_buf);
 }
 
+static int scrub_fixup_readpage(u64 inum, u64 offset, u64 root, void *ctx)
+{
+   struct page *page;
+   unsigned long index;
+   struct scrub_fixup_nodatasum *fixup = ctx;
+   int ret;
+   int corrected;
+   struct btrfs_key key;
+   struct inode *inode;
+   u64 end = offset + PAGE_SIZE - 1;
+   struct btrfs_root *local_root;
+
+   key.objectid = root;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = (u64)-1;
+   local_root = btrfs_read_fs_root_no_name(fixup-root-fs_info, key);
+   if (IS_ERR(local_root))
+   return PTR_ERR(local_root);
+
+   key.type = BTRFS_INODE_ITEM_KEY;
+   key.objectid = inum;
+   key.offset = 0;
+   inode = btrfs_iget(fixup-root-fs_info-sb, key, local_root, NULL);
+   if (IS_ERR(inode))
+   return PTR_ERR(inode);
+
+   ret = set_extent_bit(BTRFS_I(inode)-io_tree, offset, end,
+   EXTENT_DAMAGED, 0, NULL, NULL, GFP_NOFS);
+
+   /* set_extent_bit should either succeed or give proper error */
+   WARN_ON(ret  0);
+   if (ret)
+   return ret  0 ? ret : -EFAULT;
+
+   index = offset  PAGE_CACHE_SHIFT;
+
+   page = find_or_create_page(inode-i_mapping, index, GFP_NOFS);
+   if (!page)
+   return -ENOMEM;
+
+   ret = extent_read_full_page(BTRFS_I(inode)-io_tree, page,
+   btrfs_get_extent, fixup-mirror_num);
+   wait_on_page_locked(page);
+   corrected = !test_range_bit(BTRFS_I(inode)-io_tree, offset, end,
+   EXTENT_DAMAGED, 0, NULL);
+
+   if (corrected)
+   WARN_ON(!PageUptodate(page));
+   else
+   clear_extent_bit(BTRFS_I(inode)-io_tree, offset, end,
+   EXTENT_DAMAGED, 0, 0, NULL, GFP_NOFS);
+
+   put_page(page);
+   iput(inode);
+
+   if (ret  0)
+   return ret;
+
+   if (ret == 0  corrected) {
+   /*
+* we only need to call readpage for one of the inodes belonging
+* to this extent. so make iterate_extent_inodes stop
+*/
+   return 1;
+   }

[PATCH v6 0/8] Btrfs scrub: print path to corrupted files and trigger nodatasum fixup

2011-07-22 Thread Jan Schmidt
Here comes the fix for the bug immediately following the very last bug in
this patch series:

Changelog v5-v6:
- fixed ioctl priviledge and input sanity checking (reported by Andi Kleen)

Original message follows:

This patch set introduces two new features for scrub. They share the backref
iteration code which is the reason they made it into the same patch set.

The first feature adds printk statements in case scrub finds an error which list
all affected files. You will need patch 1, 2 and 3 for that.

The second feature adds the trigger which enables us to correct i/o errors in
case the affected extent does not have a checksum (nodatasum), eventually. You
will need patch 1, 4, 5 and 6 for that.

I tried to apply all patches to the current cmason/for-linus branch and to
Arne's current for-chris branch. They do apply with no errors (some offsets
possible).

The new ioctl()s can be tested from usermode by applying the patch series
[PATCH v2 0/3] Btrfs-progs: add the first inspect-internal commands
from this mailing list to the user land tools.

Please review.

Next I'm starting to make up my mind how to implement on-the-fly error
correction correctly. This will enable us to rewrite good data whenever we
encounter a bad copy. I have some preliminary patches already, the stress in the
first sentence is on correctly. The second feature mentioned in this patch
series will then automatically use that code, too.

Changelog v1-v2:
- Various cleanup, sensible error codes as suggested by David Sterba

Changelog v2-v3:
- evaluation and iteration of shared refs
- support for in-tree refs (v2 iterated inline refs only)
- never call an interator function without releasing the path
- iterate_irefs now returns -ENOENT in case no refs are found
- some stupid bugs removed where release_path was called too early
- ioctls added to provide new functions to user mode
- bugfixes for cases where search_slot found the very end of a leaf
- bugfix: use right fs root for readpage instead of fs_root-fs_info
- based on current cmason/for-linus

Changelog v3-v4:
- fixed a regression with mirror_num that could prevent error correction
- based on current cmason/for-linus

Changelog v4-v5:
- fixed a deadlock when fixup is taking longer while scrub is about to end

Please try it and report errors (or confirm there are none, of course). I can
provide a place to pull from if anyone likes.

-Jan

Jan Schmidt (8):
  btrfs: added helper functions to iterate backrefs
  btrfs scrub: added unverified_errors
  btrfs scrub: print paths of corrupted files
  btrfs scrub: bugfix: mirror_num off by one
  btrfs: add mirror_num to extent_read_full_page
  btrfs scrub: use int for mirror_num, not u64
  btrfs scrub: add fixup code for errors on nodatasum files
  btrfs: new ioctls to do logical-inode and inode-path resolving

 fs/btrfs/Makefile|3 +-
 fs/btrfs/backref.c   |  748 ++
 fs/btrfs/backref.h   |   62 +
 fs/btrfs/disk-io.c   |2 +-
 fs/btrfs/extent_io.c |6 +-
 fs/btrfs/extent_io.h |3 +-
 fs/btrfs/inode.c |2 +-
 fs/btrfs/ioctl.c |  145 ++
 fs/btrfs/ioctl.h |   29 ++
 fs/btrfs/scrub.c |  414 +---
 10 files changed, 1374 insertions(+), 40 deletions(-)
 create mode 100644 fs/btrfs/backref.c
 create mode 100644 fs/btrfs/backref.h

-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 2/8] btrfs scrub: added unverified_errors

2011-07-22 Thread Jan Schmidt
In normal operation, scrub is reading data sequentially in large portions.
In case of an i/o error, we try to find the corrupted area(s) by issuing
page sized read requests. With this commit we increment the
unverified_errors counter if all of the small size requests succeed.

Userland patches carrying such conspicous events to the administrator should
already be around.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/scrub.c |   37 ++---
 1 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index a8d03d5..35099fa 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -201,18 +201,25 @@ nomem:
  * recheck_error gets called for every page in the bio, even though only
  * one may be bad
  */
-static void scrub_recheck_error(struct scrub_bio *sbio, int ix)
+static int scrub_recheck_error(struct scrub_bio *sbio, int ix)
 {
+   struct scrub_dev *sdev = sbio-sdev;
+   u64 sector = (sbio-physical + ix * PAGE_SIZE)  9;
+
if (sbio-err) {
-   if (scrub_fixup_io(READ, sbio-sdev-dev-bdev,
-  (sbio-physical + ix * PAGE_SIZE)  9,
+   if (scrub_fixup_io(READ, sbio-sdev-dev-bdev, sector,
   sbio-bio-bi_io_vec[ix].bv_page) == 0) {
if (scrub_fixup_check(sbio, ix) == 0)
-   return;
+   return 0;
}
}
 
+   spin_lock(sdev-stat_lock);
+   ++sdev-stat.read_errors;
+   spin_unlock(sdev-stat_lock);
+
scrub_fixup(sbio, ix);
+   return 1;
 }
 
 static int scrub_fixup_check(struct scrub_bio *sbio, int ix)
@@ -382,8 +389,14 @@ static void scrub_checksum(struct btrfs_work *work)
int ret;
 
if (sbio-err) {
+   ret = 0;
for (i = 0; i  sbio-count; ++i)
-   scrub_recheck_error(sbio, i);
+   ret |= scrub_recheck_error(sbio, i);
+   if (!ret) {
+   spin_lock(sdev-stat_lock);
+   ++sdev-stat.unverified_errors;
+   spin_unlock(sdev-stat_lock);
+   }
 
sbio-bio-bi_flags = ~(BIO_POOL_MASK - 1);
sbio-bio-bi_flags |= 1  BIO_UPTODATE;
@@ -396,10 +409,6 @@ static void scrub_checksum(struct btrfs_work *work)
bi-bv_offset = 0;
bi-bv_len = PAGE_SIZE;
}
-
-   spin_lock(sdev-stat_lock);
-   ++sdev-stat.read_errors;
-   spin_unlock(sdev-stat_lock);
goto out;
}
for (i = 0; i  sbio-count; ++i) {
@@ -420,8 +429,14 @@ static void scrub_checksum(struct btrfs_work *work)
WARN_ON(1);
}
kunmap_atomic(buffer, KM_USER0);
-   if (ret)
-   scrub_recheck_error(sbio, i);
+   if (ret) {
+   ret = scrub_recheck_error(sbio, i);
+   if (!ret) {
+   spin_lock(sdev-stat_lock);
+   ++sdev-stat.unverified_errors;
+   spin_unlock(sdev-stat_lock);
+   }
+   }
}
 
 out:
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 6/8] btrfs scrub: use int for mirror_num, not u64

2011-07-22 Thread Jan Schmidt
the rest of the code uses int mirror_num, and so should scrub

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/scrub.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 59caf8f..41a0114 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -65,7 +65,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix);
 struct scrub_page {
u64 flags;  /* extent flags */
u64 generation;
-   u64 mirror_num;
+   int mirror_num;
int have_csum;
u8  csum[BTRFS_CSUM_SIZE];
 };
@@ -776,7 +776,7 @@ nomem:
 }
 
 static int scrub_page(struct scrub_dev *sdev, u64 logical, u64 len,
- u64 physical, u64 flags, u64 gen, u64 mirror_num,
+ u64 physical, u64 flags, u64 gen, int mirror_num,
  u8 *csum, int force)
 {
struct scrub_bio *sbio;
@@ -873,7 +873,7 @@ static int scrub_find_csum(struct scrub_dev *sdev, u64 
logical, u64 len,
 
 /* scrub extent tries to collect up to 64 kB for each bio */
 static int scrub_extent(struct scrub_dev *sdev, u64 logical, u64 len,
-   u64 physical, u64 flags, u64 gen, u64 mirror_num)
+   u64 physical, u64 flags, u64 gen, int mirror_num)
 {
int ret;
u8 csum[BTRFS_CSUM_SIZE];
@@ -919,7 +919,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_dev 
*sdev,
u64 physical;
u64 logical;
u64 generation;
-   u64 mirror_num;
+   int mirror_num;
 
u64 increment = map-stripe_len;
u64 offset;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 5/8] btrfs: add mirror_num to extent_read_full_page

2011-07-22 Thread Jan Schmidt
Currently, extent_read_full_page always assumes we are trying to read mirror
0, which generally is the best we can do. To add flexibility, pass it as a
parameter. This will be needed by scrub fixup code.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/disk-io.c   |2 +-
 fs/btrfs/extent_io.c |6 +++---
 fs/btrfs/extent_io.h |2 +-
 fs/btrfs/inode.c |2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1ac8db5d..b898319 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -874,7 +874,7 @@ static int btree_readpage(struct file *file, struct page 
*page)
 {
struct extent_io_tree *tree;
tree = BTRFS_I(page-mapping-host)-io_tree;
-   return extent_read_full_page(tree, page, btree_get_extent);
+   return extent_read_full_page(tree, page, btree_get_extent, 0);
 }
 
 static int btree_releasepage(struct page *page, gfp_t gfp_flags)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b181a94..b78f665 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2111,16 +2111,16 @@ static int __extent_read_full_page(struct 
extent_io_tree *tree,
 }
 
 int extent_read_full_page(struct extent_io_tree *tree, struct page *page,
-   get_extent_t *get_extent)
+   get_extent_t *get_extent, int mirror_num)
 {
struct bio *bio = NULL;
unsigned long bio_flags = 0;
int ret;
 
-   ret = __extent_read_full_page(tree, page, get_extent, bio, 0,
+   ret = __extent_read_full_page(tree, page, get_extent, bio, mirror_num,
  bio_flags);
if (bio)
-   ret = submit_one_bio(READ, bio, 0, bio_flags);
+   ret = submit_one_bio(READ, bio, mirror_num, bio_flags);
return ret;
 }
 
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index a11a92e..22bf366 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -177,7 +177,7 @@ int unlock_extent_cached(struct extent_io_tree *tree, u64 
start, u64 end,
 int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
gfp_t mask);
 int extent_read_full_page(struct extent_io_tree *tree, struct page *page,
- get_extent_t *get_extent);
+ get_extent_t *get_extent, int mirror_num);
 int __init extent_io_init(void);
 void extent_io_exit(void);
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4a13730..730ee3d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6250,7 +6250,7 @@ int btrfs_readpage(struct file *file, struct page *page)
 {
struct extent_io_tree *tree;
tree = BTRFS_I(page-mapping-host)-io_tree;
-   return extent_read_full_page(tree, page, btrfs_get_extent);
+   return extent_read_full_page(tree, page, btrfs_get_extent, 0);
 }
 
 static int btrfs_writepage(struct page *page, struct writeback_control *wbc)
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 3/8] btrfs scrub: print paths of corrupted files

2011-07-22 Thread Jan Schmidt
While scrubbing, we may encounter various errors. Previously, a logical
address was printed to the log only. Now, all paths belonging to that
address are resolved and printed separately. That should work for hardlinks
as well as reflinks.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/scrub.c |  169 --
 1 files changed, 163 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 35099fa..221fd5c 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -17,10 +17,12 @@
  */
 
 #include linux/blkdev.h
+#include linux/ratelimit.h
 #include ctree.h
 #include volumes.h
 #include disk-io.h
 #include ordered-data.h
+#include backref.h
 
 /*
  * This is only the first step towards a full-features scrub. It reads all
@@ -100,6 +102,19 @@ struct scrub_dev {
spinlock_t  stat_lock;
 };
 
+struct scrub_warning {
+   struct btrfs_path   *path;
+   u64 extent_item_size;
+   char*scratch_buf;
+   char*msg_buf;
+   const char  *errstr;
+   sector_tsector;
+   u64 logical;
+   struct btrfs_device *dev;
+   int msg_bufsize;
+   int scratch_bufsize;
+};
+
 static void scrub_free_csums(struct scrub_dev *sdev)
 {
while (!list_empty(sdev-csum_list)) {
@@ -195,6 +210,143 @@ nomem:
return ERR_PTR(-ENOMEM);
 }
 
+static int scrub_print_warning_inode(u64 inum, u64 offset, u64 root, void *ctx)
+{
+   u64 isize;
+   u32 nlink;
+   int ret;
+   int i;
+   struct extent_buffer *eb;
+   struct btrfs_inode_item *inode_item;
+   struct scrub_warning *swarn = ctx;
+   struct btrfs_fs_info *fs_info = swarn-dev-dev_root-fs_info;
+   struct inode_fs_paths *ipath = NULL;
+   struct btrfs_root *local_root;
+   struct btrfs_key root_key;
+
+   root_key.objectid = root;
+   root_key.type = BTRFS_ROOT_ITEM_KEY;
+   root_key.offset = (u64)-1;
+   local_root = btrfs_read_fs_root_no_name(fs_info, root_key);
+   if (IS_ERR(local_root)) {
+   ret = PTR_ERR(local_root);
+   goto err;
+   }
+
+   ret = inode_item_info(inum, 0, local_root, swarn-path);
+   if (ret) {
+   btrfs_release_path(swarn-path);
+   goto err;
+   }
+
+   eb = swarn-path-nodes[0];
+   inode_item = btrfs_item_ptr(eb, swarn-path-slots[0],
+   struct btrfs_inode_item);
+   isize = btrfs_inode_size(eb, inode_item);
+   nlink = btrfs_inode_nlink(eb, inode_item);
+   btrfs_release_path(swarn-path);
+
+   ipath = init_ipath(4096, local_root, swarn-path);
+   ret = paths_from_inode(inum, ipath);
+
+   if (ret  0)
+   goto err;
+
+   /*
+* we deliberately ignore the bit ipath might have been too small to
+* hold all of the paths here
+*/
+   for (i = 0; i  ipath-fspath-elem_cnt; ++i)
+   printk(KERN_WARNING btrfs: %s at logical %llu on dev 
+   %s, sector %llu, root %llu, inode %llu, offset %llu, 
+   length %llu, links %u (path: %s)\n, swarn-errstr,
+   swarn-logical, swarn-dev-name,
+   (unsigned long long)swarn-sector, root, inum, offset,
+   min(isize - offset, (u64)PAGE_SIZE), nlink,
+   ipath-fspath-str[i]);
+
+   free_ipath(ipath);
+   return 0;
+
+err:
+   printk(KERN_WARNING btrfs: %s at logical %llu on dev 
+   %s, sector %llu, root %llu, inode %llu, offset %llu: path 
+   resolving failed with ret=%d\n, swarn-errstr,
+   swarn-logical, swarn-dev-name,
+   (unsigned long long)swarn-sector, root, inum, offset, ret);
+
+   free_ipath(ipath);
+   return 0;
+}
+
+static void scrub_print_warning(const char *errstr, struct scrub_bio *sbio,
+   int ix)
+{
+   struct btrfs_device *dev = sbio-sdev-dev;
+   struct btrfs_fs_info *fs_info = dev-dev_root-fs_info;
+   struct btrfs_path *path;
+   struct btrfs_key found_key;
+   struct extent_buffer *eb;
+   struct btrfs_extent_item *ei;
+   struct scrub_warning swarn;
+   u32 item_size;
+   int ret;
+   u64 ref_root;
+   u8 ref_level;
+   unsigned long ptr = 0;
+   const int bufsize = 4096;
+   u64 extent_offset;
+
+   path = btrfs_alloc_path();
+
+   swarn.scratch_buf = kmalloc(bufsize, GFP_NOFS);
+   swarn.msg_buf = kmalloc(bufsize, GFP_NOFS);
+   swarn.sector = (sbio-physical + ix * PAGE_SIZE)  9;
+   swarn.logical = sbio-logical + ix * PAGE_SIZE;
+   swarn.errstr = errstr;
+   swarn.dev = dev;
+   swarn.msg_bufsize = bufsize;
+   swarn.scratch_bufsize = bufsize;
+
+ 

[PATCH v7 4/8] btrfs scrub: bugfix: mirror_num off by one

2011-07-22 Thread Jan Schmidt
Fix the mirror_num determination in scrub_stripe. The rest of the scrub code
did not use mirror_num for anything important and that error went unnoticed.
The nodatasum fixup patch of this set depends on a correct mirror_num.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/scrub.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 221fd5c..59caf8f 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -452,7 +452,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix)
 * first find a good copy
 */
for (i = 0; i  multi-num_stripes; ++i) {
-   if (i == sbio-spag[ix].mirror_num)
+   if (i + 1 == sbio-spag[ix].mirror_num)
continue;
 
if (scrub_fixup_io(READ, multi-stripes[i].dev-bdev,
@@ -930,21 +930,21 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_dev *sdev,
if (map-type  BTRFS_BLOCK_GROUP_RAID0) {
offset = map-stripe_len * num;
increment = map-stripe_len * map-num_stripes;
-   mirror_num = 0;
+   mirror_num = 1;
} else if (map-type  BTRFS_BLOCK_GROUP_RAID10) {
int factor = map-num_stripes / map-sub_stripes;
offset = map-stripe_len * (num / map-sub_stripes);
increment = map-stripe_len * factor;
-   mirror_num = num % map-sub_stripes;
+   mirror_num = num % map-sub_stripes + 1;
} else if (map-type  BTRFS_BLOCK_GROUP_RAID1) {
increment = map-stripe_len;
-   mirror_num = num % map-num_stripes;
+   mirror_num = num % map-num_stripes + 1;
} else if (map-type  BTRFS_BLOCK_GROUP_DUP) {
increment = map-stripe_len;
-   mirror_num = num % map-num_stripes;
+   mirror_num = num % map-num_stripes + 1;
} else {
increment = map-stripe_len;
-   mirror_num = 0;
+   mirror_num = 1;
}
 
path = btrfs_alloc_path();
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 0/8] Btrfs scrub: print path to corrupted files and trigger nodatasum fixup

2011-07-22 Thread Jan Schmidt
Please ignore v6, was sent while only half way through :-(

Changelog v6-v7:
- include everything that was stated to be in v6

Changelog v5-v6:
- fixed ioctl priviledge and input sanity checking (reported by Andi Kleen)

Original message follows:

This patch set introduces two new features for scrub. They share the backref
iteration code which is the reason they made it into the same patch set.

The first feature adds printk statements in case scrub finds an error which list
all affected files. You will need patch 1, 2 and 3 for that.

The second feature adds the trigger which enables us to correct i/o errors in
case the affected extent does not have a checksum (nodatasum), eventually. You
will need patch 1, 4, 5 and 6 for that.

I tried to apply all patches to the current cmason/for-linus branch and to
Arne's current for-chris branch. They do apply with no errors (some offsets
possible).

The new ioctl()s can be tested from usermode by applying the patch series
[PATCH v2 0/3] Btrfs-progs: add the first inspect-internal commands
from this mailing list to the user land tools.

Please review.

Next I'm starting to make up my mind how to implement on-the-fly error
correction correctly. This will enable us to rewrite good data whenever we
encounter a bad copy. I have some preliminary patches already, the stress in the
first sentence is on correctly. The second feature mentioned in this patch
series will then automatically use that code, too.

Changelog v1-v2:
- Various cleanup, sensible error codes as suggested by David Sterba

Changelog v2-v3:
- evaluation and iteration of shared refs
- support for in-tree refs (v2 iterated inline refs only)
- never call an interator function without releasing the path
- iterate_irefs now returns -ENOENT in case no refs are found
- some stupid bugs removed where release_path was called too early
- ioctls added to provide new functions to user mode
- bugfixes for cases where search_slot found the very end of a leaf
- bugfix: use right fs root for readpage instead of fs_root-fs_info
- based on current cmason/for-linus

Changelog v3-v4:
- fixed a regression with mirror_num that could prevent error correction
- based on current cmason/for-linus

Changelog v4-v5:
- fixed a deadlock when fixup is taking longer while scrub is about to end

Please try it and report errors (or confirm there are none, of course). I can
provide a place to pull from if anyone likes.

-Jan

Jan Schmidt (8):
  btrfs: added helper functions to iterate backrefs
  btrfs scrub: added unverified_errors
  btrfs scrub: print paths of corrupted files
  btrfs scrub: bugfix: mirror_num off by one
  btrfs: add mirror_num to extent_read_full_page
  btrfs scrub: use int for mirror_num, not u64
  btrfs scrub: add fixup code for errors on nodatasum files
  btrfs: new ioctls to do logical-inode and inode-path resolving

 fs/btrfs/Makefile|3 +-
 fs/btrfs/backref.c   |  748 ++
 fs/btrfs/backref.h   |   62 +
 fs/btrfs/disk-io.c   |2 +-
 fs/btrfs/extent_io.c |6 +-
 fs/btrfs/extent_io.h |3 +-
 fs/btrfs/inode.c |2 +-
 fs/btrfs/ioctl.c |  150 ++
 fs/btrfs/ioctl.h |   29 ++
 fs/btrfs/scrub.c |  414 +---
 10 files changed, 1379 insertions(+), 40 deletions(-)
 create mode 100644 fs/btrfs/backref.c
 create mode 100644 fs/btrfs/backref.h

-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 8/8] btrfs: new ioctls to do logical-inode and inode-path resolving

2011-07-22 Thread Jan Schmidt
these ioctls make use of the new functions initially added for scrub. they
return all inodes belonging to a logical address (BTRFS_IOC_LOGICAL_INO) and
all paths belonging to an inode (BTRFS_IOC_INO_PATHS).

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/ioctl.c |  150 ++
 fs/btrfs/ioctl.h |   19 +++
 2 files changed, 169 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a3c4751..798c8ed 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -51,6 +51,7 @@
 #include volumes.h
 #include locking.h
 #include inode-map.h
+#include backref.h
 
 /* Mask out flags that are inappropriate for the given type of inode. */
 static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
@@ -2836,6 +2837,151 @@ static long btrfs_ioctl_scrub_progress(struct 
btrfs_root *root,
return ret;
 }
 
+static long btrfs_ioctl_ino_to_path(struct btrfs_root *root, void __user *arg)
+{
+   int ret = 0;
+   int i;
+   unsigned long rel_ptr;
+   int size;
+   struct btrfs_ioctl_ino_path_args *ipa;
+   struct inode_fs_paths *ipath = NULL;
+   struct btrfs_path *path;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   path = btrfs_alloc_path();
+   if (!path) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   ipa = memdup_user(arg, sizeof(*ipa));
+   if (IS_ERR(ipa)) {
+   ret = PTR_ERR(ipa);
+   ipa = NULL;
+   goto out;
+   }
+
+   if (ipa-size = 0) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   size = min(ipa-size, 4096);
+   ipath = init_ipath(size, root, path);
+   if (IS_ERR(ipath)) {
+   ret = PTR_ERR(ipath);
+   ipath = NULL;
+   goto out;
+   }
+
+   ret = paths_from_inode(ipa-inum, ipath);
+   if (ret  0)
+   goto out;
+
+   for (i = 0; i  ipath-fspath-elem_cnt; ++i) {
+   rel_ptr = ipath-fspath-str[i] - (char *)ipath-fspath-str;
+   ipath-fspath-str[i] = (void *)rel_ptr;
+   }
+
+   ret = copy_to_user(ipa-fspath, ipath-fspath, size);
+   if (ret) {
+   ret = -EFAULT;
+   goto out;
+   }
+
+out:
+   btrfs_free_path(path);
+   free_ipath(ipath);
+   kfree(ipa);
+
+   return ret;
+}
+
+static int build_ino_list(u64 inum, u64 offset, u64 root, void *ctx)
+{
+   struct btrfs_data_container *inodes = ctx;
+
+   inodes-size -= 3 * sizeof(u64);
+   if (inodes-size  0) {
+   inodes-val[inodes-elem_cnt] = inum;
+   inodes-val[inodes-elem_cnt + 1] = offset;
+   inodes-val[inodes-elem_cnt + 2] = root;
+   inodes-elem_cnt += 3;
+   } else {
+   inodes-elem_missed += 3;
+   }
+
+   return 0;
+}
+
+static long btrfs_ioctl_logical_to_ino(struct btrfs_root *root,
+   void __user *arg)
+{
+   int ret = 0;
+   int size;
+   u64 extent_offset;
+   struct btrfs_ioctl_logical_ino_args *loi;
+   struct btrfs_data_container *inodes = NULL;
+   struct btrfs_path *path = NULL;
+   struct btrfs_key key;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   loi = memdup_user(arg, sizeof(*loi));
+   if (IS_ERR(loi)) {
+   ret = PTR_ERR(loi);
+   loi = NULL;
+   goto out;
+   }
+
+   if (loi-size = 0) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   path = btrfs_alloc_path();
+   if (!path) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   size = min(loi-size, 4096);
+   inodes = init_data_container(size);
+   if (IS_ERR(inodes)) {
+   ret = PTR_ERR(inodes);
+   inodes = NULL;
+   goto out;
+   }
+
+   ret = extent_from_logical(root-fs_info, loi-logical, path, key);
+
+   if (ret  BTRFS_EXTENT_FLAG_TREE_BLOCK)
+   ret = -ENOENT;
+   if (ret  0)
+   goto out;
+
+   extent_offset = loi-logical - key.objectid;
+   ret = iterate_extent_inodes(root-fs_info, path, key.objectid,
+   extent_offset, build_ino_list, inodes);
+
+   if (ret  0)
+   goto out;
+
+   ret = copy_to_user(loi-inodes, inodes, size);
+   if (ret)
+   ret = -EFAULT;
+
+out:
+   btrfs_free_path(path);
+   kfree(inodes);
+   kfree(loi);
+
+   return ret;
+}
+
 long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
 {
@@ -2893,6 +3039,10 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_tree_search(file, argp);
case BTRFS_IOC_INO_LOOKUP:
return btrfs_ioctl_ino_lookup(file, argp);
+   case 

[PATCH v7 3/8] btrfs scrub: print paths of corrupted files

2011-07-22 Thread Jan Schmidt
While scrubbing, we may encounter various errors. Previously, a logical
address was printed to the log only. Now, all paths belonging to that
address are resolved and printed separately. That should work for hardlinks
as well as reflinks.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/scrub.c |  169 --
 1 files changed, 163 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 35099fa..221fd5c 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -17,10 +17,12 @@
  */
 
 #include linux/blkdev.h
+#include linux/ratelimit.h
 #include ctree.h
 #include volumes.h
 #include disk-io.h
 #include ordered-data.h
+#include backref.h
 
 /*
  * This is only the first step towards a full-features scrub. It reads all
@@ -100,6 +102,19 @@ struct scrub_dev {
spinlock_t  stat_lock;
 };
 
+struct scrub_warning {
+   struct btrfs_path   *path;
+   u64 extent_item_size;
+   char*scratch_buf;
+   char*msg_buf;
+   const char  *errstr;
+   sector_tsector;
+   u64 logical;
+   struct btrfs_device *dev;
+   int msg_bufsize;
+   int scratch_bufsize;
+};
+
 static void scrub_free_csums(struct scrub_dev *sdev)
 {
while (!list_empty(sdev-csum_list)) {
@@ -195,6 +210,143 @@ nomem:
return ERR_PTR(-ENOMEM);
 }
 
+static int scrub_print_warning_inode(u64 inum, u64 offset, u64 root, void *ctx)
+{
+   u64 isize;
+   u32 nlink;
+   int ret;
+   int i;
+   struct extent_buffer *eb;
+   struct btrfs_inode_item *inode_item;
+   struct scrub_warning *swarn = ctx;
+   struct btrfs_fs_info *fs_info = swarn-dev-dev_root-fs_info;
+   struct inode_fs_paths *ipath = NULL;
+   struct btrfs_root *local_root;
+   struct btrfs_key root_key;
+
+   root_key.objectid = root;
+   root_key.type = BTRFS_ROOT_ITEM_KEY;
+   root_key.offset = (u64)-1;
+   local_root = btrfs_read_fs_root_no_name(fs_info, root_key);
+   if (IS_ERR(local_root)) {
+   ret = PTR_ERR(local_root);
+   goto err;
+   }
+
+   ret = inode_item_info(inum, 0, local_root, swarn-path);
+   if (ret) {
+   btrfs_release_path(swarn-path);
+   goto err;
+   }
+
+   eb = swarn-path-nodes[0];
+   inode_item = btrfs_item_ptr(eb, swarn-path-slots[0],
+   struct btrfs_inode_item);
+   isize = btrfs_inode_size(eb, inode_item);
+   nlink = btrfs_inode_nlink(eb, inode_item);
+   btrfs_release_path(swarn-path);
+
+   ipath = init_ipath(4096, local_root, swarn-path);
+   ret = paths_from_inode(inum, ipath);
+
+   if (ret  0)
+   goto err;
+
+   /*
+* we deliberately ignore the bit ipath might have been too small to
+* hold all of the paths here
+*/
+   for (i = 0; i  ipath-fspath-elem_cnt; ++i)
+   printk(KERN_WARNING btrfs: %s at logical %llu on dev 
+   %s, sector %llu, root %llu, inode %llu, offset %llu, 
+   length %llu, links %u (path: %s)\n, swarn-errstr,
+   swarn-logical, swarn-dev-name,
+   (unsigned long long)swarn-sector, root, inum, offset,
+   min(isize - offset, (u64)PAGE_SIZE), nlink,
+   ipath-fspath-str[i]);
+
+   free_ipath(ipath);
+   return 0;
+
+err:
+   printk(KERN_WARNING btrfs: %s at logical %llu on dev 
+   %s, sector %llu, root %llu, inode %llu, offset %llu: path 
+   resolving failed with ret=%d\n, swarn-errstr,
+   swarn-logical, swarn-dev-name,
+   (unsigned long long)swarn-sector, root, inum, offset, ret);
+
+   free_ipath(ipath);
+   return 0;
+}
+
+static void scrub_print_warning(const char *errstr, struct scrub_bio *sbio,
+   int ix)
+{
+   struct btrfs_device *dev = sbio-sdev-dev;
+   struct btrfs_fs_info *fs_info = dev-dev_root-fs_info;
+   struct btrfs_path *path;
+   struct btrfs_key found_key;
+   struct extent_buffer *eb;
+   struct btrfs_extent_item *ei;
+   struct scrub_warning swarn;
+   u32 item_size;
+   int ret;
+   u64 ref_root;
+   u8 ref_level;
+   unsigned long ptr = 0;
+   const int bufsize = 4096;
+   u64 extent_offset;
+
+   path = btrfs_alloc_path();
+
+   swarn.scratch_buf = kmalloc(bufsize, GFP_NOFS);
+   swarn.msg_buf = kmalloc(bufsize, GFP_NOFS);
+   swarn.sector = (sbio-physical + ix * PAGE_SIZE)  9;
+   swarn.logical = sbio-logical + ix * PAGE_SIZE;
+   swarn.errstr = errstr;
+   swarn.dev = dev;
+   swarn.msg_bufsize = bufsize;
+   swarn.scratch_bufsize = bufsize;
+
+ 

[PATCH v7 2/8] btrfs scrub: added unverified_errors

2011-07-22 Thread Jan Schmidt
In normal operation, scrub is reading data sequentially in large portions.
In case of an i/o error, we try to find the corrupted area(s) by issuing
page sized read requests. With this commit we increment the
unverified_errors counter if all of the small size requests succeed.

Userland patches carrying such conspicous events to the administrator should
already be around.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/scrub.c |   37 ++---
 1 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index a8d03d5..35099fa 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -201,18 +201,25 @@ nomem:
  * recheck_error gets called for every page in the bio, even though only
  * one may be bad
  */
-static void scrub_recheck_error(struct scrub_bio *sbio, int ix)
+static int scrub_recheck_error(struct scrub_bio *sbio, int ix)
 {
+   struct scrub_dev *sdev = sbio-sdev;
+   u64 sector = (sbio-physical + ix * PAGE_SIZE)  9;
+
if (sbio-err) {
-   if (scrub_fixup_io(READ, sbio-sdev-dev-bdev,
-  (sbio-physical + ix * PAGE_SIZE)  9,
+   if (scrub_fixup_io(READ, sbio-sdev-dev-bdev, sector,
   sbio-bio-bi_io_vec[ix].bv_page) == 0) {
if (scrub_fixup_check(sbio, ix) == 0)
-   return;
+   return 0;
}
}
 
+   spin_lock(sdev-stat_lock);
+   ++sdev-stat.read_errors;
+   spin_unlock(sdev-stat_lock);
+
scrub_fixup(sbio, ix);
+   return 1;
 }
 
 static int scrub_fixup_check(struct scrub_bio *sbio, int ix)
@@ -382,8 +389,14 @@ static void scrub_checksum(struct btrfs_work *work)
int ret;
 
if (sbio-err) {
+   ret = 0;
for (i = 0; i  sbio-count; ++i)
-   scrub_recheck_error(sbio, i);
+   ret |= scrub_recheck_error(sbio, i);
+   if (!ret) {
+   spin_lock(sdev-stat_lock);
+   ++sdev-stat.unverified_errors;
+   spin_unlock(sdev-stat_lock);
+   }
 
sbio-bio-bi_flags = ~(BIO_POOL_MASK - 1);
sbio-bio-bi_flags |= 1  BIO_UPTODATE;
@@ -396,10 +409,6 @@ static void scrub_checksum(struct btrfs_work *work)
bi-bv_offset = 0;
bi-bv_len = PAGE_SIZE;
}
-
-   spin_lock(sdev-stat_lock);
-   ++sdev-stat.read_errors;
-   spin_unlock(sdev-stat_lock);
goto out;
}
for (i = 0; i  sbio-count; ++i) {
@@ -420,8 +429,14 @@ static void scrub_checksum(struct btrfs_work *work)
WARN_ON(1);
}
kunmap_atomic(buffer, KM_USER0);
-   if (ret)
-   scrub_recheck_error(sbio, i);
+   if (ret) {
+   ret = scrub_recheck_error(sbio, i);
+   if (!ret) {
+   spin_lock(sdev-stat_lock);
+   ++sdev-stat.unverified_errors;
+   spin_unlock(sdev-stat_lock);
+   }
+   }
}
 
 out:
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 5/8] btrfs: add mirror_num to extent_read_full_page

2011-07-22 Thread Jan Schmidt
Currently, extent_read_full_page always assumes we are trying to read mirror
0, which generally is the best we can do. To add flexibility, pass it as a
parameter. This will be needed by scrub fixup code.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/disk-io.c   |2 +-
 fs/btrfs/extent_io.c |6 +++---
 fs/btrfs/extent_io.h |2 +-
 fs/btrfs/inode.c |2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1ac8db5d..b898319 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -874,7 +874,7 @@ static int btree_readpage(struct file *file, struct page 
*page)
 {
struct extent_io_tree *tree;
tree = BTRFS_I(page-mapping-host)-io_tree;
-   return extent_read_full_page(tree, page, btree_get_extent);
+   return extent_read_full_page(tree, page, btree_get_extent, 0);
 }
 
 static int btree_releasepage(struct page *page, gfp_t gfp_flags)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b181a94..b78f665 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2111,16 +2111,16 @@ static int __extent_read_full_page(struct 
extent_io_tree *tree,
 }
 
 int extent_read_full_page(struct extent_io_tree *tree, struct page *page,
-   get_extent_t *get_extent)
+   get_extent_t *get_extent, int mirror_num)
 {
struct bio *bio = NULL;
unsigned long bio_flags = 0;
int ret;
 
-   ret = __extent_read_full_page(tree, page, get_extent, bio, 0,
+   ret = __extent_read_full_page(tree, page, get_extent, bio, mirror_num,
  bio_flags);
if (bio)
-   ret = submit_one_bio(READ, bio, 0, bio_flags);
+   ret = submit_one_bio(READ, bio, mirror_num, bio_flags);
return ret;
 }
 
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index a11a92e..22bf366 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -177,7 +177,7 @@ int unlock_extent_cached(struct extent_io_tree *tree, u64 
start, u64 end,
 int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
gfp_t mask);
 int extent_read_full_page(struct extent_io_tree *tree, struct page *page,
- get_extent_t *get_extent);
+ get_extent_t *get_extent, int mirror_num);
 int __init extent_io_init(void);
 void extent_io_exit(void);
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4a13730..730ee3d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6250,7 +6250,7 @@ int btrfs_readpage(struct file *file, struct page *page)
 {
struct extent_io_tree *tree;
tree = BTRFS_I(page-mapping-host)-io_tree;
-   return extent_read_full_page(tree, page, btrfs_get_extent);
+   return extent_read_full_page(tree, page, btrfs_get_extent, 0);
 }
 
 static int btrfs_writepage(struct page *page, struct writeback_control *wbc)
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 7/8] btrfs scrub: add fixup code for errors on nodatasum files

2011-07-22 Thread Jan Schmidt
This removes a FIXME comment and introduces the first part of nodatasum
fixup: It gets the corresponding inode for a logical address and triggers a
regular readpage for the corrupted sector.

Once we have on-the-fly error correction our error will be automatically
corrected. The correction code is expected to clear the newly introduced
EXTENT_DAMAGED flag, making scrub report that error as corrected instead
of uncorrectable eventually.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/extent_io.h |1 +
 fs/btrfs/scrub.c |  188 --
 2 files changed, 183 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 22bf366..2734fd9 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -17,6 +17,7 @@
 #define EXTENT_NODATASUM (1  10)
 #define EXTENT_DO_ACCOUNTING (1  11)
 #define EXTENT_FIRST_DELALLOC (1  12)
+#define EXTENT_DAMAGED (1  13)
 #define EXTENT_IOBITS (EXTENT_LOCKED | EXTENT_WRITEBACK)
 #define EXTENT_CTLBITS (EXTENT_DO_ACCOUNTING | EXTENT_FIRST_DELALLOC)
 
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 41a0114..db09f01 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -22,6 +22,7 @@
 #include volumes.h
 #include disk-io.h
 #include ordered-data.h
+#include transaction.h
 #include backref.h
 
 /*
@@ -89,6 +90,7 @@ struct scrub_dev {
int first_free;
int curr;
atomic_tin_flight;
+   atomic_tfixup_cnt;
spinlock_t  list_lock;
wait_queue_head_t   list_wait;
u16 csum_size;
@@ -102,6 +104,14 @@ struct scrub_dev {
spinlock_t  stat_lock;
 };
 
+struct scrub_fixup_nodatasum {
+   struct scrub_dev*sdev;
+   u64 logical;
+   struct btrfs_root   *root;
+   struct btrfs_work   work;
+   int mirror_num;
+};
+
 struct scrub_warning {
struct btrfs_path   *path;
u64 extent_item_size;
@@ -190,12 +200,13 @@ struct scrub_dev *scrub_setup_dev(struct btrfs_device 
*dev)
 
if (i != SCRUB_BIOS_PER_DEV-1)
sdev-bios[i]-next_free = i + 1;
-else
+   else
sdev-bios[i]-next_free = -1;
}
sdev-first_free = 0;
sdev-curr = -1;
atomic_set(sdev-in_flight, 0);
+   atomic_set(sdev-fixup_cnt, 0);
atomic_set(sdev-cancel_req, 0);
sdev-csum_size = btrfs_super_csum_size(fs_info-super_copy);
INIT_LIST_HEAD(sdev-csum_list);
@@ -347,6 +358,151 @@ out:
kfree(swarn.msg_buf);
 }
 
+static int scrub_fixup_readpage(u64 inum, u64 offset, u64 root, void *ctx)
+{
+   struct page *page;
+   unsigned long index;
+   struct scrub_fixup_nodatasum *fixup = ctx;
+   int ret;
+   int corrected;
+   struct btrfs_key key;
+   struct inode *inode;
+   u64 end = offset + PAGE_SIZE - 1;
+   struct btrfs_root *local_root;
+
+   key.objectid = root;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = (u64)-1;
+   local_root = btrfs_read_fs_root_no_name(fixup-root-fs_info, key);
+   if (IS_ERR(local_root))
+   return PTR_ERR(local_root);
+
+   key.type = BTRFS_INODE_ITEM_KEY;
+   key.objectid = inum;
+   key.offset = 0;
+   inode = btrfs_iget(fixup-root-fs_info-sb, key, local_root, NULL);
+   if (IS_ERR(inode))
+   return PTR_ERR(inode);
+
+   ret = set_extent_bit(BTRFS_I(inode)-io_tree, offset, end,
+   EXTENT_DAMAGED, 0, NULL, NULL, GFP_NOFS);
+
+   /* set_extent_bit should either succeed or give proper error */
+   WARN_ON(ret  0);
+   if (ret)
+   return ret  0 ? ret : -EFAULT;
+
+   index = offset  PAGE_CACHE_SHIFT;
+
+   page = find_or_create_page(inode-i_mapping, index, GFP_NOFS);
+   if (!page)
+   return -ENOMEM;
+
+   ret = extent_read_full_page(BTRFS_I(inode)-io_tree, page,
+   btrfs_get_extent, fixup-mirror_num);
+   wait_on_page_locked(page);
+   corrected = !test_range_bit(BTRFS_I(inode)-io_tree, offset, end,
+   EXTENT_DAMAGED, 0, NULL);
+
+   if (corrected)
+   WARN_ON(!PageUptodate(page));
+   else
+   clear_extent_bit(BTRFS_I(inode)-io_tree, offset, end,
+   EXTENT_DAMAGED, 0, 0, NULL, GFP_NOFS);
+
+   put_page(page);
+   iput(inode);
+
+   if (ret  0)
+   return ret;
+
+   if (ret == 0  corrected) {
+   /*
+* we only need to call readpage for one of the inodes belonging
+* to this extent. so make iterate_extent_inodes stop
+*/
+   return 1;
+   }

[PATCH v7 1/8] btrfs: added helper functions to iterate backrefs

2011-07-22 Thread Jan Schmidt
These helper functions iterate back references and call a function for each
backref. There is also a function to resolve an inode to a path in the
file system.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/Makefile  |3 +-
 fs/btrfs/backref.c |  748 
 fs/btrfs/backref.h |   62 +
 fs/btrfs/ioctl.h   |   10 +
 4 files changed, 822 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 9b72dcf..c63f649 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -7,4 +7,5 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
root-tree.o dir-item.o \
   extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
   export.o tree-log.o acl.o free-space-cache.o zlib.o lzo.o \
-  compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o
+  compression.o delayed-ref.o relocation.o delayed-inode.o backref.o \
+  scrub.o
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
new file mode 100644
index 000..477f154
--- /dev/null
+++ b/fs/btrfs/backref.c
@@ -0,0 +1,748 @@
+/*
+ * Copyright (C) 2011 STRATO.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include ctree.h
+#include disk-io.h
+#include backref.h
+
+struct __data_ref {
+   struct list_head list;
+   u64 inum;
+   u64 root;
+   u64 extent_data_item_offset;
+};
+
+struct __shared_ref {
+   struct list_head list;
+   u64 disk_byte;
+};
+
+static int __inode_info(u64 inum, u64 ioff, u8 key_type,
+   struct btrfs_root *fs_root, struct btrfs_path *path,
+   struct btrfs_key *found_key)
+{
+   int ret;
+   struct btrfs_key key;
+   struct extent_buffer *eb;
+
+   key.type = key_type;
+   key.objectid = inum;
+   key.offset = ioff;
+
+   ret = btrfs_search_slot(NULL, fs_root, key, path, 0, 0);
+   if (ret  0)
+   return ret;
+
+   eb = path-nodes[0];
+   if (ret  path-slots[0] = btrfs_header_nritems(eb)) {
+   ret = btrfs_next_leaf(fs_root, path);
+   if (ret)
+   return ret;
+   eb = path-nodes[0];
+   }
+
+   btrfs_item_key_to_cpu(eb, found_key, path-slots[0]);
+   if (found_key-type != key.type || found_key-objectid != key.objectid)
+   return 1;
+
+   return 0;
+}
+
+/*
+ * this makes the path point to (inum INODE_ITEM ioff)
+ */
+int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
+   struct btrfs_path *path)
+{
+   struct btrfs_key key;
+   return __inode_info(inum, ioff, BTRFS_INODE_ITEM_KEY, fs_root, path,
+   key);
+}
+
+static int inode_ref_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
+   struct btrfs_path *path, int strict,
+   u64 *out_parent_inum,
+   struct extent_buffer **out_iref_eb,
+   int *out_slot)
+{
+   int ret;
+   struct btrfs_key found_key;
+
+   ret = __inode_info(inum, ioff, BTRFS_INODE_REF_KEY, fs_root, path,
+   found_key);
+
+   if (!ret) {
+   if (out_slot)
+   *out_slot = path-slots[0];
+   if (out_iref_eb)
+   *out_iref_eb = path-nodes[0];
+   if (out_parent_inum)
+   *out_parent_inum = found_key.offset;
+   }
+
+   btrfs_release_path(path);
+   return ret;
+}
+
+/*
+ * this iterates to turn a btrfs_inode_ref into a full filesystem path. 
elements
+ * of the path are separated by '/' and the path is guaranteed to be
+ * 0-terminated. the path is only given within the current file system.
+ * Therefore, it never starts with a '/'. the caller is responsible to provide
+ * size bytes in dest. the dest buffer will be filled backwards. finally,
+ * the start point of the resulting string is returned. this pointer is within
+ * dest, normally.
+ * in case the path buffer would overflow, the pointer is decremented further
+ * as if output was written to the buffer, though no more output is actually
+ * generated. that way, the caller 

Re: Broken btrfs?

2011-07-22 Thread Jan Schubert
On 07/22/2011 09:24 AM, Jan Schmidt wrote:
 Scrub should be printing inode numbers to your system log while
 detecting those errors. If you want to know the exact files corrupted,
 you can grab my patch set with subject Btrfs scrub: print path to
 corrupted files and trigger nodatasum fixup from the list and give it
 a try.

Cool Jan, this is exactly what I asked for in my original post.

Your patch set is against kernel sources (not btrfs-progs), right? I
took the opportunity to upgrade to official 3.0 where your patch applied
and compiled without any issues. I also did recompile
btrfs-progs-unstable and run a scrub.

This scrub completed without any errors:
# btrfs scrub status .
scrub status for 03201fc0-7695-4468-9a10-f61ad79f23ca
scrub started at Fri Jul 22 14:24:21 2011, running for 706 seconds
total bytes scrubbed: 158.01GB with 0 errors

Is'nt this strange? This message is generated after rebooting the box
(due to a crash, see below), I remember to have seen some more
information before the crash but also 0 errors.

While doing the scrub I still did see csum errors in my dmesg but no
files associated:

Jul 22 14:17:50 toral kernel: btrfs no csum found for inode 199934 start
729088
Jul 22 14:17:50 toral kernel: btrfs csum failed ino 199934 off 729088
csum 3390946210 private 0
Jul 22 14:17:51 toral kernel: btrfs no csum found for inode 199934 start
24096768
Jul 22 14:17:51 toral kernel: btrfs csum failed ino 199934 off 24096768
csum 439962552 private 0
Jul 22 14:17:51 toral kernel: btrfs no csum found for inode 199934 start
24801280
Jul 22 14:17:51 toral kernel: btrfs no csum found for inode 199934 start
24805376
Jul 22 14:17:51 toral kernel: btrfs csum failed ino 199934 off 24801280
csum 158010657 private 0
Jul 22 14:17:51 toral kernel: btrfs csum failed ino 199934 off 24805376
csum 127231121 private 0

And sorry to say, it also crashed my box throwing a kernel expception
and a reference to somtehing like scrub_print_warning_inode (or similar)
which I could not find after rebooting my box. Seems my kernel.log and
all others logs are empty for the last 30min, Sry.

What is the most current btrfs-progs git branch to use for further
investigation?

Thx,
Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/16] Btrfs: clean up code for extent_map lookup

2011-07-22 Thread David Sterba
On Thu, Jul 14, 2011 at 11:18:15AM +0800, Li Zefan wrote:
 lookup_extent_map() and search_extent_map() can share most of code.
 
 Signed-off-by: Li Zefan l...@cn.fujitsu.com
 ---
  fs/btrfs/extent_map.c |   85 
 +
  1 files changed, 29 insertions(+), 56 deletions(-)
 
 diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
 index 911a9db..df7a803 100644
 --- a/fs/btrfs/extent_map.c
 +++ b/fs/btrfs/extent_map.c
 @@ -299,19 +299,8 @@ static u64 range_end(u64 start, u64 len)
   return start + len;
  }
  
 -/**
 - * lookup_extent_mapping - lookup extent_map
 - * @tree:tree to lookup in
 - * @start:   byte offset to start the search
 - * @len: length of the lookup range
 - *
 - * Find and return the first extent_map struct in @tree that intersects the
 - * [start, len] range.  There may be additional objects in the tree that
 - * intersect, so check the object returned carefully to make sure that no
 - * additional lookups are needed.
 - */
 -struct extent_map *lookup_extent_mapping(struct extent_map_tree *tree,
 -  u64 start, u64 len)
 +struct extent_map *__lookup_extent_mapping(struct extent_map_tree *tree,
 +u64 start, u64 len, int strict)

just minor thing: can be defined static

  {
   struct extent_map *em;
   struct rb_node *rb_node;
 @@ -320,38 +309,42 @@ struct extent_map *lookup_extent_mapping(struct 
 extent_map_tree *tree,
   u64 end = range_end(start, len);
  
   rb_node = __tree_search(tree-map, start, prev, next);
 - if (!rb_node  prev) {
 - em = rb_entry(prev, struct extent_map, rb_node);
 - if (end  em-start  start  extent_map_end(em))
 - goto found;
 - }
 - if (!rb_node  next) {
 - em = rb_entry(next, struct extent_map, rb_node);
 - if (end  em-start  start  extent_map_end(em))
 - goto found;
 - }
   if (!rb_node) {
 - em = NULL;
 - goto out;
 - }
 - if (IS_ERR(rb_node)) {
 - em = ERR_CAST(rb_node);
 - goto out;
 + if (prev)
 + rb_node = prev;
 + else if (next)
 + rb_node = next;
 + else
 + return NULL;
   }
 +
   em = rb_entry(rb_node, struct extent_map, rb_node);
 - if (end  em-start  start  extent_map_end(em))
 - goto found;
  
 - em = NULL;
 - goto out;
 + if (strict  !(end  em-start  start  extent_map_end(em)))
 + return NULL;
  
 -found:
   atomic_inc(em-refs);
 -out:
   return em;
  }
  
  /**
 + * lookup_extent_mapping - lookup extent_map
 + * @tree:tree to lookup in
 + * @start:   byte offset to start the search
 + * @len: length of the lookup range
 + *
 + * Find and return the first extent_map struct in @tree that intersects the
 + * [start, len] range.  There may be additional objects in the tree that
 + * intersect, so check the object returned carefully to make sure that no
 + * additional lookups are needed.
 + */
 +struct extent_map *lookup_extent_mapping(struct extent_map_tree *tree,
 +  u64 start, u64 len)
 +{
 + return __lookup_extent_mapping(tree, start, len, 1);
 +}
 +
 +/**
   * search_extent_mapping - find a nearby extent map
   * @tree:tree to lookup in
   * @start:   byte offset to start the search
 @@ -365,27 +358,7 @@ out:
  struct extent_map *search_extent_mapping(struct extent_map_tree *tree,
u64 start, u64 len)
  {
 - struct extent_map *em;
 - struct rb_node *rb_node;
 - struct rb_node *prev = NULL;
 - struct rb_node *next = NULL;
 -
 - rb_node = __tree_search(tree-map, start, prev, next);
 - if (!rb_node  prev) {
 - em = rb_entry(prev, struct extent_map, rb_node);
 - goto found;
 - }
 - if (!rb_node  next) {
 - em = rb_entry(next, struct extent_map, rb_node);
 - goto found;
 - }
 - if (!rb_node)
 - return NULL;
 -
 - em = rb_entry(rb_node, struct extent_map, rb_node);
 -found:
 - atomic_inc(em-refs);
 - return em;
 + return __lookup_extent_mapping(tree, start, len, 0);
  }
  
  /**
 -- 
 1.7.3.1
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 0/4] btrfs: Suggestion for raid auto-repair

2011-07-22 Thread Jan Schmidt
Hi all!

This is my suggestion how to do on the fly repair for corrupted raid setups. 
Currently, btrfs can cope with a hardware failure in a way that it tries to
find another mirror and ... that's it. The bad mirror always stays bad and your
data is lost when the last copy vanishes.

Here is where I got on my way changing this. I built upon the retry code
originally used for data (inode.c), moved it to a more central place
(extent_io.c) and made it repair errors when possible. Those two steps are
currently inlcuded in patch 4, because what I actually did was somewhat more
iterative. If it helps reviewing, I can try to split that up in a move-commit
and a change-commit - just tell me you'd like this.

To test this, I made some bad sectors with hdparm (data and metadata) and had
them corrected while reading the affected data. Anyway, this patch touches
critical parts and can potentially screw up your data, in case i have an error
in determination of the destination for corrective writes. You have been warned!
But please, try it anyway :-)

One remark concerning scrub: My latest scrub patches include a change that
triggers a regular page read to correct some kind of errors. This code is meant
to end up exactly in the error correction routines added here, too.

There are some special cases (nodatasum and a certain state of page cache) where
scrub comes across an error that it reports as incorrectable, which it isn't. I
have a patch for that as well, but as it is only relevant when you combine those
two patch series, I did not include it.
 
-Jan

Jan Schmidt (4):
  btrfs: btrfs_multi_bio replaced with btrfs_bio
  btrfs: Do not use bio-bi_bdev after submission
  btrfs: Put mirror_num in bi_bdev
  btrfs: Moved repair code from inode.c to extent_io.c

 fs/btrfs/extent-tree.c |   10 +-
 fs/btrfs/extent_io.c   |  386 +++-
 fs/btrfs/extent_io.h   |   11 ++-
 fs/btrfs/inode.c   |  155 +---
 fs/btrfs/scrub.c   |   20 ++--
 fs/btrfs/volumes.c |  130 +
 fs/btrfs/volumes.h |   10 +-
 7 files changed, 485 insertions(+), 237 deletions(-)

-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/4] btrfs: Do not use bio-bi_bdev after submission

2011-07-22 Thread Jan Schmidt
The block layer modifies bio-bi_bdev and bio-bi_sector while working on
the bio, they do _not_ come back unmodified in the completion callback.

To call add_page, we need at least some bi_bdev set, which is why the code
was working, previously. With this patch, we use the latest_bdev from
fsinfo instead of the leftover in the bio. This gives us the possibility to
use the bi_bdev field for another purpose.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/inode.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4a13730..6ec7a93 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1916,7 +1916,7 @@ static int btrfs_io_failed_hook(struct bio *failed_bio,
bio-bi_private = state;
bio-bi_end_io = failed_bio-bi_end_io;
bio-bi_sector = failrec-logical  9;
-   bio-bi_bdev = failed_bio-bi_bdev;
+   bio-bi_bdev = BTRFS_I(inode)-root-fs_info-fs_devices-latest_bdev;
bio-bi_size = 0;
 
bio_add_page(bio, page, failrec-len, start - page_offset(page));
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/4] btrfs: btrfs_multi_bio replaced with btrfs_bio

2011-07-22 Thread Jan Schmidt
btrfs_bio is a bio abstraction able to split and not complete after the last
bio has returned (like the old btrfs_multi_bio). Additionally, btrfs_bio
tracks the mirror_num used to read data which can be used for error
correction purposes.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/extent-tree.c |   10 ++--
 fs/btrfs/scrub.c   |   20 
 fs/btrfs/volumes.c |  128 +--
 fs/btrfs/volumes.h |   10 +++-
 4 files changed, 90 insertions(+), 78 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 71cd456..351efb3 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1772,18 +1772,18 @@ static int btrfs_discard_extent(struct btrfs_root 
*root, u64 bytenr,
 {
int ret;
u64 discarded_bytes = 0;
-   struct btrfs_multi_bio *multi = NULL;
+   struct btrfs_bio *bbio = NULL;
 
 
/* Tell the block device(s) that the sectors can be discarded */
ret = btrfs_map_block(root-fs_info-mapping_tree, REQ_DISCARD,
- bytenr, num_bytes, multi, 0);
+ bytenr, num_bytes, bbio, 0);
if (!ret) {
-   struct btrfs_bio_stripe *stripe = multi-stripes;
+   struct btrfs_bio_stripe *stripe = bbio-stripes;
int i;
 
 
-   for (i = 0; i  multi-num_stripes; i++, stripe++) {
+   for (i = 0; i  bbio-num_stripes; i++, stripe++) {
ret = btrfs_issue_discard(stripe-dev-bdev,
  stripe-physical,
  stripe-length);
@@ -1792,7 +1792,7 @@ static int btrfs_discard_extent(struct btrfs_root *root, 
u64 bytenr,
else if (ret != -EOPNOTSUPP)
break;
}
-   kfree(multi);
+   kfree(bbio);
}
if (discarded_bytes  ret == -EOPNOTSUPP)
ret = 0;
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index a8d03d5..c04775e 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -250,7 +250,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix)
struct scrub_dev *sdev = sbio-sdev;
struct btrfs_fs_info *fs_info = sdev-dev-dev_root-fs_info;
struct btrfs_mapping_tree *map_tree = fs_info-mapping_tree;
-   struct btrfs_multi_bio *multi = NULL;
+   struct btrfs_bio *bbio = NULL;
u64 logical = sbio-logical + ix * PAGE_SIZE;
u64 length;
int i;
@@ -269,8 +269,8 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix)
 
length = PAGE_SIZE;
ret = btrfs_map_block(map_tree, REQ_WRITE, logical, length,
- multi, 0);
-   if (ret || !multi || length  PAGE_SIZE) {
+ bbio, 0);
+   if (ret || !bbio || length  PAGE_SIZE) {
printk(KERN_ERR
   scrub_fixup: btrfs_map_block failed us for %llu\n,
   (unsigned long long)logical);
@@ -278,19 +278,19 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix)
return;
}
 
-   if (multi-num_stripes == 1)
+   if (bbio-num_stripes == 1)
/* there aren't any replicas */
goto uncorrectable;
 
/*
 * first find a good copy
 */
-   for (i = 0; i  multi-num_stripes; ++i) {
+   for (i = 0; i  bbio-num_stripes; ++i) {
if (i == sbio-spag[ix].mirror_num)
continue;
 
-   if (scrub_fixup_io(READ, multi-stripes[i].dev-bdev,
-  multi-stripes[i].physical  9,
+   if (scrub_fixup_io(READ, bbio-stripes[i].dev-bdev,
+  bbio-stripes[i].physical  9,
   sbio-bio-bi_io_vec[ix].bv_page)) {
/* I/O-error, this is not a good copy */
continue;
@@ -299,7 +299,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix)
if (scrub_fixup_check(sbio, ix) == 0)
break;
}
-   if (i == multi-num_stripes)
+   if (i == bbio-num_stripes)
goto uncorrectable;
 
if (!sdev-readonly) {
@@ -314,7 +314,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix)
}
}
 
-   kfree(multi);
+   kfree(bbio);
spin_lock(sdev-stat_lock);
++sdev-stat.corrected_errors;
spin_unlock(sdev-stat_lock);
@@ -325,7 +325,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix)
return;
 
 uncorrectable:
-   kfree(multi);
+   kfree(bbio);
spin_lock(sdev-stat_lock);
++sdev-stat.uncorrectable_errors;
spin_unlock(sdev-stat_lock);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 19450bc..e839b72 100644
--- a/fs/btrfs/volumes.c

[RFC PATCH 3/4] btrfs: Put mirror_num in bi_bdev

2011-07-22 Thread Jan Schmidt
The error correction code wants to make sure that only the bad mirror is
rewritten. Thus, we need to know which mirror is the bad one. I did not
find a more apropriate field than bi_bdev. But I think using this is fine,
because it is modified by the block layer, anyway, and should not be read
after the bio returned.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/volumes.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e839b72..55fbd4d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3169,6 +3169,8 @@ static void btrfs_end_bio(struct bio *bio, int err)
}
bio-bi_private = bbio-private;
bio-bi_end_io = bbio-end_io;
+   bio-bi_bdev = (struct block_device *)
+   (unsigned long)bbio-mirror_num;
/* only send an error to the higher layers if it is
 * beyond the tolerance of the multi-bio
 */
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


rw_semaphore performance, was: new metadata reader/writer locks in integration-test

2011-07-22 Thread Christoph Hellwig
On Tue, Jul 19, 2011 at 01:30:22PM -0400, Chris Mason wrote:
 We've seen a number of benchmarks dominated by contention on the root
 node lock.  This changes our locks into a simple reader/writer lock.
 They are based on mutexes so that we still take advantage of the mutex
 adaptive spins for write locks (rwsemaphores were much slower).

Interesting.  Do you have set up some artifical benchmarks for this?

I wonder if the lack of adaptive spinning has something to do with the
slightly slower XFS performance on Joern's flash testing, given that
we extensively use the rw_semaphore as the primary I/O mutex, while
all others rely on plain mutexes as the primary synchronization
primitive.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rw_semaphore performance, was: new metadata reader/writer locks in integration-test

2011-07-22 Thread Chris Mason
Excerpts from Christoph Hellwig's message of 2011-07-22 11:01:51 -0400:
 On Tue, Jul 19, 2011 at 01:30:22PM -0400, Chris Mason wrote:
  We've seen a number of benchmarks dominated by contention on the root
  node lock.  This changes our locks into a simple reader/writer lock.
  They are based on mutexes so that we still take advantage of the mutex
  adaptive spins for write locks (rwsemaphores were much slower).
 
 Interesting.  Do you have set up some artifical benchmarks for this?
 
 I wonder if the lack of adaptive spinning has something to do with the
 slightly slower XFS performance on Joern's flash testing, given that
 we extensively use the rw_semaphore as the primary I/O mutex, while
 all others rely on plain mutexes as the primary synchronization
 primitive.

For the rw locks I had three main tests.

1) dbench 10.  This is interesting only because it is mostly bound by how
quickly we can do metadata operations in ram.  There's not much IO and
there's a good mixture of read and write btree operations (about 50/50).
rwsemaphores ran at 200MB/s while my current code runs at 2400MB/s.

The old btrfs implementation runs at 3000MB/s.  We all love and hate
dbench, so I don't put a huge amount of stock in 2400 vs 3000.  But, 200
vs 2400...people notice that in real world stuff.

2) fs_mark doing parallel zero byte file creates.  No fsyncs here, all
metadata operations.  The old btrfs locking was completely bound by
getting write locks on the root node.  The new code is much better here,
overall about 30-50% faster.  I didn't do the rw semaphores on this one,
I'll give it a shot.

3) A stat-hammer program.  This creates a bunch of files in parallel,
and then times how long it takes us to stat all the inodes.  I went from
3s of CPU time down to .9s.  rwsems were about the same here (very
fast), but that's because it's 100% reader locks.

My money for Joern's benchmarks is end-io latencies.  xfs and btrfs are
doing more at endio time.  But I need to sit down and run them myself
and take a look.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: new metadata reader/writer locks in integration-test

2011-07-22 Thread Chris Mason
On Wed, Jul 20, 2011 at 05:36:09PM +0900, Tsutomu Itoh wrote:
 (2011/07/20 16:58), Chris Mason wrote:
  Excerpts from Tsutomu Itoh's message of 2011-07-19 22:08:38 -0400:
  (2011/07/20 2:30), Chris Mason wrote:
  Hi everyone,
 
  I've pushed out a new integration-test branch, and it includes a new
  reader/writer locking scheme for the btree locks.
 
  We've seen a number of benchmarks dominated by contention on the root
  node lock.  This changes our locks into a simple reader/writer lock.
  They are based on mutexes so that we still take advantage of the mutex
  adaptive spins for write locks (rwsemaphores were much slower).
 
  I'm also sending the individual commits, please do take a look.
 
  I pulled the new integration-test branch, and I got the following
  warning messages.
 
  Jul 20 10:03:30 luna kernel: [ cut here ]
  Jul 20 10:03:30 luna kernel: WARNING: at fs/btrfs/extent-tree.c:5704 
  btrfs_alloc_free_block+0x178/0x340 [btrfs]()
  
  Thanks, I think this one is related to Josef's enospc changes, but I'll
  double check.  
 
 What was the test?
 
 I ran my original test script. 
 This script concurrently executes the making deletion of a lot of files,
 and the making deletion of a big file, etc. 

I'm having a hard time triggering this with Josef's current patch (after
my rebase).

Could you please send along the reproduction script?

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: new metadata reader/writer locks in integration-test

2011-07-22 Thread Arne Jansen
On 21.07.2011 07:44, Arne Jansen wrote:
 On 20.07.2011 19:21, Chris Mason wrote:
 Excerpts from Chris Mason's message of 2011-07-19 13:30:22 -0400:
 Hi everyone,

 I've pushed out a new integration-test branch, and it includes a new
 reader/writer locking scheme for the btree locks.

 We've seen a number of benchmarks dominated by contention on the root
 node lock.  This changes our locks into a simple reader/writer lock.
 They are based on mutexes so that we still take advantage of the mutex
 adaptive spins for write locks (rwsemaphores were much slower).

 I'm also sending the individual commits, please do take a look.

 Hi everyone,

 I just rebased Josef's enospc fixes into integration-test, it should fix
 the warnings in extent-tree.c

 
 With the current integration-test branch I get very early enospc on
 a 7G volume create with -m single -d single and
 
 fs_mark-3.3/fs_mark -d /mnt/fsm -D 512 -t 16 -n 4096 -s 51200 -L5 -S0 -R1
 
 It enospces at about 20%, but I can continue to fill it up to 94%.

I tried to bisect this, but it turned out to be hard. Sooner or later
I get this early enospc on every revision, on some sooner, on others
later. At least the current for-linus branch is much worse than
integration-test.

 
 -Arne
 --
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Issues with KVM

2011-07-22 Thread Victor Stinner

 Hi,

I have a new fast computer to run many virtual machines. Everything 
looks very fast, except the installation of new operating systems in 
KVM. The installation is very fast until it begins to write on disk. It 
looks like it writes slower and slower. I tried Debian, FreeBSD, 
OpenIndiana and OpenBSD: same problem. The FreeBSD installer displays 
the speed: it starts at 780 KB/sec (which it already very slow) to 
finish between 1 and 8 KB/sec.


darksatanic suggested me to use nodatacow mount flag: it is not faster, 
and it looks even slower (fewer wsec/s in iostat output, see below).


The computer is an Intel i7 2600 (4 cores with hyper threading: 8 
threads), 12 GB or RAM, 2 hard drives of 1 TB (Western Digital Caviar 
Blue 1 To 7200 RPM 32 Mo Serial ATA 6Gb/s - WD10EALX). Both disks are 
connected to SATA 6 GB/sec connectors using a P67 chipset. I'm using 
RAID 0 with Linux software (MD) RAID, and I have one unique btrfs 
partition of 2 TB. The host OS is Fedora 15 (Linux kernel 2.6.38).


I'm using hardware virtualisation with KVM. Debian is installed using 
virtio, so it should not be an issue with the hard drive driver of KVM.


I'm watching iostat during the Debian installation. With the default 
mount option, wsec/s starts at 49000 to finish near 42000 (on the MD 
device), %usage is greater than 50% of both disks (near 80% for sda, 
near 60% for sdb). Using nodatacow option, wsec/s starts at 12000 
(%usage  75%) to finish near 1 (%usage always  75%). It is slower, 
right? A sector is 512 bytes. The Debian image size is 40 GB, its type 
is raw. The system load is greater than 2 (or maybe 3) during the 
installation of the VM, while CPU usage is under 8% and wa% is also low 
(maybe 10% or lower, I don't remember).


bonnie++ output (on the Fedora host, not in a VM):

Version  1.96   --Sequential Output-- --Sequential Input- 
--Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
ned  24048M   346  98 220839  24 98489  19   245  84 251547  18 
199.2 259
Latency 37256us 326ms 943ms 251ms 197ms 
151ms
Version  1.96   --Sequential Create-- Random 
Create
ned -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
/sec %CP
 16 11128  34 + +++ 16558  45 14006  40 + +++ 
17226  48
Latency 14997us 663us   11401us8115us 282us 
10105us

1.96,1.96,ned,1,1311365747,24048M,,346,98,220839,24,98489,19,245,84,251547,18,199.2,259,16,11128,34,+,+++,16558,45,14006,40,+,+++,17226,48,37256us,326ms,943ms,251ms,197ms,151ms,14997us,663us,11401us,8115us,282us,10105us

Do you have any idea why the %usage is so high (in iostat), while the 
speed looks so low? The disk speed during the installation is between 5 
MB/sec and 23 MB/sec, whereas the raw speed is greater than 200 MB/sec 
on the host (234 MB/sec according to hdparm -t /dev/md127, 220 MB/sec 
according to bonnie++ on sequential output).


It's difficult to read bonnie++ output: random create is near 14 MB/sec 
if I read correctly. btrfs behaves maybe very badly with a raw image of 
40 GB, especially on RAID 0 with 2 disks?


Should I try other KVM option (e.g. use another type of image)? Try 
btrfs RAID instead of Linux MD RAID? Try to disable some CPU cores? Or 
maybe not using btrfs for KVM images? :-)


Victor


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issues with KVM

2011-07-22 Thread C Anthony Risinger
On Fri, Jul 22, 2011 at 2:44 PM, Victor Stinner
victor.stin...@haypocalc.com wrote:
  Hi,

 I have a new fast computer to run many virtual machines. Everything looks
 very fast, except the installation of new operating systems in KVM. The
 installation is very fast until it begins to write on disk. It looks like it
 writes slower and slower. I tried Debian, FreeBSD, OpenIndiana and OpenBSD:
 same problem. The FreeBSD installer displays the speed: it starts at 780
 KB/sec (which it already very slow) to finish between 1 and 8 KB/sec.

) is the host FS btrfs?
) are virtio modules in the initramfs (or kernel probably)?
) are you sure virtio is being used (eg. are the disks called vdX vs sdX)?
) is the disk bus set to virtio (virtmanager)?
) is the disk's cache mode set to none [or maybe writeback] (virtmanager)?
) is the disk's storage format set to raw, should be (virtmanager)?
) is caching enabled on the image? ()

probably need to change the cache mode on the disk, or if the host is
btrfs you need to flag the image with whetever is needed to prevent
continuous COWing.

C Anthony
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issues with KVM

2011-07-22 Thread C Anthony Risinger
On Fri, Jul 22, 2011 at 2:59 PM, C Anthony Risinger anth...@xtfx.me wrote:
 On Fri, Jul 22, 2011 at 2:44 PM, Victor Stinner
 victor.stin...@haypocalc.com wrote:

 ) is caching enabled on the image? ()

oops, disregard that ... remainder left over from editing copy/paste :-)

C Anthony
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issues with KVM

2011-07-22 Thread Josef Bacik
On 07/22/2011 03:44 PM, Victor Stinner wrote:
  Hi,
 
 I have a new fast computer to run many virtual machines. Everything
 looks very fast, except the installation of new operating systems in
 KVM. The installation is very fast until it begins to write on disk. It
 looks like it writes slower and slower. I tried Debian, FreeBSD,
 OpenIndiana and OpenBSD: same problem. The FreeBSD installer displays
 the speed: it starts at 780 KB/sec (which it already very slow) to
 finish between 1 and 8 KB/sec.
 
 darksatanic suggested me to use nodatacow mount flag: it is not faster,
 and it looks even slower (fewer wsec/s in iostat output, see below).
 
 The computer is an Intel i7 2600 (4 cores with hyper threading: 8
 threads), 12 GB or RAM, 2 hard drives of 1 TB (Western Digital Caviar
 Blue 1 To 7200 RPM 32 Mo Serial ATA 6Gb/s - WD10EALX). Both disks are
 connected to SATA 6 GB/sec connectors using a P67 chipset. I'm using
 RAID 0 with Linux software (MD) RAID, and I have one unique btrfs
 partition of 2 TB. The host OS is Fedora 15 (Linux kernel 2.6.38).
 
 I'm using hardware virtualisation with KVM. Debian is installed using
 virtio, so it should not be an issue with the hard drive driver of KVM.
 
 I'm watching iostat during the Debian installation. With the default
 mount option, wsec/s starts at 49000 to finish near 42000 (on the MD
 device), %usage is greater than 50% of both disks (near 80% for sda,
 near 60% for sdb). Using nodatacow option, wsec/s starts at 12000
 (%usage  75%) to finish near 1 (%usage always  75%). It is slower,
 right? A sector is 512 bytes. The Debian image size is 40 GB, its type
 is raw. The system load is greater than 2 (or maybe 3) during the
 installation of the VM, while CPU usage is under 8% and wa% is also low
 (maybe 10% or lower, I don't remember).
 
 bonnie++ output (on the Fedora host, not in a VM):
 
 Version  1.96   --Sequential Output-- --Sequential Input-
 --Random-
 Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
 --Seeks--
 MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
 /sec %CP
 ned  24048M   346  98 220839  24 98489  19   245  84 251547  18
 199.2 259
 Latency 37256us 326ms 943ms 251ms 197ms 151ms
 Version  1.96   --Sequential Create-- Random
 Create
 ned -Create-- --Read--- -Delete-- -Create-- --Read---
 -Delete--
   files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 /sec %CP
  16 11128  34 + +++ 16558  45 14006  40 + +++
 17226  48
 Latency 14997us 663us   11401us8115us 282us 10105us
 1.96,1.96,ned,1,1311365747,24048M,,346,98,220839,24,98489,19,245,84,251547,18,199.2,259,16,11128,34,+,+++,16558,45,14006,40,+,+++,17226,48,37256us,326ms,943ms,251ms,197ms,151ms,14997us,663us,11401us,8115us,282us,10105us
 
 
 Do you have any idea why the %usage is so high (in iostat), while the
 speed looks so low? The disk speed during the installation is between 5
 MB/sec and 23 MB/sec, whereas the raw speed is greater than 200 MB/sec
 on the host (234 MB/sec according to hdparm -t /dev/md127, 220 MB/sec
 according to bonnie++ on sequential output).
 
 It's difficult to read bonnie++ output: random create is near 14 MB/sec
 if I read correctly. btrfs behaves maybe very badly with a raw image of
 40 GB, especially on RAID 0 with 2 disks?
 
 Should I try other KVM option (e.g. use another type of image)? Try
 btrfs RAID instead of Linux MD RAID? Try to disable some CPU cores? Or
 maybe not using btrfs for KVM images? :-)
 

Use the kvm option of cache=none for your device.  Granted its still
going to be slow, but it should be a bit faster.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issues with KVM

2011-07-22 Thread Morten P.D. Stevens

On Fri, 22 Jul 2011 21:44:24 +0200, Victor Stinner wrote:

Should I try other KVM option (e.g. use another type of image)? Try
btrfs RAID instead of Linux MD RAID? Try to disable some CPU cores? 
Or

maybe not using btrfs for KVM images? :-)


Hi,

I would suggest you the following points:

- qemu-img create -f qcow2 -o size=400,preallocation=metadata 
vdisk.img

- disk: cache=none
- controller: virtio

Best regards,

Morten
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 3.0 release - btrfs possible locking deadlock

2011-07-22 Thread Ed Tomlinson
On Thursday 21 July 2011 22:59:53 Linus Torvalds wrote:
 So there it is. Gone are the 2.6.bignum days, and 3.0 is out.
 

Hi,

Managed to get this with btrfs rsync(ing) from ext4 to a btrfs fs with three 
partitions using raid1.

[16018.211493] device fsid f7186eeb-60df-4b1a-890a-4a1eb42f81fe devid 1 transid 
10 /dev/sdd4
[16018.230643] btrfs: use lzo compression
[16018.234619] btrfs: enabling disk space caching
[25949.414011] 
[25949.414011] ===
[25949.416549] [ INFO: possible circular locking dependency detected ]
[25949.423187] 3.0.0-crc+ #348
[25949.423187] ---
[25949.423187] rsync/20237 is trying to acquire lock:
[25949.423187]  (btrfs-extent-01){+.+...}, at: [a047ce88] 
btrfs_try_spin_lock+0x78/0xb0 [btrfs]
[25949.423187] 
[25949.423187] but task is already holding lock:
[25949.423187]  ((eb-lock)-rlock){+.+...}, at: [a047cee2] 
btrfs_clear_lock_blocking+0x22/0x30 [btrfs]
[25949.423187] 
[25949.423187] which lock already depends on the new lock.
[25949.423187] 
[25949.423187] 
[25949.423187] the existing dependency chain (in reverse order) is:
[25949.423187] 
[25949.423187] - #1 ((eb-lock)-rlock){+.+...}:
[25949.423187][8108bb75] lock_acquire+0x95/0x140
[25949.423187][815792eb] _raw_spin_lock+0x3b/0x50
[25949.423187][a047ce88] btrfs_try_spin_lock+0x78/0xb0 [btrfs]
[25949.423187][a0427959] btrfs_search_slot+0x2e9/0x800 [btrfs]
[25949.423187][a0433bee] 
lookup_inline_extent_backref+0xbe/0x490 [btrfs]
[25949.423187][a0434cbb] __btrfs_free_extent+0x13b/0x900 
[btrfs]
[25949.423187][a0435ca3] run_clustered_refs+0x823/0xaf0 
[btrfs]
[25949.423187][a043603d] btrfs_run_delayed_refs+0xcd/0x290 
[btrfs]
[25949.423187][a0445ecb] btrfs_commit_transaction+0x8b/0x9d0 
[btrfs]
[25949.423187][a0440c06] transaction_kthread+0x2b6/0x2e0 
[btrfs]
[25949.423187][81071536] kthread+0xb6/0xc0
[25949.423187][81582314] kernel_thread_helper+0x4/0x10
[25949.423187] 
[25949.423187] - #0 (btrfs-extent-01){+.+...}:
[25949.423187][8108b468] __lock_acquire+0x1588/0x16a0
[25949.423187][8108bb75] lock_acquire+0x95/0x140
[25949.423187][815792eb] _raw_spin_lock+0x3b/0x50
[25949.423187][a047ce88] btrfs_try_spin_lock+0x78/0xb0 [btrfs]
[25949.423187][a0427959] btrfs_search_slot+0x2e9/0x800 [btrfs]
[25949.423187][a0439dd2] btrfs_lookup_dir_item+0x82/0x120 
[btrfs]
[25949.423187][a04532a5] btrfs_lookup_dentry+0xc5/0x4c0 
[btrfs]
[25949.423187][a04536c4] btrfs_lookup+0x24/0x70 [btrfs]
[25949.423187][8115a863] d_alloc_and_lookup+0xc3/0x100
[25949.423187][8115cfa0] do_lookup+0x260/0x480
[25949.423187][8115d540] walk_component+0x60/0x1f0
[25949.423187][8115e7aa] path_lookupat+0xea/0x620
[25949.423187][8115ed15] do_path_lookup+0x35/0x1c0
[25949.423187][8115fc38] user_path_at+0x98/0xe0
[25949.423187][81153fac] vfs_fstatat+0x4c/0x90
[25949.423187][8115405e] vfs_lstat+0x1e/0x20
[25949.423187][81154084] sys_newlstat+0x24/0x50
[25949.423187][815814eb] system_call_fastpath+0x16/0x1b
[25949.423187] 
[25949.423187] other info that might help us debug this:
[25949.423187] 
[25949.423187]  Possible unsafe locking scenario:
[25949.423187] 
[25949.423187]CPU0CPU1
[25949.423187]
[25949.423187]   lock((eb-lock)-rlock);
[25949.423187]lock(btrfs-extent-01);
[25949.423187]lock((eb-lock)-rlock);
[25949.423187]   lock(btrfs-extent-01);
[25949.423187] 
[25949.423187]  *** DEADLOCK ***
[25949.423187] 
[25949.423187] 2 locks held by rsync/20237:
[25949.423187]  #0:  (sb-s_type-i_mutex_key#14){+.+.+.}, at: 
[8115cf5a] do_lookup+0x21a/0x480
[25949.423187]  #1:  ((eb-lock)-rlock){+.+...}, at: [a047cee2] 
btrfs_clear_lock_blocking+0x22/0x30 [btrfs]
[25949.423187] 
[25949.423187] stack backtrace:
[25949.423187] Pid: 20237, comm: rsync Not tainted 3.0.0-crc+ #348
[25949.423187] Call Trace:
[25949.423187]  [810887de] print_circular_bug+0x20e/0x2f0
[25949.423187]  [8108b468] __lock_acquire+0x1588/0x16a0
[25949.423187]  [a0441ebb] ? verify_parent_transid+0xcb/0x290 [btrfs]
[25949.423187]  [a047ce88] ? btrfs_try_spin_lock+0x78/0xb0 [btrfs]
[25949.423187]  [8108bb75] lock_acquire+0x95/0x140
[25949.423187]  [a047ce88] ? btrfs_try_spin_lock+0x78/0xb0 [btrfs]
[25949.423187]  [815792eb] _raw_spin_lock+0x3b/0x50
[25949.423187]  [a047ce88] ? btrfs_try_spin_lock+0x78/0xb0 [btrfs]