Re: [PATCH v3 3/6] btrfs: add scrub code and prototypes

2011-03-16 Thread Ilya Dryomov
On Sat, Mar 12, 2011 at 03:50:42PM +0100, Arne Jansen wrote:

All extent maps are leaked after unmount.  lookup_extent_mapping() takes
a reference, so they are all left with em-refs = 1.  I suggest the
following:

 +static noinline_for_stack int scrub_chunk(struct scrub_dev *sdev, 
 + u64 chunk_tree, u64 chunk_objectid, u64 chunk_offset, u64 length)
 +{
 + struct btrfs_mapping_tree *map_tree =
 + sdev-dev-dev_root-fs_info-mapping_tree;
 + struct map_lookup *map;
 + struct extent_map *em;
 + int i;
 + int ret;
 +
 + read_lock(map_tree-map_tree.lock);
 + em = lookup_extent_mapping(map_tree-map_tree, chunk_offset, 1);
 + read_unlock(map_tree-map_tree.lock);
 +
 + if (!em)
 + return -EINVAL;
 +
 + map = (struct map_lookup *)em-bdev;
 + if (em-start != chunk_offset)
 + return -EINVAL;
 +
 + if (em-len  length)
 + return -EINVAL;
 +
 + for (i = 0; i  map-num_stripes; ++i) {
 + if (map-stripes[i].dev == sdev-dev) {
 + ret = scrub_stripe(sdev, map, i, chunk_offset, length);
 + if (ret)
 + return ret;
 + }
 + }

+   free_extent_map(em);

 + return 0;
 +}

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/6] btrfs: add scrub code and prototypes

2011-03-16 Thread Andi Kleen

Arne Jansen sensi...@gmx.net writes:
 +  */
 + mutex_lock(fs_info-scrub_lock);
 + atomic_inc(fs_info-scrubs_running);
 + mutex_unlock(fs_info-scrub_lock);

It seems odd to protect an atomic_inc with a mutex. 
Is that done for some side effect? Otherwise you either
don't need atomic or don't need the lock.

That seems to be all over the source file.

 +int btrfs_scrub_pause(struct btrfs_root *root)
 +{
 + struct btrfs_fs_info *fs_info = root-fs_info;
 + mutex_lock(fs_info-scrub_lock);


As I understand it you take that mutex on every transaction
commit, which is a fast path for normal IO. 

For me that looks like a scalability problem with enough
cores. Did you do any performance testing of this on a system
with a reasonable number of cores?

btrfs already has enough scalability problems, please don't
add new ones.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/6] btrfs: add scrub code and prototypes

2011-03-16 Thread Arne Jansen

On 16.03.2011 23:07, Andi Kleen wrote:

Arne Jansensensi...@gmx.net  writes:

+*/
+   mutex_lock(fs_info-scrub_lock);
+   atomic_inc(fs_info-scrubs_running);
+   mutex_unlock(fs_info-scrub_lock);

It seems odd to protect an atomic_inc with a mutex.
Is that done for some side effect? Otherwise you either
don't need atomic or don't need the lock.


The reason it is atomic is because it is checked inside a wait_event,
where I can't hold a lock. The mutex is there to protect the check
in btrfs_scrub_pause and btrfs_scrub_cancel. But, now that I think
of it, there is still a race condition left. I'll rethink the locking there
and see if I can eliminate some of the mutex_locks.

That seems to be all over the source file.


+int btrfs_scrub_pause(struct btrfs_root *root)
+{
+   struct btrfs_fs_info *fs_info = root-fs_info;
+   mutex_lock(fs_info-scrub_lock);


As I understand it you take that mutex on every transaction
commit, which is a fast path for normal IO.


A transaction commit only happens every 30 seconds. At this point,
all outstanding data gets flushed and the super blocks written. I only
pause the scrub in a very late phase during commit. At this point,
the commit is already single threaded.

Apart from that you can be sure that scrub will have an impact on the
performance, as it keeps the disks at 100% busy.
To mitigate this, all scrub activity happens inside ioctls. The idea is that
this way the user can control the impact of the scrub using ionice.

--Arne


For me that looks like a scalability problem with enough
cores. Did you do any performance testing of this on a system
with a reasonable number of cores?

btrfs already has enough scalability problems, please don't
add new ones.

-Andi



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/6] btrfs: add scrub code and prototypes

2011-03-14 Thread Arne Jansen

On 14.03.2011 00:50, Ilya Dryomov wrote:

On Sat, Mar 12, 2011 at 03:50:42PM +0100, Arne Jansen wrote:

This is the main scrub code.



+   int nstripes;


It should be u64 nstripes.  A few lines below you assign the stripe
length, it is a 64 bit quantity, also do_div() expects a 64 bit
dividend.  This fixes a serious breakage on x86-32.


Thanks.



Overall, this function could use a good cleanup.  I can send you a patch
later unless you want to do it yourself.


Which parts do you have in mind? I see only potential for minor cleanup,
but as I wrote it, I might as well be blind to it ;) So when you
send the patch I'll probably see what you mean.
The part I'm not content with is the prefetching, as it is still too
slow.




+   int start_stripe;
+   struct extent_buffer *l;
+   struct btrfs_key key;
+   u64 physical;
+   u64 logical;
+   u64 generation;
+   u64 mirror_num;


And a minor thing on the whole series: u64 should be printed with %llu
and explicit casting to unsigned long long.


fixed.



Thanks,

Ilya


--

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/6] btrfs: add scrub code and prototypes

2011-03-12 Thread Arne Jansen
This is the main scrub code.

Updates v3:

 - fixed EIO handling, need to reallocate bio instead of reusing it
 - Updated according to David Sterba's review
 - don't clobber bi_flags on reuse, just set UPTODATE
 - allocate long living bios with bio_kmalloc instead of bio_alloc

Signed-off-by: Arne Jansen sensi...@gmx.net
---
 fs/btrfs/Makefile |2 +-
 fs/btrfs/ctree.h  |   14 +
 fs/btrfs/scrub.c  | 1497 +
 3 files changed, 1512 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 31610ea..8fda313 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -7,4 +7,4 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
root-tree.o dir-item.o \
   extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
   export.o tree-log.o acl.o free-space-cache.o zlib.o lzo.o \
-  compression.o delayed-ref.o relocation.o
+  compression.o delayed-ref.o relocation.o scrub.o
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4c99834..030c321 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2610,4 +2610,18 @@ void btrfs_reloc_pre_snapshot(struct btrfs_trans_handle 
*trans,
  u64 *bytes_to_reserve);
 void btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
  struct btrfs_pending_snapshot *pending);
+
+/* scrub.c */
+int btrfs_scrub_dev(struct btrfs_root *root, u64 devid, u64 start, u64 end,
+struct btrfs_scrub_progress *progress);
+int btrfs_scrub_pause(struct btrfs_root *root);
+int btrfs_scrub_pause_super(struct btrfs_root *root);
+int btrfs_scrub_continue(struct btrfs_root *root);
+int btrfs_scrub_continue_super(struct btrfs_root *root);
+int btrfs_scrub_cancel(struct btrfs_root *root);
+int btrfs_scrub_cancel_dev(struct btrfs_root *root, struct btrfs_device *dev);
+int btrfs_scrub_cancel_devid(struct btrfs_root *root, u64 devid);
+int btrfs_scrub_progress(struct btrfs_root *root, u64 devid,
+ struct btrfs_scrub_progress *progress);
+
 #endif
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
new file mode 100644
index 000..915641d
--- /dev/null
+++ b/fs/btrfs/scrub.c
@@ -0,0 +1,1497 @@
+/*
+ * Copyright (C) 2011 STRATO.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include linux/sched.h
+#include linux/pagemap.h
+#include linux/writeback.h
+#include linux/blkdev.h
+#include linux/rbtree.h
+#include linux/slab.h
+#include linux/workqueue.h
+#include ctree.h
+#include volumes.h
+#include disk-io.h
+#include ordered-data.h
+
+/*
+ * This is only the first step towards a full-features scrub. It reads all
+ * extent and super block and verifies the checksums. In case a bad checksum
+ * is found or the extent cannot be read, good data will be written back if
+ * any can be found.
+ *
+ * Future enhancements:
+ *  - To enhance the performance, better read-ahead strategies for the
+ *extent-tree can be employed.
+ *  - In case an unrepairable extent is encountered, track which files are
+ *affected and report them
+ *  - In case of a read error on files with nodatasum, map the file and read
+ *the extent to trigger a writeback of the good copy
+ *  - track and record media errors, throw out bad devices
+ *  - add a readonly mode
+ *  - add a mode to also read unallocated space
+ */
+
+#ifdef SCRUB_BTRFS_WORKER
+typedef struct btrfs_work scrub_work_t;
+#define SCRUB_INIT_WORK(work, fn) do { (work)-func = (fn); } while (0)
+#define SCRUB_QUEUE_WORK(wq, w) do { btrfs_queue_worker((wq), w); } while (0)
+#else
+typedef struct work_struct scrub_work_t;
+#define SCRUB_INIT_WORK INIT_WORK
+#define SCRUB_QUEUE_WORK queue_work
+#endif
+
+struct scrub_bio;
+struct scrub_page;
+struct scrub_dev;
+struct scrub_fixup;
+static void scrub_bio_end_io(struct bio *bio, int err);
+static void scrub_checksum(scrub_work_t *work);
+static int scrub_checksum_data(struct scrub_dev *sdev,
+   struct scrub_page *spag, void *buffer);
+static int scrub_checksum_tree_block(struct scrub_dev *sdev,
+ struct scrub_page *spag, u64 logical,
+ void *buffer);
+static int scrub_checksum_super(struct