[PATCH] Btrfs-progs: fix segfault when getting scrub status

2013-12-04 Thread Wang Shilong
I sometimes get segfault in cmd_scrub_status(), this is because
free_history() forgot to check whether pointer address is valid,fix it.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 cmds-scrub.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cmds-scrub.c b/cmds-scrub.c
index 9f614bc..69791b3 100644
--- a/cmds-scrub.c
+++ b/cmds-scrub.c
@@ -285,7 +285,7 @@ static void print_fs_stat(struct scrub_fs_stat *fs_stat, 
int raw)
 static void free_history(struct scrub_file_record **last_scrubs)
 {
struct scrub_file_record **l = last_scrubs;
-   if (!l)
+   if (!l || IS_ERR(l))
return;
while (*l)
free(*l++);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Btrfs: Add linear chunk allocation support.

2013-12-04 Thread chandan
Thanks for the review comments David. I will come up with another patch
that implements your suggestion.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs deadlocks under stress up until 3.12

2013-12-04 Thread Mel Gorman
Hi,

I queued up a number of tests including IO stress tests a few weeks ago
and had noticed that some of the btrfs tests failed to complete but only
looked today.  Specfically, stress tests with reaims alltests configuration
on btrfs failed up until 3.12 with a console log that looked like

[ 2882.975251] INFO: task btrfs-transacti:2816 blocked for more than 480 
seconds.
[ 2882.994789] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this 
message.
[ 2883.015070] btrfs-transacti D 88023fc13600 0  2816  2 0x
[ 2883.034734]  880234539dc0 0046 880234539fd8 
00013600
[ 2883.054847]  880234539fd8 00013600 880230a44540 
8801c97868b8
[ 2883.075027]  8802346be9e8 8802346be9e8  
8801ef4a6770
[ 2883.095256] Call Trace:
[ 2883.110170]  [815de804] schedule+0x24/0x70
[ 2883.127723]  [8128460f] wait_current_trans.isra.18+0xaf/0x110
[ 2883.147034]  [810741f0] ? wake_up_atomic_t+0x30/0x30
[ 2883.165492]  [81285c70] start_transaction+0x270/0x510
[ 2883.184214]  [81285fc2] btrfs_attach_transaction+0x12/0x20
[ 2883.203282]  [8127cb74] transaction_kthread+0x74/0x220
[ 2883.221941]  [8127cb00] ? verify_parent_transid+0x170/0x170
[ 2883.241048]  [8107347b] kthread+0xbb/0xc0
[ 2883.258423]  [810733c0] ? kthread_create_on_node+0x110/0x110
[ 2883.277654]  [815e7efc] ret_from_fork+0x7c/0xb0
[ 2883.295561]  [810733c0] ? kthread_create_on_node+0x110/0x110
[ 2883.314535] INFO: task kworker/u16:3:21786 blocked for more than 480 seconds.
[ 2883.334131] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this 
message.
[ 2883.354587] kworker/u16:3   D 88023fc13600 0 21786  2 0x
[ 2883.374274] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1)
[ 2883.393428]  8801d450bbb0 0046 8801d450bfd8 
00013600
[ 2883.413681]  8801d450bfd8 00013600 8801f540a440 
88021c9473b0
[ 2883.433798]  88023463a000 8801d450bbd0 8801c97868b8 
8801c9786928
[ 2883.453815] Call Trace:
[ 2883.468366]  [815de804] schedule+0x24/0x70
[ 2883.485395]  [81285305] btrfs_commit_transaction+0x265/0x960
[ 2883.503928]  [810741f0] ? wake_up_atomic_t+0x30/0x30
[ 2883.521745]  [81292140] btrfs_write_inode+0x70/0xb0
[ 2883.539623]  [811aa317] __writeback_single_inode+0x167/0x220
[ 2883.558528]  [811aae5f] writeback_sb_inodes+0x19f/0x400
[ 2883.577137]  [811ab273] wb_writeback+0xe3/0x2b0
[ 2883.595184]  [8106efe1] ? set_worker_desc+0x71/0x80
[ 2883.613730]  [811ace00] bdi_writeback_workfn+0x100/0x3d0
[ 2883.632837]  [8106c0a8] process_one_work+0x178/0x410
[ 2883.651553]  [8106ccb9] worker_thread+0x119/0x3a0
[ 2883.669822]  [8106cba0] ? rescuer_thread+0x360/0x360
[ 2883.688338]  [8107347b] kthread+0xbb/0xc0
[ 2883.705761]  [810733c0] ? kthread_create_on_node+0x110/0x110
[ 2883.724865]  [815e7efc] ret_from_fork+0x7c/0xb0
[ 2883.742994]  [810733c0] ? kthread_create_on_node+0x110/0x110

Tests were executed by mmtests using the
configs/config-global-dhp__reaim-stress-alltests as a baseline but with
the following parameters added to use a test partition

export TESTDISK_PARTITION=/dev/sda6
export TESTDISK_FILESYSTEM=btrfs
export TESTDISK_MKFS_PARAM=-f
export TESTDISK_MOUNT_ARGS=

While it is apparently fixed at the moment, any distribution using btrfs
with 3.10-longterm or 3.11 may file bugs and nag about the general stability
of btrfs even though the issues are already resolved.  I note there are
a number of deadlock-related fixes merged for btrfs between 3.11 and
3.12. Are there plans to backport them?

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cluster-devel] [PATCH 16/18] gfs2: use generic posix ACL infrastructure

2013-12-04 Thread Steven Whitehouse
Hi,

On Sun, 2013-12-01 at 03:59 -0800, Christoph Hellwig wrote:
 plain text document attachment
 (0016-gfs2-use-generic-posix-ACL-infrastructure.patch)
 This contains some major refactoring for the create path so that
 inodes are created with the right mode to start with instead of
 fixing it up later.
 
 Signed-off-by: Christoph Hellwig h...@lst.de
 ---
  fs/gfs2/acl.c   |  229 
 +++
  fs/gfs2/acl.h   |4 +-
  fs/gfs2/inode.c |   33 ++--
  fs/gfs2/xattr.c |4 +-
  4 files changed, 61 insertions(+), 209 deletions(-)
 
Looks very good. I'd really like to be able to do something similar with
the security xattrs, in terms of the refactoring that at inode creation
to give the xattrs ahead of the inode allocation itself. That way it
should be possible to allocate the xattr blocks at the same time as the
inode, rather than as an after thought.

Some more comments below

 diff --git a/fs/gfs2/acl.c b/fs/gfs2/acl.c
 index e82e4ac..e6c7a2c 100644
 --- a/fs/gfs2/acl.c
 +++ b/fs/gfs2/acl.c
[snip]
 -
 -static int gfs2_xattr_system_set(struct dentry *dentry, const char *name,
 -  const void *value, size_t size, int flags,
 -  int xtype)
 -{
 - struct inode *inode = dentry-d_inode;
 - struct gfs2_sbd *sdp = GFS2_SB(inode);
 - struct posix_acl *acl = NULL;
 - int error = 0, type;
 -
 - if (!sdp-sd_args.ar_posix_acl)
 - return -EOPNOTSUPP;
 -
 - type = gfs2_acl_type(name);
 - if (type  0)
 - return type;
 - if (flags  XATTR_CREATE)
 - return -EINVAL;
 - if (type == ACL_TYPE_DEFAULT  !S_ISDIR(inode-i_mode))
 - return value ? -EACCES : 0;
 - if (!uid_eq(current_fsuid(), inode-i_uid)  !capable(CAP_FOWNER))
 - return -EPERM;
 - if (S_ISLNK(inode-i_mode))
 - return -EOPNOTSUPP;
 -
 - if (!value)
 - goto set_acl;
  
 - acl = posix_acl_from_xattr(init_user_ns, value, size);
 - if (!acl) {
 - /*
 -  * acl_set_file(3) may request that we set default ACLs with
 -  * zero length -- defend (gracefully) against that here.
 -  */
 - goto out;
 - }
 - if (IS_ERR(acl)) {
 - error = PTR_ERR(acl);
 - goto out;
 - }
 -
 - error = posix_acl_valid(acl);
 - if (error)
 - goto out_release;
 -
 - error = -EINVAL;
   if (acl-a_count  GFS2_ACL_MAX_ENTRIES)
 - goto out_release;
 + return -EINVAL;
  
   if (type == ACL_TYPE_ACCESS) {
   umode_t mode = inode-i_mode;
 +
   error = posix_acl_equiv_mode(acl, mode);
 + if (error  0)
  
Andy Price has pointed out a missing return error; here

 - if (error = 0) {
 - posix_acl_release(acl);
 + if (error == 0)
   acl = NULL;
  
 - if (error  0)
 - return error;
 - }
 -

Also, there seems to be a white space error in the xfs patch around line
170 in fs/xfs/xfs_iops.c where there is an added if (default_acl) with
a space before the tab,

Steve.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-ino-cache runs on every boot for 6 minutes

2013-12-04 Thread Szőts Ákos
Thank you for your answer.

Here are the blocked states logs. Since I had a plenty of time, I made 
multiple of them. The last two are the longest of them.

- http://paste.opensuse.org/view/raw/51654551
- http://paste.opensuse.org/view/raw/39005796
- http://paste.opensuse.org/view/raw/38028651
- http://paste.opensuse.org/view/raw/21592065
- http://paste.opensuse.org/view/raw/46655344
- http://paste.opensuse.org/view/raw/26821272

Ákos
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/2] Btrfs: fix wrong super generation mismatch when scrubbing supers

2013-12-04 Thread Wang Shilong
We came a race condition when scrubbing superblocks, the story is:

In commiting transaction, we will update @last_trans_commited after
writting superblocks, if scrubber start after writting superblocks
and before updating @last_trans_commited, generation mismatch happens!

We fix this by checking @scrub_pause_req, and we won't start a srubber
until commiting transaction is finished.(after btrfs_scrub_continue()
finished.)

Reported-by: Sebastian Ochmann ochm...@informatik.uni-bonn.de
Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
Reviewed-by: Miao Xie mi...@cn.fujitsu.com
---
v3-v4:
by checking @scrub_pause_req, block a scrubber
if we are committing transaction(thanks to Miao and Liu)
---
 fs/btrfs/scrub.c | 45 ++---
 1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 2544805..d27f95e 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -257,6 +257,7 @@ static int copy_nocow_pages_for_inode(u64 inum, u64 offset, 
u64 root,
 static int copy_nocow_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
int mirror_num, u64 physical_for_dev_replace);
 static void copy_nocow_pages_worker(struct btrfs_work *work);
+static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info);
 
 
 static void scrub_pending_bio_inc(struct scrub_ctx *sctx)
@@ -270,6 +271,16 @@ static void scrub_pending_bio_dec(struct scrub_ctx *sctx)
wake_up(sctx-list_wait);
 }
 
+static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info)
+{
+   while (atomic_read(fs_info-scrub_pause_req)) {
+   mutex_unlock(fs_info-scrub_lock);
+   wait_event(fs_info-scrub_pause_wait,
+  atomic_read(fs_info-scrub_pause_req) == 0);
+   mutex_lock(fs_info-scrub_lock);
+   }
+}
+
 /*
  * used for workers that require transaction commits (i.e., for the
  * NOCOW case)
@@ -2330,14 +2341,10 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_ctx *sctx,
btrfs_reada_wait(reada2);
 
mutex_lock(fs_info-scrub_lock);
-   while (atomic_read(fs_info-scrub_pause_req)) {
-   mutex_unlock(fs_info-scrub_lock);
-   wait_event(fs_info-scrub_pause_wait,
-  atomic_read(fs_info-scrub_pause_req) == 0);
-   mutex_lock(fs_info-scrub_lock);
-   }
+   scrub_blocked_if_needed(fs_info);
atomic_dec(fs_info-scrubs_paused);
mutex_unlock(fs_info-scrub_lock);
+
wake_up(fs_info-scrub_pause_wait);
 
/*
@@ -2377,15 +2384,12 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_ctx *sctx,
atomic_set(sctx-wr_ctx.flush_all_writes, 0);
atomic_inc(fs_info-scrubs_paused);
wake_up(fs_info-scrub_pause_wait);
+
mutex_lock(fs_info-scrub_lock);
-   while (atomic_read(fs_info-scrub_pause_req)) {
-   mutex_unlock(fs_info-scrub_lock);
-   wait_event(fs_info-scrub_pause_wait,
-  atomic_read(fs_info-scrub_pause_req) == 0);
-   mutex_lock(fs_info-scrub_lock);
-   }
+   scrub_blocked_if_needed(fs_info);
atomic_dec(fs_info-scrubs_paused);
mutex_unlock(fs_info-scrub_lock);
+
wake_up(fs_info-scrub_pause_wait);
}
 
@@ -2707,14 +2711,10 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
   atomic_read(sctx-workers_pending) == 0);
 
mutex_lock(fs_info-scrub_lock);
-   while (atomic_read(fs_info-scrub_pause_req)) {
-   mutex_unlock(fs_info-scrub_lock);
-   wait_event(fs_info-scrub_pause_wait,
-  atomic_read(fs_info-scrub_pause_req) == 0);
-   mutex_lock(fs_info-scrub_lock);
-   }
+   scrub_blocked_if_needed(fs_info);
atomic_dec(fs_info-scrubs_paused);
mutex_unlock(fs_info-scrub_lock);
+
wake_up(fs_info-scrub_pause_wait);
 
btrfs_put_block_group(cache);
@@ -2926,7 +2926,13 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
devid, u64 start,
}
sctx-readonly = readonly;
dev-scrub_device = sctx;
+   mutex_unlock(fs_info-fs_devices-device_list_mutex);
 
+   /*
+* checking @scrub_pause_req here, we can avoid
+* race between committing transaction and scrubbing.
+*/
+   scrub_blocked_if_needed(fs_info);
atomic_inc(fs_info-scrubs_running);
mutex_unlock(fs_info-scrub_lock);
 
@@ -2935,9 +2941,10 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
devid, u64 start,
 * by holding device list mutex, 

[PATCH 2/2] Btrfs: wrap repeated code into scrub_blocked_if_needed()

2013-12-04 Thread Wang Shilong
Just wrap same code into one function scrub_blocked_if_needed().

This make a change that we will move waiting (@workers_pending = 0)
before we can wake up commiting transaction(atomic_inc(@scrub_paused)), 
we must take carefully to not deadlock here.

Thread 1Thread 2
|-btrfs_commit_transaction()
|-set trans type(COMMIT_DOING)
|-btrfs_scrub_paused()(blocked)
|-join_transaction(blocked)

Move btrfs_scrub_paused() before setting trans type which means we can
still join a transaction when commiting_transaction is blocked.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
Suggested-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/scrub.c   | 43 +--
 fs/btrfs/transaction.c |  3 ++-
 2 files changed, 19 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index d27f95e..fced60c 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -257,6 +257,7 @@ static int copy_nocow_pages_for_inode(u64 inum, u64 offset, 
u64 root,
 static int copy_nocow_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
int mirror_num, u64 physical_for_dev_replace);
 static void copy_nocow_pages_worker(struct btrfs_work *work);
+static void __scrub_blocked_if_needed(struct btrfs_fs_info *fs_info);
 static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info);
 
 
@@ -271,7 +272,7 @@ static void scrub_pending_bio_dec(struct scrub_ctx *sctx)
wake_up(sctx-list_wait);
 }
 
-static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info)
+static void __scrub_blocked_if_needed(struct btrfs_fs_info *fs_info)
 {
while (atomic_read(fs_info-scrub_pause_req)) {
mutex_unlock(fs_info-scrub_lock);
@@ -281,6 +282,19 @@ static void scrub_blocked_if_needed(struct btrfs_fs_info 
*fs_info)
}
 }
 
+static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info)
+{
+   atomic_inc(fs_info-scrubs_paused);
+   wake_up(fs_info-scrub_pause_wait);
+
+   mutex_lock(fs_info-scrub_lock);
+   __scrub_blocked_if_needed(fs_info);
+   atomic_dec(fs_info-scrubs_paused);
+   mutex_unlock(fs_info-scrub_lock);
+
+   wake_up(fs_info-scrub_pause_wait);
+}
+
 /*
  * used for workers that require transaction commits (i.e., for the
  * NOCOW case)
@@ -2315,8 +2329,7 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_ctx *sctx,
 
wait_event(sctx-list_wait,
   atomic_read(sctx-bios_in_flight) == 0);
-   atomic_inc(fs_info-scrubs_paused);
-   wake_up(fs_info-scrub_pause_wait);
+   scrub_blocked_if_needed(fs_info);
 
/* FIXME it might be better to start readahead at commit root */
key_start.objectid = logical;
@@ -2340,12 +2353,6 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_ctx *sctx,
if (!IS_ERR(reada2))
btrfs_reada_wait(reada2);
 
-   mutex_lock(fs_info-scrub_lock);
-   scrub_blocked_if_needed(fs_info);
-   atomic_dec(fs_info-scrubs_paused);
-   mutex_unlock(fs_info-scrub_lock);
-
-   wake_up(fs_info-scrub_pause_wait);
 
/*
 * collect all data csums for the stripe to avoid seeking during
@@ -2382,15 +2389,7 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_ctx *sctx,
wait_event(sctx-list_wait,
   atomic_read(sctx-bios_in_flight) == 0);
atomic_set(sctx-wr_ctx.flush_all_writes, 0);
-   atomic_inc(fs_info-scrubs_paused);
-   wake_up(fs_info-scrub_pause_wait);
-
-   mutex_lock(fs_info-scrub_lock);
scrub_blocked_if_needed(fs_info);
-   atomic_dec(fs_info-scrubs_paused);
-   mutex_unlock(fs_info-scrub_lock);
-
-   wake_up(fs_info-scrub_pause_wait);
}
 
key.objectid = logical;
@@ -2705,17 +2704,9 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
wait_event(sctx-list_wait,
   atomic_read(sctx-bios_in_flight) == 0);
atomic_set(sctx-wr_ctx.flush_all_writes, 0);
-   atomic_inc(fs_info-scrubs_paused);
-   wake_up(fs_info-scrub_pause_wait);
wait_event(sctx-list_wait,
   atomic_read(sctx-workers_pending) == 0);
-
-   mutex_lock(fs_info-scrub_lock);
scrub_blocked_if_needed(fs_info);
-   atomic_dec(fs_info-scrubs_paused);
-   mutex_unlock(fs_info-scrub_lock);
-
-   wake_up(fs_info-scrub_pause_wait);
 
btrfs_put_block_group(cache);
if (ret)
@@ -2932,7 +2923,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
devid, u64 start,
 * checking @scrub_pause_req 

Re: btrfs-cleaner leaves CPU pinned to 100%

2013-12-04 Thread Adam Gradzki
Adam G adam.gradzki at gmail.com writes:

 
 Hello,
 
 After a recent kernel upgrade I noticed the fans blowing at full
 throttle on my laptop.
 
 There is currently very low load on the machine and nothing out of the
 ordinary has occurred to lead me to believe this is normal behavior.
 
 Here is a link to the perf output I managed to capture
 
 http://pastebin.com/a5cAy7Dw
 
 Please let me know if I may be of any further assistance.
 
 Regards,
 
 Adam Gradzki
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majordomo at vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 

I would like to add an updated perf report I took shortly after the first 
one yesterday (600 seconds this time).

http://www.pastebin.ca/2492119

Removing autodefrag from /etc/fstab has ameliorated my troubles for the time 
being.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-cleaner leaves CPU pinned to 100%

2013-12-04 Thread Josef Bacik
Adam Gradzki adam.grad...@gmail.com wrote on Wed [2013-Dec-04 15:27:58 +]:
 Adam G adam.gradzki at gmail.com writes:
 
  
  Hello,
  
  After a recent kernel upgrade I noticed the fans blowing at full
  throttle on my laptop.
  
  There is currently very low load on the machine and nothing out of the
  ordinary has occurred to lead me to believe this is normal behavior.
  
  Here is a link to the perf output I managed to capture
  
  http://pastebin.com/a5cAy7Dw
  
  Please let me know if I may be of any further assistance.
  
  Regards,
  
  Adam Gradzki
  --
  To unsubscribe from this list: send the line unsubscribe linux-btrfs in
  the body of a message to majordomo at vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
  
 
 I would like to add an updated perf report I took shortly after the first 
 one yesterday (600 seconds this time).
 
 http://www.pastebin.ca/2492119
 
 Removing autodefrag from /etc/fstab has ameliorated my troubles for the time 
 being.
 

Do you have a lot of snapshots?  I'm working on this currently so hopefully it
will be fixed soon.  If you have issues again can you use perf record -g so it
pulls the stacktraces and then you can drill down before you post so I can
figure out where we're spending all of our time.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] Btrfs Maintainer update

2013-12-04 Thread Chris Mason
Hi Linus,

I'm still getting settled into new devel hardware etc, but I do have one
commit for the next rc.  Please grab it from my for-linus branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus

This changes my email over to fb.com, and adds a MAINTAINERS entry for
Josef as well.

Instead of diffstat etc, I've put the patch below.  The commit is also
signed, my updated key should be floating around the pgp servers now.

commit c0778e2534b81be6b85b5ee07c0e15ff774f7136
Author: Chris Mason c...@fb.com
Date:   Tue Dec 3 20:16:03 2013 -0500

Btrfs: update the MAINTAINERS file

Josef and I have new email addresses

Signed-off-by: Chris Mason c...@fb.com
Signed-off-by: Josef Bacik jba...@fb.com

diff --git a/MAINTAINERS b/MAINTAINERS
index ffcaf97..db89b5d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1921,7 +1921,8 @@ S:Maintained
 F: drivers/gpio/gpio-bt8xx.c
 
 BTRFS FILE SYSTEM
-M: Chris Mason chris.ma...@fusionio.com
+M: Chris Mason c...@fb.com
+M: Josef Bacik jba...@fb.com
 L: linux-btrfs@vger.kernel.org
 W: http://btrfs.wiki.kernel.org/
 Q: http://patchwork.kernel.org/project/linux-btrfs/list/
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Can scrub also refresh?

2013-12-04 Thread Kyle Evans

Basically, is there a way to force the refresh of the magnetic state of
data? I assume scrub does this only when a read error has been
encountered. Does anyone think it would be a good option to write 100%
of the data back on request?

I am asking because I have ddrescue running on a hard drive that had
data written to it years ago and the data was never even looked at.
Then, some blocks started to go bad so I started the recovery and wow,
is it going slow. It does have some read errors, some pending sectors, and
some multizone error rate, whatever that is, but they are not increasing
at a steady rate, it will go a few days without them increasing. I suspect
this is merely some stale data that could have been prevented by something
like dd if=dev of=dev once a year.
Anyway some performance numbers, obviously not related to Btrfs:
I started this Nov 22nd with about 1 day of downtime.

$ ddrescue -n -a 1 /dev/sdi4 opt.img opt.log

GNU ddrescue 1.16
Press Ctrl-C to interrupt
Initial status (read from logfile)
rescued:44177 MB,  errsize:   9879 kB,  errors: 207
Current status
rescued:55578 MB,  errsize:   9879 kB,  current rate:   30 B/s
   ipos:58881 MB,   errors: 207,average rate: 131 kB/s
   opos:58881 MB, time since last successful read:   0 s
Copying non-tried blocks...



=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   196   196   051Pre-fail  Always   
-   269625
  3 Spin_Up_Time0x0003   177   174   021Pre-fail  Always   
-   6125
  4 Start_Stop_Count0x0032   100   100   000Old_age   Always   
-   586
  5 Reallocated_Sector_Ct   0x0033   200   200   140Pre-fail  Always   
-   0
  7 Seek_Error_Rate 0x000e   200   200   051Old_age   Always   
-   0
  9 Power_On_Hours  0x0032   083   083   000Old_age   Always   
-   12753
 10 Spin_Retry_Count0x0012   100   100   051Old_age   Always   
-   0
 11 Calibration_Retry_Count 0x0012   100   100   051Old_age   Always   
-   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age   Always   
-   485
192 Power-Off_Retract_Count 0x0032   200   200   000Old_age   Always   
-   197
193 Load_Cycle_Count0x0032   200   200   000Old_age   Always   
-   588
194 Temperature_Celsius 0x0022   115   085   000Old_age   Always   
-   35
196 Reallocated_Event_Count 0x0032   200   200   000Old_age   Always   
-   0
197 Current_Pending_Sector  0x0012   183   183   000Old_age   Always   
-   1439
198 Offline_Uncorrectable   0x0010   200   200   000Old_age   Offline  
-   11
199 UDMA_CRC_Error_Count0x003e   200   200   000Old_age   Always   
-   232
200 Multi_Zone_Error_Rate   0x0008   001   001   051Old_age   Offline  
FAILING_NOW 27189

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can scrub also refresh?

2013-12-04 Thread Chris Murphy

On Dec 4, 2013, at 1:51 PM, Kyle Evans kvan...@gmail.com wrote:
 
 200 Multi_Zone_Error_Rate   0x0008   001   001   051Old_age   Offline  
 FAILING_NOW 27189

The drive is failing. I would only do what's required to get data off the drive 
and then get rid of it (RMA it if it's under warranty). It's only worth keeping 
around as an unreliable test subject!

Otherwise to answer the question, balance is what you're after. It reads and 
writes all chunks. As you guessed, scrub is read only unless there's an error 
that can be corrected.

ddrescue will go slowly, because all sectors are being copied, even sectors you 
probably don't need. So I personally would use rsync or btrfs send, to get the 
data off the drive rather than every sector, many of which you don't care about.

Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/2] Btrfs: fix wrong super generation mismatch when scrubbing supers

2013-12-04 Thread Sebastian Ochmann

Hello,

seems to be working for me (only tested using both parts of the patch); 
wasn't able to trigger the errors after almost an hour of stress-testing.


Best regards,
Sebastian

On 04.12.2013 14:15, Wang Shilong wrote:

We came a race condition when scrubbing superblocks, the story is:

In commiting transaction, we will update @last_trans_commited after
writting superblocks, if scrubber start after writting superblocks
and before updating @last_trans_commited, generation mismatch happens!

We fix this by checking @scrub_pause_req, and we won't start a srubber
until commiting transaction is finished.(after btrfs_scrub_continue()
finished.)

Reported-by: Sebastian Ochmann ochm...@informatik.uni-bonn.de
Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
Reviewed-by: Miao Xie mi...@cn.fujitsu.com
---
v3-v4:
by checking @scrub_pause_req, block a scrubber
if we are committing transaction(thanks to Miao and Liu)
---
  fs/btrfs/scrub.c | 45 ++---
  1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 2544805..d27f95e 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -257,6 +257,7 @@ static int copy_nocow_pages_for_inode(u64 inum, u64 offset, 
u64 root,
  static int copy_nocow_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
int mirror_num, u64 physical_for_dev_replace);
  static void copy_nocow_pages_worker(struct btrfs_work *work);
+static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info);


  static void scrub_pending_bio_inc(struct scrub_ctx *sctx)
@@ -270,6 +271,16 @@ static void scrub_pending_bio_dec(struct scrub_ctx *sctx)
wake_up(sctx-list_wait);
  }

+static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info)
+{
+   while (atomic_read(fs_info-scrub_pause_req)) {
+   mutex_unlock(fs_info-scrub_lock);
+   wait_event(fs_info-scrub_pause_wait,
+  atomic_read(fs_info-scrub_pause_req) == 0);
+   mutex_lock(fs_info-scrub_lock);
+   }
+}
+
  /*
   * used for workers that require transaction commits (i.e., for the
   * NOCOW case)
@@ -2330,14 +2341,10 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_ctx *sctx,
btrfs_reada_wait(reada2);

mutex_lock(fs_info-scrub_lock);
-   while (atomic_read(fs_info-scrub_pause_req)) {
-   mutex_unlock(fs_info-scrub_lock);
-   wait_event(fs_info-scrub_pause_wait,
-  atomic_read(fs_info-scrub_pause_req) == 0);
-   mutex_lock(fs_info-scrub_lock);
-   }
+   scrub_blocked_if_needed(fs_info);
atomic_dec(fs_info-scrubs_paused);
mutex_unlock(fs_info-scrub_lock);
+
wake_up(fs_info-scrub_pause_wait);

/*
@@ -2377,15 +2384,12 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_ctx *sctx,
atomic_set(sctx-wr_ctx.flush_all_writes, 0);
atomic_inc(fs_info-scrubs_paused);
wake_up(fs_info-scrub_pause_wait);
+
mutex_lock(fs_info-scrub_lock);
-   while (atomic_read(fs_info-scrub_pause_req)) {
-   mutex_unlock(fs_info-scrub_lock);
-   wait_event(fs_info-scrub_pause_wait,
-  atomic_read(fs_info-scrub_pause_req) == 0);
-   mutex_lock(fs_info-scrub_lock);
-   }
+   scrub_blocked_if_needed(fs_info);
atomic_dec(fs_info-scrubs_paused);
mutex_unlock(fs_info-scrub_lock);
+
wake_up(fs_info-scrub_pause_wait);
}

@@ -2707,14 +2711,10 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
   atomic_read(sctx-workers_pending) == 0);

mutex_lock(fs_info-scrub_lock);
-   while (atomic_read(fs_info-scrub_pause_req)) {
-   mutex_unlock(fs_info-scrub_lock);
-   wait_event(fs_info-scrub_pause_wait,
-  atomic_read(fs_info-scrub_pause_req) == 0);
-   mutex_lock(fs_info-scrub_lock);
-   }
+   scrub_blocked_if_needed(fs_info);
atomic_dec(fs_info-scrubs_paused);
mutex_unlock(fs_info-scrub_lock);
+
wake_up(fs_info-scrub_pause_wait);

btrfs_put_block_group(cache);
@@ -2926,7 +2926,13 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
devid, u64 start,
}
sctx-readonly = readonly;
dev-scrub_device = sctx;
+   mutex_unlock(fs_info-fs_devices-device_list_mutex);

+   /*
+* checking @scrub_pause_req here, we can avoid
+* race between committing transaction and scrubbing.
+*/
+   scrub_blocked_if_needed(fs_info);

[PATCH] Btrfs: more efficient push_leaf_right

2013-12-04 Thread Filipe David Borba Manana
Currently when finding the leaf to insert a key into a btree, if the
leaf doesn't have enough space to store the item we attempt to move
off some items from our leaf to its right neighbor leaf, and if this
fails to create enough free space in our leaf, we try to move off more
items to the left neighbor leaf as well.

When trying to move off items to the right neighbor leaf, if it has
enough room to store the new key but not not enough room to move off
at least one item from our target leaf, __push_leaf_right returns 1 and
we have to attempt to move items to the left neighbor (push_leaf_left
function) without touching the right neighbor leaf.
For the case where the right leaf has enough room to store at least 1
item from our leaf, we end up modifying (and dirtying) both our leaf
and the right leaf. This is non-optimal for the case where the new key
is greater than any key in our target leaf because it can be inserted at
slot 0 of the right neighbor leaf and we don't need to touch our leaf
at all nor to attempt to move off items to the left neighbor leaf.

Therefore this change just selects the right neighbor leaf as our new
target leaf if it has enough room for the new key without modifying our
initial target leaf - we do this only if the new key is higher than any
key in the initial target leaf.

While running the following test, push_leaf_right was called by split_leaf
4802 times. Out of those 4802 calls, for 2571 calls (53.5%) we hit this
special case (right leaf has enough room and new key is higher than any key
in the initial target leaf).

Test:

  sysbench --test=fileio --file-num=512 --file-total-size=5G \
--file-test-mode=[seqwr|rndwr] --num-threads=512 --file-block-size=8192 \
--max-requests=10 --file-io-mode=sync [prepare|run]

Results:

sequential writes

Throughput before this change: 65.71Mb/sec (average of 10 runs)
Throughput after this change:  66.58Mb/sec (average of 10 runs)

random writes

Throughput before this change: 10.75Mb/sec (average of 10 runs)
Throughput after this change:  11.56Mb/sec (average of 10 runs)

Signed-off-by: Filipe David Borba Manana fdman...@gmail.com
---
 fs/btrfs/ctree.c |   13 +
 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 11f9a18..a57507a 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -3613,6 +3613,19 @@ static int push_leaf_right(struct btrfs_trans_handle 
*trans, struct btrfs_root
if (left_nritems == 0)
goto out_unlock;
 
+   if (path-slots[0] == left_nritems  !empty) {
+   /* Key greater than all keys in the leaf, right neighbor has
+* enough room for it and we're not emptying our leaf to delete
+* it, therefore use right neighbor to insert the new item and
+* no need to touch/dirty our left leaft. */
+   btrfs_tree_unlock(left);
+   free_extent_buffer(left);
+   path-nodes[0] = right;
+   path-slots[0] = 0;
+   path-slots[1]++;
+   return 0;
+   }
+
return __push_leaf_right(trans, root, path, min_data_size, empty,
right, free_space, left_nritems, min_slot);
 out_unlock:
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can scrub also refresh?

2013-12-04 Thread Kyle Evans

On 12/04/2013 04:50 PM, Chris Murphy wrote:

Otherwise to answer the question, balance is what you're after. It reads and 
writes all chunks.


Brilliant!

Thanks,
Kyle
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-cleaner leaves CPU pinned to 100%

2013-12-04 Thread Adam G
Good news:

I recreated the problem by booting with autodefrag enabled and loading
a large VirtualBox image of Windows 8.1. The CPU is pinned to 100%
even after closing the VirtualBox software.

Here is the perf with your suggested configuration:

http://pastebin.ca/2492270

~ Adam Gradzki

On Wed, Dec 4, 2013 at 6:58 PM, Adam G adam.grad...@gmail.com wrote:
 Hi Josef,

 I do not have any snapshots at the moment. This is a new filesystem
 created with the 3.12 kernel last week. The filesystem is on a laptop
 that sees very little load on the file system and there is plenty of
 free space available on my two btrfs partitions.

 I will reboot with autodefrag enabled in fstab right now to see if I
 can capture the stacktraces for you.

 Regards,
 Adam Gradzki
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ERROR: send ioctl failed with -12: Cannot allocate memory

2013-12-04 Thread Jim Salter

Sending a 585G snapshot from box1 to box2:

# ionice -c3 btrfs send daily*2013-12-01* | pv -L40m -s585G | ssh 
-c arcfour 10.0.0.40 btrfs receive /data/.snapshots/data/images

At subvol daily_[1385956801]_2013-12-01_23:00:01
At subvol daily_[1385956801]_2013-12-01_23:00:01
ERROR: send ioctl failed with -12: Cannot allocate 
memory=== ] 59% ETA 2:04:01

 347GB 3:00:12 [32.8MB/s]
[= ]
59%
ERROR: unexpected EOF in stream.

Send failed a little over halfway through with Cannot allocate memory 
error.  Which is surprising, given that this is a relatively lightly 
loaded 32G server.  Output of free -m, taken immediately after seeing 
the error above:


root@gwa-virt1:/data/.snapshots/data/images# free -m
 total   used   free shared buffers cached
Mem: 32158  31798360  0 0  22092
-/+ buffers/cache:   9705  22453
Swap:0  0  0

Anybody got any suggestions?


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html