date:20151026

Re: Recover btrfs volume which can only be mounded in read-only mode

2015-10-26 Thread Hugo Mills

On Mon, Oct 26, 2015 at 09:14:00AM +, Duncan wrote:
> Dmitry Katsubo posted on Sun, 18 Oct 2015 11:44:08 +0200 as excerpted:
> 
> >> Meanwhile, the present btrfs raid1 read-scheduler is both pretty simple
> >> to code up and pretty simple to arrange tests for that run either one
> >> side or the other, but not both, or that are well balanced to both.
> >> However, it's pretty poor in terms of ensuring optimized real-world
> >> deployment read-scheduling.
> >> 
> >> What it does is simply this.  Remember, btrfs raid1 is specifically two
> >> copies.  It chooses which copy of the two will be read very simply,
> >> based on the PID making the request.  Odd PIDs get assigned one copy,
> >> even PIDs the other.  As I said, simple to code, great for ensuring
> >> testing of one copy or the other or both, but not really optimized at
> >> all for real-world usage.
> >> 
> >> If your workload happens to be a bunch of all odd or all even PIDs,
> >> well, enjoy your testing-grade read-scheduler, bottlenecking everything
> >> reading one copy, while the other sits entirely idle.
> > 
> > I think PID-based solution is not the best one. Why not simply take a
> > random device? Then at least all drives in the volume are equally loaded
> > (in average).
> 
> Nobody argues that the even/odd-PID-based read-scheduling solution is 
> /optimal/, in a production sense at least.  But at the time and for the 
> purpose it was written it was pretty good, arguably reasonably close to 
> "best", because the implementation is at once simple and transparent for 
> debugging purposes, and real easy to test either one side or the other, 
> or both, and equally important, to duplicate the results of those tests, 
> by simply arranging for the testing to have either all even or all odd 
> PIDs, or both.  And for ordinary use, it's good /enough/, as ordinarily, 
> PIDs will be evenly distributed even/odd.
> 
> In that context, your random device read-scheduling algorithm would be 
> far worse, because while being reasonably simple, it's anything *but* 
> easy to ensure reads go to only one side or equally to both, or for that 
> matter, to duplicate the tests, because randomization, by definition 
> does /not/ lend itself to duplication.

   For what it's worth, David tried implementing round-robin (IIRC)
some time ago, and found that it performed *worse* than the pid-based
system. (It may have been random, but memory says it was round-robin).

   Hugo.

-- 
Hugo Mills | Great films about cricket: The Umpire Strikes Back
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature

Re: [RFC PATCH 2/3] btrfs-progs: kernel based default features for mkfs

2015-10-26 Thread Anand Jain



Thanks Jeff for the comments.

On 10/23/2015 11:24 PM, Jeff Mahoney wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/21/15 4:45 AM, Anand Jain wrote:

mkfs from latest btrfs-progs will enable latest default features,
and if the kernel is down-rev and does not support a latest
default feature then mount fails, as expected.

This patch disables default features based on the running kernel.


I like the idea generally based on comments further in the thread, but
what I don't like is:

1) It's silent.


 Will add warning.


2) There's no way to override it.



If we're going to change the defaults at runtime, we should tell the
user what has changed and why.  Otherwise, an identical mkfs.btrfs
binary will behave differently on different systems without feedback
and that violates the principle of least surprise.  If they want to do
what Qu suggests later in the thread, where the device is being
prepared for use on a newer kernel, it should be possible to force it.
  The normal -f should be fine there.


 Sill have no idea how to get this, trying.

Thanks, Anand



- -Jeff


Signed-off-by: Anand Jain  --- mkfs.c | 5
- 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mkfs.c b/mkfs.c index a5802f7..2b9d734 100644 ---
a/mkfs.c +++ b/mkfs.c @@ -1357,10 +1357,13 @@ int main(int ac, char
**av) int dev_cnt = 0; int saved_optind; char
fs_uuid[BTRFS_UUID_UNPARSED_SIZE] = { 0 }; -u64 features =
BTRFS_MKFS_DEFAULT_FEATURES; +  u64 features; struct mkfs_allocation
allocation = { 0 }; struct btrfs_mkfs_config mkfs_cfg;

+   features = btrfs_features_allowed_by_kernel(); +features &=
BTRFS_MKFS_DEFAULT_FEATURES; + while(1) { int c; static const
struct option long_options[] = {




- --
Jeff Mahoney
SUSE Labs
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.19 (Darwin)

iQIcBAEBAgAGBQJWKlE/AAoJEB57S2MheeWy2mAP/33RvA9174u0PRmh+RBorZDC
p3nDFFxS7pI5u7rSkFqUvbsKy9AoblUvgMYS8pDNkFokDML2hbH3HaYWFEmqvMch
mp9DQ+wKz5hI5fYt/wgDdtVO6X0E3TCm2Cj1Uw4fl7E0bMzgNgio8tnOoGTrHGa5
YkZ96L9UWzEScv9EtesO3DLbUC+O3pokyHsHCdBRVgEwLcLB1AtmPrQmhc2a1+M4
sfzElmbo9Rld/xmtI4ecHl1sWbpfrYcKimzV32Jdv/SNhEyPuFOcN6/GUDOrGE7o
Vs87+HtuXUr+CbFUM9r9wB1Nqj4yYJ78LnBfepBMjY9vWyAgPR49WFPRA/uhkvu/
uOd4DNgUbLktakztsMb1GRiS/6AEj6s8mHFzkOrS5b9E/RbwegWgcnpnWCveFcDO
Nsfa6Mg99X7ojuXeMi8c00Jins70uSnh/dLOtP5JYkxTAf8v5znbouYYawBZLHAi
P0KbIpQFmW+Qm9is1CDVZktnj79BFMcd+twMFQu/m9jhYdLUFqeEFCJ+sxCGcmoM
n18ayAzbvCQCYz5dBOk2EQPgQoQKJGEOdc4IY0GdRcOwNcbw2hWbwbfGjLAKpLrA
PVC8YmRsyT1CotXBXJEpn7jYFR2fnDOyO/5jq1JRDa6Mxeq3dECIRWof3pwQLnpI
boQXIGHUlVWltF+hla3C
=TG+F
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2 v2] Btrfs: fix regression when running delayed references

2015-10-26 Thread Filipe Manana

On Mon, Oct 26, 2015 at 9:22 AM, Liu Bo  wrote:
> On 10/25/2015 06:04 PM, fdman...@kernel.org wrote:
>>
>> From: Filipe Manana 
>>
>> In the kernel 4.2 merge window we had a refactoring/rework of the delayed
>> references implementation in order to fix certain problems with qgroups.
>> However that rework introduced one more regression that leads to the
>> following trace when running delayed references for metadata:
>>
>> [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832!
>> [35908.065201] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
>> [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic
>> xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache
>> sunrpc loop fuse parport_pc psmouse i2
>> [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW
>> 4.3.0-rc5-btrfs-next-17+ #1
>> [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
>> [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
>> [btrfs]
>> [35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti:
>> 88010c4c8000
>> [35908.065201] RIP: 0010:[]  []
>> insert_inline_extent_backref+0x52/0xb1 [btrfs]
>> [35908.065201] RSP: 0018:88010c4cbb08  EFLAGS: 00010293
>> [35908.065201] RAX:  RBX: 88008a661000 RCX:
>> 
>> [35908.065201] RDX: a04dd58f RSI: 0001 RDI:
>> 
>> [35908.065201] RBP: 88010c4cbb40 R08: 1000 R09:
>> 88010c4cb9f8
>> [35908.065201] R10:  R11: 002c R12:
>> 
>> [35908.065201] R13: 88020a74c578 R14:  R15:
>> 
>> [35908.065201] FS:  () GS:88023edc()
>> knlGS:
>> [35908.065201] CS:  0010 DS:  ES:  CR0: 8005003b
>> [35908.065201] CR2: 015e8708 CR3: 000102185000 CR4:
>> 06e0
>> [35908.065201] Stack:
>> [35908.065201]  88010c4cbb18 0f37 88020a74c578
>> 88015a408000
>> [35908.065201]  880154a44000  0005
>> 88010c4cbbd8
>> [35908.065201]  a0492b9a 0005 
>> 
>> [35908.065201] Call Trace:
>> [35908.065201]  [] __btrfs_inc_extent_ref+0x8b/0x208
>> [btrfs]
>> [35908.065201]  [] ?
>> __btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs]
>> [35908.065201]  [] __btrfs_run_delayed_refs+0xafa/0xd33
>> [btrfs]
>> [35908.065201]  [] ? join_transaction.isra.10+0x25/0x41f
>> [btrfs]
>> [35908.065201]  [] ? join_transaction.isra.10+0xa8/0x41f
>> [btrfs]
>> [35908.065201]  [] btrfs_run_delayed_refs+0x75/0x1dd
>> [btrfs]
>> [35908.065201]  [] delayed_ref_async_start+0x3c/0x7b
>> [btrfs]
>> [35908.065201]  [] normal_work_helper+0x14c/0x32a
>> [btrfs]
>> [35908.065201]  [] btrfs_extent_refs_helper+0x12/0x14
>> [btrfs]
>> [35908.065201]  [] process_one_work+0x24a/0x4ac
>> [35908.065201]  [] worker_thread+0x206/0x2c2
>> [35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
>> [35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
>> [35908.065201]  [] kthread+0xef/0xf7
>> [35908.065201]  [] ? kthread_parkme+0x24/0x24
>> [35908.065201]  [] ret_from_fork+0x3f/0x70
>> [35908.065201]  [] ? kthread_parkme+0x24/0x24
>> [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48
>> 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02
>> <0f> 0b 4c 8b 45 30 8b 4d 28 45 31
>> [35908.065201] RIP  []
>> insert_inline_extent_backref+0x52/0xb1 [btrfs]
>> [35908.065201]  RSP 
>> [35908.310885] ---[ end trace fe4299baf0666457 ]---
>>
>> This happens because the new delayed references code no longer merges
>> delayed references that have different sequence values. The following
>> steps are an example sequence leading to this issue:
>>
>> 1) Transaction N starts, fs_info->tree_mod_seq has value 0;
>>
>> 2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for
>> bytenr A is created, with a value of 1 and a seq value of 0;
>>
>> 3) fs_info->tree_mod_seq is incremented to 1;
>>
>> 4) Extent buffer A is deleted through btrfs_del_items(), which calls
>> btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The
>> later returns the metadata extent associated to extent buffer A to
>> the free space cache (the range is not pinned), because the extent
>> buffer was created in the current transaction (N) and writeback never
>> happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set
>> in the extent buffer).
>> This creates the delayed reference Ref2 for bytenr A, with a value
>> of -1 and a seq value of 1;
>>
>> 5) Delayed reference Ref2 is not merged with Ref1 when we create it,
>> because they have different sequence numbers (decided at
>> add_delayed_ref_tail_merge());
>>
>> 6)

RE: [PATCH] btrfs: Remove code for no-cow in scrub/replace

2015-10-26 Thread Zhao Lei

Hi, Filipe Manana

> -Original Message-
> From: Filipe Manana [mailto:fdman...@gmail.com]
> Sent: Friday, October 23, 2015 11:17 PM
> To: Jeff Mahoney 
> Cc: Zhao Lei ; linux-btrfs@vger.kernel.org
> Subject: Re: [PATCH] btrfs: Remove code for no-cow in scrub/replace
> 
> On Fri, Oct 23, 2015 at 4:11 PM, Jeff Mahoney  wrote:
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA1
> >
> > On 10/23/15 4:03 AM, Zhao Lei wrote:
> >> Since we set source bg to readonly in scrub/replace, we don't need to
> >> consider confliction of no-cow write in scrub/replace operaion.
> >
> > What happens if there's a read failure?  IIRC the initial purpose of
> > this code was to correct read failures during scrub and device
> > replacement by fetching the bad extent from another device if one is
> > available.
> 
> And we don't have xfstests, or any other automated test suite, to test those
> code paths.
> So completely useless to run xfstests in a loop for 5 days, especially the 
> generic
> tests which never trigger scrub runs...
> 
Completely agree you.

Something which was not writen in patch's comment, I also have a custom test,
which include scrub on bad-block device:

[root@ZLLINUX custom_tests]# ls -l
total 36
-rwxr-xr-x 1 root root 15931 Oct  9 21:08 btrfs_maintance.sh
-rwxr-xr-x 1 root root  2000 Apr 30 18:23 btrfs_out_of_space.sh
-rwxr-xr-x 1 root root  5588 May 26 09:47 btrfs_replace_sumerr.sh
-rwxr-xr-x 1 root root  1042 Sep 18 17:05 __loop_custom.sh
-rwxr-xr-x 1 root root   939 Jul 14 18:15 xfstests.sh
[root@ZLLINUX custom_tests]#

# grep '()' btrfs_maintance.sh
...
redundancy_func_test()
scrub_func_test()
replace_func_test()
many_faildisk_test()
[root@ZLLINUX custom_tests]#

And this patch also tested in above script:

# ./__loop_custom.sh ./btrfs_maintance.sh
Loop 0
replace_func_test: raidtype=raid1 mkfs_opt= mount_opt=-o nodatacow max_disk=
mkfs and mount: raid=raid1 dev_cnt=2 mkfs_opt= mount_opt=-o nodatacow:
Writting some data:
  du: 17, writting 1M file
  du: 18, writting 2M file
  du: 20, writting 4M file
  du: 24, writting 8M file
  du: 32, writting 16M file
  reach enough space: 48%
Start replace /dev/vdc -> /dev/vde
  OO:/dev/vdc OO:/dev/vdd #X:/dev/vde
  #O:/dev/vdc OO:/dev/vdd OO:/dev/vde
  Replace start in dmesg found
  Replace finish in dmesg found
  dmesg: OK
  file contents before remount: same
  dmesg: OK
  file contents after remount: same
Check prune /dev/vdc
  #O:/dev/vdc OO:/dev/vdd OO:/dev/vde
  prune dsk /dev/vdc
  #X:/dev/vdc OO:/dev/vdd OO:/dev/vde
  dmesg: OK
  fs contents: same
Check prune /dev/vdd
  #X:/dev/vdc OO:/dev/vdd OO:/dev/vde
  prune dsk /dev/vdd
  #X:/dev/vdc OX:/dev/vdd OO:/dev/vde
  dmesg: OK
  fs contents: same
Start replace /dev/vde -> /dev/vdc
  #X:/dev/vdc OX:/dev/vdd OO:/dev/vde
  OO:/dev/vdc OX:/dev/vdd #O:/dev/vde
  Replace start in dmesg found
  Replace finish in dmesg found
  dmesg: OK
  file contents before remount: same
  

I'll add above script into xfstests when it is complete
and stable.

Thanks
Zhaolei

> 
> >
> > See commit 0ef8e45158f (btrfs scrub: add fixup code for errors on
> > nodatasum files)
> >
> > - -Jeff
> >
> >> This patch removes special code for no-cow mode in scrub/replace,
> >> reduced 670 lines.
> >>
> >> Tested by continuous xfstests in 5 days, include generic and btrfs
> >> groups with 10 mount options include nodatacow.
> >>
> >> Signed-off-by: Zhao Lei  ---
> >> fs/btrfs/ctree.h |   1 - fs/btrfs/scrub.c | 669
> >> --- 2 files
> >> changed, 670 deletions(-)
> >>
> >> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index
> >> 938efe3..3387509 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h
> >> @@ -1688,7 +1688,6 @@ struct btrfs_fs_info { int
> >> scrub_workers_refcnt; struct btrfs_workqueue *scrub_workers; struct
> >> btrfs_workqueue *scrub_wr_completion_workers; -   struct
> >> btrfs_workqueue *scrub_nocow_workers; struct btrfs_workqueue
> >> *scrub_parity_workers;
> >>
> >> #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY diff --git a/fs/btrfs/scrub.c
> >> b/fs/btrfs/scrub.c index d64f557..6027679
> >> 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -205,32
> >> +205,6 @@ struct scrub_ctx { atomic_trefs; };
> >>
> >> -struct scrub_fixup_nodatasum { - struct scrub_ctx*sctx; -
> struct
> >> btrfs_device  *dev; - u64 logical; -  struct
> btrfs_root   *root; -
> >> struct btrfs_work work; - int mirror_num; -}; -
> -struct
> >> scrub_nocow_inode { - u64 inum; - u64
> offset; -   u64 root; -
> >> struct list_head  list; -}; - -struct scrub_copy_nocow_ctx { -
> >> struct scrub_ctx  *sctx; -u64
> logical; -  u64 len; -  int
> >> mirror_num; - u64 physical_for_dev_replace; -
> struct list_head
> >> inodes; -

[PATCH] btrfs: clear PF_NOFREEZE in cleaner_kthread()

2015-10-26 Thread Jiri Kosina

From: Jiri Kosina 

cleaner_kthread() kthread calls try_to_freeze() at the beginning of every 
cleanup attempt. This operation can't ever succeed though, as the kthread 
hasn't marked itself as freezable.

Before (hopefully eventually) kthread freezing gets converted to fileystem 
freezing, we'd rather mark cleaner_kthread() freezable (as my 
understanding is that it can generate filesystem I/O during suspend).

Signed-off-by: Jiri Kosina 
---
 fs/btrfs/disk-io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 295795a..173970d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1759,6 +1759,7 @@ static int cleaner_kthread(void *arg)
int again;
struct btrfs_trans_handle *trans;
 
+   set_freezable();
do {
again = 0;
 
-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5] btrfs: qgroup: Don't copy extent buffer to do qgroup rescan

2015-10-26 Thread Filipe Manana

On Mon, Oct 26, 2015 at 1:19 AM, Qu Wenruo  wrote:
> Ancient qgroup code call memcpy() on a extent buffer and use it for leaf
> iteration.
>
> As extent buffer contains lock, pointers to pages, it's never sane to do
> such copy.
>
> The following bug may be caused by this insane operation:
> [92098.841309] general protection fault:  [#1] SMP
> [92098.841338] Modules linked in: ...
> [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
> 4.3.0-rc1 #1
> [92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
> [btrfs]
> [92098.842261] Call Trace:
> [92098.842277]  [] ? read_extent_buffer+0xb8/0x110
> [btrfs]
> [92098.842304]  [] ? btrfs_find_all_roots+0x60/0x70
> [btrfs]
> [92098.842329]  []
> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
>
> Where btrfs_qgroup_rescan_worker+0x28d is btrfs_disk_key_to_cpu(),
> called in reading key from the copied extent_buffer.
>
> This patch will use btrfs_clone_extent_buffer() to a better copy of
> extent buffer to deal such case.
>
> Reported-by: Stephane Lesimple 
> Suggested-by: Filipe Manana 
> Signed-off-by: Qu Wenruo 
Reviewed-by: Filipe Manana 

thanks Qu

> ---
> v2:
>   Follow the parameter change in previous patch.
> v3:
>   None
> v4:
>   Use btrfs_clone_extent_buffer() other than introducing new facilities
> v5:
>   Change slot = path->slots[0] postion.
> ---
>  fs/btrfs/qgroup.c | 26 --
>  1 file changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 158633c..31d1934 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2192,10 +2192,10 @@ void assert_qgroups_uptodate(struct 
> btrfs_trans_handle *trans)
>   */
>  static int
>  qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path,
> -  struct btrfs_trans_handle *trans,
> -  struct extent_buffer *scratch_leaf)
> +  struct btrfs_trans_handle *trans)
>  {
> struct btrfs_key found;
> +   struct extent_buffer *scratch_leaf = NULL;
> struct ulist *roots = NULL;
> struct seq_list tree_mod_seq_elem = SEQ_LIST_INIT(tree_mod_seq_elem);
> u64 num_bytes;
> @@ -2233,7 +2233,15 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, 
> struct btrfs_path *path,
> fs_info->qgroup_rescan_progress.objectid = found.objectid + 1;
>
> btrfs_get_tree_mod_seq(fs_info, _mod_seq_elem);
> -   memcpy(scratch_leaf, path->nodes[0], sizeof(*scratch_leaf));
> +   scratch_leaf = btrfs_clone_extent_buffer(path->nodes[0]);
> +   if (!scratch_leaf) {
> +   ret = -ENOMEM;
> +   mutex_unlock(_info->qgroup_rescan_lock);
> +   goto out;
> +   }
> +   extent_buffer_get(scratch_leaf);
> +   btrfs_tree_read_lock(scratch_leaf);
> +   btrfs_set_lock_blocking_rw(scratch_leaf, BTRFS_READ_LOCK);
> slot = path->slots[0];
> btrfs_release_path(path);
> mutex_unlock(_info->qgroup_rescan_lock);
> @@ -2259,6 +2267,10 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, 
> struct btrfs_path *path,
> goto out;
> }
>  out:
> +   if (scratch_leaf) {
> +   btrfs_tree_read_unlock_blocking(scratch_leaf);
> +   free_extent_buffer(scratch_leaf);
> +   }
> btrfs_put_tree_mod_seq(fs_info, _mod_seq_elem);
>
> return ret;
> @@ -2270,16 +2282,12 @@ static void btrfs_qgroup_rescan_worker(struct 
> btrfs_work *work)
>  qgroup_rescan_work);
> struct btrfs_path *path;
> struct btrfs_trans_handle *trans = NULL;
> -   struct extent_buffer *scratch_leaf = NULL;
> int err = -ENOMEM;
> int ret = 0;
>
> path = btrfs_alloc_path();
> if (!path)
> goto out;
> -   scratch_leaf = kmalloc(sizeof(*scratch_leaf), GFP_NOFS);
> -   if (!scratch_leaf)
> -   goto out;
>
> err = 0;
> while (!err) {
> @@ -2291,8 +2299,7 @@ static void btrfs_qgroup_rescan_worker(struct 
> btrfs_work *work)
> if (!fs_info->quota_enabled) {
> err = -EINTR;
> } else {
> -   err = qgroup_rescan_leaf(fs_info, path, trans,
> -scratch_leaf);
> +   err = qgroup_rescan_leaf(fs_info, path, trans);
> }
> if (err > 0)
> btrfs_commit_transaction(trans, fs_info->fs_root);
> @@ -2301,7 +2308,6 @@ static void btrfs_qgroup_rescan_worker(struct 
> btrfs_work *work)
> }
>
>  out:
> -   kfree(scratch_leaf);
> btrfs_free_path(path);
>
> mutex_lock(_info->qgroup_rescan_lock);
> --
> 2.6.2
>
> --
> To unsubscribe from this list: send the line

RE: [PATCH] btrfs: Remove code for no-cow in scrub/replace

2015-10-26 Thread Zhao Lei

Hi, Jeff Mahoney

Thanks for review!

> -Original Message-
> From: Jeff Mahoney [mailto:je...@suse.com]
> Sent: Friday, October 23, 2015 11:11 PM
> To: Zhao Lei ; linux-btrfs@vger.kernel.org
> Subject: Re: [PATCH] btrfs: Remove code for no-cow in scrub/replace
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 10/23/15 4:03 AM, Zhao Lei wrote:
> > Since we set source bg to readonly in scrub/replace, we don't need to
> > consider confliction of no-cow write in scrub/replace operaion.
> 
> What happens if there's a read failure?  IIRC the initial purpose of this code
> was to correct read failures during scrub and device replacement by fetching
> the bad extent from another device if one is available.
> 
> See commit 0ef8e45158f (btrfs scrub: add fixup code for errors on nodatasum
> files)
> 
"nodatasum" is used to check "is the data in non-cow state", and the reason
for using inode-writeback is to avoid same-time-writing in the block which is
in scrubing. Comment in newest code scrub.c L1055 can give us the detail.
(Introduced by comment: b5d67f64f)

Since the entire bg was set to readonly in scrub period, there are no same-time
write operation for both cow and non-cow bg, and the bio-based fix
operation can works for all above case.

Thanks
Zhaolei

> - -Jeff
> 
> > This patch removes special code for no-cow mode in scrub/replace,
> > reduced 670 lines.
> >
> > Tested by continuous xfstests in 5 days, include generic and btrfs
> > groups with 10 mount options include nodatacow.
> >
> > Signed-off-by: Zhao Lei  ---
> > fs/btrfs/ctree.h |   1 - fs/btrfs/scrub.c | 669
> > --- 2 files
> > changed, 670 deletions(-)
> >
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index
> > 938efe3..3387509 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h
> > @@ -1688,7 +1688,6 @@ struct btrfs_fs_info { int scrub_workers_refcnt;
> > struct btrfs_workqueue *scrub_workers; struct
> > btrfs_workqueue *scrub_wr_completion_workers; - struct
> > btrfs_workqueue *scrub_nocow_workers; struct btrfs_workqueue
> > *scrub_parity_workers;
> >
> > #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY diff --git a/fs/btrfs/scrub.c
> > b/fs/btrfs/scrub.c index d64f557..6027679
> > 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -205,32
> > +205,6 @@ struct scrub_ctx { atomic_trefs; };
> >
> > -struct scrub_fixup_nodatasum { -   struct scrub_ctx*sctx; -
> > struct
> > btrfs_device*dev; - u64 logical; -  struct 
> > btrfs_root   *root; -
> > struct btrfs_work   work; - int mirror_num; -}; - 
> > -struct
> > scrub_nocow_inode { -   u64 inum; - u64 
> > offset; -   u64
>   root; -
> > struct list_headlist; -}; - -struct scrub_copy_nocow_ctx { -
> > struct scrub_ctx*sctx; -u64 logical; -  
> > u64 len; -
>   int
> > mirror_num; -   u64 physical_for_dev_replace; - 
> > struct
> list_head
> > inodes; -   struct btrfs_work   work; -}; - struct scrub_warning {
> > struct btrfs_path   *path; u64  extent_item_size; @@ 
> > -242,8
> +216,6
> > @@ struct scrub_warning {
> >
> > static void scrub_pending_bio_inc(struct scrub_ctx *sctx); static void
> > scrub_pending_bio_dec(struct scrub_ctx *sctx); -static void
> > scrub_pending_trans_workers_inc(struct scrub_ctx *sctx); -static void
> > scrub_pending_trans_workers_dec(struct scrub_ctx *sctx); static int
> > scrub_handle_errored_block(struct scrub_block *sblock_to_check);
> > static int scrub_setup_recheck_block(struct scrub_block
> > *original_sblock, struct scrub_block *sblocks_for_recheck); @@ -298,13
> > +270,6 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx,
> > static void scrub_wr_submit(struct scrub_ctx *sctx); static void
> > scrub_wr_bio_end_io(struct bio *bio, int err); static void
> > scrub_wr_bio_end_io_worker(struct btrfs_work *work); -static int
> > write_page_nocow(struct scrub_ctx *sctx, -  u64
> > physical_for_dev_replace, struct page *page); -static int
> > copy_nocow_pages_for_inode(u64 inum, u64 offset, u64 root, - struct
> > scrub_copy_nocow_ctx *ctx); -static int copy_nocow_pages(struct
> > scrub_ctx *sctx, u64 logical, u64 len, - int mirror_num, u64
> > physical_for_dev_replace); -static void copy_nocow_pages_worker(struct
> > btrfs_work *work); static void __scrub_blocked_if_needed(struct
> > btrfs_fs_info *fs_info); static void scrub_blocked_if_needed(struct
> > btrfs_fs_info *fs_info); static void scrub_put_ctx(struct scrub_ctx
> > *sctx); @@ -355,60 +320,6 @@ static void
> > scrub_blocked_if_needed(struct btrfs_fs_info *fs_info)
> > scrub_pause_off(fs_info); }
> >
> > -/* - * used for workers that require transaction commits (i.e., for
> > the - * NOCOW case) -

[PATCH v2 0/5] btrfs-progs: Add all missing close_ctree and btrfs_close_all_devices

2015-10-26 Thread Zhao Lei

This patch add all missing close_ctree and btrfs_close_all_devices
to several tools in btrfs progs, to avoid memory leak.

Changelog v1->v2:
 Move btrfs_close_all_devices() from cmd-XXX into btrfs.c to make
 code simple, and avoid similar problem in cmd-XXX in future.

Zhao Lei (5):
  btrfs-progs: btrfs: Add missing btrfs_close_all_devices for btrfs
command
  btrfs-progs: Remove all btrfs_close_all_devices in sub-command
  btrfs-progs: Add all missing btrfs_close_all_devices to standalone
tools
  btrfs-progs: Add missing close_ctree to btrfs-select-super.c
  btrfs-progs: use system's default path for math.h

 btrfs-calc-size.c| 1 +
 btrfs-debug-tree.c   | 5 -
 btrfs-find-root.c| 1 +
 btrfs-map-logical.c  | 1 +
 btrfs-select-super.c | 3 +++
 btrfs.c  | 9 -
 btrfstune.c  | 1 +
 cmds-check.c | 1 -
 cmds-device.c| 3 ---
 cmds-replace.c   | 2 --
 extent-tree.c| 2 +-
 11 files changed, 20 insertions(+), 9 deletions(-)

-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/5] btrfs-progs: btrfs: Add missing btrfs_close_all_devices for btrfs command

2015-10-26 Thread Zhao Lei

Adding a btrfs_close_all_devices() after command callback in btrfs.c
can force-close all opened device before program exit, to avoid memory leak
in all btrfs sub-command.

Suggested-by: David Sterba 
Signed-off-by: Zhao Lei 
---
 btrfs.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/btrfs.c b/btrfs.c
index 63df377..9416a29 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 
+#include "volumes.h"
 #include "crc32c.h"
 #include "commands.h"
 #include "utils.h"
@@ -214,6 +215,7 @@ int main(int argc, char **argv)
 {
const struct cmd_struct *cmd;
const char *bname;
+   int ret;
 
if ((bname = strrchr(argv[0], '/')) != NULL)
bname++;
@@ -242,5 +244,10 @@ int main(int argc, char **argv)
crc32c_optimization_init();
 
fixup_argv0(argv, cmd->token);
-   exit(cmd->fn(argc, argv));
+
+   ret = cmd->fn(argc, argv);
+
+   btrfs_close_all_devices();
+
+   exit(ret);
 }
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 5/5] btrfs-progs: use system's default path for math.h

2015-10-26 Thread Zhao Lei

Line of
 #include "math.h"
in extent-tree.c using quotas is history reason,
(we have cuseom math.h in source before)

Now it is better to use "<>" instead of quotas for this header file.

Signed-off-by: Zhao Lei 
---
 extent-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/extent-tree.c b/extent-tree.c
index 0c8152a..0d605e1 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "kerncompat.h"
 #include "radix-tree.h"
 #include "ctree.h"
@@ -28,7 +29,6 @@
 #include "crc32c.h"
 #include "volumes.h"
 #include "free-space-cache.h"
-#include "math.h"
 #include "utils.h"
 
 #define PENDING_EXTENT_INSERT 0
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 4/5] btrfs-progs: Add missing close_ctree to btrfs-select-super.c

2015-10-26 Thread Zhao Lei

Add missing close_ctree() to btrfs-select-super.c to avoid memory leak.

Signed-off-by: Zhao Lei 
---
 btrfs-select-super.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/btrfs-select-super.c b/btrfs-select-super.c
index bd44978..df74153 100644
--- a/btrfs-select-super.c
+++ b/btrfs-select-super.c
@@ -102,6 +102,7 @@ int main(int ac, char **av)
 */
printf("using SB copy %llu, bytenr %llu\n", (unsigned long long)num,
   (unsigned long long)bytenr);
+   close_ctree(root);
btrfs_close_all_devices();
return ret;
 }
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/5] btrfs-progs: Remove all btrfs_close_all_devices in sub-command

2015-10-26 Thread Zhao Lei

Since we have btrfs_close_all_devices() in btrfs's main entrance,
it is not necessary to call btrfs_close_all_devices() separately
in each sub-command.

Signed-off-by: Zhao Lei 
---
 cmds-check.c   | 1 -
 cmds-device.c  | 3 ---
 cmds-replace.c | 2 --
 3 files changed, 6 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index 1f8caad..3af6e61 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -9738,7 +9738,6 @@ out:
free_root_recs_tree(_cache);
 close_out:
close_ctree(root);
-   btrfs_close_all_devices();
 err_out:
if (ctx.progress_enabled)
task_deinit(ctx.info);
diff --git a/cmds-device.c b/cmds-device.c
index 5f2b952..620ae8b 100644
--- a/cmds-device.c
+++ b/cmds-device.c
@@ -139,7 +139,6 @@ static int cmd_device_add(int argc, char **argv)
 
 error_out:
close_file_or_dir(fdmnt, dirstream);
-   btrfs_close_all_devices();
return !!ret;
 }
 
@@ -288,7 +287,6 @@ static int cmd_device_scan(int argc, char **argv)
}
 
 out:
-   btrfs_close_all_devices();
return !!ret;
 }
 
@@ -466,7 +464,6 @@ static int cmd_device_stats(int argc, char **argv)
 out:
free(di_args);
close_file_or_dir(fdmnt, dirstream);
-   btrfs_close_all_devices();
 
return err;
 }
diff --git a/cmds-replace.c b/cmds-replace.c
index 9ab8438..fadd2cd 100644
--- a/cmds-replace.c
+++ b/cmds-replace.c
@@ -330,7 +330,6 @@ static int cmd_replace_start(int argc, char **argv)
}
}
close_file_or_dir(fdmnt, dirstream);
-   btrfs_close_all_devices();
return 0;
 
 leave_with_error:
@@ -340,7 +339,6 @@ leave_with_error:
close(fdmnt);
if (fddstdev != -1)
close(fddstdev);
-   btrfs_close_all_devices();
return 1;
 }
 
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 3/5] btrfs-progs: Add all missing btrfs_close_all_devices to standalone tools

2015-10-26 Thread Zhao Lei

This patch add all missing btrfs_close_all_devices() to standalone
tools in btrfs progs, to avoid memory leak.

Signed-off-by: Zhao Lei 
---
 btrfs-calc-size.c| 1 +
 btrfs-debug-tree.c   | 5 -
 btrfs-find-root.c| 1 +
 btrfs-map-logical.c  | 1 +
 btrfs-select-super.c | 2 ++
 btrfstune.c  | 1 +
 6 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/btrfs-calc-size.c b/btrfs-calc-size.c
index 7287858..b756693 100644
--- a/btrfs-calc-size.c
+++ b/btrfs-calc-size.c
@@ -508,5 +508,6 @@ int main(int argc, char **argv)
 out:
close_ctree(root);
free(roots);
+   btrfs_close_all_devices();
return ret;
 }
diff --git a/btrfs-debug-tree.c b/btrfs-debug-tree.c
index 7d8e876..8adc39f 100644
--- a/btrfs-debug-tree.c
+++ b/btrfs-debug-tree.c
@@ -28,6 +28,7 @@
 #include "disk-io.h"
 #include "print-tree.h"
 #include "transaction.h"
+#include "volumes.h"
 #include "utils.h"
 
 static int print_usage(int ret)
@@ -428,5 +429,7 @@ no_node:
printf("uuid %s\n", uuidbuf);
printf("%s\n", PACKAGE_STRING);
 close_root:
-   return close_ctree(root);
+   ret = close_ctree(root);
+   btrfs_close_all_devices();
+   return ret;
 }
diff --git a/btrfs-find-root.c b/btrfs-find-root.c
index 01b3603..fc3812c 100644
--- a/btrfs-find-root.c
+++ b/btrfs-find-root.c
@@ -216,5 +216,6 @@ int main(int argc, char **argv)
 out:
btrfs_find_root_free();
close_ctree(root);
+   btrfs_close_all_devices();
return ret;
 }
diff --git a/btrfs-map-logical.c b/btrfs-map-logical.c
index d9fa6b2..0161b5c 100644
--- a/btrfs-map-logical.c
+++ b/btrfs-map-logical.c
@@ -359,5 +359,6 @@ close:
close_ctree(root);
if (ret < 0)
ret = 1;
+   btrfs_close_all_devices();
return ret;
 }
diff --git a/btrfs-select-super.c b/btrfs-select-super.c
index b790f3e..bd44978 100644
--- a/btrfs-select-super.c
+++ b/btrfs-select-super.c
@@ -23,6 +23,7 @@
 #include 
 #include "kerncompat.h"
 #include "ctree.h"
+#include "volumes.h"
 #include "disk-io.h"
 #include "print-tree.h"
 #include "transaction.h"
@@ -101,5 +102,6 @@ int main(int ac, char **av)
 */
printf("using SB copy %llu, bytenr %llu\n", (unsigned long long)num,
   (unsigned long long)bytenr);
+   btrfs_close_all_devices();
return ret;
 }
diff --git a/btrfstune.c b/btrfstune.c
index c248ee6..0907aa9 100644
--- a/btrfstune.c
+++ b/btrfstune.c
@@ -548,6 +548,7 @@ int main(int argc, char *argv[])
}
 out:
close_ctree(root);
+   btrfs_close_all_devices();
 
return ret;
 }
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2 v2] Btrfs: fix regression when running delayed references

2015-10-26 Thread Liu Bo


On 10/25/2015 06:04 PM, fdman...@kernel.org wrote:

From: Filipe Manana 

In the kernel 4.2 merge window we had a refactoring/rework of the delayed
references implementation in order to fix certain problems with qgroups.
However that rework introduced one more regression that leads to the
following trace when running delayed references for metadata:

[35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832!
[35908.065201] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
[35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor 
raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc 
loop fuse parport_pc psmouse i2
[35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW   
4.3.0-rc5-btrfs-next-17+ #1
[35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: 
88010c4c8000
[35908.065201] RIP: 0010:[]  [] 
insert_inline_extent_backref+0x52/0xb1 [btrfs]
[35908.065201] RSP: 0018:88010c4cbb08  EFLAGS: 00010293
[35908.065201] RAX:  RBX: 88008a661000 RCX: 
[35908.065201] RDX: a04dd58f RSI: 0001 RDI: 
[35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: 88010c4cb9f8
[35908.065201] R10:  R11: 002c R12: 
[35908.065201] R13: 88020a74c578 R14:  R15: 
[35908.065201] FS:  () GS:88023edc() 
knlGS:
[35908.065201] CS:  0010 DS:  ES:  CR0: 8005003b
[35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: 06e0
[35908.065201] Stack:
[35908.065201]  88010c4cbb18 0f37 88020a74c578 
88015a408000
[35908.065201]  880154a44000  0005 
88010c4cbbd8
[35908.065201]  a0492b9a 0005  

[35908.065201] Call Trace:
[35908.065201]  [] __btrfs_inc_extent_ref+0x8b/0x208 [btrfs]
[35908.065201]  [] ? __btrfs_run_delayed_refs+0x4d4/0xd33 
[btrfs]
[35908.065201]  [] __btrfs_run_delayed_refs+0xafa/0xd33 
[btrfs]
[35908.065201]  [] ? join_transaction.isra.10+0x25/0x41f 
[btrfs]
[35908.065201]  [] ? join_transaction.isra.10+0xa8/0x41f 
[btrfs]
[35908.065201]  [] btrfs_run_delayed_refs+0x75/0x1dd [btrfs]
[35908.065201]  [] delayed_ref_async_start+0x3c/0x7b [btrfs]
[35908.065201]  [] normal_work_helper+0x14c/0x32a [btrfs]
[35908.065201]  [] btrfs_extent_refs_helper+0x12/0x14 [btrfs]
[35908.065201]  [] process_one_work+0x24a/0x4ac
[35908.065201]  [] worker_thread+0x206/0x2c2
[35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
[35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
[35908.065201]  [] kthread+0xef/0xf7
[35908.065201]  [] ? kthread_parkme+0x24/0x24
[35908.065201]  [] ret_from_fork+0x3f/0x70
[35908.065201]  [] ? kthread_parkme+0x24/0x24
[35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 8d 4d d0 
e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 <0f> 0b 4c 8b 
45 30 8b 4d 28 45 31
[35908.065201] RIP  [] insert_inline_extent_backref+0x52/0xb1 
[btrfs]
[35908.065201]  RSP 
[35908.310885] ---[ end trace fe4299baf0666457 ]---

This happens because the new delayed references code no longer merges
delayed references that have different sequence values. The following
steps are an example sequence leading to this issue:

1) Transaction N starts, fs_info->tree_mod_seq has value 0;

2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for
bytenr A is created, with a value of 1 and a seq value of 0;

3) fs_info->tree_mod_seq is incremented to 1;

4) Extent buffer A is deleted through btrfs_del_items(), which calls
btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The
later returns the metadata extent associated to extent buffer A to
the free space cache (the range is not pinned), because the extent
buffer was created in the current transaction (N) and writeback never
happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set
in the extent buffer).
This creates the delayed reference Ref2 for bytenr A, with a value
of -1 and a seq value of 1;

5) Delayed reference Ref2 is not merged with Ref1 when we create it,
because they have different sequence numbers (decided at
add_delayed_ref_tail_merge());

6) fs_info->tree_mod_seq is incremented to 2;

7) Some task attempts to allocate a new extent buffer (done at
extent-tree.c:find_free_extent()), but due to heavy fragmentation
and running low on metadata space the clustered allocation fails
and we fall back to unclustered allocation, which finds the
extent at offset A, so a new extent buffer at offset A

[GIT PULL] Btrfs fixes for delayed refs regression and a deadlock

2015-10-26 Thread fdmanana

From: Filipe Manana 

Hi Chris,

please consider the following fixes for the 4.4 merge window (they were
previously sent to the mailing list already). They fix an issue with
delayed references that makes us hit some BUG_ONs as of the 4.2 kernel
release.

A lot of people have been hitting this and reported it in the mailing
list and bugzilla. For at least some of them this has been making it
impossible to run a balance on a 4.2+ kernel, such as Stéphane's case
on his multi terabyte filesystem.

I've tagged both for stable and included review tags that people gave
through the mailing list.

A very special thanks to Stéphane Lesimple for volunteering not only
to test these fixes (balance took over 1 day to complete on his fs!)
but also debug patches to help me figure out what was leading to the
crashes. Not only balance finishes successfully for him now, but fsck
also does not report any inconsistencies and his filesystem seems
healthy (his files, snapshots, etc, seem all ok).

As a bonus, the second patch also ends up fixing a deadlock in the clone
ioctl when qgroups are enabled (reported by Elias Probst in the mailing
list).

Thanks.

The following changes since commit a9e6d153563d2ed69c6cd7fb4fa5ce4ca7c712eb:

  Merge branch 'allocator-fixes' into for-linus-4.4 (2015-10-21 19:00:38 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git 
delayed-refs-balance-fix-4.4

for you to fetch changes up to b06c4bf5c874a57254b197f53ddf588e7a24a2bf:

  Btrfs: fix regression running delayed references when using qgroups 
(2015-10-25 19:53:26 +)


Filipe Manana (2):
  Btrfs: fix regression when running delayed references
  Btrfs: fix regression running delayed references when using qgroups

 fs/btrfs/ctree.h   |   4 ++--
 fs/btrfs/delayed-ref.c | 139 
---
 fs/btrfs/delayed-ref.h |   7 ++-
 fs/btrfs/extent-tree.c |  59 
++-
 fs/btrfs/file.c|  10 +-
 fs/btrfs/inode.c   |   4 ++--
 fs/btrfs/ioctl.c   |  62 
+-
 fs/btrfs/relocation.c  |  16 +++-
 fs/btrfs/tree-log.c|   2 +-
 9 files changed, 170 insertions(+), 133 deletions(-)

-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

random i/o error without error in dmesg

2015-10-26 Thread Szalma László


Hi,

I have this error for a time, It's not easy to reproduce, i write 
everything i know at the moment.


I maintain some servers running xen (4.5.1) and gentoo dom0 with recent 
kernels (3.18.*, 4.1.6, 4.2.3, 4.2.4). I use gentoo-sources patchset.

Running xen domu s, for www and mysql.
I have mysql servers in domu with high load (lots of read write). These 
systems are identical in term of configuration and kernel.


Sometimes I got mysql errors randomly (sometimes more than one at a day, 
sometimes one at a week), but it is more frequent on high load.


The mysql errors are because the file cannot be read from the 
filesystem. If i try to run md5sum on it it shows io error.


At this point mysql stop && umount && mount && mysql start solves the 
problem.


calling
echo 3 > /proc/sys/vm/drop_caches
sometimes solves the io error, but not every time. The problem rarely 
randomly fixed without remount.


The problem seems to have no connection to the dom0 kernel and the xen 
version. I have this problem for example on these dom0 -s:


kernel: 3.19.3  xen 4.5.0
kernel: 4.2.3 xen 4.5.1

The problem seems to have started with the kernel 4.0 series, but I am 
not sure. In the summer the load was low, and the problem occured very 
rarely.


In this case of io error:
btrfs scrub finds no error.
no memory or hdd/ssd hardware error (smart, memtest, etc) (not only one 
physical server is affected) and no errors in dmesg at all.
tried different kernel configs, but I don't think I have anything 
extraordinary.

I use deadline scheduler.
I use these mount options:

/dev/xvdb1 on /mnt/mysql_naplo_b2 type btrfs 
(rw,noatime,compress=zlib,nossd,noacl,space_cache,subvolid=5,subvol=/)


I tried to reformat the filesystem with recent btrfs-progs: (and olders 
before)

btrfs-progs v4.2.2
I use default mkfs options (skinny extents)
After format the problem was disappeared for some days. (it seems 
correlation with the age of the filesystem?)
I do manual defragment on the filesystem with a script simply 
recursively check "filefrag" for count the fragmentation and defrag if 
it is more than 50 and the file is larger than 64kbyte. (this sometimes 
lowers the frequency of the problem)

The files unreadable are usually small files, for example:

filefrag:
/mnt/mysql_naplo_b2/mozanaplo_boly_altisk_2015/n_helyettesites.MYD: 2 
extents found

ls -l:
-rw-rw 1 mysql mysql 8092 okt   22 08.24 
/mnt/mysql_naplo_b2/mozanaplo_boly_altisk_2015/n_helyettesites.MYD


There is no error in dmesg, no io errors, no kernel panic, etc at all.

The (virtual) servers has 3-4GB of memory, and I use a 2GB tmpfs for the 
temporary tables (this way the physical memory usage is somewhat hectic).


The filesystem has no snapshots, but sometimes (for rebuilding 
replication) I take on, and delete it. (but the problem happens on 
filesystems with no snapshot created ever)


I did not try downgrading the kernel (for 3.18), but I always try to 
upgrade.


I guess this problem has some connection to the memory usage (but there 
is no out of memory).


I am able to try any debug mode if you suggest one, but it's not 
reproducable, it happens randomly. I think there should be some errors 
in the dmesg if I encounter io errors, but I am not sure if this error 
has direct connection for btrfs at all. I didn't try other filesystems. 
The problem was occured with kernel versions: 4.0.1, 4.0.4, 4.1.6, 
4.2.1, 4.2.3, 4.2.4.


I checked the bugzilla, and google for similar problem, but I couldn't 
find any similar.


This problem sometimes (i think it is the same) happen on a www server 
too, with apache log files (they are fragmented heavily), but very 
rarely. I don't have any problem with this configuration on other 
servers even mysql servers with lower load.


I welcome any suggestion:

László Szalma
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] btrfs: Cleanup no_quota parameter

2015-10-26 Thread Qu Wenruo

No_quota parameter for delayed_ref related function are meaningless
after 4.2-rc1, as any new delayed_ref_head will cause qgroup to scan
extent for its rfer/excl change without checking no_quota flag.

So this patch will clean them up.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/ctree.h   |  4 ++--
 fs/btrfs/delayed-ref.c | 26 ++
 fs/btrfs/delayed-ref.h |  7 ++
 fs/btrfs/extent-tree.c | 45 ++---
 fs/btrfs/file.c| 10 -
 fs/btrfs/inode.c   |  4 ++--
 fs/btrfs/ioctl.c   | 60 +-
 fs/btrfs/relocation.c  | 16 ++
 fs/btrfs/tree-log.c|  2 +-
 9 files changed, 43 insertions(+), 131 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index bc3c711..3fa3c3b 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3422,7 +3422,7 @@ int btrfs_set_disk_extent_flags(struct btrfs_trans_handle 
*trans,
 int btrfs_free_extent(struct btrfs_trans_handle *trans,
  struct btrfs_root *root,
  u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid,
- u64 owner, u64 offset, int no_quota);
+ u64 owner, u64 offset);
 
 int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len,
   int delalloc);
@@ -3435,7 +3435,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle 
*trans,
 int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 struct btrfs_root *root,
 u64 bytenr, u64 num_bytes, u64 parent,
-u64 root_objectid, u64 owner, u64 offset, int 
no_quota);
+u64 root_objectid, u64 owner, u64 offset);
 
 int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans,
   struct btrfs_root *root);
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index bd9b63b..449974f 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -292,8 +292,7 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle *trans,
exist = list_entry(href->ref_list.prev, struct btrfs_delayed_ref_node,
   list);
/* No need to compare bytenr nor is_head */
-   if (exist->type != ref->type || exist->no_quota != ref->no_quota ||
-   exist->seq != ref->seq)
+   if (exist->type != ref->type || exist->seq != ref->seq)
goto add_tail;
 
if ((exist->type == BTRFS_TREE_BLOCK_REF_KEY ||
@@ -526,7 +525,7 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
 struct btrfs_delayed_ref_head *head_ref,
 struct btrfs_delayed_ref_node *ref, u64 bytenr,
 u64 num_bytes, u64 parent, u64 ref_root, int level,
-int action, int no_quota)
+int action)
 {
struct btrfs_delayed_tree_ref *full_ref;
struct btrfs_delayed_ref_root *delayed_refs;
@@ -548,7 +547,6 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
ref->action = action;
ref->is_head = 0;
ref->in_tree = 1;
-   ref->no_quota = no_quota;
ref->seq = seq;
 
full_ref = btrfs_delayed_node_to_tree_ref(ref);
@@ -581,7 +579,7 @@ add_delayed_data_ref(struct btrfs_fs_info *fs_info,
 struct btrfs_delayed_ref_head *head_ref,
 struct btrfs_delayed_ref_node *ref, u64 bytenr,
 u64 num_bytes, u64 parent, u64 ref_root, u64 owner,
-u64 offset, int action, int no_quota)
+u64 offset, int action)
 {
struct btrfs_delayed_data_ref *full_ref;
struct btrfs_delayed_ref_root *delayed_refs;
@@ -604,7 +602,6 @@ add_delayed_data_ref(struct btrfs_fs_info *fs_info,
ref->action = action;
ref->is_head = 0;
ref->in_tree = 1;
-   ref->no_quota = no_quota;
ref->seq = seq;
 
full_ref = btrfs_delayed_node_to_data_ref(ref);
@@ -635,17 +632,13 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info 
*fs_info,
   struct btrfs_trans_handle *trans,
   u64 bytenr, u64 num_bytes, u64 parent,
   u64 ref_root,  int level, int action,
-  struct btrfs_delayed_extent_op *extent_op,
-  int no_quota)
+  struct btrfs_delayed_extent_op *extent_op)
 {
struct btrfs_delayed_tree_ref *ref;
struct btrfs_delayed_ref_head *head_ref;
struct btrfs_delayed_ref_root *delayed_refs;
struct btrfs_qgroup_extent_record *record = NULL;
 
-   if (!is_fstree(ref_root) || !fs_info->quota_enabled)
-   no_quota = 0;
-
BUG_ON(extent_op && extent_op->is_data);
ref = kmem_cache_alloc(btrfs_delayed_tree_ref_cachep, GFP_NOFS);

[PATCH 2/3] btrfs: qgroup: Fix a race in delayed_ref which leads to abort trans

2015-10-26 Thread Qu Wenruo

Between btrfs_allocerved_file_extent() and
btrfs_add_delayed_qgroup_reserve(), there is a window that delayed_refs
are run and delayed ref head maybe freed before
btrfs_add_delayed_qgroup_reserve().

This will cause btrfs_dad_delayed_qgroup_reserve() to return -ENOENT,
and cause transaction to be aborted.

This patch will record qgroup reserve space info into delayed_ref_head
at btrfs_add_delayed_ref(), to eliminate the race window.

Reported-by: Filipe Manana 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/ctree.h   |  3 ++-
 fs/btrfs/delayed-ref.c | 22 +-
 fs/btrfs/delayed-ref.h |  2 +-
 fs/btrfs/extent-tree.c | 14 --
 fs/btrfs/inode.c   | 12 
 5 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 3fa3c3b..a8c9a27 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3403,7 +3403,8 @@ void btrfs_free_tree_block(struct btrfs_trans_handle 
*trans,
 int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
 struct btrfs_root *root,
 u64 root_objectid, u64 owner,
-u64 offset, struct btrfs_key *ins);
+u64 offset, u64 ram_bytes,
+struct btrfs_key *ins);
 int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
   u64 root_objectid, u64 owner, u64 offset,
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 449974f..8d65427 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -422,7 +422,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
 struct btrfs_trans_handle *trans,
 struct btrfs_delayed_ref_node *ref,
 struct btrfs_qgroup_extent_record *qrecord,
-u64 bytenr, u64 num_bytes, int action, int is_data)
+u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved,
+int action, int is_data)
 {
struct btrfs_delayed_ref_head *existing;
struct btrfs_delayed_ref_head *head_ref = NULL;
@@ -431,6 +432,9 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
int count_mod = 1;
int must_insert_reserved = 0;
 
+   /* If reserved is provided, it must be a data extent. */
+   BUG_ON(!is_data && reserved);
+
/*
 * the head node stores the sum of all the mods, so dropping a ref
 * should drop the sum in the head node by one.
@@ -480,6 +484,11 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
 
/* Record qgroup extent info if provided */
if (qrecord) {
+   if (ref_root && reserved) {
+   head_ref->qgroup_ref_root = ref_root;
+   head_ref->qgroup_reserved = reserved;
+   }
+
qrecord->bytenr = bytenr;
qrecord->num_bytes = num_bytes;
qrecord->old_roots = NULL;
@@ -498,6 +507,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
existing = htree_insert(_refs->href_root,
_ref->href_node);
if (existing) {
+   WARN_ON(ref_root && reserved && existing->qgroup_ref_root
+   && existing->qgroup_reserved);
update_existing_head_ref(delayed_refs, >node, ref);
/*
 * we've updated the existing ref, free the newly
@@ -664,7 +675,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info 
*fs_info,
 * the spin lock
 */
head_ref = add_delayed_ref_head(fs_info, trans, _ref->node, record,
-   bytenr, num_bytes, action, 0);
+   bytenr, num_bytes, 0, 0, action, 0);
 
add_delayed_tree_ref(fs_info, trans, head_ref, >node, bytenr,
 num_bytes, parent, ref_root, level, action);
@@ -687,7 +698,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
*fs_info,
   struct btrfs_trans_handle *trans,
   u64 bytenr, u64 num_bytes,
   u64 parent, u64 ref_root,
-  u64 owner, u64 offset, int action,
+  u64 owner, u64 offset, u64 reserved, int action,
   struct btrfs_delayed_extent_op *extent_op)
 {
struct btrfs_delayed_data_ref *ref;
@@ -726,7 +737,8 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
*fs_info,
 * the spin lock
 */
head_ref = add_delayed_ref_head(fs_info, trans, _ref->node, record,
-   bytenr, num_bytes, action, 1);
+   bytenr, num_bytes,

[4.4][PATCH 0/3] btrfs: Qgroup hotfix

2015-10-26 Thread Qu Wenruo

This patchset fixes 2 bugs:
1. Race condition leading to abort transaction
Reported by Filipe, fixed by 2nd patch.

2. Qgroup low level double free leading to EDQUOT
In fact, I hit such bug several times during internal rebase, but I'm so
stupid to forgot to include it in v3 patchset.
Fixed in 3rd patch.

Qu Wenruo (3):
  btrfs: Cleanup no_quota parameter
  btrfs: qgroup: Fix a race in delayed_ref which leads to abort trans
  btrfs: qgroup: Fix a rebase bug which will cause qgroup double free

 fs/btrfs/ctree.h   |  7 +++---
 fs/btrfs/delayed-ref.c | 48 
 fs/btrfs/delayed-ref.h |  9 +++-
 fs/btrfs/extent-tree.c | 55 ++---
 fs/btrfs/file.c| 10 -
 fs/btrfs/inode.c   | 16 +-
 fs/btrfs/ioctl.c   | 60 +-
 fs/btrfs/qgroup.c  |  4 
 fs/btrfs/relocation.c  | 16 ++
 fs/btrfs/tree-log.c|  2 +-
 10 files changed, 73 insertions(+), 154 deletions(-)

-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] btrfs: qgroup: Fix a rebase bug which will cause qgroup double free

2015-10-26 Thread Qu Wenruo

When rebasing my patchset, I forgot to pick up a cleanup patch to remove
old hotfix in 4.2 release.

Witouth the cleanup, it will screw up new qgroup reserve framework and
always cause minus reserved number.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/qgroup.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 158633c..7664a63 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1652,10 +1652,6 @@ static int qgroup_update_counters(struct btrfs_fs_info 
*fs_info,
}
}
 
-   /* For exclusive extent, free its reserved bytes too */
-   if (nr_old_roots == 0 && nr_new_roots == 1 &&
-   cur_new_count == nr_new_roots)
-   qg->reserved -= num_bytes;
if (dirty)
qgroup_dirty(fs_info, qg);
}
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2 v3] Btrfs: fix regression running delayed references when using qgroups

2015-10-26 Thread Qu Wenruo




 wrote on 2015/10/25 18:51 +:

From: Filipe Manana 

In the kernel 4.2 merge window we had a big changes to the implementation
of delayed references and qgroups which made the no_quota field of delayed
references not used anymore. More specifically the no_quota field is not
used anymore as of:

   commit 0ed4792af0e8 ("btrfs: qgroup: Switch to new extent-oriented qgroup 
mechanism.")

Leaving the no_quota field actually prevents delayed references from
getting merged, which in turn cause the following BUG_ON(), at
fs/btrfs/extent-tree.c, to be hit when qgroups are enabled:

   static int run_delayed_tree_ref(...)
   {
  (...)
  BUG_ON(node->ref_mod != 1);
  (...)
   }

This happens on a scenario like the following:

   1) Ref1 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added.

   2) Ref2 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added.
  It's not merged with Ref1 because Ref1->no_quota != Ref2->no_quota.

   3) Ref3 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added.
  It's not merged with the reference at the tail of the list of refs
  for bytenr X because the reference at the tail, Ref2 is incompatible
  due to Ref2->no_quota != Ref3->no_quota.

   4) Ref4 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added.
  It's not merged with the reference at the tail of the list of refs
  for bytenr X because the reference at the tail, Ref3 is incompatible
  due to Ref3->no_quota != Ref4->no_quota.

   5) We run delayed references, trigger merging of delayed references,
  through __btrfs_run_delayed_refs() -> btrfs_merge_delayed_refs().

   6) Ref1 and Ref3 are merged as Ref1->no_quota = Ref3->no_quota and
  all other conditions are satisfied too. So Ref1 gets a ref_mod
  value of 2.

   7) Ref2 and Ref4 are merged as Ref2->no_quota = Ref4->no_quota and
  all other conditions are satisfied too. So Ref2 gets a ref_mod
  value of 2.

   8) Ref1 and Ref2 aren't merged, because they have different values
  for their no_quota field.

   9) Delayed reference Ref1 is picked for running (select_delayed_ref()
  always prefers references with an action == BTRFS_ADD_DELAYED_REF).
  So run_delayed_tree_ref() is called for Ref1 which triggers the
  BUG_ON because Ref1->red_mod != 1 (equals 2).

So fix this by removing the no_quota field, as it's not used anymore as
of commit 0ed4792af0e8 ("btrfs: qgroup: Switch to new extent-oriented
qgroup mechanism.").

The use of no_quota was also buggy in at least two places:

1) At delayed-refs.c:btrfs_add_delayed_tree_ref() - we were setting
no_quota to 0 instead of 1 when the following condition was true:
is_fstree(ref_root) || !fs_info->quota_enabled

2) At extent-tree.c:__btrfs_inc_extent_ref() - we were attempting to
reset a node's no_quota when the condition "!is_fstree(root_objectid)
|| !root->fs_info->quota_enabled" was true but we did it only in
an unused local stack variable, that is, we never reset the no_quota
value in the node itself.

This fixes the remainder of problems several people have been having when
running delayed references, mostly while a balance is running in parallel,
on a 4.2+ kernel.

Very special thanks to Stéphane Lesimple for helping debugging this issue
and testing this fix on his multi terabyte filesystem (which took more
than one day to balance alone, plus fsck, etc).

Also, this fixes deadlock issue when using the clone ioctl with qgroups
enabled, as reported by Elias Probst in the mailing list. The deadlock
happens because after calling btrfs_insert_empty_item we have our path
holding a write lock on a leaf of the fs/subvol tree and then before
releasing the path we called check_ref() which did backref walking, when
qgroups are enabled, and tried to read lock the same leaf. The trace for
this case is the following:

   INFO: task systemd-nspawn:6095 blocked for more than 120 seconds.
   (...)
   Call Trace:
 [] schedule+0x74/0x83
 [] btrfs_tree_read_lock+0xc0/0xea
 [] ? wait_woken+0x74/0x74
 [] btrfs_search_old_slot+0x51a/0x810
 [] btrfs_next_old_leaf+0xdf/0x3ce
 [] ? ulist_add_merge+0x1b/0x127
 [] __resolve_indirect_refs+0x62a/0x667
 [] ? btrfs_clear_lock_blocking_rw+0x78/0xbe
 [] find_parent_nodes+0xaf3/0xfc6
 [] __btrfs_find_all_roots+0x92/0xf0
 [] btrfs_find_all_roots+0x45/0x65
 [] ? btrfs_get_tree_mod_seq+0x2b/0x88
 [] check_ref+0x64/0xc4
 [] btrfs_clone+0x66e/0xb5d
 [] btrfs_ioctl_clone+0x48f/0x5bb
 [] ? native_sched_clock+0x28/0x77
 [] btrfs_ioctl+0xabc/0x25cb
   (...)

Reported-by: Stéphane Lesimple 
Tested-by: Stéphane Lesimple 
Reported-by: Elias Probst 
Reported-by: Peter Becker 
Reported-by: Malte Schröder 
Reported-by: Derek Dongray 
Reported-by: Erkki Seppala

Re: corrupted RAID1: unsuccessful recovery / help needed

2015-10-26 Thread Duncan

Lukas Pirl posted on Mon, 26 Oct 2015 19:19:50 +1300 as excerpted:

> TL;DR: RAID1 does not recover, I guess the interesting part in the stack
> trace is: [elided, I'm not a dev so it's little help to me]
> 
> I'd appreciate some help for repairing a corrupted RAID1.
> 
> Setup:
> * Linux 4.2.0-12, Btrfs v3.17, 
> `btrfs fi show`:
>uuid: 5be372f5-5492-4f4b-b641-c14f4ad8ae23
>Total devices 6 FS bytes used 2.87TiB
>devid 1 size 931.51GiB used 636.00GiB path /dev/mapper/[...]
>devid 2 size 931.51GiB used 634.03GiB path /dev/mapper/
>devid 3 size   1.82TiB used   1.53TiB path /dev/mapper/
>devid 4 size   1.82TiB used   1.53TiB path /dev/mapper/
>devid 6 size   1.82TiB used   1.05TiB path /dev/mapper/
>*** Some devices missing
> * disks are dm-crypted

FWIW... Older btrfs userspace such as your v3.17 is "OK" for normal 
runtime use, assuming you don't need any newer features, as in normal 
runtime, it's the kernel code doing the real work and userspace for the 
most part simply makes the appropriate kernel calls to do that work.

But, once you get into a recovery situation like the one you're in now, 
current userspace becomes much more important, as the various things 
you'll do to attempt recovery rely far more on userspace code directly 
accessing the filesystem, and it's only the newest userspace code that 
has the latest fixes.

So for a recovery situation, the newest userspace release (4.2.2 at 
present) as well as a recent kernel is recommended, and depending on the 
problem, you may at times need to run integration or apply patches on top 
of that.

> What happened:
> * devid 5 started to die (slowly)
> * added a new disk (devid 6) and tried `btrfs device delete`
> * failed with kernel crashes (guess:) due to heavy IO errors
> * removed devid 5 from /dev (deactivated in dm-crypt)
> * tried `btrfs balance`
>* interrupted multiple times due to kernel crashes
>  (probably due to semi-corrupted file system?)
> * file system did not mount anymore after a required hard-reset
> * no successful recovery so far:
>if not read-only, kernel IO blocks eventually (hard-reset required)
> * tried:
>* `-o degraded`
>  -> IO freeze, kernel log: http://pastebin.com/Rzrp7XeL
>* `-o degraded,recovery`
>  -> IO freeze, kernel log: http://pastebin.com/VemHfnuS
>* `-o degraded,recovery,ro`
>  -> file system accessible, system stable
> * going rw again does not fix the problem
> 
> I did not btrfs-zero-log so far because my oops did not look very
> similar to the one in the Wiki and I did not want to risk to make
> recovery harder.

General note about btrfs and btrfs raid.  Given that btrfs itself remains 
a "stabilizing, but not yet fully mature and stable filesystem", while 
btrfs raid will often let you recover from a bad device, sometimes that 
recovery is in the form of letting you mount ro, so you can access the 
data and copy it elsewhere, before blowing away the filesystem and 
starting over.

Back to the problem at hand.  Current btrfs has a known limitation when 
operating in degraded mode.  That being, a btrfs raid may be write-
mountable only once, degraded, after which it can only be read-only 
mounted.  This is because under certain circumstances in degraded mode, 
btrfs will fall back from its normal raid mode to single mode chunk 
allocation for new writes, and once there's single-mode chunks on the 
filesystem, btrfs mount isn't currently smart enough to check that all 
chunks are actually available on present devices, and simply jumps to the 
conclusion that there's single mode chunks on the missing device(s) as 
well, so refuses to mount writable after that in ordered to prevent 
further damage to the filesystem and preserve the ability to mount at 
least ro, to copy off what isn't damaged.

There's a patch in the pipeline for this problem, that checks individual 
chunks instead of leaping to conclusions based on the presence of single-
mode chunks on a degraded filesystem with missing devices.  If that's 
your only problem (which the backtraces might reveal but I as a non-dev 
btrfs user can't tell), the patches should let you mount writable.

But that patch isn't in kernel 4.2.  You'll need at least kernel 4.3-rc, 
and possibly btrfs integration, or to cherrypick the patches onto 4.2.


Meanwhile, in keeping with the admin's rule on backups, by definition, if 
you valued the data more than the time and resources necessary for a 
backup, by definition, you have a backup available, otherwise, by 
definition, you valued the data less than the time and resources 
necessary to back it up.

Therefore, no worries.  Regardless of the fate of the data, you saved 
what your actions declared of most valuable to you, either the data, or 
the hassle and resources cost of the backup you didn't do.  As such, if 
you don't have a backup (or if you do but it's outdated), the data at 
risk of loss is by definition of very limited value.

That said, it

Re: [PATCH 1/3] btrfs: Cleanup no_quota parameter

2015-10-26 Thread Qu Wenruo




Filipe Manana wrote on 2015/10/26 08:14 +:

On Mon, Oct 26, 2015 at 6:11 AM, Qu Wenruo  wrote:

No_quota parameter for delayed_ref related function are meaningless
after 4.2-rc1, as any new delayed_ref_head will cause qgroup to scan
extent for its rfer/excl change without checking no_quota flag.

So this patch will clean them up.


Hi Qu,

I already send a patch for this yesterday:
https://patchwork.kernel.org/patch/7481901/


Sorry, I didn't notice the patch also removed no_quota...



This is more than a cleanup, it fixes several bugs. The most important
is crashes (BUG_ON) when running delayed references, mostly triggered
during balance. The second one is a deadlock in the clone ioctl
(reported at http://www.spinics.net/lists/linux-btrfs/msg45844.html).

The use of no_quota was also buggy in at least 2 places:

 1) At delayed-refs.c:btrfs_add_delayed_tree_ref() - we were setting
no_quota to 0 instead of 1 when the following condition was true:
is_fstree(ref_root) || !fs_info->quota_enabled

 2) At extent-tree.c:__btrfs_inc_extent_ref() - we were attempting to
reset a node's no_quota when the condition "!is_fstree(root_objectid)
|| !root->fs_info->quota_enabled" was true but we did it only in
an unused local stack variable, that is, we never reset the no_quota
value in the node itself.

I want to get this to stable, together with the other delayed
references fix, as a lot of people are unable to run balance as of
kernel 4.2+.


Sorry again for the regression I brought in 4.2.
The rework for delayed_ref implement is not important at all, and in 
fact new qgroup accounting could work completely well without them.



I'll update the changelog to reflect the clone ioctl deadlock issue,
which I previously forgot.

thanks



That would be great.

BTW what about split the patch into no_quota cleanup and other fixes?
It's not that obvious if they are all put into one patch.

Thanks,
Qu





Signed-off-by: Qu Wenruo 
---
  fs/btrfs/ctree.h   |  4 ++--
  fs/btrfs/delayed-ref.c | 26 ++
  fs/btrfs/delayed-ref.h |  7 ++
  fs/btrfs/extent-tree.c | 45 ++---
  fs/btrfs/file.c| 10 -
  fs/btrfs/inode.c   |  4 ++--
  fs/btrfs/ioctl.c   | 60 +-
  fs/btrfs/relocation.c  | 16 ++
  fs/btrfs/tree-log.c|  2 +-
  9 files changed, 43 insertions(+), 131 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index bc3c711..3fa3c3b 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3422,7 +3422,7 @@ int btrfs_set_disk_extent_flags(struct btrfs_trans_handle 
*trans,
  int btrfs_free_extent(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
   u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid,
- u64 owner, u64 offset, int no_quota);
+ u64 owner, u64 offset);

  int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len,
int delalloc);
@@ -3435,7 +3435,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle 
*trans,
  int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
  struct btrfs_root *root,
  u64 bytenr, u64 num_bytes, u64 parent,
-u64 root_objectid, u64 owner, u64 offset, int 
no_quota);
+u64 root_objectid, u64 owner, u64 offset);

  int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans,
struct btrfs_root *root);
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index bd9b63b..449974f 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -292,8 +292,7 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle *trans,
 exist = list_entry(href->ref_list.prev, struct btrfs_delayed_ref_node,
list);
 /* No need to compare bytenr nor is_head */
-   if (exist->type != ref->type || exist->no_quota != ref->no_quota ||
-   exist->seq != ref->seq)
+   if (exist->type != ref->type || exist->seq != ref->seq)
 goto add_tail;

 if ((exist->type == BTRFS_TREE_BLOCK_REF_KEY ||
@@ -526,7 +525,7 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
  struct btrfs_delayed_ref_head *head_ref,
  struct btrfs_delayed_ref_node *ref, u64 bytenr,
  u64 num_bytes, u64 parent, u64 ref_root, int level,
-int action, int no_quota)
+int action)
  {
 struct btrfs_delayed_tree_ref *full_ref;
 struct btrfs_delayed_ref_root *delayed_refs;
@@ -548,7 +547,6 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
 ref->action = action;
 ref->is_head = 0;

[PATCH 1/2 v3] Btrfs: fix regression when running delayed references

2015-10-26 Thread fdmanana

From: Filipe Manana 

In the kernel 4.2 merge window we had a refactoring/rework of the delayed
references implementation in order to fix certain problems with qgroups.
However that rework introduced one more regression that leads to the
following trace when running delayed references for metadata:

[35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832!
[35908.065201] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
[35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor 
raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc 
loop fuse parport_pc psmouse i2
[35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW   
4.3.0-rc5-btrfs-next-17+ #1
[35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: 
88010c4c8000
[35908.065201] RIP: 0010:[]  [] 
insert_inline_extent_backref+0x52/0xb1 [btrfs]
[35908.065201] RSP: 0018:88010c4cbb08  EFLAGS: 00010293
[35908.065201] RAX:  RBX: 88008a661000 RCX: 
[35908.065201] RDX: a04dd58f RSI: 0001 RDI: 
[35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: 88010c4cb9f8
[35908.065201] R10:  R11: 002c R12: 
[35908.065201] R13: 88020a74c578 R14:  R15: 
[35908.065201] FS:  () GS:88023edc() 
knlGS:
[35908.065201] CS:  0010 DS:  ES:  CR0: 8005003b
[35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: 06e0
[35908.065201] Stack:
[35908.065201]  88010c4cbb18 0f37 88020a74c578 
88015a408000
[35908.065201]  880154a44000  0005 
88010c4cbbd8
[35908.065201]  a0492b9a 0005  

[35908.065201] Call Trace:
[35908.065201]  [] __btrfs_inc_extent_ref+0x8b/0x208 [btrfs]
[35908.065201]  [] ? __btrfs_run_delayed_refs+0x4d4/0xd33 
[btrfs]
[35908.065201]  [] __btrfs_run_delayed_refs+0xafa/0xd33 
[btrfs]
[35908.065201]  [] ? join_transaction.isra.10+0x25/0x41f 
[btrfs]
[35908.065201]  [] ? join_transaction.isra.10+0xa8/0x41f 
[btrfs]
[35908.065201]  [] btrfs_run_delayed_refs+0x75/0x1dd [btrfs]
[35908.065201]  [] delayed_ref_async_start+0x3c/0x7b [btrfs]
[35908.065201]  [] normal_work_helper+0x14c/0x32a [btrfs]
[35908.065201]  [] btrfs_extent_refs_helper+0x12/0x14 [btrfs]
[35908.065201]  [] process_one_work+0x24a/0x4ac
[35908.065201]  [] worker_thread+0x206/0x2c2
[35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
[35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
[35908.065201]  [] kthread+0xef/0xf7
[35908.065201]  [] ? kthread_parkme+0x24/0x24
[35908.065201]  [] ret_from_fork+0x3f/0x70
[35908.065201]  [] ? kthread_parkme+0x24/0x24
[35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 8d 
4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 <0f> 0b 
4c 8b 45 30 8b 4d 28 45 31
[35908.065201] RIP  [] insert_inline_extent_backref+0x52/0xb1 
[btrfs]
[35908.065201]  RSP 
[35908.310885] ---[ end trace fe4299baf0666457 ]---

This happens because the new delayed references code no longer merges
delayed references that have different sequence values. The following
steps are an example sequence leading to this issue:

1) Transaction N starts, fs_info->tree_mod_seq has value 0;

2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for
   bytenr A is created, with a value of 1 and a seq value of 0;

3) fs_info->tree_mod_seq is incremented to 1;

4) Extent buffer A is deleted through btrfs_del_items(), which calls
   btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The
   later returns the metadata extent associated to extent buffer A to
   the free space cache (the range is not pinned), because the extent
   buffer was created in the current transaction (N) and writeback never
   happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set
   in the extent buffer).
   This creates the delayed reference Ref2 for bytenr A, with a value
   of -1 and a seq value of 1;

5) Delayed reference Ref2 is not merged with Ref1 when we create it,
   because they have different sequence numbers (decided at
   add_delayed_ref_tail_merge());

6) fs_info->tree_mod_seq is incremented to 2;

7) Some task attempts to allocate a new extent buffer (done at
   extent-tree.c:find_free_extent()), but due to heavy fragmentation
   and running low on metadata space the clustered allocation fails
   and we fall back to unclustered allocation, which finds the
   extent at offset A, so a new extent buffer at offset A is allocated.
   This creates delayed reference Ref3 for bytenr A,

Re: [PATCH 1/3] btrfs: Cleanup no_quota parameter

2015-10-26 Thread Filipe Manana

On Mon, Oct 26, 2015 at 8:25 AM, Qu Wenruo  wrote:
>
>
> Filipe Manana wrote on 2015/10/26 08:14 +:
>>
>> On Mon, Oct 26, 2015 at 6:11 AM, Qu Wenruo 
>> wrote:
>>>
>>> No_quota parameter for delayed_ref related function are meaningless
>>> after 4.2-rc1, as any new delayed_ref_head will cause qgroup to scan
>>> extent for its rfer/excl change without checking no_quota flag.
>>>
>>> So this patch will clean them up.
>>
>>
>> Hi Qu,
>>
>> I already send a patch for this yesterday:
>> https://patchwork.kernel.org/patch/7481901/
>
>
> Sorry, I didn't notice the patch also removed no_quota...
>
>>
>> This is more than a cleanup, it fixes several bugs. The most important
>> is crashes (BUG_ON) when running delayed references, mostly triggered
>> during balance. The second one is a deadlock in the clone ioctl
>> (reported at http://www.spinics.net/lists/linux-btrfs/msg45844.html).
>>
>> The use of no_quota was also buggy in at least 2 places:
>>
>>  1) At delayed-refs.c:btrfs_add_delayed_tree_ref() - we were setting
>> no_quota to 0 instead of 1 when the following condition was true:
>> is_fstree(ref_root) || !fs_info->quota_enabled
>>
>>  2) At extent-tree.c:__btrfs_inc_extent_ref() - we were attempting to
>> reset a node's no_quota when the condition
>> "!is_fstree(root_objectid)
>> || !root->fs_info->quota_enabled" was true but we did it only in
>> an unused local stack variable, that is, we never reset the
>> no_quota
>> value in the node itself.
>>
>> I want to get this to stable, together with the other delayed
>> references fix, as a lot of people are unable to run balance as of
>> kernel 4.2+.
>
>
> Sorry again for the regression I brought in 4.2.
> The rework for delayed_ref implement is not important at all, and in fact
> new qgroup accounting could work completely well without them.
>
>> I'll update the changelog to reflect the clone ioctl deadlock issue,
>> which I previously forgot.
>>
>> thanks
>>
>
> That would be great.
>
> BTW what about split the patch into no_quota cleanup and other fixes?

No point in doing that. Fixing the balance regression requires
removing the whole no_quota thing, fixing those 2 bugs it had alone,
won't fix the problem leading to the BUG_ON.

> It's not that obvious if they are all put into one patch.
>
> Thanks,
> Qu
>
>
>>
>>>
>>> Signed-off-by: Qu Wenruo 
>>> ---
>>>   fs/btrfs/ctree.h   |  4 ++--
>>>   fs/btrfs/delayed-ref.c | 26 ++
>>>   fs/btrfs/delayed-ref.h |  7 ++
>>>   fs/btrfs/extent-tree.c | 45 ++---
>>>   fs/btrfs/file.c| 10 -
>>>   fs/btrfs/inode.c   |  4 ++--
>>>   fs/btrfs/ioctl.c   | 60
>>> +-
>>>   fs/btrfs/relocation.c  | 16 ++
>>>   fs/btrfs/tree-log.c|  2 +-
>>>   9 files changed, 43 insertions(+), 131 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>> index bc3c711..3fa3c3b 100644
>>> --- a/fs/btrfs/ctree.h
>>> +++ b/fs/btrfs/ctree.h
>>> @@ -3422,7 +3422,7 @@ int btrfs_set_disk_extent_flags(struct
>>> btrfs_trans_handle *trans,
>>>   int btrfs_free_extent(struct btrfs_trans_handle *trans,
>>>struct btrfs_root *root,
>>>u64 bytenr, u64 num_bytes, u64 parent, u64
>>> root_objectid,
>>> - u64 owner, u64 offset, int no_quota);
>>> + u64 owner, u64 offset);
>>>
>>>   int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64
>>> len,
>>> int delalloc);
>>> @@ -3435,7 +3435,7 @@ int btrfs_finish_extent_commit(struct
>>> btrfs_trans_handle *trans,
>>>   int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>>>   struct btrfs_root *root,
>>>   u64 bytenr, u64 num_bytes, u64 parent,
>>> -u64 root_objectid, u64 owner, u64 offset, int
>>> no_quota);
>>> +u64 root_objectid, u64 owner, u64 offset);
>>>
>>>   int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans,
>>> struct btrfs_root *root);
>>> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
>>> index bd9b63b..449974f 100644
>>> --- a/fs/btrfs/delayed-ref.c
>>> +++ b/fs/btrfs/delayed-ref.c
>>> @@ -292,8 +292,7 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle
>>> *trans,
>>>  exist = list_entry(href->ref_list.prev, struct
>>> btrfs_delayed_ref_node,
>>> list);
>>>  /* No need to compare bytenr nor is_head */
>>> -   if (exist->type != ref->type || exist->no_quota != ref->no_quota
>>> ||
>>> -   exist->seq != ref->seq)
>>> +   if (exist->type != ref->type || exist->seq != ref->seq)
>>>  goto add_tail;
>>>
>>>  if ((exist->type ==

Re: Recover btrfs volume which can only be mounded in read-only mode

2015-10-26 Thread Duncan

Dmitry Katsubo posted on Sun, 18 Oct 2015 11:44:08 +0200 as excerpted:

>> Meanwhile, the present btrfs raid1 read-scheduler is both pretty simple
>> to code up and pretty simple to arrange tests for that run either one
>> side or the other, but not both, or that are well balanced to both.
>> However, it's pretty poor in terms of ensuring optimized real-world
>> deployment read-scheduling.
>> 
>> What it does is simply this.  Remember, btrfs raid1 is specifically two
>> copies.  It chooses which copy of the two will be read very simply,
>> based on the PID making the request.  Odd PIDs get assigned one copy,
>> even PIDs the other.  As I said, simple to code, great for ensuring
>> testing of one copy or the other or both, but not really optimized at
>> all for real-world usage.
>> 
>> If your workload happens to be a bunch of all odd or all even PIDs,
>> well, enjoy your testing-grade read-scheduler, bottlenecking everything
>> reading one copy, while the other sits entirely idle.
> 
> I think PID-based solution is not the best one. Why not simply take a
> random device? Then at least all drives in the volume are equally loaded
> (in average).

Nobody argues that the even/odd-PID-based read-scheduling solution is 
/optimal/, in a production sense at least.  But at the time and for the 
purpose it was written it was pretty good, arguably reasonably close to 
"best", because the implementation is at once simple and transparent for 
debugging purposes, and real easy to test either one side or the other, 
or both, and equally important, to duplicate the results of those tests, 
by simply arranging for the testing to have either all even or all odd 
PIDs, or both.  And for ordinary use, it's good /enough/, as ordinarily, 
PIDs will be evenly distributed even/odd.

In that context, your random device read-scheduling algorithm would be 
far worse, because while being reasonably simple, it's anything *but* 
easy to ensure reads go to only one side or equally to both, or for that 
matter, to duplicate the tests, because randomization, by definition 
does /not/ lend itself to duplication.

And with both simplicity/transparency/debuggability and duplicatability 
of testing being primary factors when the code went in...

And again, the fact that it hasn't been optimized since then, in the 
context of "premature optimization", really says quite a bit about what 
the btrfs devs themselves consider btrfs' status to be -- obviously *not* 
production-grade stable and mature, or optimizations like this would have 
already been done.

Like it or not, that's btrfs' status at the moment.

Actually, the coming N-way-mirroring may very well be why they've not yet 
optimized the even/odd-PID mechanism already, because doing an optimized 
two-way would obviously be premature-optimization given the coming N-way, 
and doing an N-way clearly couldn't be properly tested at present, 
because only two-way is possible.  Introducing an optimized N-way 
scheduler together with the N-way-mirroring code necessary to properly 
test it thus becomes a no-brainer.

> From what you said I believe that certain servers will not benefit from
> btrfs, e.g. dedicated server that runs only one "fat" Java process, or
> one "huge" MySQL database.

Indeed.  But with btrfs still "stabilizing, but not entirely stable and 
mature", and indeed, various features still set to drop, and various 
optimizations still yet to do including this one, nobody, leastwise not 
the btrfs devs and knowledgeable regulars on this list, is /claiming/ 
that btrfs is at this time the be-all and end-all optimal solution for 
every single use-case.  Rather far from it!

As for the claims of salespeople... should any of them be making wild 
claims about btrfs, who in their sane mind takes salespeople's claims at 
face value in any case?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recover btrfs volume which can only be mounded in read-only mode

2015-10-26 Thread Duncan

Dmitry Katsubo posted on Sun, 18 Oct 2015 11:44:08 +0200 as excerpted:

[Regarding the btrfs raid1 "device-with-the-most-space" chunk-allocation 
strategy.]

> I think the mentioned strategy (fill in the device with most free space)
> is not most effective. If the data is spread equally, the read
> performance would be higher (reading from 3 disks instead of 2). In my
> case this is even crucial, because the smallest drive is SSD (and it is
> not loaded at all).
> 
> Maybe I don't see the benefit from the strategy which is currently
> implemented (besides that it is robust and well-tested)?

Two comments:

1) As Hugo alluded to, in striped mode (raid0/5/6 and I believe 10), the 
chunk allocator goes wide, allocating a chunk from each device with free 
space, then striping at something smaller (64 KiB maybe?).  When the 
smallest device is full, it reduces the width by one and continues 
allocating, down to the minimum stripe width for the raid type.  However, 
raid1 and single do device-with-the-most-space first, thus, particularly 
for raid1, ensuring maximum usage of available space.

Were raid1 to do width-first, capacity would be far lower and much more 
of the largest device would remain unusable, because some chunk pairs 
would be allocated entirely on the smaller devices, meaning less of the 
largest device would be used before the smaller devices fill up and no 
more raid1 chunks could be allocated as only the single largest device 
has free space left and raid1 requires allocation on two separate devices.

In the three-device raid1 case, the difference in usable capacity would 
be 1/3 the capacity of the smallest device, since until it is full, 1/3 
of all allocations would be to the two smaller devices, leaving that much 
more space unusable on the largest device.

So you see there's a reason for most-space-first, that being that it 
forces one chunk from each pair-allocation to the largest device, thereby 
most efficiently distributing space so as to leave as little space as 
possible unusable due to only one device left when pair-allocation is 
required.

2) There has been talk of a more flexible chunk allocator with an admin-
specified strategy allowing smart use of hybrid ssd/disk filesystems, for 
instance.  Perhaps put the metadata on the ssds, for instance, since 
btrfs metadata is relatively hot as in addition to the traditional 
metadata, it contains the checksums which btrfs of course checks on read.

However, this sort of thing is likely to be some time off, as it's 
relatively lower priority than various other possible features.  
Unfortunately, given the rate of btrfs development, "some time off" is in 
practice likely to be at least five years out.

In the mean time, there's technologies such as bcache that allow hybrid 
caching of "hot" data, designed to present themselves as virtual block 
devices so btrfs as well as other filesystems can layer on top.

And in fact, we have some regular users that have btrfs on top of bcache 
actually deployed, and from reports, it now works quite well.  (There 
were some problems awhile in the past, but they're several years in the 
past now, back well before the last couple LTS kernel series that's the 
oldest recommended for btrfs deployment.)

If you're interested, start a new thread with btrfs on bcache in the 
subject line, and you'll likely get some very useful replies. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] btrfs: Cleanup no_quota parameter

2015-10-26 Thread Qu Wenruo




Qu Wenruo wrote on 2015/10/26 16:25 +0800:



Filipe Manana wrote on 2015/10/26 08:14 +:

On Mon, Oct 26, 2015 at 6:11 AM, Qu Wenruo 
wrote:

No_quota parameter for delayed_ref related function are meaningless
after 4.2-rc1, as any new delayed_ref_head will cause qgroup to scan
extent for its rfer/excl change without checking no_quota flag.

So this patch will clean them up.


Hi Qu,

I already send a patch for this yesterday:
https://patchwork.kernel.org/patch/7481901/


Sorry, I didn't notice the patch also removed no_quota...



This is more than a cleanup, it fixes several bugs. The most important
is crashes (BUG_ON) when running delayed references, mostly triggered
during balance. The second one is a deadlock in the clone ioctl
(reported at http://www.spinics.net/lists/linux-btrfs/msg45844.html).

The use of no_quota was also buggy in at least 2 places:

 1) At delayed-refs.c:btrfs_add_delayed_tree_ref() - we were setting
no_quota to 0 instead of 1 when the following condition was true:
is_fstree(ref_root) || !fs_info->quota_enabled

 2) At extent-tree.c:__btrfs_inc_extent_ref() - we were attempting to
reset a node's no_quota when the condition
"!is_fstree(root_objectid)
|| !root->fs_info->quota_enabled" was true but we did it only in
an unused local stack variable, that is, we never reset the
no_quota
value in the node itself.

I want to get this to stable, together with the other delayed
references fix, as a lot of people are unable to run balance as of
kernel 4.2+.


Sorry again for the regression I brought in 4.2.
The rework for delayed_ref implement is not important at all, and in
fact new qgroup accounting could work completely well without them.


I'll update the changelog to reflect the clone ioctl deadlock issue,
which I previously forgot.

thanks



That would be great.

BTW what about split the patch into no_quota cleanup and other fixes?
It's not that obvious if they are all put into one patch.


Just forget it...
The cleanup itself will fix them all

Thanks,
Qu



Thanks,
Qu





Signed-off-by: Qu Wenruo 
---
  fs/btrfs/ctree.h   |  4 ++--
  fs/btrfs/delayed-ref.c | 26 ++
  fs/btrfs/delayed-ref.h |  7 ++
  fs/btrfs/extent-tree.c | 45 ++---
  fs/btrfs/file.c| 10 -
  fs/btrfs/inode.c   |  4 ++--
  fs/btrfs/ioctl.c   | 60
+-
  fs/btrfs/relocation.c  | 16 ++
  fs/btrfs/tree-log.c|  2 +-
  9 files changed, 43 insertions(+), 131 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index bc3c711..3fa3c3b 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3422,7 +3422,7 @@ int btrfs_set_disk_extent_flags(struct
btrfs_trans_handle *trans,
  int btrfs_free_extent(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
   u64 bytenr, u64 num_bytes, u64 parent, u64
root_objectid,
- u64 owner, u64 offset, int no_quota);
+ u64 owner, u64 offset);

  int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start,
u64 len,
int delalloc);
@@ -3435,7 +3435,7 @@ int btrfs_finish_extent_commit(struct
btrfs_trans_handle *trans,
  int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
  struct btrfs_root *root,
  u64 bytenr, u64 num_bytes, u64 parent,
-u64 root_objectid, u64 owner, u64 offset,
int no_quota);
+u64 root_objectid, u64 owner, u64 offset);

  int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans,
struct btrfs_root *root);
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index bd9b63b..449974f 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -292,8 +292,7 @@ add_delayed_ref_tail_merge(struct
btrfs_trans_handle *trans,
 exist = list_entry(href->ref_list.prev, struct
btrfs_delayed_ref_node,
list);
 /* No need to compare bytenr nor is_head */
-   if (exist->type != ref->type || exist->no_quota !=
ref->no_quota ||
-   exist->seq != ref->seq)
+   if (exist->type != ref->type || exist->seq != ref->seq)
 goto add_tail;

 if ((exist->type == BTRFS_TREE_BLOCK_REF_KEY ||
@@ -526,7 +525,7 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
  struct btrfs_delayed_ref_head *head_ref,
  struct btrfs_delayed_ref_node *ref, u64 bytenr,
  u64 num_bytes, u64 parent, u64 ref_root, int
level,
-int action, int no_quota)
+int action)
  {
 struct btrfs_delayed_tree_ref *full_ref;
 struct btrfs_delayed_ref_root *delayed_refs;
@@ -548,7 +547,6 @@

[4.4][PATCH 0/3] btrfs: Qgroup hotfix

2015-10-26 Thread Qu Wenruo

This patchset fixes 2 bugs:
1. Race condition leading to abort transaction
Reported by Filipe, fixed by 2nd patch.

2. Qgroup low level double free leading to EDQUOT
In fact, I hit such bug several times during internal rebase, but I'm so
stupid to forgot to include it in v3 patchset.
Fixed in 3rd patch.

Qu Wenruo (3):
  btrfs: Cleanup no_quota parameter
  btrfs: qgroup: Fix a race in delayed_ref which leads to abort trans
  btrfs: qgroup: Fix a rebase bug which will cause qgroup double free

 fs/btrfs/ctree.h   |  7 +++---
 fs/btrfs/delayed-ref.c | 48 
 fs/btrfs/delayed-ref.h |  9 +++-
 fs/btrfs/extent-tree.c | 55 ++---
 fs/btrfs/file.c| 10 -
 fs/btrfs/inode.c   | 16 +-
 fs/btrfs/ioctl.c   | 60 +-
 fs/btrfs/qgroup.c  |  4 
 fs/btrfs/relocation.c  | 16 ++
 fs/btrfs/tree-log.c|  2 +-
 10 files changed, 73 insertions(+), 154 deletions(-)

-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] btrfs: Cleanup no_quota parameter

2015-10-26 Thread Filipe Manana

On Mon, Oct 26, 2015 at 6:11 AM, Qu Wenruo  wrote:
> No_quota parameter for delayed_ref related function are meaningless
> after 4.2-rc1, as any new delayed_ref_head will cause qgroup to scan
> extent for its rfer/excl change without checking no_quota flag.
>
> So this patch will clean them up.

Hi Qu,

I already send a patch for this yesterday:
https://patchwork.kernel.org/patch/7481901/

This is more than a cleanup, it fixes several bugs. The most important
is crashes (BUG_ON) when running delayed references, mostly triggered
during balance. The second one is a deadlock in the clone ioctl
(reported at http://www.spinics.net/lists/linux-btrfs/msg45844.html).

The use of no_quota was also buggy in at least 2 places:

1) At delayed-refs.c:btrfs_add_delayed_tree_ref() - we were setting
   no_quota to 0 instead of 1 when the following condition was true:
   is_fstree(ref_root) || !fs_info->quota_enabled

2) At extent-tree.c:__btrfs_inc_extent_ref() - we were attempting to
   reset a node's no_quota when the condition "!is_fstree(root_objectid)
   || !root->fs_info->quota_enabled" was true but we did it only in
   an unused local stack variable, that is, we never reset the no_quota
   value in the node itself.

I want to get this to stable, together with the other delayed
references fix, as a lot of people are unable to run balance as of
kernel 4.2+.
I'll update the changelog to reflect the clone ioctl deadlock issue,
which I previously forgot.

thanks


>
> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/ctree.h   |  4 ++--
>  fs/btrfs/delayed-ref.c | 26 ++
>  fs/btrfs/delayed-ref.h |  7 ++
>  fs/btrfs/extent-tree.c | 45 ++---
>  fs/btrfs/file.c| 10 -
>  fs/btrfs/inode.c   |  4 ++--
>  fs/btrfs/ioctl.c   | 60 
> +-
>  fs/btrfs/relocation.c  | 16 ++
>  fs/btrfs/tree-log.c|  2 +-
>  9 files changed, 43 insertions(+), 131 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index bc3c711..3fa3c3b 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3422,7 +3422,7 @@ int btrfs_set_disk_extent_flags(struct 
> btrfs_trans_handle *trans,
>  int btrfs_free_extent(struct btrfs_trans_handle *trans,
>   struct btrfs_root *root,
>   u64 bytenr, u64 num_bytes, u64 parent, u64 
> root_objectid,
> - u64 owner, u64 offset, int no_quota);
> + u64 owner, u64 offset);
>
>  int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len,
>int delalloc);
> @@ -3435,7 +3435,7 @@ int btrfs_finish_extent_commit(struct 
> btrfs_trans_handle *trans,
>  int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>  struct btrfs_root *root,
>  u64 bytenr, u64 num_bytes, u64 parent,
> -u64 root_objectid, u64 owner, u64 offset, int 
> no_quota);
> +u64 root_objectid, u64 owner, u64 offset);
>
>  int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans,
>struct btrfs_root *root);
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index bd9b63b..449974f 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -292,8 +292,7 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle 
> *trans,
> exist = list_entry(href->ref_list.prev, struct btrfs_delayed_ref_node,
>list);
> /* No need to compare bytenr nor is_head */
> -   if (exist->type != ref->type || exist->no_quota != ref->no_quota ||
> -   exist->seq != ref->seq)
> +   if (exist->type != ref->type || exist->seq != ref->seq)
> goto add_tail;
>
> if ((exist->type == BTRFS_TREE_BLOCK_REF_KEY ||
> @@ -526,7 +525,7 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
>  struct btrfs_delayed_ref_head *head_ref,
>  struct btrfs_delayed_ref_node *ref, u64 bytenr,
>  u64 num_bytes, u64 parent, u64 ref_root, int level,
> -int action, int no_quota)
> +int action)
>  {
> struct btrfs_delayed_tree_ref *full_ref;
> struct btrfs_delayed_ref_root *delayed_refs;
> @@ -548,7 +547,6 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
> ref->action = action;
> ref->is_head = 0;
> ref->in_tree = 1;
> -   ref->no_quota = no_quota;
> ref->seq = seq;
>
> full_ref = btrfs_delayed_node_to_tree_ref(ref);
> @@ -581,7 +579,7 @@ add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>  struct btrfs_delayed_ref_head *head_ref,
>  struct btrfs_delayed_ref_node *ref, u64 bytenr,
>

[PATCH 2/2 v3] Btrfs: fix regression running delayed references when using qgroups

2015-10-26 Thread fdmanana

From: Filipe Manana 

In the kernel 4.2 merge window we had a big changes to the implementation
of delayed references and qgroups which made the no_quota field of delayed
references not used anymore. More specifically the no_quota field is not
used anymore as of:

  commit 0ed4792af0e8 ("btrfs: qgroup: Switch to new extent-oriented qgroup 
mechanism.")

Leaving the no_quota field actually prevents delayed references from
getting merged, which in turn cause the following BUG_ON(), at
fs/btrfs/extent-tree.c, to be hit when qgroups are enabled:

  static int run_delayed_tree_ref(...)
  {
 (...)
 BUG_ON(node->ref_mod != 1);
 (...)
  }

This happens on a scenario like the following:

  1) Ref1 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added.

  2) Ref2 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added.
 It's not merged with Ref1 because Ref1->no_quota != Ref2->no_quota.

  3) Ref3 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added.
 It's not merged with the reference at the tail of the list of refs
 for bytenr X because the reference at the tail, Ref2 is incompatible
 due to Ref2->no_quota != Ref3->no_quota.

  4) Ref4 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added.
 It's not merged with the reference at the tail of the list of refs
 for bytenr X because the reference at the tail, Ref3 is incompatible
 due to Ref3->no_quota != Ref4->no_quota.

  5) We run delayed references, trigger merging of delayed references,
 through __btrfs_run_delayed_refs() -> btrfs_merge_delayed_refs().

  6) Ref1 and Ref3 are merged as Ref1->no_quota = Ref3->no_quota and
 all other conditions are satisfied too. So Ref1 gets a ref_mod
 value of 2.

  7) Ref2 and Ref4 are merged as Ref2->no_quota = Ref4->no_quota and
 all other conditions are satisfied too. So Ref2 gets a ref_mod
 value of 2.

  8) Ref1 and Ref2 aren't merged, because they have different values
 for their no_quota field.

  9) Delayed reference Ref1 is picked for running (select_delayed_ref()
 always prefers references with an action == BTRFS_ADD_DELAYED_REF).
 So run_delayed_tree_ref() is called for Ref1 which triggers the
 BUG_ON because Ref1->red_mod != 1 (equals 2).

So fix this by removing the no_quota field, as it's not used anymore as
of commit 0ed4792af0e8 ("btrfs: qgroup: Switch to new extent-oriented
qgroup mechanism.").

The use of no_quota was also buggy in at least two places:

1) At delayed-refs.c:btrfs_add_delayed_tree_ref() - we were setting
   no_quota to 0 instead of 1 when the following condition was true:
   is_fstree(ref_root) || !fs_info->quota_enabled

2) At extent-tree.c:__btrfs_inc_extent_ref() - we were attempting to
   reset a node's no_quota when the condition "!is_fstree(root_objectid)
   || !root->fs_info->quota_enabled" was true but we did it only in
   an unused local stack variable, that is, we never reset the no_quota
   value in the node itself.

This fixes the remainder of problems several people have been having when
running delayed references, mostly while a balance is running in parallel,
on a 4.2+ kernel.

Very special thanks to Stéphane Lesimple for helping debugging this issue
and testing this fix on his multi terabyte filesystem (which took more
than one day to balance alone, plus fsck, etc).

Also, this fixes deadlock issue when using the clone ioctl with qgroups
enabled, as reported by Elias Probst in the mailing list. The deadlock
happens because after calling btrfs_insert_empty_item we have our path
holding a write lock on a leaf of the fs/subvol tree and then before
releasing the path we called check_ref() which did backref walking, when
qgroups are enabled, and tried to read lock the same leaf. The trace for
this case is the following:

  INFO: task systemd-nspawn:6095 blocked for more than 120 seconds.
  (...)
  Call Trace:
[] schedule+0x74/0x83
[] btrfs_tree_read_lock+0xc0/0xea
[] ? wait_woken+0x74/0x74
[] btrfs_search_old_slot+0x51a/0x810
[] btrfs_next_old_leaf+0xdf/0x3ce
[] ? ulist_add_merge+0x1b/0x127
[] __resolve_indirect_refs+0x62a/0x667
[] ? btrfs_clear_lock_blocking_rw+0x78/0xbe
[] find_parent_nodes+0xaf3/0xfc6
[] __btrfs_find_all_roots+0x92/0xf0
[] btrfs_find_all_roots+0x45/0x65
[] ? btrfs_get_tree_mod_seq+0x2b/0x88
[] check_ref+0x64/0xc4
[] btrfs_clone+0x66e/0xb5d
[] btrfs_ioctl_clone+0x48f/0x5bb
[] ? native_sched_clock+0x28/0x77
[] btrfs_ioctl+0xabc/0x25cb
  (...)

Reported-by: Stéphane Lesimple 
Tested-by: Stéphane Lesimple 
Reported-by: Elias Probst 
Reported-by: Peter Becker 
Reported-by: Malte Schröder 
Reported-by: Derek Dongray 
Reported-by: Erkki Seppala 
Cc: sta...@vger.kernel.org  # 4.2+
Signed-off-by: Filipe Manana

corrupted RAID1: unsuccessful recovery / help needed

2015-10-26 Thread Lukas Pirl

TL;DR: RAID1 does not recover, I guess the interesting part in the stack 
trace is:


  Call Trace:
  [] __del_reloc_root+0x30/0x100 [btrfs]
  [] free_reloc_roots+0x25/0x40 [btrfs]
  [] merge_reloc_roots+0x18e/0x240 [btrfs]
  [] btrfs_recover_relocation+0x374/0x420 [btrfs]
  [] open_ctree+0x1b7d/0x23e0 [btrfs]
  [] btrfs_mount+0x94e/0xa70 [btrfs]
  [] ? find_next_bit+0x15/0x20
  [] mount_fs+0x38/0x160
  …

Hello list.

I'd appreciate some help for repairing a corrupted RAID1.

Setup:
* Linux 4.2.0-12, Btrfs v3.17, `btrfs fi show`:
  uuid: 5be372f5-5492-4f4b-b641-c14f4ad8ae23
  Total devices 6 FS bytes used 2.87TiB
  devid 1 size 931.51GiB used 636.00GiB path /dev/mapper/WD-WCC4J7AFLTSZ
  devid 2 size 931.51GiB used 634.03GiB path /dev/mapper/WD-WCAU45343103
  devid 3 size   1.82TiB used   1.53TiB path /dev/mapper/WD-WCAVY6423276
  devid 4 size   1.82TiB used   1.53TiB path /dev/mapper/WD-WCAZAF872578
  devid 6 size   1.82TiB used   1.05TiB path /dev/mapper/WD-WMC4M0H3Z5UK
  *** Some devices missing
* disks are dm-crypted

What happened:
* devid 5 started to die (slowly)
* added a new disk (devid 6) and tried `btrfs device delete`
* failed with kernel crashes (guess:) due to heavy IO errors
* removed devid 5 from /dev (deactivated in dm-crypt)
* tried `btrfs balance`
  * interrupted multiple times due to kernel crashes
(probably due to semi-corrupted file system?)
* file system did not mount anymore after a required hard-reset
* no successful recovery so far:
  if not read-only, kernel IO blocks eventually (hard-reset required)
* tried:
  * `-o degraded`
-> IO freeze, kernel log: http://pastebin.com/Rzrp7XeL
  * `-o degraded,recovery`
-> IO freeze, kernel log: http://pastebin.com/VemHfnuS
  * `-o degraded,recovery,ro`
-> file system accessible, system stable
* going rw again does not fix the problem

I did not btrfs-zero-log so far because my oops did not look very
similar to the one in the Wiki and I did not want to risk to make
recovery harder.

Thanks,

Lukas

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Exclusive quota of snapshot exceeded despite no space used

2015-10-26 Thread Qu Wenruo




Thanks a lot for your reply!

While remounting the filesystem fixes the issue temporary, it doesn't
take very long for the bug to happen again so it's not really a
workaround I can work with.

I did recompile the kernel using your patches, but unfortunately the
problem still appears.

Thanks,
Johannes


Interesting, just touching file will cause EQUOTA is quite a big problem.

I'll try to reproduce it with my patchset and see what really caused
the problem.
The problem seems to do with snapshot qgroup hacking.
But I'm not completely sure yet.

BTW, does "sync; btrfs qgroup show -prce" still show excl as 16K?
16K is the correct number with only 6 empty files, just in case.

Thanks,
Qu


I ran my example from the first mail again and managed to write 7 files
this time, "qgroup show" still shows 16kB after sync:

root@t420:/media/extern/snap# btrfs qg limit -e 50M .
root@t420:/media/extern/snap# for file in {1..100}; do touch $file;
sleep 5m; done
touch: cannot touch ‘8’: Disk quota exceeded
^C
root@t420:/media/extern/snap# sync
root@t420:/media/extern/snap# btrfs qgroup show -pcre .
qgroupid rfer excl max_rfer max_excl parent  child
     --  -
0/5  16.00KiB 16.00KiB none none --- ---
0/25716.00KiB 16.00KiB none none --- ---
0/25816.00KiB 16.00KiB none 50.00MiB --- ---
root@t420:/media/extern/snap# btrfs fi sync .
FSSync '.'
root@t420:/media/extern/snap# btrfs qgroup show -pcre .
qgroupid rfer excl max_rfer max_excl parent  child
     --  -
0/5  16.00KiB 16.00KiB none none --- ---
0/25716.00KiB 16.00KiB none none --- ---
0/25816.00KiB 16.00KiB none 50.00MiB --- ---

By the way, I don't if its relevant but the problem is not limited to
exclusive quotas, but also happens when setting a "referenced" limit
(qgroup limit without "-e").

Thanks,
Johannes



The bug is located, and turns out to be quite a stupid problem caused by 
myself.


I just forgot to include a cleanup patch during rebase AGAIN!!!

You can apply the following patch to resolve it:
[PATCH 3/3] btrfs: qgroup: Fix a rebase bug which will cause qgroup 
double free


Or just apply the whole patchset:
[4.4][PATCH 0/3] btrfs: Qgroup hotfix

At least, with the patchset based on Chris' integration-4.4 branch, it 
succeeded in touching all the 100 files in my test box.


Thanks,
Qu



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS with 8TB SMR drives

2015-10-26 Thread Henk Slager

I decided to give this ST8000AS0002 a try for storing old snapshots,
although standardization for more optimal/native contol of SMR drives
is still ongoing.  I saw people got it working with 3.18 kernel, so
that gave confidence.

I wanted to see if i could get it running with 4.3.0-rc6 kernel (and
4.2.3 tools) on an H87M-Pro eSata (non-Intel) port. Filesystem is
btrfs all single profiles on top of dm-crypt and mounted with
compress-force=zlib,nossd (I use the drive via bcache but currently
with not attached to a cache device). The initial snapshot send |
receive action crashed after 1.2TB transferred, with all the
typical/known problems in dmesg

Then same trial, newly created fs, on 1 of the Intel sata ports. Also
the same timeouts seen in dmesg, but fs already corrupted after a few
GB of datatransfer. It seemed  that the drive was not able to handle
and store the filesystemdatastream that was being pushed onto it.

So I did  some step back and just created an ext4 on it and did and
rsync copy.Unfortunately, also the same timouts, port resets etc.

As the drive made the main system unstable, I hooked it up to an AMD
E-350 based board, also to try other kernels. Also on this board, no
success with 4.x kernels and also not with 3.18.22 in the first place.
But I figured out that a powercycle did the trick and not just a hard-
or softreset. So again created fs from scratch and mounted as
indicated.

Now it is 55% filled (3.9TiB) with 10 snapshots (done as increments
from the source fs from late 2013, with uncompressed allocation of
about 5.6 TiB). The whole datatransfer took about 4 days, which is
roughly 10x slower than what would be achieved if the drive were
non-SMR and in a fast (e.g. Core i7) system.

Although the task below took more than 8 minutes:
[322087.174089] Workqueue: events_unbound
btrfs_async_reclaim_metadata_space [btrfs]
... the fs and system runs OK.

My take is that this relatively low average datatransfer (one reason I
forced zlib compression) helped getting the task done successfully for
this device-managed SMR drive, but it is unsatisfying that there are
kernel version and computerystem dependencies. I had limited time for
preparing and setting up the datatransfer, so other configurations
with new kernels might also work, but I had most confidence upfront in
the one that has turned out to work. Maybe now that all data is on the
drive, I shrink the fs and create a test fs in a second partition.

On Sat, Oct 24, 2015 at 5:27 AM, Ken Long  wrote:
> Hello,
>
> I have a a single version of this drive formatted with btrfs. Its my
> only btrfs drive on this machine.
> I'm getting similar errors. Is there any info I can provide to help
> troubleshoot this?
>
> Is a full dmesg still wanted?
>
> here's what I'm running-
>
> $ uname -a
> Linux machine 4.2.0-16-lowlatency #19-Ubuntu SMP PREEMPT Thu Oct 8
> 16:19:23 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS with 8TB SMR drives

2015-10-26 Thread Henk Slager

I decided to give this ST8000AS0002 a try for backups / storing old
snapshots, although standardization for more optimal/native contol of
SMR drives is still ongoing.  I saw people got it working with 3.18
kernel, so that gave confidence.

I wanted to see if i could get it running with 4.3.0-rc6 kernel (and
4.2.3 tools) on an H87M-Pro eSata (non-Intel) port. Filesystem is
btrfs all single profiles on top of dm-crypt and mounted with
compress-force=zlib,nossd (I use the drive via bcache but currently
with not attached to a cache device). The initial snapshot send |
receive action crashed after 1.2TB transferred, with all the
typical/known problems in dmesg.

Then same trial, newly created fs, on 1 of the Intel sata ports. Also
the same timeouts seen in dmesg, but fs already corrupted after a few
GB of datatransfer. It seemed  that the drive was not able to handle
and store the filesystemdatastream that was being pushed onto it.

So I did  some step back and just created an ext4 on it and did and
rsync copy.Unfortunately, also the same timouts, port resets etc.

As the drive made the main system unstable, I hooked it up to an AMD
E-350 based board, also to try other kernels. Also on this board, no
success with 4.x kernels and also not with 3.18.22 in the first place.
But I figured out that a powercycle did the trick and not just a hard-
or softreset. So again created fs from scratch and mounted as
indicated.

Now it is 55% filled (3.9TiB) with 10 snapshots (done as increments
from the source fs from late 2013, with uncompressed allocation of
about 5.6 TiB). The whole datatransfer took about 4 days, which is
roughly 10x slower than what would be achieved if the drive were
non-SMR and in a fast (e.g. Core i7) system.

Although the task below took more than 8 minutes:
[322087.174089] Workqueue: events_unbound
btrfs_async_reclaim_metadata_space [btrfs]
... the fs and system runs OK.

My take is that this relatively low average datatransfer (one reason I
forced zlib compression) helped getting the task done successfully for
this device-managed SMR drive, but it is unsatisfying that there are
kernel version and computerystem dependencies. I had limited time for
preparing and setting up the datatransfer, so other configurations
with new kernels might also work, but I had most confidence upfront in
the one that has turned out to work. Maybe now that all data is on the
drive, I shrink the fs and create a test fs in a second partition.

On Sat, Oct 24, 2015 at 5:27 AM, Ken Long  wrote:
> Hello,
>
> I have a a single version of this drive formatted with btrfs. Its my
> only btrfs drive on this machine.
> I'm getting similar errors. Is there any info I can provide to help
> troubleshoot this?
>
> Is a full dmesg still wanted?
>
> here's what I'm running-
>
> $ uname -a
> Linux machine 4.2.0-16-lowlatency #19-Ubuntu SMP PREEMPT Thu Oct 8
> 16:19:23 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 5/4] copy_file_range.2: New page documenting copy_file_range()

2015-10-26 Thread J. Bruce Fields

On Mon, Oct 26, 2015 at 12:19:33PM +, Pádraig Brady wrote:
> On 26/10/15 03:39, Christoph Hellwig wrote:
> > On Sat, Oct 24, 2015 at 01:02:21PM +0100, P??draig Brady wrote:
> >> I'm a bit worried about the sparse expansion and default reflinking
> >> which might preclude cp(1) from using this call in most cases, but I will
> >> test and try to use it. coreutils has heuristics for determining if files
> >> are remote, which we might use to restrict to that use case.
> > 
> > Can you explain why reflinking and hole expansion are an issue if done
> > locally and not if done remotely?  I'd really like to make the call as
> > usable as possible for everyone, but we really need clear sem�ntics for
> > that.
> 
> Fair point on local vs remote. I was just assuming that remote
> copy offload would not do reflinking on the backend, or at
> least wasn't an exposed option over the remote interface.

The server could definitely do a reflink.  More generally, from the
description of the NFS COPY operation:

https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-39#page-64

If the copy completes successfully, either synchronously or
asynchronously, the data copied from the source file to the
destination file MUST appear identical to the NFS client.
However, the NFS server's on disk representation of the data in
the source file and destination file MAY differ.  For example,
the NFS server might encrypt, compress, deduplicate, or
otherwise represent the on disk data in the source and
destination file differently.

> I get the impression that you think reflinking should be hidden
> from the user, i.e. cp(1) should not have had the --reflink option
> (for the last 6 years)?  I'm not convinced of that, and even so
> I think lower level interfaces would benefit from finer grained options.
> This would be especially useful since there is no general interface
> to reflink at present. I was happy with the reflink control options,
> thinking the extra control could allow cp to use this by default.

Maybe that's a case for Christoph's "clone" operation.

I agree with him that it makes sense to allow the filesystem to
implement "copy" using reflink or similar tricks under the covers.  And
that in fact it's difficult to imagine how you'd prevent that in the
presence of layers of filesystem or block protocols underneath.

That "cp" flag seems strange to me, but if "cp" wants to take advantage
of a copy system call while continuing to make something like that
distinction then I suppose it could fallocate the destination range file
after the copy.

--b.

> > Also note that Annas current series allows for hole filling - any decent
> > implementation should not do them, but that's really a quality of
> > implementation and not an interface issue.
> 
> I think you're saying the default `cp --sparse=auto` operation
> could rely on copy_file_range(...complete file...), while
> cp --sparse={always,never} would have to iterate over the
> file, punching or filling holes as appropriate. I thought
> Anna indicated differently wrt splice filling holes by default.
> 
> TBH I'm not clear on the semantics of the current implementation,
> so need to test the above in various cases.
> 
> thanks,
> Pádraig.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Exclusive quota of snapshot exceeded despite no space used

2015-10-26 Thread Qu Wenruo




Johannes Henninger wrote on 2015/10/27 01:15 +0100:

On 26.10.2015 08:12, Qu Wenruo wrote:



Thanks a lot for your reply!

While remounting the filesystem fixes the issue temporary, it doesn't
take very long for the bug to happen again so it's not really a
workaround I can work with.

I did recompile the kernel using your patches, but unfortunately the
problem still appears.

Thanks,
Johannes


Interesting, just touching file will cause EQUOTA is quite a big
problem.

I'll try to reproduce it with my patchset and see what really caused
the problem.
The problem seems to do with snapshot qgroup hacking.
But I'm not completely sure yet.

BTW, does "sync; btrfs qgroup show -prce" still show excl as 16K?
16K is the correct number with only 6 empty files, just in case.

Thanks,
Qu


I ran my example from the first mail again and managed to write 7 files
this time, "qgroup show" still shows 16kB after sync:

root@t420:/media/extern/snap# btrfs qg limit -e 50M .
root@t420:/media/extern/snap# for file in {1..100}; do touch $file;
sleep 5m; done
touch: cannot touch ‘8’: Disk quota exceeded
^C
root@t420:/media/extern/snap# sync
root@t420:/media/extern/snap# btrfs qgroup show -pcre .
qgroupid rfer excl max_rfer max_excl parent
child
     --
-
0/5  16.00KiB 16.00KiB none none --- ---
0/25716.00KiB 16.00KiB none none --- ---
0/25816.00KiB 16.00KiB none 50.00MiB --- ---
root@t420:/media/extern/snap# btrfs fi sync .
FSSync '.'
root@t420:/media/extern/snap# btrfs qgroup show -pcre .
qgroupid rfer excl max_rfer max_excl parent
child
     --
-
0/5  16.00KiB 16.00KiB none none --- ---
0/25716.00KiB 16.00KiB none none --- ---
0/25816.00KiB 16.00KiB none 50.00MiB --- ---

By the way, I don't if its relevant but the problem is not limited to
exclusive quotas, but also happens when setting a "referenced" limit
(qgroup limit without "-e").

Thanks,
Johannes



The bug is located, and turns out to be quite a stupid problem caused
by myself.

I just forgot to include a cleanup patch during rebase AGAIN!!!

You can apply the following patch to resolve it:
[PATCH 3/3] btrfs: qgroup: Fix a rebase bug which will cause qgroup
double free

Or just apply the whole patchset:
[4.4][PATCH 0/3] btrfs: Qgroup hotfix

At least, with the patchset based on Chris' integration-4.4 branch, it
succeeded in touching all the 100 files in my test box.

Thanks,
Qu



It's working! Thank you so much for fixing this bug, you don't even know
how much this has helped me!

Thanks!
Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Glad to hear that.

If it's working for you, it would be better to add a 'Tested-by' tag for 
the 3rd patch.


Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bad fs performance, IO freezes

2015-10-26 Thread cheater00 .

Hello,
currently my computer freezes every several seconds for half a second
or so. Using it feels like I'm playing musical chairs with the kernel.
I have just one download happening on utorrent right now - this is
what the graph looks like:
http://i.imgur.com/LqhMtrJ.png
and every time a new spike happens, a freeze happens just before
that... that's the only time those freezes happen, too.

Please advise.

On Mon, Oct 26, 2015 at 7:31 PM, cheater00 .  wrote:
> I do not experience btrfs-transacti going up to 100% for minutes at a
> time now (not reproduced yet) but I have it spiking up to say 30% for
> a short while and everything jags during that time. So, say, if I am
> watching youtube, the sound cuts out and the video drops out for a
> bit. And if I'm typing, then what I typed during that time gets lost,
> like if I never typed that.
>
> I have also connected the same HDD bay with a USB3 cable instead of
> USB2. It's on an USB3 port. So it's running via USB3 now.
>
>
> On Mon, Oct 26, 2015 at 6:43 PM, cheater00 .  wrote:
>> So far I cannot reproduce. If I don't post again this means the issue
>> has been fixed by updating the kernel.
>>
>> On Mon, Oct 26, 2015 at 4:40 PM, cheater00 .  wrote:
>>> I have located 4.3.0-rc7 binaries which I will now try.
>>>
>>> On Mon, Oct 26, 2015 at 3:38 PM, cheater00 .  wrote:
 Thanks for the reply. What version did this go into? I'll try getting
 a prebuilt backport of the kernel, building source could slow things
 down considerably, but debs will not be available for the latest few
 minor versions I guess. So if you can tell me a min version, I'll try
 to find the latest deb newer than that, or I'll build if that's not
 available.

 On Mon, Oct 26, 2015 at 3:25 PM, Liu Bo  wrote:
> On 10/26/2015 08:16 PM, cheater00 . wrote:
>>
>> Hi guys,
>> I am running into really bad performance. Here's my setup:
>>
>> WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu
>> 32-bit with kernel 4.0.4-040004-generic #201505171336.
>>
>> Single btrfs partition covering whole disk.
>>
>> Autodefrag is on.
>>
>> fstab line:
>> UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0
>>
>> Sometimes when files are being modified or removed, I see
>> btrfs-transacti eat 100% cpu; during this time no io operations
>> succeed, that is, they're all stalled. You can't even ls on that fs.
>> This happens for several minutes then normal operation resumes. There
>> doesn't seem to be a rule to what will trigger this, other than
>> opening a single file and reading usually works quite well. (say,
>> watching a movie while all other programs are closed). But even moving
>> files off the disks triggers some sort of bug. Just now I am moving a
>> few files (just 30gb worth) onto another disk, and the bug triggers.
>> So btrfs-transacti was eating my cpu for over 5 minutes and according
>> to mv's output after this was done and cpu usage went back to normal
>> what I was waiting for was for a tiny png file to be removed. This is
>> pretty bad.
>>
>> I have tried defragmenting directories where files are being accessed
>> and moved. This hasn't helped.
>>
>> This happens whether the FS is near full or not. It currently is near
>> full but it wasn't before and it still did that. It still has about ~
>> 100GB free space now.
>>
>> The more things are happening the more often this bug gets triggered.
>> So if I have utorrent running and its temporary downloads directory is
>> there, its download speed graph will be a few spikes of running at
>> several MB/sec separated by durations of 0 download speed.
>>
>> Nothing seems to show up in dmesg or syslog.
>>
>> I have asked in #btrfs but the suggestions ended up not fixing the
>> issue (autodefrag, defrag dirs).
>>
>> Please advise what I should do with this issue.
>
>
> It might be related to delayed ref rework, the last time I saw this kind 
> of
> hanging problem about btrfs-transaction eating cpu is that because btrfs
> doesn't merge delayed refs, it'd be best to try the lastest kernel and if
> the issue is not resolved, then we can work out a reproducer and provide
> debugging.
>
> Thanks,
>
> Liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: Print Warning only if ENOSPC_DEBUG is enabled

2015-10-26 Thread Ashish Samant

Signed-off-by : Ashish Samant 
---
 fs/btrfs/delayed-inode.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index a2ae427..b86cfd9 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -652,9 +652,13 @@ static int btrfs_delayed_inode_reserve_metadata(
goto out;
 
ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes);
-   if (!WARN_ON(ret))
+   if (!ret)
goto out;
 
+   if (btrfs_test_opt(root, ENOSPC_DEBUG))
+   WARN(1, KERN_DEBUG
+"btrfs: block rsv migrate returned %d\n", ret);
+
/*
 * Ok this is a problem, let's just steal from the global rsv
 * since this really shouldn't happen that often.
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref

2015-10-26 Thread Qu Wenruo




Filipe Manana wrote on 2015/10/25 14:39 +:

On Tue, Oct 13, 2015 at 3:20 AM, Qu Wenruo  wrote:

Add new function btrfs_add_delayed_qgroup_reserve() function to record
how much space is reserved for that extent.

As btrfs only accounts qgroup at run_delayed_refs() time, so newly
allocated extent should keep the reserved space until then.

So add needed function with related members to do it.

Signed-off-by: Qu Wenruo 
---
v2:
   None
v3:
   None
---
  fs/btrfs/delayed-ref.c | 29 +
  fs/btrfs/delayed-ref.h | 14 ++
  2 files changed, 43 insertions(+)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index ac3e81d..bd9b63b 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
 INIT_LIST_HEAD(_ref->ref_list);
 head_ref->processing = 0;
 head_ref->total_ref_mod = count_mod;
+   head_ref->qgroup_reserved = 0;
+   head_ref->qgroup_ref_root = 0;

 /* Record qgroup extent info if provided */
 if (qrecord) {
@@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
*fs_info,
 return 0;
  }

+int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
+struct btrfs_trans_handle *trans,
+u64 ref_root, u64 bytenr, u64 num_bytes)
+{
+   struct btrfs_delayed_ref_root *delayed_refs;
+   struct btrfs_delayed_ref_head *ref_head;
+   int ret = 0;
+
+   if (!fs_info->quota_enabled || !is_fstree(ref_root))
+   return 0;
+
+   delayed_refs = >transaction->delayed_refs;
+
+   spin_lock(_refs->lock);
+   ref_head = find_ref_head(_refs->href_root, bytenr, 0);
+   if (!ref_head) {
+   ret = -ENOENT;
+   goto out;
+   }


Hi Qu,

So while running btrfs/063, with qgroups enabled (I modified the test
to enable qgroups), ran into this 2 times:

[169125.246506] BTRFS info (device sdc): disk space caching is enabled
[169125.363164] [ cut here ]
[169125.365236] WARNING: CPU: 10 PID: 2827 at fs/btrfs/inode.c:2929
btrfs_finish_ordered_io+0x347/0x4eb [btrfs]()
[169125.367702] BTRFS: Transaction aborted (error -2)
[169125.368830] Modules linked in: btrfs dm_flakey dm_mod
crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs
lockd grace fscache sunrpc loop fuse parport_pc parport i2c_piix4
psmouse acpi_cpufreq microcode pcspkr processor evdev i2c_core
serio_raw button ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom
ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring
scsi_mod e1000 virtio [last unloaded: btrfs]
[169125.376755] CPU: 10 PID: 2827 Comm: kworker/u32:14 Tainted: G
   W   4.3.0-rc5-btrfs-next-17+ #1


Hi Filipe,

Although not related to the bug report, I'm a little interested in your 
testing kernel.


Are you testing integration-4.4 from Chris repo?
Or 4.3-rc from mainline repo with my qgroup reserve patchset applied?

Although integration-4.4 already merged qgroup reserve patchset, but 
it's causing some strange bug like over decrease data 
sinfo->bytes_may_use, mainly in generic/127 testcase.


But if qgroup reserve patchset is rebased to integration-4.3 (I did all 
my old tests based on that), no generic/127 problem at all.


Thanks,
Qu


[169125.378522] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org
04/01/2014
[169125.380916] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[169125.382167]   88007ef2bc28 812566f4
88007ef2bc70
[169125.383643]  88007ef2bc60 8104d0a6 a03cac33
8801f5ca6db0
[169125.385197]  8802c6c7ee98 880122bc1000 fffe
88007ef2bcc8
[169125.386691] Call Trace:
[169125.387194]  [] dump_stack+0x4e/0x79
[169125.388205]  [] warn_slowpath_common+0x9f/0xb8
[169125.389386]  [] ?
btrfs_finish_ordered_io+0x347/0x4eb [btrfs]
[169125.390837]  [] warn_slowpath_fmt+0x48/0x50
[169125.391839]  [] ? unpin_extent_cache+0xbe/0xcc [btrfs]
[169125.392973]  []
btrfs_finish_ordered_io+0x347/0x4eb [btrfs]
[169125.395714]  [] ? _raw_spin_unlock_irqrestore+0x38/0x60
[169125.396888]  [] ? trace_hardirqs_off_caller+0x1f/0xb9
[169125.397986]  [] finish_ordered_fn+0x15/0x17 [btrfs]
[169125.399122]  [] normal_work_helper+0x14c/0x32a [btrfs]
[169125.400300]  [] btrfs_endio_write_helper+0x12/0x14 [btrfs]
[169125.401450]  [] process_one_work+0x24a/0x4ac
[169125.402631]  [] worker_thread+0x206/0x2c2
[169125.403622]  [] ? rescuer_thread+0x2cb/0x2cb
[169125.404693]  [] kthread+0xef/0xf7
[169125.405727]  [] ? kthread_parkme+0x24/0x24
[169125.406808]  [] ret_from_fork+0x3f/0x70
[169125.407834]  [] ? kthread_parkme+0x24/0x24
[169125.408840] ---[ end trace 6ee4342a5722b119 ]---
[169125.409654] BTRFS: error (device sdc) in

Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref

2015-10-26 Thread Qu Wenruo




Chris Mason wrote on 2015/10/27 01:14 -0400:

On Tue, Oct 27, 2015 at 12:13:11PM +0800, Qu Wenruo wrote:



Filipe Manana wrote on 2015/10/25 14:39 +:

On Tue, Oct 13, 2015 at 3:20 AM, Qu Wenruo  wrote:

Add new function btrfs_add_delayed_qgroup_reserve() function to record
how much space is reserved for that extent.

As btrfs only accounts qgroup at run_delayed_refs() time, so newly
allocated extent should keep the reserved space until then.

So add needed function with related members to do it.

Signed-off-by: Qu Wenruo 
---
v2:
   None
v3:
   None
---
  fs/btrfs/delayed-ref.c | 29 +
  fs/btrfs/delayed-ref.h | 14 ++
  2 files changed, 43 insertions(+)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index ac3e81d..bd9b63b 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
 INIT_LIST_HEAD(_ref->ref_list);
 head_ref->processing = 0;
 head_ref->total_ref_mod = count_mod;
+   head_ref->qgroup_reserved = 0;
+   head_ref->qgroup_ref_root = 0;

 /* Record qgroup extent info if provided */
 if (qrecord) {
@@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
*fs_info,
 return 0;
  }

+int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
+struct btrfs_trans_handle *trans,
+u64 ref_root, u64 bytenr, u64 num_bytes)
+{
+   struct btrfs_delayed_ref_root *delayed_refs;
+   struct btrfs_delayed_ref_head *ref_head;
+   int ret = 0;
+
+   if (!fs_info->quota_enabled || !is_fstree(ref_root))
+   return 0;
+
+   delayed_refs = >transaction->delayed_refs;
+
+   spin_lock(_refs->lock);
+   ref_head = find_ref_head(_refs->href_root, bytenr, 0);
+   if (!ref_head) {
+   ret = -ENOENT;
+   goto out;
+   }


Hi Qu,

So while running btrfs/063, with qgroups enabled (I modified the test
to enable qgroups), ran into this 2 times:

[169125.246506] BTRFS info (device sdc): disk space caching is enabled
[169125.363164] [ cut here ]
[169125.365236] WARNING: CPU: 10 PID: 2827 at fs/btrfs/inode.c:2929
btrfs_finish_ordered_io+0x347/0x4eb [btrfs]()
[169125.367702] BTRFS: Transaction aborted (error -2)
[169125.368830] Modules linked in: btrfs dm_flakey dm_mod
crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs
lockd grace fscache sunrpc loop fuse parport_pc parport i2c_piix4
psmouse acpi_cpufreq microcode pcspkr processor evdev i2c_core
serio_raw button ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom
ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring
scsi_mod e1000 virtio [last unloaded: btrfs]
[169125.376755] CPU: 10 PID: 2827 Comm: kworker/u32:14 Tainted: G
   W   4.3.0-rc5-btrfs-next-17+ #1


Hi Filipe,

Although not related to the bug report, I'm a little interested in your
testing kernel.

Are you testing integration-4.4 from Chris repo?
Or 4.3-rc from mainline repo with my qgroup reserve patchset applied?

Although integration-4.4 already merged qgroup reserve patchset, but it's
causing some strange bug like over decrease data sinfo->bytes_may_use,
mainly in generic/127 testcase.

But if qgroup reserve patchset is rebased to integration-4.3 (I did all my
old tests based on that), no generic/127 problem at all.


Did I mismerge things?

-chris


Not sure yet.

But at least some patches in 4.3 is not in integration-4.4, like the 
following patch:
btrfs: Avoid truncate tailing page if fallocate range doesn't exceed 
inode size


I'll continue testing and bisecting to see what triggers the strange 
WARN_ON() in integration-4.4.

--
Oct 27 11:05:00 vmware kernel: WARNING: CPU: 4 PID: 13711 at 
fs/btrfs//extent-tree.c:4171 
btrfs_free_reserved_data_space_noquota+0x175/0x190 [btrfs]()
Oct 27 11:05:00 vmware kernel: Modules linked in: btrfs(OE) fuse vfat 
msdos fat xfs binfmt_misc bridge stp llc dm_snapshot dm_bufio dm_flakey 
loop iptable_nat nf_conntrack_ipv4 nf
_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw 
iptable_filter ip_tables dm_mirror dm_region_hash dm_log xor dm_mod 
crc32c_intel vmw_balloon raid6_pq nfsd
vmw_vmci i2c_piix4 shpchp auth_rpcgss acpi_cpufreq nfs_acl lockd grace 
sunrpc ext4 mbcache jbd2 sd_mod vmwgfx drm_kms_helper syscopyarea 
sysfillrect sysimgblt fb_sys_fops ttm drm

ata_piix vmxnet3 libata vmw_pvscsi floppy [last unloaded: btrfs]
Oct 27 11:05:00 vmware kernel: CPU: 4 PID: 13711 Comm: fsx Tainted: G 
 W  OE   4.3.0-rc5+ #5
Oct 27 11:05:00 vmware kernel: Hardware name: VMware, Inc. VMware 
Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 08/16/2013
Oct 27 11:05:00 vmware kernel:  2caf2373 
88021f63b760 81302e73
Oct 27 11:05:00 vmware kernel:

Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref

2015-10-26 Thread Chris Mason

On Tue, Oct 27, 2015 at 12:13:11PM +0800, Qu Wenruo wrote:
> 
> 
> Filipe Manana wrote on 2015/10/25 14:39 +:
> >On Tue, Oct 13, 2015 at 3:20 AM, Qu Wenruo  wrote:
> >>Add new function btrfs_add_delayed_qgroup_reserve() function to record
> >>how much space is reserved for that extent.
> >>
> >>As btrfs only accounts qgroup at run_delayed_refs() time, so newly
> >>allocated extent should keep the reserved space until then.
> >>
> >>So add needed function with related members to do it.
> >>
> >>Signed-off-by: Qu Wenruo 
> >>---
> >>v2:
> >>   None
> >>v3:
> >>   None
> >>---
> >>  fs/btrfs/delayed-ref.c | 29 +
> >>  fs/btrfs/delayed-ref.h | 14 ++
> >>  2 files changed, 43 insertions(+)
> >>
> >>diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> >>index ac3e81d..bd9b63b 100644
> >>--- a/fs/btrfs/delayed-ref.c
> >>+++ b/fs/btrfs/delayed-ref.c
> >>@@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
> >> INIT_LIST_HEAD(_ref->ref_list);
> >> head_ref->processing = 0;
> >> head_ref->total_ref_mod = count_mod;
> >>+   head_ref->qgroup_reserved = 0;
> >>+   head_ref->qgroup_ref_root = 0;
> >>
> >> /* Record qgroup extent info if provided */
> >> if (qrecord) {
> >>@@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
> >>*fs_info,
> >> return 0;
> >>  }
> >>
> >>+int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
> >>+struct btrfs_trans_handle *trans,
> >>+u64 ref_root, u64 bytenr, u64 
> >>num_bytes)
> >>+{
> >>+   struct btrfs_delayed_ref_root *delayed_refs;
> >>+   struct btrfs_delayed_ref_head *ref_head;
> >>+   int ret = 0;
> >>+
> >>+   if (!fs_info->quota_enabled || !is_fstree(ref_root))
> >>+   return 0;
> >>+
> >>+   delayed_refs = >transaction->delayed_refs;
> >>+
> >>+   spin_lock(_refs->lock);
> >>+   ref_head = find_ref_head(_refs->href_root, bytenr, 0);
> >>+   if (!ref_head) {
> >>+   ret = -ENOENT;
> >>+   goto out;
> >>+   }
> >
> >Hi Qu,
> >
> >So while running btrfs/063, with qgroups enabled (I modified the test
> >to enable qgroups), ran into this 2 times:
> >
> >[169125.246506] BTRFS info (device sdc): disk space caching is enabled
> >[169125.363164] [ cut here ]
> >[169125.365236] WARNING: CPU: 10 PID: 2827 at fs/btrfs/inode.c:2929
> >btrfs_finish_ordered_io+0x347/0x4eb [btrfs]()
> >[169125.367702] BTRFS: Transaction aborted (error -2)
> >[169125.368830] Modules linked in: btrfs dm_flakey dm_mod
> >crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs
> >lockd grace fscache sunrpc loop fuse parport_pc parport i2c_piix4
> >psmouse acpi_cpufreq microcode pcspkr processor evdev i2c_core
> >serio_raw button ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom
> >ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring
> >scsi_mod e1000 virtio [last unloaded: btrfs]
> >[169125.376755] CPU: 10 PID: 2827 Comm: kworker/u32:14 Tainted: G
> >   W   4.3.0-rc5-btrfs-next-17+ #1
> 
> Hi Filipe,
> 
> Although not related to the bug report, I'm a little interested in your
> testing kernel.
> 
> Are you testing integration-4.4 from Chris repo?
> Or 4.3-rc from mainline repo with my qgroup reserve patchset applied?
> 
> Although integration-4.4 already merged qgroup reserve patchset, but it's
> causing some strange bug like over decrease data sinfo->bytes_may_use,
> mainly in generic/127 testcase.
> 
> But if qgroup reserve patchset is rebased to integration-4.3 (I did all my
> old tests based on that), no generic/127 problem at all.

Did I mismerge things?

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recover btrfs volume which can only be mounded in read-only mode

2015-10-26 Thread Duncan

Hugo Mills posted on Mon, 26 Oct 2015 09:24:57 + as excerpted:

> On Mon, Oct 26, 2015 at 09:14:00AM +, Duncan wrote:
>> Dmitry Katsubo posted on Sun, 18 Oct 2015 11:44:08 +0200 as excerpted:
>> 
>>> I think PID-based solution is not the best one. Why not simply take a
>>> random device? Then at least all drives in the volume are equally
>>> loaded (in average).
>> 
>> Nobody argues that the even/odd-PID-based read-scheduling solution is
>> /optimal/, in a production sense at least.  But [it's near ideal for
>> testing, and "good enough" for the most general case].
> 
> For what it's worth, David tried implementing round-robin (IIRC)
> some time ago, and found that it performed *worse* than the pid-based
> system. (It may have been random, but memory says it was round-robin).

What I'd like to know is what mdraid1 uses, and if btrfs can get that.  
Because some upgrades worth ago, after trying mdraid6 for the main system 
and mdraid0 for some parts (with mdraid1 for boot since grub1 could deal 
with it, but not the others), I eventually settled on 4-way mdraid1 for 
everything, using the same disks I had used for the raid6 and raid0.

And I was rather blown away by the mdraid1 speed, in comparison, 
especially compared to raid0, which I thought would be better than 
raid1.  I guess my use-case is multi-thread read-heavy enough that the 
whatever mdraid1 uses, I was getting upto four separate reads (one per 
spindle) going at once, while writes still happened at single-spindle 
speed as with SATA (as opposed to the older IDE, this was when SATA was 
still new), each spindle had its own channel and they could write in 
parallel with bottleneck being the speed at which the slowest of the four 
completed its write.  So writes were single-spindle-speed, still far 
faster than the raid6 read-modify-write cycle, while reads... it really 
did appear to multitask one per spindle.

Also, the mdraid1 may have actually taken into account spindle head 
location as well, and scheduled reads to the spindle with the head 
already positioned closest to the target, tho I'm not sure on that.

But whatever mdraid1 scheduling does, I was totally astonished at how 
efficient it was, and it really did turn my thinking on most efficient 
raid choices upside down.  So if btrfs could simply take that scheduler 
and modify it as necessary for btrfs specifics, provided the 
modifications weren't /too/ heavy (and the fact that btrfs does read-time 
checksum verification could very well mean the algorithm as directly 
adapted as possible may not reach anything like the same efficiency), I 
really do think that'd be the ideal.  And of course it's freedomware code 
in the same kernel, so reusing the mdraid read-scheduler shouldn't be the 
problem it might be in other circumstances, tho the possible caveat of 
btrfs specific implementation issues does remain.

And of course someone would have to take the time to adapt it to work 
with btrfs, which gets us back onto the practical side of things, the 
"opportunity rich, developer-time poor" situation that is btrfs coding 
reality, premature optimization, possibly doing it at the same time as N-
way-mirroring, etc.

But anyway, mdraid's raid1 read-scheduler really does seem to be 
impressively efficient, the benchmark to try to match, if possible.  If 
that can be done by reusing some of the same code, so much the better. 
=:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs-progs: Prevent creation of filesystem with 'mixed bgs' and having differing sectorsize and nodesize.

2015-10-26 Thread David Sterba

On Wed, Oct 14, 2015 at 11:10:38PM +0530, Chandan Rajendra wrote:
> mkfs.btrfs allows creation of Btrfs filesystem instances with mixed block
> group feature enabled and having a sectorsize different from nodesize.
> For e.g:
> 
> [root@localhost btrfs-progs]# mkfs.btrfs -f -M -s 4096 -n 16384  /dev/loop0
> Forcing mixed metadata/data groups
> btrfs-progs v3.19-rc2-404-gbbbd18e-dirty
> See http://btrfs.wiki.kernel.org for more information.
> 
> Performing full device TRIM (4.00GiB) ...
> Label:  (null)
> UUID:   c82b5720-6d88-4fa1-ac05-d0d4cb797fd5
> Node size:  16384
> Sector size:4096
> Filesystem size:4.00GiB
> Block group profiles:
>   Data+Metadata:single8.00MiB
>   System:   single4.00MiB
> SSD detected:   no
> Incompat features:  mixed-bg, extref, skinny-metadata
> Number of devices:  1
> Devices:
>   IDSIZE  PATH
>1 4.00GiB  /dev/loop6
> 
> This commit fixes the issue by setting BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS
> feature bit before checking the validity of nodesize that was specified on the
> command line.
> 
> Signed-off-by: Chandan Rajendra 

Test added and applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bad fs performance, IO freezes

2015-10-26 Thread cheater00 .

Hi guys,
I am running into really bad performance. Here's my setup:

WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu
32-bit with kernel 4.0.4-040004-generic #201505171336.

Single btrfs partition covering whole disk.

Autodefrag is on.

fstab line:
UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0

Sometimes when files are being modified or removed, I see
btrfs-transacti eat 100% cpu; during this time no io operations
succeed, that is, they're all stalled. You can't even ls on that fs.
This happens for several minutes then normal operation resumes. There
doesn't seem to be a rule to what will trigger this, other than
opening a single file and reading usually works quite well. (say,
watching a movie while all other programs are closed). But even moving
files off the disks triggers some sort of bug. Just now I am moving a
few files (just 30gb worth) onto another disk, and the bug triggers.
So btrfs-transacti was eating my cpu for over 5 minutes and according
to mv's output after this was done and cpu usage went back to normal
what I was waiting for was for a tiny png file to be removed. This is
pretty bad.

I have tried defragmenting directories where files are being accessed
and moved. This hasn't helped.

This happens whether the FS is near full or not. It currently is near
full but it wasn't before and it still did that. It still has about ~
100GB free space now.

The more things are happening the more often this bug gets triggered.
So if I have utorrent running and its temporary downloads directory is
there, its download speed graph will be a few spikes of running at
several MB/sec separated by durations of 0 download speed.

Nothing seems to show up in dmesg or syslog.

I have asked in #btrfs but the suggestions ended up not fixing the
issue (autodefrag, defrag dirs).

Please advise what I should do with this issue.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bad fs performance, IO freezes

2015-10-26 Thread Donald Pearson

I get the same kind of muddy errors when I do a quota rescan on the
filesystem or qgroup show on any subvolume on mine, and I know I don't
have them enabled.

On Mon, Oct 26, 2015 at 8:56 AM, cheater00 .  wrote:
> fwiw, I did this:
>
> sudo btrfs qgroup show /media/X
> ERROR: can't perform the search - No such file or directory
> ERROR: can't list qgroups: No such file or directory
>
> I assume this means no qgroups present, which means no quotas present.
> Please correct me if I'm wrong.
> So yes, the issue must lie elsewhere.
>
> On Mon, Oct 26, 2015 at 2:46 PM, cheater00 .  wrote:
>> I don't remember doing that, but just to exclude everything, how do I check?
>>
>> On Mon, Oct 26, 2015 at 2:45 PM, Donald Pearson
>>  wrote:
>>> AFAIK quotas aren't a mount option, but if you never enabled them and
>>> created the qgroups by hand that's your answer and the issue must be
>>> something else.
>>>
>>> On Mon, Oct 26, 2015 at 8:36 AM, cheater00 .  wrote:
 There are no quotas. I haven't enabled them. I believe the fstab says
 that - could they be enabled in another way? How do I check for sure?
 The man page doesn't say how to check the status:
 https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-quota

 On Mon, Oct 26, 2015 at 2:32 PM, Donald Pearson
  wrote:
> Accidentally didn't reply to the list the 1st time.
>
> I see the same issue when I have quotas enabled.  If you have quotas
> on, see if turning them off helps.
>
> On Mon, Oct 26, 2015 at 7:16 AM, cheater00 .  wrote:
>> Hi guys,
>> I am running into really bad performance. Here's my setup:
>>
>> WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu
>> 32-bit with kernel 4.0.4-040004-generic #201505171336.
>>
>> Single btrfs partition covering whole disk.
>>
>> Autodefrag is on.
>>
>> fstab line:
>> UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0
>>
>> Sometimes when files are being modified or removed, I see
>> btrfs-transacti eat 100% cpu; during this time no io operations
>> succeed, that is, they're all stalled. You can't even ls on that fs.
>> This happens for several minutes then normal operation resumes. There
>> doesn't seem to be a rule to what will trigger this, other than
>> opening a single file and reading usually works quite well. (say,
>> watching a movie while all other programs are closed). But even moving
>> files off the disks triggers some sort of bug. Just now I am moving a
>> few files (just 30gb worth) onto another disk, and the bug triggers.
>> So btrfs-transacti was eating my cpu for over 5 minutes and according
>> to mv's output after this was done and cpu usage went back to normal
>> what I was waiting for was for a tiny png file to be removed. This is
>> pretty bad.
>>
>> I have tried defragmenting directories where files are being accessed
>> and moved. This hasn't helped.
>>
>> This happens whether the FS is near full or not. It currently is near
>> full but it wasn't before and it still did that. It still has about ~
>> 100GB free space now.
>>
>> The more things are happening the more often this bug gets triggered.
>> So if I have utorrent running and its temporary downloads directory is
>> there, its download speed graph will be a few spikes of running at
>> several MB/sec separated by durations of 0 download speed.
>>
>> Nothing seems to show up in dmesg or syslog.
>>
>> I have asked in #btrfs but the suggestions ended up not fixing the
>> issue (autodefrag, defrag dirs).
>>
>> Please advise what I should do with this issue.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: random i/o error without error in dmesg

2015-10-26 Thread Marc Joliet

Hi

FWIW, this sounds like what I've been seeing with dovecot.  In case it's 
relevant, I'll try to explain.

After some uptime, I'll see log messages like this:

Okt 26 12:05:46 thetick dovecot[467]: imap(marcec): Error: pread() failed with 
file /home/marcec/.mdbox/mailboxes/BTRFS/dbox-Mails/dovecot.index.log: 
Input/output error

Occasionally they go away by themselves, but usually I have to reboot to make 
them go away.  This happens when getmail attempts to fetch mail, which fails 
due to the above error.  After the reboot getmail succeeds again.

As in Szalma's case, btrfs-scrub never reports anything wrong.

I use LZO compression on the relevant file system, so I wanted to wait until 
kernel 4.1.11 before reporting this, but that hasn't hit Gentoo yet (and 
neither has 4.1.10, for some reason).  I don't use quotas.

According to the what I see in the systemd journal, the errors started on 
2015-06-01 with kernel 3.19.8.  Note that, strangely enough, I had been using 
that same version since 2015-05-23, so for more than a week before the error 
cropped up.  I checked whether I made any changes to the configuration, and 
found this:

diff --git a/kernels/kernel-config-3.19.8-gentoo b/kernels/kernel-
config-3.19.8-gentoo
index b061b31..8cf8eba 100644
--- a/kernels/kernel-config-3.19.8-gentoo
+++ b/kernels/kernel-config-3.19.8-gentoo
@@ -64,7 +64,7 @@ CONFIG_INIT_ENV_ARG_LIMIT=32
 CONFIG_CROSS_COMPILE=""
 # CONFIG_COMPILE_TEST is not set
 CONFIG_LOCALVERSION=""
-CONFIG_LOCALVERSION_AUTO=y
+# CONFIG_LOCALVERSION_AUTO is not set
 CONFIG_HAVE_KERNEL_GZIP=y
 CONFIG_HAVE_KERNEL_BZIP2=y
 CONFIG_HAVE_KERNEL_LZMA=y
@@ -73,8 +73,8 @@ CONFIG_HAVE_KERNEL_LZO=y
 CONFIG_HAVE_KERNEL_LZ4=y
 # CONFIG_KERNEL_GZIP is not set
 # CONFIG_KERNEL_BZIP2 is not set
-CONFIG_KERNEL_LZMA=y
-# CONFIG_KERNEL_XZ is not set
+# CONFIG_KERNEL_LZMA is not set
+CONFIG_KERNEL_XZ=y
 # CONFIG_KERNEL_LZO is not set
 # CONFIG_KERNEL_LZ4 is not set
 CONFIG_DEFAULT_HOSTNAME="(none)"
@@ -132,7 +132,7 @@ CONFIG_TICK_CPU_ACCOUNTING=y
 # CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
 # CONFIG_IRQ_TIME_ACCOUNTING is not set
 CONFIG_BSD_PROCESS_ACCT=y
-# CONFIG_BSD_PROCESS_ACCT_V3 is not set
+CONFIG_BSD_PROCESS_ACCT_V3=y
 CONFIG_TASKSTATS=y
 CONFIG_TASK_DELAY_ACCT=y
 CONFIG_TASK_XACCT=y

The only change I can think of that might affect anything is 
CONFIG_BSD_PROCESS_ACCT_V3=y (I don't remember why exactly I set it).  I can 
try without it set, but maybe the kernel configuration is a red herring?

Anyway, the current state of the system is:

# uname -r 
4.1.9-gentoo-r1
# btrfs filesystem show / 
Label: 'MARCEC_ROOT'  uuid: 0267d8b3-a074-460a-832d-5d5fd36bae64
Total devices 1 FS bytes used 74.40GiB
devid1 size 107.79GiB used 105.97GiB path /dev/sda1

btrfs-progs v4.2.2
# btrfs filesystem df /
Data, single: total=98.94GiB, used=72.30GiB
System, single: total=32.00MiB, used=20.00KiB
Metadata, single: total=7.00GiB, used=2.10GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

The filesystem is mounted as (leaving out subvolume mounts which use the same 
mount options):
/dev/sda1 on / type btrfs (rw,noatime,compress=lzo,ssd,discard,space_cache)

Greetings,
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.

Re: Bad fs performance, IO freezes

2015-10-26 Thread Liu Bo


On 10/26/2015 08:16 PM, cheater00 . wrote:

Hi guys,
I am running into really bad performance. Here's my setup:

WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu
32-bit with kernel 4.0.4-040004-generic #201505171336.

Single btrfs partition covering whole disk.

Autodefrag is on.

fstab line:
UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0

Sometimes when files are being modified or removed, I see
btrfs-transacti eat 100% cpu; during this time no io operations
succeed, that is, they're all stalled. You can't even ls on that fs.
This happens for several minutes then normal operation resumes. There
doesn't seem to be a rule to what will trigger this, other than
opening a single file and reading usually works quite well. (say,
watching a movie while all other programs are closed). But even moving
files off the disks triggers some sort of bug. Just now I am moving a
few files (just 30gb worth) onto another disk, and the bug triggers.
So btrfs-transacti was eating my cpu for over 5 minutes and according
to mv's output after this was done and cpu usage went back to normal
what I was waiting for was for a tiny png file to be removed. This is
pretty bad.

I have tried defragmenting directories where files are being accessed
and moved. This hasn't helped.

This happens whether the FS is near full or not. It currently is near
full but it wasn't before and it still did that. It still has about ~
100GB free space now.

The more things are happening the more often this bug gets triggered.
So if I have utorrent running and its temporary downloads directory is
there, its download speed graph will be a few spikes of running at
several MB/sec separated by durations of 0 download speed.

Nothing seems to show up in dmesg or syslog.

I have asked in #btrfs but the suggestions ended up not fixing the
issue (autodefrag, defrag dirs).

Please advise what I should do with this issue.


It might be related to delayed ref rework, the last time I saw this kind 
of hanging problem about btrfs-transaction eating cpu is that because 
btrfs doesn't merge delayed refs, it'd be best to try the lastest kernel 
and if the issue is not resolved, then we can work out a reproducer and 
provide debugging.


Thanks,

Liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bad fs performance, IO freezes

2015-10-26 Thread cheater00 .

Thanks for the reply. What version did this go into? I'll try getting
a prebuilt backport of the kernel, building source could slow things
down considerably, but debs will not be available for the latest few
minor versions I guess. So if you can tell me a min version, I'll try
to find the latest deb newer than that, or I'll build if that's not
available.

On Mon, Oct 26, 2015 at 3:25 PM, Liu Bo  wrote:
> On 10/26/2015 08:16 PM, cheater00 . wrote:
>>
>> Hi guys,
>> I am running into really bad performance. Here's my setup:
>>
>> WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu
>> 32-bit with kernel 4.0.4-040004-generic #201505171336.
>>
>> Single btrfs partition covering whole disk.
>>
>> Autodefrag is on.
>>
>> fstab line:
>> UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0
>>
>> Sometimes when files are being modified or removed, I see
>> btrfs-transacti eat 100% cpu; during this time no io operations
>> succeed, that is, they're all stalled. You can't even ls on that fs.
>> This happens for several minutes then normal operation resumes. There
>> doesn't seem to be a rule to what will trigger this, other than
>> opening a single file and reading usually works quite well. (say,
>> watching a movie while all other programs are closed). But even moving
>> files off the disks triggers some sort of bug. Just now I am moving a
>> few files (just 30gb worth) onto another disk, and the bug triggers.
>> So btrfs-transacti was eating my cpu for over 5 minutes and according
>> to mv's output after this was done and cpu usage went back to normal
>> what I was waiting for was for a tiny png file to be removed. This is
>> pretty bad.
>>
>> I have tried defragmenting directories where files are being accessed
>> and moved. This hasn't helped.
>>
>> This happens whether the FS is near full or not. It currently is near
>> full but it wasn't before and it still did that. It still has about ~
>> 100GB free space now.
>>
>> The more things are happening the more often this bug gets triggered.
>> So if I have utorrent running and its temporary downloads directory is
>> there, its download speed graph will be a few spikes of running at
>> several MB/sec separated by durations of 0 download speed.
>>
>> Nothing seems to show up in dmesg or syslog.
>>
>> I have asked in #btrfs but the suggestions ended up not fixing the
>> issue (autodefrag, defrag dirs).
>>
>> Please advise what I should do with this issue.
>
>
> It might be related to delayed ref rework, the last time I saw this kind of
> hanging problem about btrfs-transaction eating cpu is that because btrfs
> doesn't merge delayed refs, it'd be best to try the lastest kernel and if
> the issue is not resolved, then we can work out a reproducer and provide
> debugging.
>
> Thanks,
>
> Liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/5] btrfs-progs: Add all missing close_ctree and btrfs_close_all_devices

2015-10-26 Thread David Sterba

On Mon, Oct 26, 2015 at 06:28:17PM +0800, Zhao Lei wrote:
> This patch add all missing close_ctree and btrfs_close_all_devices
> to several tools in btrfs progs, to avoid memory leak.
> 
> Changelog v1->v2:
>  Move btrfs_close_all_devices() from cmd-XXX into btrfs.c to make
>  code simple, and avoid similar problem in cmd-XXX in future.
> 
> Zhao Lei (5):
>   btrfs-progs: btrfs: Add missing btrfs_close_all_devices for btrfs
> command
>   btrfs-progs: Remove all btrfs_close_all_devices in sub-command
>   btrfs-progs: Add all missing btrfs_close_all_devices to standalone
> tools
>   btrfs-progs: Add missing close_ctree to btrfs-select-super.c
>   btrfs-progs: use system's default path for math.h

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bad fs performance, IO freezes

2015-10-26 Thread cheater00 .

I don't remember doing that, but just to exclude everything, how do I check?

On Mon, Oct 26, 2015 at 2:45 PM, Donald Pearson
 wrote:
> AFAIK quotas aren't a mount option, but if you never enabled them and
> created the qgroups by hand that's your answer and the issue must be
> something else.
>
> On Mon, Oct 26, 2015 at 8:36 AM, cheater00 .  wrote:
>> There are no quotas. I haven't enabled them. I believe the fstab says
>> that - could they be enabled in another way? How do I check for sure?
>> The man page doesn't say how to check the status:
>> https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-quota
>>
>> On Mon, Oct 26, 2015 at 2:32 PM, Donald Pearson
>>  wrote:
>>> Accidentally didn't reply to the list the 1st time.
>>>
>>> I see the same issue when I have quotas enabled.  If you have quotas
>>> on, see if turning them off helps.
>>>
>>> On Mon, Oct 26, 2015 at 7:16 AM, cheater00 .  wrote:
 Hi guys,
 I am running into really bad performance. Here's my setup:

 WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu
 32-bit with kernel 4.0.4-040004-generic #201505171336.

 Single btrfs partition covering whole disk.

 Autodefrag is on.

 fstab line:
 UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0

 Sometimes when files are being modified or removed, I see
 btrfs-transacti eat 100% cpu; during this time no io operations
 succeed, that is, they're all stalled. You can't even ls on that fs.
 This happens for several minutes then normal operation resumes. There
 doesn't seem to be a rule to what will trigger this, other than
 opening a single file and reading usually works quite well. (say,
 watching a movie while all other programs are closed). But even moving
 files off the disks triggers some sort of bug. Just now I am moving a
 few files (just 30gb worth) onto another disk, and the bug triggers.
 So btrfs-transacti was eating my cpu for over 5 minutes and according
 to mv's output after this was done and cpu usage went back to normal
 what I was waiting for was for a tiny png file to be removed. This is
 pretty bad.

 I have tried defragmenting directories where files are being accessed
 and moved. This hasn't helped.

 This happens whether the FS is near full or not. It currently is near
 full but it wasn't before and it still did that. It still has about ~
 100GB free space now.

 The more things are happening the more often this bug gets triggered.
 So if I have utorrent running and its temporary downloads directory is
 there, its download speed graph will be a few spikes of running at
 several MB/sec separated by durations of 0 download speed.

 Nothing seems to show up in dmesg or syslog.

 I have asked in #btrfs but the suggestions ended up not fixing the
 issue (autodefrag, defrag dirs).

 Please advise what I should do with this issue.
 --
 To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bad fs performance, IO freezes

2015-10-26 Thread Donald Pearson

Accidentally didn't reply to the list the 1st time.

I see the same issue when I have quotas enabled.  If you have quotas
on, see if turning them off helps.

On Mon, Oct 26, 2015 at 7:16 AM, cheater00 .  wrote:
> Hi guys,
> I am running into really bad performance. Here's my setup:
>
> WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu
> 32-bit with kernel 4.0.4-040004-generic #201505171336.
>
> Single btrfs partition covering whole disk.
>
> Autodefrag is on.
>
> fstab line:
> UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0
>
> Sometimes when files are being modified or removed, I see
> btrfs-transacti eat 100% cpu; during this time no io operations
> succeed, that is, they're all stalled. You can't even ls on that fs.
> This happens for several minutes then normal operation resumes. There
> doesn't seem to be a rule to what will trigger this, other than
> opening a single file and reading usually works quite well. (say,
> watching a movie while all other programs are closed). But even moving
> files off the disks triggers some sort of bug. Just now I am moving a
> few files (just 30gb worth) onto another disk, and the bug triggers.
> So btrfs-transacti was eating my cpu for over 5 minutes and according
> to mv's output after this was done and cpu usage went back to normal
> what I was waiting for was for a tiny png file to be removed. This is
> pretty bad.
>
> I have tried defragmenting directories where files are being accessed
> and moved. This hasn't helped.
>
> This happens whether the FS is near full or not. It currently is near
> full but it wasn't before and it still did that. It still has about ~
> 100GB free space now.
>
> The more things are happening the more often this bug gets triggered.
> So if I have utorrent running and its temporary downloads directory is
> there, its download speed graph will be a few spikes of running at
> several MB/sec separated by durations of 0 download speed.
>
> Nothing seems to show up in dmesg or syslog.
>
> I have asked in #btrfs but the suggestions ended up not fixing the
> issue (autodefrag, defrag dirs).
>
> Please advise what I should do with this issue.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bad fs performance, IO freezes

2015-10-26 Thread cheater00 .

fwiw, I did this:

sudo btrfs qgroup show /media/X
ERROR: can't perform the search - No such file or directory
ERROR: can't list qgroups: No such file or directory

I assume this means no qgroups present, which means no quotas present.
Please correct me if I'm wrong.
So yes, the issue must lie elsewhere.

On Mon, Oct 26, 2015 at 2:46 PM, cheater00 .  wrote:
> I don't remember doing that, but just to exclude everything, how do I check?
>
> On Mon, Oct 26, 2015 at 2:45 PM, Donald Pearson
>  wrote:
>> AFAIK quotas aren't a mount option, but if you never enabled them and
>> created the qgroups by hand that's your answer and the issue must be
>> something else.
>>
>> On Mon, Oct 26, 2015 at 8:36 AM, cheater00 .  wrote:
>>> There are no quotas. I haven't enabled them. I believe the fstab says
>>> that - could they be enabled in another way? How do I check for sure?
>>> The man page doesn't say how to check the status:
>>> https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-quota
>>>
>>> On Mon, Oct 26, 2015 at 2:32 PM, Donald Pearson
>>>  wrote:
 Accidentally didn't reply to the list the 1st time.

 I see the same issue when I have quotas enabled.  If you have quotas
 on, see if turning them off helps.

 On Mon, Oct 26, 2015 at 7:16 AM, cheater00 .  wrote:
> Hi guys,
> I am running into really bad performance. Here's my setup:
>
> WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu
> 32-bit with kernel 4.0.4-040004-generic #201505171336.
>
> Single btrfs partition covering whole disk.
>
> Autodefrag is on.
>
> fstab line:
> UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0
>
> Sometimes when files are being modified or removed, I see
> btrfs-transacti eat 100% cpu; during this time no io operations
> succeed, that is, they're all stalled. You can't even ls on that fs.
> This happens for several minutes then normal operation resumes. There
> doesn't seem to be a rule to what will trigger this, other than
> opening a single file and reading usually works quite well. (say,
> watching a movie while all other programs are closed). But even moving
> files off the disks triggers some sort of bug. Just now I am moving a
> few files (just 30gb worth) onto another disk, and the bug triggers.
> So btrfs-transacti was eating my cpu for over 5 minutes and according
> to mv's output after this was done and cpu usage went back to normal
> what I was waiting for was for a tiny png file to be removed. This is
> pretty bad.
>
> I have tried defragmenting directories where files are being accessed
> and moved. This hasn't helped.
>
> This happens whether the FS is near full or not. It currently is near
> full but it wasn't before and it still did that. It still has about ~
> 100GB free space now.
>
> The more things are happening the more often this bug gets triggered.
> So if I have utorrent running and its temporary downloads directory is
> there, its download speed graph will be a few spikes of running at
> several MB/sec separated by durations of 0 download speed.
>
> Nothing seems to show up in dmesg or syslog.
>
> I have asked in #btrfs but the suggestions ended up not fixing the
> issue (autodefrag, defrag dirs).
>
> Please advise what I should do with this issue.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 5/4] copy_file_range.2: New page documenting copy_file_range()

2015-10-26 Thread Pádraig Brady

On 26/10/15 03:39, Christoph Hellwig wrote:
> On Sat, Oct 24, 2015 at 01:02:21PM +0100, P??draig Brady wrote:
>> I'm a bit worried about the sparse expansion and default reflinking
>> which might preclude cp(1) from using this call in most cases, but I will
>> test and try to use it. coreutils has heuristics for determining if files
>> are remote, which we might use to restrict to that use case.
> 
> Can you explain why reflinking and hole expansion are an issue if done
> locally and not if done remotely?  I'd really like to make the call as
> usable as possible for everyone, but we really need clear sem�ntics for
> that.

Fair point on local vs remote. I was just assuming that remote
copy offload would not do reflinking on the backend, or at
least wasn't an exposed option over the remote interface.

I get the impression that you think reflinking should be hidden
from the user, i.e. cp(1) should not have had the --reflink option
(for the last 6 years)?  I'm not convinced of that, and even so
I think lower level interfaces would benefit from finer grained options.
This would be especially useful since there is no general interface
to reflink at present. I was happy with the reflink control options,
thinking the extra control could allow cp to use this by default.

> Also note that Annas current series allows for hole filling - any decent
> implementation should not do them, but that's really a quality of
> implementation and not an interface issue.

I think you're saying the default `cp --sparse=auto` operation
could rely on copy_file_range(...complete file...), while
cp --sparse={always,never} would have to iterate over the
file, punching or filling holes as appropriate. I thought
Anna indicated differently wrt splice filling holes by default.

TBH I'm not clear on the semantics of the current implementation,
so need to test the above in various cases.

thanks,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bad fs performance, IO freezes

2015-10-26 Thread cheater00 .

There are no quotas. I haven't enabled them. I believe the fstab says
that - could they be enabled in another way? How do I check for sure?
The man page doesn't say how to check the status:
https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-quota

On Mon, Oct 26, 2015 at 2:32 PM, Donald Pearson
 wrote:
> Accidentally didn't reply to the list the 1st time.
>
> I see the same issue when I have quotas enabled.  If you have quotas
> on, see if turning them off helps.
>
> On Mon, Oct 26, 2015 at 7:16 AM, cheater00 .  wrote:
>> Hi guys,
>> I am running into really bad performance. Here's my setup:
>>
>> WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu
>> 32-bit with kernel 4.0.4-040004-generic #201505171336.
>>
>> Single btrfs partition covering whole disk.
>>
>> Autodefrag is on.
>>
>> fstab line:
>> UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0
>>
>> Sometimes when files are being modified or removed, I see
>> btrfs-transacti eat 100% cpu; during this time no io operations
>> succeed, that is, they're all stalled. You can't even ls on that fs.
>> This happens for several minutes then normal operation resumes. There
>> doesn't seem to be a rule to what will trigger this, other than
>> opening a single file and reading usually works quite well. (say,
>> watching a movie while all other programs are closed). But even moving
>> files off the disks triggers some sort of bug. Just now I am moving a
>> few files (just 30gb worth) onto another disk, and the bug triggers.
>> So btrfs-transacti was eating my cpu for over 5 minutes and according
>> to mv's output after this was done and cpu usage went back to normal
>> what I was waiting for was for a tiny png file to be removed. This is
>> pretty bad.
>>
>> I have tried defragmenting directories where files are being accessed
>> and moved. This hasn't helped.
>>
>> This happens whether the FS is near full or not. It currently is near
>> full but it wasn't before and it still did that. It still has about ~
>> 100GB free space now.
>>
>> The more things are happening the more often this bug gets triggered.
>> So if I have utorrent running and its temporary downloads directory is
>> there, its download speed graph will be a few spikes of running at
>> several MB/sec separated by durations of 0 download speed.
>>
>> Nothing seems to show up in dmesg or syslog.
>>
>> I have asked in #btrfs but the suggestions ended up not fixing the
>> issue (autodefrag, defrag dirs).
>>
>> Please advise what I should do with this issue.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bad fs performance, IO freezes

2015-10-26 Thread Donald Pearson

AFAIK quotas aren't a mount option, but if you never enabled them and
created the qgroups by hand that's your answer and the issue must be
something else.

On Mon, Oct 26, 2015 at 8:36 AM, cheater00 .  wrote:
> There are no quotas. I haven't enabled them. I believe the fstab says
> that - could they be enabled in another way? How do I check for sure?
> The man page doesn't say how to check the status:
> https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-quota
>
> On Mon, Oct 26, 2015 at 2:32 PM, Donald Pearson
>  wrote:
>> Accidentally didn't reply to the list the 1st time.
>>
>> I see the same issue when I have quotas enabled.  If you have quotas
>> on, see if turning them off helps.
>>
>> On Mon, Oct 26, 2015 at 7:16 AM, cheater00 .  wrote:
>>> Hi guys,
>>> I am running into really bad performance. Here's my setup:
>>>
>>> WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu
>>> 32-bit with kernel 4.0.4-040004-generic #201505171336.
>>>
>>> Single btrfs partition covering whole disk.
>>>
>>> Autodefrag is on.
>>>
>>> fstab line:
>>> UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0
>>>
>>> Sometimes when files are being modified or removed, I see
>>> btrfs-transacti eat 100% cpu; during this time no io operations
>>> succeed, that is, they're all stalled. You can't even ls on that fs.
>>> This happens for several minutes then normal operation resumes. There
>>> doesn't seem to be a rule to what will trigger this, other than
>>> opening a single file and reading usually works quite well. (say,
>>> watching a movie while all other programs are closed). But even moving
>>> files off the disks triggers some sort of bug. Just now I am moving a
>>> few files (just 30gb worth) onto another disk, and the bug triggers.
>>> So btrfs-transacti was eating my cpu for over 5 minutes and according
>>> to mv's output after this was done and cpu usage went back to normal
>>> what I was waiting for was for a tiny png file to be removed. This is
>>> pretty bad.
>>>
>>> I have tried defragmenting directories where files are being accessed
>>> and moved. This hasn't helped.
>>>
>>> This happens whether the FS is near full or not. It currently is near
>>> full but it wasn't before and it still did that. It still has about ~
>>> 100GB free space now.
>>>
>>> The more things are happening the more often this bug gets triggered.
>>> So if I have utorrent running and its temporary downloads directory is
>>> there, its download speed graph will be a few spikes of running at
>>> several MB/sec separated by durations of 0 download speed.
>>>
>>> Nothing seems to show up in dmesg or syslog.
>>>
>>> I have asked in #btrfs but the suggestions ended up not fixing the
>>> issue (autodefrag, defrag dirs).
>>>
>>> Please advise what I should do with this issue.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bad fs performance, IO freezes

2015-10-26 Thread cheater00 .

I have located 4.3.0-rc7 binaries which I will now try.

On Mon, Oct 26, 2015 at 3:38 PM, cheater00 .  wrote:
> Thanks for the reply. What version did this go into? I'll try getting
> a prebuilt backport of the kernel, building source could slow things
> down considerably, but debs will not be available for the latest few
> minor versions I guess. So if you can tell me a min version, I'll try
> to find the latest deb newer than that, or I'll build if that's not
> available.
>
> On Mon, Oct 26, 2015 at 3:25 PM, Liu Bo  wrote:
>> On 10/26/2015 08:16 PM, cheater00 . wrote:
>>>
>>> Hi guys,
>>> I am running into really bad performance. Here's my setup:
>>>
>>> WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu
>>> 32-bit with kernel 4.0.4-040004-generic #201505171336.
>>>
>>> Single btrfs partition covering whole disk.
>>>
>>> Autodefrag is on.
>>>
>>> fstab line:
>>> UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0
>>>
>>> Sometimes when files are being modified or removed, I see
>>> btrfs-transacti eat 100% cpu; during this time no io operations
>>> succeed, that is, they're all stalled. You can't even ls on that fs.
>>> This happens for several minutes then normal operation resumes. There
>>> doesn't seem to be a rule to what will trigger this, other than
>>> opening a single file and reading usually works quite well. (say,
>>> watching a movie while all other programs are closed). But even moving
>>> files off the disks triggers some sort of bug. Just now I am moving a
>>> few files (just 30gb worth) onto another disk, and the bug triggers.
>>> So btrfs-transacti was eating my cpu for over 5 minutes and according
>>> to mv's output after this was done and cpu usage went back to normal
>>> what I was waiting for was for a tiny png file to be removed. This is
>>> pretty bad.
>>>
>>> I have tried defragmenting directories where files are being accessed
>>> and moved. This hasn't helped.
>>>
>>> This happens whether the FS is near full or not. It currently is near
>>> full but it wasn't before and it still did that. It still has about ~
>>> 100GB free space now.
>>>
>>> The more things are happening the more often this bug gets triggered.
>>> So if I have utorrent running and its temporary downloads directory is
>>> there, its download speed graph will be a few spikes of running at
>>> several MB/sec separated by durations of 0 download speed.
>>>
>>> Nothing seems to show up in dmesg or syslog.
>>>
>>> I have asked in #btrfs but the suggestions ended up not fixing the
>>> issue (autodefrag, defrag dirs).
>>>
>>> Please advise what I should do with this issue.
>>
>>
>> It might be related to delayed ref rework, the last time I saw this kind of
>> hanging problem about btrfs-transaction eating cpu is that because btrfs
>> doesn't merge delayed refs, it'd be best to try the lastest kernel and if
>> the issue is not resolved, then we can work out a reproducer and provide
>> debugging.
>>
>> Thanks,
>>
>> Liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bad fs performance, IO freezes

2015-10-26 Thread cheater00 .

So far I cannot reproduce. If I don't post again this means the issue
has been fixed by updating the kernel.

On Mon, Oct 26, 2015 at 4:40 PM, cheater00 .  wrote:
> I have located 4.3.0-rc7 binaries which I will now try.
>
> On Mon, Oct 26, 2015 at 3:38 PM, cheater00 .  wrote:
>> Thanks for the reply. What version did this go into? I'll try getting
>> a prebuilt backport of the kernel, building source could slow things
>> down considerably, but debs will not be available for the latest few
>> minor versions I guess. So if you can tell me a min version, I'll try
>> to find the latest deb newer than that, or I'll build if that's not
>> available.
>>
>> On Mon, Oct 26, 2015 at 3:25 PM, Liu Bo  wrote:
>>> On 10/26/2015 08:16 PM, cheater00 . wrote:

 Hi guys,
 I am running into really bad performance. Here's my setup:

 WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu
 32-bit with kernel 4.0.4-040004-generic #201505171336.

 Single btrfs partition covering whole disk.

 Autodefrag is on.

 fstab line:
 UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0

 Sometimes when files are being modified or removed, I see
 btrfs-transacti eat 100% cpu; during this time no io operations
 succeed, that is, they're all stalled. You can't even ls on that fs.
 This happens for several minutes then normal operation resumes. There
 doesn't seem to be a rule to what will trigger this, other than
 opening a single file and reading usually works quite well. (say,
 watching a movie while all other programs are closed). But even moving
 files off the disks triggers some sort of bug. Just now I am moving a
 few files (just 30gb worth) onto another disk, and the bug triggers.
 So btrfs-transacti was eating my cpu for over 5 minutes and according
 to mv's output after this was done and cpu usage went back to normal
 what I was waiting for was for a tiny png file to be removed. This is
 pretty bad.

 I have tried defragmenting directories where files are being accessed
 and moved. This hasn't helped.

 This happens whether the FS is near full or not. It currently is near
 full but it wasn't before and it still did that. It still has about ~
 100GB free space now.

 The more things are happening the more often this bug gets triggered.
 So if I have utorrent running and its temporary downloads directory is
 there, its download speed graph will be a few spikes of running at
 several MB/sec separated by durations of 0 download speed.

 Nothing seems to show up in dmesg or syslog.

 I have asked in #btrfs but the suggestions ended up not fixing the
 issue (autodefrag, defrag dirs).

 Please advise what I should do with this issue.
>>>
>>>
>>> It might be related to delayed ref rework, the last time I saw this kind of
>>> hanging problem about btrfs-transaction eating cpu is that because btrfs
>>> doesn't merge delayed refs, it'd be best to try the lastest kernel and if
>>> the issue is not resolved, then we can work out a reproducer and provide
>>> debugging.
>>>
>>> Thanks,
>>>
>>> Liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bad fs performance, IO freezes

2015-10-26 Thread cheater00 .

I do not experience btrfs-transacti going up to 100% for minutes at a
time now (not reproduced yet) but I have it spiking up to say 30% for
a short while and everything jags during that time. So, say, if I am
watching youtube, the sound cuts out and the video drops out for a
bit. And if I'm typing, then what I typed during that time gets lost,
like if I never typed that.

I have also connected the same HDD bay with a USB3 cable instead of
USB2. It's on an USB3 port. So it's running via USB3 now.


On Mon, Oct 26, 2015 at 6:43 PM, cheater00 .  wrote:
> So far I cannot reproduce. If I don't post again this means the issue
> has been fixed by updating the kernel.
>
> On Mon, Oct 26, 2015 at 4:40 PM, cheater00 .  wrote:
>> I have located 4.3.0-rc7 binaries which I will now try.
>>
>> On Mon, Oct 26, 2015 at 3:38 PM, cheater00 .  wrote:
>>> Thanks for the reply. What version did this go into? I'll try getting
>>> a prebuilt backport of the kernel, building source could slow things
>>> down considerably, but debs will not be available for the latest few
>>> minor versions I guess. So if you can tell me a min version, I'll try
>>> to find the latest deb newer than that, or I'll build if that's not
>>> available.
>>>
>>> On Mon, Oct 26, 2015 at 3:25 PM, Liu Bo  wrote:
 On 10/26/2015 08:16 PM, cheater00 . wrote:
>
> Hi guys,
> I am running into really bad performance. Here's my setup:
>
> WD Red 6 TB connected over USB2 to my core i7 laptop, running Ubuntu
> 32-bit with kernel 4.0.4-040004-generic #201505171336.
>
> Single btrfs partition covering whole disk.
>
> Autodefrag is on.
>
> fstab line:
> UUID=... /media/X btrfs rw,nosuid,nodev,autodefrag 0 0
>
> Sometimes when files are being modified or removed, I see
> btrfs-transacti eat 100% cpu; during this time no io operations
> succeed, that is, they're all stalled. You can't even ls on that fs.
> This happens for several minutes then normal operation resumes. There
> doesn't seem to be a rule to what will trigger this, other than
> opening a single file and reading usually works quite well. (say,
> watching a movie while all other programs are closed). But even moving
> files off the disks triggers some sort of bug. Just now I am moving a
> few files (just 30gb worth) onto another disk, and the bug triggers.
> So btrfs-transacti was eating my cpu for over 5 minutes and according
> to mv's output after this was done and cpu usage went back to normal
> what I was waiting for was for a tiny png file to be removed. This is
> pretty bad.
>
> I have tried defragmenting directories where files are being accessed
> and moved. This hasn't helped.
>
> This happens whether the FS is near full or not. It currently is near
> full but it wasn't before and it still did that. It still has about ~
> 100GB free space now.
>
> The more things are happening the more often this bug gets triggered.
> So if I have utorrent running and its temporary downloads directory is
> there, its download speed graph will be a few spikes of running at
> several MB/sec separated by durations of 0 download speed.
>
> Nothing seems to show up in dmesg or syslog.
>
> I have asked in #btrfs but the suggestions ended up not fixing the
> issue (autodefrag, defrag dirs).
>
> Please advise what I should do with this issue.


 It might be related to delayed ref rework, the last time I saw this kind of
 hanging problem about btrfs-transaction eating cpu is that because btrfs
 doesn't merge delayed refs, it'd be best to try the lastest kernel and if
 the issue is not resolved, then we can work out a reproducer and provide
 debugging.

 Thanks,

 Liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

59 matches

Mail list logo