Re: [PATCH RFC v3 1/5] Revert "btrfs: add support for processing pending changes" related commits

2015-01-25 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH RFC v3 1/5] Revert "btrfs: add support for 
processing pending changes" related commits

From: Qu Wenruo 
To: dste...@suse.cz, linux-btrfs@vger.kernel.org, miao...@huawei.com
Date: 2015年01月26日 08:37


 Original Message 
Subject: Re: [PATCH RFC v3 1/5] Revert "btrfs: add support for 
processing pending changes" related commits

From: David Sterba 
To: Qu Wenruo 
Date: 2015年01月23日 22:57

On Fri, Jan 23, 2015 at 05:31:41PM +0800, Qu Wenruo wrote:

For mount option change, later patches will introduce copy-n-update
method and rwsem protects to keep mount options consistent during
transaction.

That's a better approach, for the mount options.

I'm glad that you like this method.
Although the description in this patch is outdated, it is now 
per-transaction mount option.

Sorry for the confusion.



For sysfs interface to change label/features, it will keep the same
behavior as 'btrfs pro set', so pending changes are also not needed.

This still leaves the transaction commit inside the syfs handler, that
was one of the points not to do that.

The callstack looks safe from, eg. the label handler:

[169148.523158] WARNING: CPU: 1 PID: 2044 at fs/btrfs/sysfs.c:394 
btrfs_label_store+0x135/0x190 [btrfs]()
[169148.533925] Modules linked in: btrfs dm_flakey rpcsec_gss_krb5 
loop [last unloaded: btrfs]
[169148.536950] CPU: 1 PID: 2044 Comm: bash Tainted: G W  
3.19.0-rc5-default+ #211
[169148.536952] Hardware name: Intel Corporation Santa Rosa 
platform/Matanzas, BIOS TSRSCRB1.86C.0047.B00.0610170821 10/17/06
[169148.536954]  018a 88007a753dc8 81a9898b 
018a
[169148.536963]   88007a753e08 81077f65 
880077fb0100
[169148.536972]  880075dc 880077fbff00 0009 
880075dc06d0

[169148.536980] Call Trace:
[169148.536983]  [] dump_stack+0x4f/0x6c
[169148.536991]  [] warn_slowpath_common+0x95/0xe0
[169148.537000]  [] warn_slowpath_null+0x1a/0x20
[169148.537005]  [] btrfs_label_store+0x135/0x190 
[btrfs]

[169148.537030]  [] kobj_attr_store+0x17/0x20
[169148.537037]  [] sysfs_kf_write+0x4f/0x70
[169148.537044]  [] kernfs_fop_write+0x128/0x180
[169148.537051]  [] vfs_write+0xd4/0x1d0
[169148.537059]  [] SyS_write+0x59/0xd0
[169148.537070]  [] system_call_fastpath+0x12/0x17

Lockep shows these locks held:

[169148.537296] 4 locks held by bash/2044:
[169148.537309]  #0:  (sb_writers#5){.+.+.+}, at: 
[] vfs_write+0x1b0/0x1d0
[169148.537319]  #1:  (&of->mutex){+.+.+.}, at: [] 
kernfs_fop_write+0x8e/0x180
[169148.537330]  #2:  (s_active#214){.+.+.+}, at: 
[] kernfs_fop_write+0x96/0x180
[169148.537342]  #3:  (tasklist_lock){.+.+..}, at: 
[] debug_show_all_locks+0x44/0x1e0


#3 is from lockdep
#2 is not really a lock, annotated vfs atomic counter
#0 is annotated atomic, the freezing barrier

#1 is a kernfs mutex that, afaics it's per file, but I don't like to see
the lock dependency here. That's a lock we can see now, but it's outside
of btrfs or the vfs. It's a matter of precaution.

Thanks for pointing out the problem.
It makes sense to delay it.

But we have btrfs-workqueue, why not put it to "worker" workqueue?

If using this method, we can just wrap btrfs_ioctl_set_fslabel() and 
queue it to fs_info->workers.
This can avoid the the lockdep problem, but the behavior is still 
inconsistent with the synchronized

ioctl method.
Although not perfect, it should be good enough and still clean enough.

Wait a second, #1 is a mutex, so I didn't quite understand the problem.
Just because it is not btrfs/vfs mutex so we want to avoid it?
It seems not convincing enough for me...

For readonly/freeze check, I prefer extra vfsmount from sb->s_mounts and 
use mnt_want_write() (handle ro)

and transaction (handle freeze).
So IMHO it just needs some small tweaks on the original implementation.

Thanks,
Qu


What do you think about such method?

Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: Make csum tree rebuild works with extent tree rebuild.

2015-01-25 Thread Qu Wenruo
Before this patch, csum tree rebuild will not work with extent tree
rebuild, since extent tree rebuild will only build up basic block
groups, but csum tree rebuild needs data extents to rebuild.
So if one use btrfsck with --init-csum-tree and --init-extent-tree, csum
tree will be empty and tons of "missing csum" error will be outputted.

This patch allows csum tree rebuild get its data from fs/subvol trees
using regular file extents(which is also the only one using csum tree
currently)

Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 158 +--
 1 file changed, 155 insertions(+), 3 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index 45d3468..bafa743 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -8534,8 +8534,141 @@ static int populate_csum(struct btrfs_trans_handle 
*trans,
return ret;
 }
 
-static int fill_csum_tree(struct btrfs_trans_handle *trans,
- struct btrfs_root *csum_root)
+static int fill_csum_tree_from_one_fs(struct btrfs_trans_handle *trans,
+ struct btrfs_root *csum_root,
+ struct btrfs_root *cur_root)
+{
+   struct btrfs_path *path;
+   struct btrfs_key key;
+   struct extent_buffer *node;
+   struct btrfs_file_extent_item *fi;
+   char *buf = NULL;
+   u64 start = 0;
+   u64 len = 0;
+   int slot = 0;
+   int ret = 0;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+   buf = malloc(cur_root->fs_info->csum_root->sectorsize);
+   if (!buf) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   key.objectid = 0;
+   key.offset = 0;
+   key.type = 0;
+
+   ret = btrfs_search_slot(NULL, cur_root, &key, path, 0, 0);
+   if (ret < 0)
+   goto out;
+   /* Iterate all regular file extents and fill its csum */
+   while (1) {
+   btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+
+   if (key.type != BTRFS_EXTENT_DATA_KEY)
+   goto next;
+   node = path->nodes[0];
+   slot = path->slots[0];
+   fi = btrfs_item_ptr(node, slot, struct btrfs_file_extent_item);
+   if (btrfs_file_extent_type(node, fi) != BTRFS_FILE_EXTENT_REG)
+   goto next;
+   start = btrfs_file_extent_disk_bytenr(node, fi);
+   len = btrfs_file_extent_disk_num_bytes(node, fi);
+
+   ret = populate_csum(trans, csum_root, buf, start, len);
+   if (ret == -EEXIST)
+   ret = 0;
+   if (ret < 0)
+   goto out;
+next:
+   /*
+* TODO: if next leaf is corrupted, jump to nearest next valid
+* leaf.
+*/
+   ret = btrfs_next_item(cur_root, path);
+   if (ret < 0)
+   goto out;
+   if (ret > 0) {
+   ret = 0;
+   goto out;
+   }
+   }
+
+out:
+   btrfs_free_path(path);
+   free(buf);
+   return ret;
+}
+
+static int fill_csum_tree_from_fs(struct btrfs_trans_handle *trans,
+ struct btrfs_root *csum_root)
+{
+   struct btrfs_fs_info *fs_info = csum_root->fs_info;
+   struct btrfs_path *path;
+   struct btrfs_root *tree_root = fs_info->tree_root;
+   struct btrfs_root *cur_root;
+   struct extent_buffer *node;
+   struct btrfs_key key;
+   int slot = 0;
+   int ret = 0;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   key.objectid = BTRFS_FS_TREE_OBJECTID;
+   key.offset = 0;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+
+   ret = btrfs_search_slot(NULL, tree_root, &key, path, 0, 0);
+   if (ret < 0)
+   goto out;
+   if (ret > 0) {
+   ret = -ENOENT;
+   goto out;
+   }
+
+   while (1) {
+   node = path->nodes[0];
+   slot = path->slots[0];
+   btrfs_item_key_to_cpu(node, &key, slot);
+   if (key.objectid > BTRFS_LAST_FREE_OBJECTID)
+   goto out;
+   if (key.type != BTRFS_ROOT_ITEM_KEY)
+   goto next;
+   if (!is_fstree(key.objectid))
+   goto next;
+   key.offset = (u64)-1;
+
+   cur_root = btrfs_read_fs_root(fs_info, &key);
+   if (IS_ERR(cur_root) || !cur_root) {
+   fprintf(stderr, "Fail to read fs/subvol tree: %lld\n",
+   key.objectid);
+   goto out;
+   }
+   ret = fill_csum_tree_from_one_fs(trans, csum_root, cur_root);
+   if (ret < 0)
+   goto out;
+next:
+   ret = btrfs_nex

Re: 3.19-rc5: Bug 91911: [REGRESSION] rm command hangs big time with deleting a lot of files at once

2015-01-25 Thread Zygo Blaxell
On Fri, Jan 23, 2015 at 02:38:09PM +, Holger Hoffstätte wrote:
> On Fri, 23 Jan 2015 15:01:28 +0100, Martin Steigerwald wrote:
> 
> > Hi!
> > 
> > Anyone seen this?
> > 
> > Reported as:
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=91911
> 
> You might be interested in:
> 
> https://git.kernel.org/cgit/linux/kernel/git/josef/btrfs-next.git/commit/?h=evict-softlockup&id=29249e14d6e3379a5c4bb098dd4beddfefbc606f
> 
> and
> 
> https://git.kernel.org/cgit/linux/kernel/git/josef/btrfs-next.git/commit/?h=evict-softlockup&id=e4a58b71ff981b098ac3371f4d573dc6a90006ce
>
> I'm sure everyone would love to hear how this works out for you ;-)

I merged both commits and I've been running with them since Friday.
Several softlockups since then, in unlinkat() and renameat2().
Some typical stacks:

[] ? free_extent_state.part.29+0x34/0xb0
[] ? free_extent_state+0x25/0x30
[] ? __set_extent_bit+0x3aa/0x4f0
[] ? _raw_spin_unlock_irqrestore+0x32/0x70
[] ? get_parent_ip+0x11/0x50
[] schedule+0x29/0x70
[] lock_extent_bits+0x1b0/0x200
[] ? add_wait_queue+0x60/0x60
[] btrfs_evict_inode+0x139/0x550
[] evict+0xb8/0x190
[] iput+0x105/0x1a0
[] do_unlinkat+0x189/0x2d0
[] ? SyS_newlstat+0x2a/0x40
[] ? trace_hardirqs_on_thunk+0x3a/0x3c
[] SyS_unlink+0x16/0x20
[] system_call_fastpath+0x1a/0x1f

Note that the above stack is _very_ typical.  I've caught machines
with well over 100 processes stuck in "D" state with an identical stack
trace from "btrfs_evict_inode" to "system_call_fastpath".

[] lock_extent_bits+0x1b0/0x200   

[] btrfs_evict_inode+0x12a/0x540  

[] evict+0xb8/0x190   

[] iput+0x105/0x1a0   

[] __dentry_kill+0x190/0x200  

[] dput+0xba/0x190

[] SyS_renameat2+0x510/0x580  

[] SyS_rename+0x1e/0x20   

[] system_call_fastpath+0x16/0x1b 

[] 0x 


The above is a typical renameat2() softlockup stack.

[] wait_on_page_bit+0xb8/0xc0
[] shrink_page_list+0x8c4/0xb20
[] shrink_inactive_list+0x19d/0x500
[] shrink_lruvec+0x59d/0x760
[] shrink_zone+0x83/0x1c0
[] do_try_to_free_pages+0x16e/0x460
[] try_to_free_mem_cgroup_pages+0x9e/0x180
[] mem_cgroup_reclaim+0x4e/0xe0
[] try_charge+0x15d/0x500
[] mem_cgroup_try_charge+0x8d/0x1a0
[] __add_to_page_cache_locked+0x8f/0x280
[] add_to_page_cache_lru+0x28/0x80
[] pagecache_get_page+0xab/0x1d0
[] alloc_extent_buffer+0xe4/0x380 [btrfs]
[] btrfs_find_create_tree_block+0x1f/0x30 [btrfs]
[] readahead_tree_block+0x1f/0x60 [btrfs]
[] reada_for_balance+0x160/0x1e0 [btrfs]
[] btrfs_search_slot+0x687/0xac0 [btrfs]
[] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
[] __btrfs_update_delayed_inode+0x65/0x210 [btrfs]
[] btrfs_commit_inode_delayed_inode+0x13a/0x150 [btrfs]
[] btrfs_evict_inode+0x2ca/0x520 [btrfs]
[] evict+0xb8/0x190
[] iput+0x105/0x1a0
[] __dentry_kill+0x1b8/0x210
[] dput+0xba/0x190
[] SyS_renameat2+0x440/0x530
[] SyS_rename+0x1e/0x20
[] system_call_fastpath+0x1a/0x1f
[] 0x

The last one is a little older (from 3.17.4) but it's a bit more
interesting.  Since mem cgroups were involved, I allocated a lot more
RAM to the cgroup and it seems to have helped reduce the frequency of
this bug occurring.


> 
> -h
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: Digital signature


Resolved...ish. was: Re: spurious I/O errors from btrfs...at the caching layer?

2015-01-25 Thread Zygo Blaxell
It seems that the rate of spurious I/O errors varies most according to
the vm.vfs_cache_pressure sysctl.  At '10' the I/O errors occur so often
that building a kernel is impossible.  At '100' I can't reproduce even
a single I/O error.

I guess this is own my fault for using non-default sysctl parameters,
although I wouldn't expect any value of this sysctl to cause these
symptoms... :-P


On Sun, Jan 25, 2015 at 11:50:36AM -0500, Zygo Blaxell wrote:
> On Sat, Jan 24, 2015 at 01:06:01PM -0500, Zygo Blaxell wrote:
> > I am seeing a lot of spurious I/O errors that look like they come from
> > the cache-facing side of btrfs.  While running a heavy load with some
> > extent-sharing (e.g. building 20 Linux kernels at once from source trees
> > copied with 'cp -a --reflink=always'), some files will return spurious
> > EIO on read.  It happens often enough to prevent a Linux kernel build
> > about 1/3 of the time.
> [...]
> > Observed from 3.17..3.18.3.  All filesystems affected use skinny-metadata.
> > No filesystems that are not using skinny-metadata seem to have this
> > problem.
> 
> I ran a test overnight using 3.18.3 on a freshly formatted filesystem with
> no skinny-metadata.
> 
> The test consisted of creating reflink copies of a Linux kernel source
> tree and running kernel builds in each copy simultaneously, like this:
> 
>   # assume you have a ready-to-build kernel tree in 'linux'
>   for x in $(seq 1 5); do
>   cp -a --reflink linux linux-$x
>   done
> 
>   # build all the kernels at once
>   for x in $(seq 1 5); do
>   (cd linux-$x && make -j10 2>&1 | tee make.log) &
>   done
> 
>   wait
>   # then tail all the make.logs and see how many failed due to
>   # I/O errors
> 
> Spurious I/O errors occured with as few as two concurrent kernel builds.
> 
> The test machine has 16GB of RAM and the filesystem is also 16GB,
> RAID1 on two spinning disks.
> 




signature.asc
Description: Digital signature


Re: [PATCH RFC v3 1/5] Revert "btrfs: add support for processing pending changes" related commits

2015-01-25 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH RFC v3 1/5] Revert "btrfs: add support for 
processing pending changes" related commits

From: David Sterba 
To: Qu Wenruo 
Date: 2015年01月23日 22:57

On Fri, Jan 23, 2015 at 05:31:41PM +0800, Qu Wenruo wrote:

For mount option change, later patches will introduce copy-n-update
method and rwsem protects to keep mount options consistent during
transaction.

That's a better approach, for the mount options.

I'm glad that you like this method.
Although the description in this patch is outdated, it is now 
per-transaction mount option.

Sorry for the confusion.



For sysfs interface to change label/features, it will keep the same
behavior as 'btrfs pro set', so pending changes are also not needed.

This still leaves the transaction commit inside the syfs handler, that
was one of the points not to do that.

The callstack looks safe from, eg. the label handler:

[169148.523158] WARNING: CPU: 1 PID: 2044 at fs/btrfs/sysfs.c:394 
btrfs_label_store+0x135/0x190 [btrfs]()
[169148.533925] Modules linked in: btrfs dm_flakey rpcsec_gss_krb5 loop [last 
unloaded: btrfs]
[169148.536950] CPU: 1 PID: 2044 Comm: bash Tainted: GW  
3.19.0-rc5-default+ #211
[169148.536952] Hardware name: Intel Corporation Santa Rosa platform/Matanzas, 
BIOS TSRSCRB1.86C.0047.B00.0610170821 10/17/06
[169148.536954]  018a 88007a753dc8 81a9898b 
018a
[169148.536963]   88007a753e08 81077f65 
880077fb0100
[169148.536972]  880075dc 880077fbff00 0009 
880075dc06d0
[169148.536980] Call Trace:
[169148.536983]  [] dump_stack+0x4f/0x6c
[169148.536991]  [] warn_slowpath_common+0x95/0xe0
[169148.537000]  [] warn_slowpath_null+0x1a/0x20
[169148.537005]  [] btrfs_label_store+0x135/0x190 [btrfs]
[169148.537030]  [] kobj_attr_store+0x17/0x20
[169148.537037]  [] sysfs_kf_write+0x4f/0x70
[169148.537044]  [] kernfs_fop_write+0x128/0x180
[169148.537051]  [] vfs_write+0xd4/0x1d0
[169148.537059]  [] SyS_write+0x59/0xd0
[169148.537070]  [] system_call_fastpath+0x12/0x17

Lockep shows these locks held:

[169148.537296] 4 locks held by bash/2044:
[169148.537309]  #0:  (sb_writers#5){.+.+.+}, at: [] 
vfs_write+0x1b0/0x1d0
[169148.537319]  #1:  (&of->mutex){+.+.+.}, at: [] 
kernfs_fop_write+0x8e/0x180
[169148.537330]  #2:  (s_active#214){.+.+.+}, at: [] 
kernfs_fop_write+0x96/0x180
[169148.537342]  #3:  (tasklist_lock){.+.+..}, at: [] 
debug_show_all_locks+0x44/0x1e0

#3 is from lockdep
#2 is not really a lock, annotated vfs atomic counter
#0 is annotated atomic, the freezing barrier

#1 is a kernfs mutex that, afaics it's per file, but I don't like to see
the lock dependency here. That's a lock we can see now, but it's outside
of btrfs or the vfs. It's a matter of precaution.

Thanks for pointing out the problem.
It makes sense to delay it.

But we have btrfs-workqueue, why not put it to "worker" workqueue?

If using this method, we can just wrap btrfs_ioctl_set_fslabel() and 
queue it to fs_info->workers.
This can avoid the the lockdep problem, but the behavior is still 
inconsistent with the synchronized

ioctl method.
Although not perfect, it should be good enough and still clean enough.

What do you think about such method?

Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.

2015-01-25 Thread Miao Xie
On Fri, 23 Jan 2015 17:59:49 +0100, David Sterba wrote:
> On Wed, Jan 21, 2015 at 03:04:02PM +0800, Miao Xie wrote:
>>> Pending changes are *not* only mount options. Feature change and label 
>>> change
>>> are also pending changes if using sysfs.
>>
>> My miss, I don't notice feature and label change by sysfs.
>>
>> But the implementation of feature and label change by sysfs is wrong, we can
>> not change them without write permission.
> 
> Label change does not happen if the fs is readonly. If the filesystem is
> RW and label is changed through sysfs, then remount to RO will sync the
> filesystem and the new label will be saved.
> 
> The sysfs features write handler is missing that protection though, I'll
> send a patch.

First, the R/O protection is so cheap, there is a race between R/O remount and
label/feature change, please consider the following case:
Remount R/O taskLabel/Attr Change Task
Check R/O
remount ro R/O
change Label/feature

Second, it forgets to handle the freezing event.

> 
>>> For freeze, it's not the same problem since the fs will be unfreeze sooner 
>>> or
>>> later and transaction will be initiated.
>>
>> You can not assume the operations of the users, they might freeze the fs and
>> then shutdown the machine.
> 
> The semantics of freezing should make the on-device image consistent,
> but still keep some changes in memory.
> 
> For example, if we change the features/label through sysfs, and then 
> umount
> the fs,
 It is different from pending change.
>>> No, now features/label changing using sysfs both use pending changes to do 
>>> the
>>> commit.
>>> See BTRFS_PENDING_COMMIT bit.
>>> So freeze -> change features/label -> sync will still cause the deadlock in 
>>> the
>>> same way,
>>> and you can try it yourself.
>>
>> As I said above, the implementation of sysfs feature and label change is 
>> wrong,
>> it is better to separate them from the pending mount option change, make the
>> sysfs feature and label change be done in the context of transaction after
>> getting the write permission. If so, we needn't do anything special when sync
>> the fs.
> 
> That would mean to drop the write support of sysfs files that change
> global filesystem state (label and features right now). This would leave
> only the ioctl way to do that. I'd like to keep the sysfs write support
> though for ease of use from scripts and languages not ioctl-friendly.
> .

not drop the write support of sysfs, just fix the bug and make it change the
label and features under the writable context.

Thanks
Miao
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3.19-rc5: Bug 91911: [REGRESSION] rm command hangs big time with deleting a lot of files at once

2015-01-25 Thread Zygo Blaxell
On Fri, Jan 23, 2015 at 06:29:40PM -0500, Zygo Blaxell wrote:
> On Fri, Jan 23, 2015 at 03:01:28PM +0100, Martin Steigerwald wrote:
> > Hi!
> > 
> > Anyone seen this?
> > 
> > Reported as:
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=91911
> 
> I have seen something like this since 3.15.
> 
> I've also seen its cousin, which gets stuck in evict_inode, but the stacks
> of the hanging processes start from renameat2() instead of unlinkat().
> I haven't seen the renameat2() variant of this bug since 3.18-rc6.

Since I wrote the above paragraph two days ago, I've seen the
renameat2()/btrfs_evict_inode bug twice on 3.18.3.  :-P

> > I just want to get rid of some 127000+ akonadi lost+found files, any delete 
> > command I start just gets rid of some thousands and then hangs.
> > 
> > merkaba:~> btrfs fi df /home
> > Data, RAID1: total=160.92GiB, used=111.09GiB
> > System, RAID1: total=32.00MiB, used=48.00KiB
> > Metadata, RAID1: total=5.99GiB, used=2.49GiB
> > GlobalReserve, single: total=512.00MiB, used=0.00B
> > merkaba:~> btrfs fi sh /home
> > Label: 'home'  uuid: […]
> > Total devices 2 FS bytes used 113.58GiB
> > devid1 size 170.00GiB used 166.94GiB path /dev/mapper/msata-
> > home
> > devid2 size 170.00GiB used 166.94GiB path /dev/mapper/sata-
> > home
> > 
> > Btrfs v3.18
> > 
> > 
> > merkaba:/home/ms/.local/share/akonadi#1> find file_lost+found | wc -l  
> > 110070
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found -delete &
> > [4] 2660
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found | wc -l  
> > 101645
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found -delete &
> > [5] 2663
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found -delete &
> > [6] 2664
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found | wc -l  
> > 91369
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found | wc -l
> > 89844
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found | wc -l
> > 88042
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found -delete &
> > [7] 2671
> > merkaba:/home/ms/.local/share/akonadi> uname -a
> > Linux merkaba 3.19.0-rc5-tp520-trim-all-bgroups+ #18 SMP PREEMPT Mon Jan 
> > 19 09:58:33 CET 2015 x86_64 GNU/Linux
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found -delete &
> > [8] 2694
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found -delete &
> > [9] 2700
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found | wc -l  
> > 67278
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found | wc -l
> > 65244
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found | wc -l
> > 63713
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found | wc -l
> > 62725
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found | wc -l
> > 62213
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found | wc -l
> > 61213
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found -delete &
> > [10] 2715
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found | wc -l  
> > 60470
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found -delete &
> > [11] 2718
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found | wc -l  
> > 53303
> > 
> > 
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found | wc -l
> > 51396
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found | wc -l
> > 51396
> > merkaba:/home/ms/.local/share/akonadi> find file_lost+found | wc -l
> > 51396
> > 
> > 
> > merkaba:/home/ms/.local/share/akonadi> ps aux | grep find
> > ms2647  0.4  0.2  43096 36204 pts/3D+   14:45   0:00 find 
> > file_lost+found -delete
> > root  2651  0.3  0.2  42568 35688 pts/0DN   14:45   0:00 find 
> > file_lost+found -delete
> > root  2654  2.7  0.2  44544 35652 pts/0DN   14:46   0:05 find 
> > file_lost+found -delete
> > root  2657  0.3  0.2  44016 35048 pts/0DN   14:46   0:00 find 
> > file_lost+found -delete
> > root  2660  2.1  0.1  39136 32280 pts/0DN   14:46   0:03 find 
> > file_lost+found -delete
> > root  2663  0.2  0.1  36760 29988 pts/0DN   14:46   0:00 find 
> > file_lost+found -delete
> > root  2664  3.3  0.1  36760 29888 pts/0DN   14:46   0:05 find 
> > file_lost+found -delete
> > root  2671  0.9  0.1  33856 26984 pts/0DN   14:46   0:01 find 
> > file_lost+found -delete
> > root  2694  1.1  0.1  32404 25380 pts/0DN   14:47   0:01 find 
> > file_lost+found -delete
> > root  2700  4.0  0.1  30952 24064 pts/0DN   14:47   0:04 find 
> > file_lost+found -delete
> > root  2715  0.3  0.1  26200 19332 pts/0DN   14:47   0:00 find 
> > file_lost+found -delete
> > root  2718  4.1  0.1  26068 19068 pts/0DN   14:47   0:02 find 
> > file_lost+found -delete
> > root  2840  0.0  0.0  12672  1592 pts/0S+   14:49   0:00 grep find
> > merkaba:/home/ms/.local/share/akonadi> ps aux | grep rm 

Re: spuious I/O errors from btrfs...at the caching layer?

2015-01-25 Thread Zygo Blaxell
On Sat, Jan 24, 2015 at 01:06:01PM -0500, Zygo Blaxell wrote:
> I am seeing a lot of spurious I/O errors that look like they come from
> the cache-facing side of btrfs.  While running a heavy load with some
> extent-sharing (e.g. building 20 Linux kernels at once from source trees
> copied with 'cp -a --reflink=always'), some files will return spurious
> EIO on read.  It happens often enough to prevent a Linux kernel build
> about 1/3 of the time.
[...]
> Observed from 3.17..3.18.3.  All filesystems affected use skinny-metadata.
> No filesystems that are not using skinny-metadata seem to have this
> problem.

I ran a test overnight using 3.18.3 on a freshly formatted filesystem with
no skinny-metadata.

The test consisted of creating reflink copies of a Linux kernel source
tree and running kernel builds in each copy simultaneously, like this:

# assume you have a ready-to-build kernel tree in 'linux'
for x in $(seq 1 5); do
cp -a --reflink linux linux-$x
done

# build all the kernels at once
for x in $(seq 1 5); do
(cd linux-$x && make -j10 2>&1 | tee make.log) &
done

wait
# then tail all the make.logs and see how many failed due to
# I/O errors

Spurious I/O errors occured with as few as two concurrent kernel builds.

The test machine has 16GB of RAM and the filesystem is also 16GB,
RAID1 on two spinning disks.



signature.asc
Description: Digital signature


Re: btrfs convert running out of space

2015-01-25 Thread Marc Joliet
Am Fri, 23 Jan 2015 08:46:23 + (UTC)
schrieb Duncan <1i5t5.dun...@cox.net>:

> Marc Joliet posted on Fri, 23 Jan 2015 08:54:41 +0100 as excerpted:
> 
> > Am Fri, 23 Jan 2015 04:34:19 + (UTC)
> > schrieb Duncan <1i5t5.dun...@cox.net>:
> > 
> >> Gareth Pye posted on Fri, 23 Jan 2015 08:58:08 +1100 as excerpted:
> >> 
> >> > What are the chances that splitting all the large files up into sub
> >> > gig pieces, finish convert, then recombine them all will work?
> >> 
> > [...]
> >> Option 2: Since new files should be created using the desired target
> >> mode (raid1 IIRC), you may actually be able to move them off and
> >> immediately back on, so they appear as new files and thus get created
> >> in the desired mode.
> > 
> > With current coreutils, wouldn't that also work if he moves the files to
> > another (temporary) subvolume? (And with future coreutils, by copying
> > the files without using reflinks and then removing the originals.)
> 
> If done correctly, yes.
> 
> However, "off the filesystem" is far simpler to explain over email or the 
> like, and is much less ambiguous in terms of "OK, but did you do it 
> 'correctly'" if it doesn't end up helping.  If it doesn't work, it 
> doesn't work.  If "move to a different subvolume under specific 
> conditions in terms of reflinking and the like" doesn't work, there's 
> always the question of whether it /really/ didn't work, or if somehow the 
> instructions weren't clear enough and thus failure was simply the result 
> of a failure to fully meet the technical requirements.
> 
> Of course if I was doing it myself, and if I was absolutely sure of the 
> technical details in terms of what command I had to use to be /sure/ it 
> didn't simply reflink and thus defeat the whole exercise, I'd likely use 
> the shortcut.  But in reality, if it didn't work I'd be second-guessing 
> myself and would probably move everything entirely off and back on to be 
> sure, and knowing that, I'd probably do it the /sure/ way in the first 
> place, avoiding the chance of having to redo it to prove to myself that 
> I'd done it correctly.
> 
> Of course, having demonstrated to myself that it worked, if I ever had 
> the problem again, I might try the shortcut, just to demonstrate to my 
> own satisfaction the full theory that the effect of the shortcut was the 
> same as the effect of doing it the longer and more fool-proof way.  But 
> of course I'd rather not have the opportunity to try that second-half 
> proof. =:^)
> 
> Make sense? =:^)

I was going to argue that my suggestion was hardly difficult to get right, but
then I read that cp defaults to --reflink=always and that it is not possible to
turn off reflinks (i.e., there is no --reflink=never).

So then would have to consider alternatives like dd, and, well, you are right,
I suppose :) .

(Of course, with the *current* version of coreutils, the simple "mv somefile
tmp_subvol/; mv tmp_subvol/somefile ." will still work.)

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


pgpo2SzLpOPXM.pgp
Description: Digitale Signatur von OpenPGP