Re: [PATCH v2 1/8] Btrfs: introduce a tree for items that map UUIDs to something

2013-05-15 Thread Liu Bo
On Tue, May 14, 2013 at 11:36:53AM +0200, Stefan Behrens wrote: > Mapping UUIDs to subvolume IDs is an operation with a high effort > today. Today, the algorithm even has quadratic effort (based on the > number of existing subvolumes), which means, that it takes minutes > to send/receive a single s

Re: [PATCH 06/17] Btrfs: introduce grab/put functions for the root of the fs/file tree

2013-05-15 Thread Liu Bo
On Thu, May 16, 2013 at 01:34:11PM +0800, Miao Xie wrote: > On Thu, 16 May 2013 13:15:57 +0800, Liu Bo wrote: > > On Thu, May 16, 2013 at 12:31:11PM +0800, Miao Xie wrote: > >> On thu, 16 May 2013 11:36:46 +0800, Liu Bo wrote: > >>> On Wed, May 15, 2013 at 03:48:20PM +0800, Miao Xie wrote: > T

Re: [PATCH 06/17] Btrfs: introduce grab/put functions for the root of the fs/file tree

2013-05-15 Thread Miao Xie
On Thu, 16 May 2013 13:15:57 +0800, Liu Bo wrote: > On Thu, May 16, 2013 at 12:31:11PM +0800, Miao Xie wrote: >> On thu, 16 May 2013 11:36:46 +0800, Liu Bo wrote: >>> On Wed, May 15, 2013 at 03:48:20PM +0800, Miao Xie wrote: The grab/put funtions will be used in the next patch, which need grab

Re: [PATCH 06/17] Btrfs: introduce grab/put functions for the root of the fs/file tree

2013-05-15 Thread Liu Bo
On Thu, May 16, 2013 at 12:31:11PM +0800, Miao Xie wrote: > On thu, 16 May 2013 11:36:46 +0800, Liu Bo wrote: > > On Wed, May 15, 2013 at 03:48:20PM +0800, Miao Xie wrote: > >> The grab/put funtions will be used in the next patch, which need grab > >> the root object and ensure it is not freed. We

Re: [PATCH 06/17] Btrfs: introduce grab/put functions for the root of the fs/file tree

2013-05-15 Thread Miao Xie
On thu, 16 May 2013 11:36:46 +0800, Liu Bo wrote: > On Wed, May 15, 2013 at 03:48:20PM +0800, Miao Xie wrote: >> The grab/put funtions will be used in the next patch, which need grab >> the root object and ensure it is not freed. We use reference counter >> instead of the srcu lock is to aovid bloc

Re: [PATCH 10/17] Btrfs: just flush the delalloc inodes in the source tree before snapshot creation

2013-05-15 Thread Miao Xie
On Thu, 16 May 2013 11:20:39 +0800, Liu Bo wrote: > On Wed, May 15, 2013 at 03:48:24PM +0800, Miao Xie wrote: >> Before applying this patch, we need flush all the delalloc inodes in >> the fs when we want to create a snapshot, it wastes time, and make >> the transaction commit be blocked for a long

Re: [PATCH 06/17] Btrfs: introduce grab/put functions for the root of the fs/file tree

2013-05-15 Thread Liu Bo
On Wed, May 15, 2013 at 03:48:20PM +0800, Miao Xie wrote: > The grab/put funtions will be used in the next patch, which need grab > the root object and ensure it is not freed. We use reference counter > instead of the srcu lock is to aovid blocking the memory reclaim task, > which invokes synchroni

Re: [PATCH 10/17] Btrfs: just flush the delalloc inodes in the source tree before snapshot creation

2013-05-15 Thread Liu Bo
On Wed, May 15, 2013 at 03:48:24PM +0800, Miao Xie wrote: > Before applying this patch, we need flush all the delalloc inodes in > the fs when we want to create a snapshot, it wastes time, and make > the transaction commit be blocked for a long time. It means some other > user operation would also

Re: [PATCH v2 4/8] Btrfs: maintain subvolume items in the UUID tree

2013-05-15 Thread Liu Bo
On Wed, May 15, 2013 at 05:39:35PM +0200, Stefan Behrens wrote: > On Tue, 14 May 2013 18:44:11 +0800, Liu Bo wrote: > > On Tue, May 14, 2013 at 11:36:56AM +0200, Stefan Behrens wrote: > >> @@ -396,7 +403,7 @@ static noinline int create_subvol(struct inode *dir, > >> * of create_snapshot(). > >>

[PATCH] btrfs-progs: restore: use long option for the path regex

2013-05-15 Thread David Sterba
Current way of specifying the path to match is not very comfortable, but the feature itself is very useful. Let's save the short option -m for a more user friendly syntax and keep a long option --path-regex with the current syntax. CC: Peter Stuge Signed-off-by: David Sterba --- cmds-restore.c

Re: [RFC v0 4/4] nfs, nfsd: rough sys_copy_range and COPY support

2013-05-15 Thread J. Bruce Fields
On Wed, May 15, 2013 at 08:21:54PM +, Myklebust, Trond wrote: > On Wed, 2013-05-15 at 16:19 -0400, J. Bruce Fields wrote: > > On Tue, May 14, 2013 at 02:15:26PM -0700, Zach Brown wrote: > > > This crude patch illustrates the simplest plumbing involved in > > > supporting sys_call_range with the

Re: [RFC v0 4/4] nfs, nfsd: rough sys_copy_range and COPY support

2013-05-15 Thread Myklebust, Trond
On Wed, 2013-05-15 at 16:19 -0400, J. Bruce Fields wrote: > On Tue, May 14, 2013 at 02:15:26PM -0700, Zach Brown wrote: > > This crude patch illustrates the simplest plumbing involved in > > supporting sys_call_range with the NFS COPY operation that's pending in > > the 4.2 draft spec. > > > > The

Re: [RFC v0 4/4] nfs, nfsd: rough sys_copy_range and COPY support

2013-05-15 Thread J. Bruce Fields
On Tue, May 14, 2013 at 02:15:26PM -0700, Zach Brown wrote: > This crude patch illustrates the simplest plumbing involved in > supporting sys_call_range with the NFS COPY operation that's pending in > the 4.2 draft spec. > > The patch is based on a previous prototype that used the COPY op to > imp

Re: [RFC v0 1/4] vfs: add copy_range syscall and vfs entry point

2013-05-15 Thread Zach Brown
On Wed, May 15, 2013 at 07:44:05PM +, Eric Wong wrote: > Why introduce a new syscall instead of extending sys_splice? Personally, I think it's ugly to have different operations use the same syscall just because their arguments match. But that preference aside, sure, if the consensus is that w

Re: [RFC v0 1/4] vfs: add copy_range syscall and vfs entry point

2013-05-15 Thread Eric Wong
Zach Brown wrote: > This adds a syscall and vfs entry point for clone_range which offloads > data copying between existing files. > > The syscall is a thin wrapper around the vfs entry point. Its arguments > are inspired by sys_splice(). Why introduce a new syscall instead of extending sys_spli

Re: I/O errors block the entire filesystem

2013-05-15 Thread Alexandre Oliva
On May 15, 2013, Josef Bacik wrote: > So this should only happen in the case that you are on a dm device it looks > like, is that how you are running? That was my first thought, but no, I'm using partitions out of the SATA disks directly. I even checked for stray dm out of fake raid or somesuch

Re: I/O errors block the entire filesystem

2013-05-15 Thread Alexandre Oliva
On May 14, 2013, Liu Bo wrote: >> In one of the failures that caused machine load spikes, I tried to >> collect info on active processes with perf top and SysRq-T, but nothing >> there seemed to explain the spike. Thoughts on how to figure out what's >> causing this? > Although I've seen your s

[PATCH] Btrfs-progs: add a newline to a free space cache message

2013-05-15 Thread Josef Bacik
Left out a newline in the generation check printf. Signed-off-by: Josef Bacik --- free-space-cache.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/free-space-cache.c b/free-space-cache.c index 5fb8ece..a30438b 100644 --- a/free-space-cache.c +++ b/free-space-cache.c @

Re: subvol copying

2013-05-15 Thread Harald Glatt
On Wed, May 15, 2013 at 7:28 PM, Chris Murphy wrote: > > On May 15, 2013, at 10:44 AM, Harald Glatt wrote: > >> >> You make a ro snapshot rw by creating a snapshot of it that is rw. So >> yes to both questions, by doing the same thing in both cases. > > In other words, a normal snapshot (without

Re: subvol copying

2013-05-15 Thread Chris Murphy
On May 15, 2013, at 10:44 AM, Harald Glatt wrote: > > You make a ro snapshot rw by creating a snapshot of it that is rw. So > yes to both questions, by doing the same thing in both cases. In other words, a normal snapshot (without -r) of a read-only snapshot will create a rw snapshot? In any

Re: subvol copying

2013-05-15 Thread Harald Glatt
On Wed, May 15, 2013 at 6:43 PM, Chris Murphy wrote: > > On May 15, 2013, at 1:40 AM, Gabriel de Perthuis wrote: > >> >> You can move subvolumes at any time, as if they were regular directories. > > In the example case, the subvolumes are read-only. So is it possible to make > a read-only subvol

Re: subvol copying

2013-05-15 Thread Chris Murphy
On May 15, 2013, at 1:40 AM, Gabriel de Perthuis wrote: > > You can move subvolumes at any time, as if they were regular directories. In the example case, the subvolumes are read-only. So is it possible to make a read-only subvolume (snapshot) read-writable? And is it possible to make a read

Re: [PATCH v2 4/8] Btrfs: maintain subvolume items in the UUID tree

2013-05-15 Thread Stefan Behrens
On Tue, 14 May 2013 18:44:11 +0800, Liu Bo wrote: > On Tue, May 14, 2013 at 11:36:56AM +0200, Stefan Behrens wrote: >> @@ -396,7 +403,7 @@ static noinline int create_subvol(struct inode *dir, >> * of create_snapshot(). >> */ >> ret = btrfs_subvolume_reserve_metadata(root, &block_rs

Re: kernel BUG at fs/btrfs/free-space-cache.c:1567!, Kernel 3.9.1

2013-05-15 Thread Philipp Dreimann
On 15 May 2013 17:00, Harald Glatt wrote: > On Wed, May 15, 2013 at 4:19 PM, Philipp Dreimann > wrote: >> Hello, >> >> my btrfs filesystem was not mountable anymore after a loss of power: >> >> kernel BUG at fs/btrfs/free-space-cache.c:1567! >> invalid opcode: [#1] SMP >> Modules linked in:

Re: kernel BUG at fs/btrfs/free-space-cache.c:1567!, Kernel 3.9.1

2013-05-15 Thread Harald Glatt
On Wed, May 15, 2013 at 4:19 PM, Philipp Dreimann wrote: > Hello, > > my btrfs filesystem was not mountable anymore after a loss of power: > > kernel BUG at fs/btrfs/free-space-cache.c:1567! > invalid opcode: [#1] SMP > Modules linked in: btrfs libcrc32c xor zlib_deflate raid6_pq i915(+) > i2

Re: [PATCH] Btrfs-progs: fix missing recow roots when making btrfs filesystem

2013-05-15 Thread David Sterba
On Tue, May 14, 2013 at 07:50:28PM +0800, Wang Shilong wrote: > When making btrfs filesystem. we firstly write root leaf to > specified filed, and then we recow the root. If we don't recow, > some trees are not in the correct block group. > > Steps to reproduce: > dd if=/dev/zero of=test.img

kernel BUG at fs/btrfs/free-space-cache.c:1567!, Kernel 3.9.1

2013-05-15 Thread Philipp Dreimann
Hello, my btrfs filesystem was not mountable anymore after a loss of power: kernel BUG at fs/btrfs/free-space-cache.c:1567! invalid opcode: [#1] SMP Modules linked in: btrfs libcrc32c xor zlib_deflate raid6_pq i915(+) i2c_algo_bit drm_kms_helper drm i2c_core video uinput CPU 3 Pid: 147, comm

Re: [PATCH] Btrfs: increase the max global reserve size to 1gig

2013-05-15 Thread David Sterba
On Fri, May 03, 2013 at 08:56:54AM -0400, Josef Bacik wrote: > Apparently 512mb was too small, with a fs_mark command we could get so much > delayed work built up that we'd never trip the "lets commit the transaction" > logic until we'd gotten too much delayed refs built up. Increasing this to 1 >

[PATCH] Btrfs-progs: detect when scrub is started twice

2013-05-15 Thread Stefan Behrens
Check whether any involved device is already busy running a scrub. This would cause damaged status messages and the state "aborted" without the explanation that a scrub was already running. Therefore check it first, prevent it and give some feedback to the user if scrub is already running. Note tha

Re: I/O errors block the entire filesystem

2013-05-15 Thread Josef Bacik
On Sat, May 11, 2013 at 01:16:38AM -0600, Alexandre Oliva wrote: > On Apr 4, 2013, Alexandre Oliva wrote: > > > I've been trying to figure out the btrfs I/O stack to try to understand > > why, sometimes (but not always), after a failure to read a (data > > non-replicated) block from the disk, th

Re: [PATCH v2 0/8] Btrfs: introduce a tree for UUID to subvol ID mapping

2013-05-15 Thread David Sterba
On Tue, May 14, 2013 at 07:11:54PM +0200, Stefan Behrens wrote: > # of subvols | without| with > in filesystem | UUID tree | UUID tree > --++-- > 2 | 0m00.004s | 0m00.003s > 1000 | 0m07.010s | 0m00.004s > 2000 | 0m28.210s | 0m00

Re: [PATCH] xfstests btrfs/284: shorten duration, fix output

2013-05-15 Thread Rich Johnston
On 04/26/2013 01:45 PM, Eric Sandeen wrote: test 284 had... some issues. diff --git a/tests/btrfs/284 b/tests/btrfs/284 old mode 100644 new mode 100755 index d952977..67161a3 --- a/tests/btrfs/284 +++ b/tests/btrfs/284 This patch has been committed: commit 91f87e3b89c0f7350a56d397ba7255

Re: [PATCH] Btrfs: increase the max global reserve size to 1gig

2013-05-15 Thread David Sterba
On Fri, May 03, 2013 at 11:28:35PM +0800, Liu Bo wrote: > On Fri, May 03, 2013 at 08:56:54AM -0400, Josef Bacik wrote: > > Apparently 512mb was too small, with a fs_mark command we could get so much > > delayed work built up that we'd never trip the "lets commit the transaction" > > logic until we'

Re: [PATCH v2 0/8] Btrfs: introduce a tree for UUID to subvol ID mapping

2013-05-15 Thread Stefan Behrens
On Tue, 14 May 2013 19:11:54 +0200, Stefan Behrens wrote: > On Tue, 14 May 2013 18:55:23 +0800, Liu Bo wrote: >> On Tue, May 14, 2013 at 11:36:52AM +0200, Stefan Behrens wrote: >>> Mapping UUIDs to subvolume IDs is an operation with a high effort >>> today. Today, the algorithm even has quadratic e

[PATCH 00/17] improve the block time during the transaction commit

2013-05-15 Thread Miao Xie
This patchset improve the problem that the transaction may be blocked for a long time when it is being committed if there is heavy I/O. In this patchset, - 0001-0005, 0007, 0011-0012 are random fix or code cleanup patch. - 0006, 0008-0010 introduce per-subvolume delalloc inode list and ordered e

[PATCH 06/17] Btrfs: introduce grab/put functions for the root of the fs/file tree

2013-05-15 Thread Miao Xie
The grab/put funtions will be used in the next patch, which need grab the root object and ensure it is not freed. We use reference counter instead of the srcu lock is to aovid blocking the memory reclaim task, which invokes synchronize_srcu(). Signed-off-by: Miao Xie --- fs/btrfs/ctree.h |

[PATCH 15/17] Btrfs: remove unnecessary varient ->num_joined in btrfs_transaction structure

2013-05-15 Thread Miao Xie
We used ->num_joined track if there were some writers which join the current transaction when the committer was sleeping. If some writers joined the current transaction, we has to continue the while loop to do some necessary stuff, such as flush the ordered operations. But it is unnecessary because

[PATCH 11/17] Btrfs: cleanup unnecessary assignment when cleaning up all the residual transaction

2013-05-15 Thread Miao Xie
When we umount a fs with serious errors, we will invoke btrfs_cleanup_transactions() to clean up the residual transaction. At this time, It is impossible to start a new transaction, so we needn't assign trans_no_join to 1, and also needn't clear running transaction every time we destroy a residu

[PATCH 04/17] Btrfs: remove BUG_ON() in btrfs_read_fs_tree_no_radix()

2013-05-15 Thread Miao Xie
We have checked if ->node is NULL or not, so it is unnecessary to use BUG_ON() to check again. Remove it. Signed-off-by: Miao Xie --- fs/btrfs/disk-io.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 2a9ae38..8c1e4fb 100644 --- a/fs/btrfs/disk-io

[PATCH 09/17] Btrfs: introduce per-subvolume ordered extent list

2013-05-15 Thread Miao Xie
The reason we introduce per-subvolume ordered extent list is the same as the per-subvolume delalloc inode list. Signed-off-by: Miao Xie --- fs/btrfs/ctree.h| 25 --- fs/btrfs/dev-replace.c | 4 +- fs/btrfs/disk-io.c | 45 +++- fs/btrfs/extent-tree.c |

[PATCH 03/17] Btrfs: pause the space balance when remounting to R/O

2013-05-15 Thread Miao Xie
Signed-off-by: Miao Xie --- fs/btrfs/super.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index a4807ce..f0857e0 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1263,6 +1263,7 @@ static int btrfs_remount(struct super_block *sb, int *flags, char

[PATCH 10/17] Btrfs: just flush the delalloc inodes in the source tree before snapshot creation

2013-05-15 Thread Miao Xie
Before applying this patch, we need flush all the delalloc inodes in the fs when we want to create a snapshot, it wastes time, and make the transaction commit be blocked for a long time. It means some other user operation would also be blocked for a long time. This patch improves this problem, we

[PATCH 12/17] Btrfs: remove the code for the impossible case in cleanup_transaction()

2013-05-15 Thread Miao Xie
If the transaction is removed from the transaction list, it means the transaction has been committed successfully. So it is impossible to call cleanup_transaction(), otherwise there is something wrong with the code logic. Thus, we use BUG_ON() instead of the original handle. Signed-off-by: Miao Xi

[PATCH 17/17] Btrfs: make the state of the transaction more readable

2013-05-15 Thread Miao Xie
We used 3 variants to track the state of the transaction, it was complex and wasted the memory space. Besides that, it was hard to understand that which types of the transaction handles should be blocked in each transaction state, so the developers often made mistakes. This patch improved the abov

[PATCH 16/17] Btrfs: remove the time check in btrfs_commit_transaction()

2013-05-15 Thread Miao Xie
We checked the commit time to avoid committing the transaction frequently, but it is unnecessary because: - It made the transaction commit spend more time, and delayed the operation of the external writers(TRANS_START/TRANS_USERSPACE). - Except the space that we have to commit transaction, such a

[PATCH 13/17] Btrfs: don't wait for all the writers circularly during the transaction commit

2013-05-15 Thread Miao Xie
btrfs_commit_transaction has the following loop before we commit the transaction. do { // attempt to do some useful stuff and/or sleep } while (atomic_read(&cur_trans->num_writers) > 1 || (should_grow && cur_trans->num_joined != joined)); This is used to prevent from the TRANS_START

[PATCH 05/17] Btrfs: cleanup the similar code of the fs root read

2013-05-15 Thread Miao Xie
There are several functions whose code is similar, such as btrfs_find_last_root() btrfs_read_fs_root_no_radix() Besides that, some functions are invoked twice, it is unnecessary, for example, we are sure that all roots which is found in btrfs_find_orphan_roots() have their orphan items, so i

[PATCH 14/17] Btrfs: don't flush the delalloc inodes in the while loop if flushoncommit is set

2013-05-15 Thread Miao Xie
It is unnecessary to flush the delalloc inodes again and again because we don't care the dirty pages which are introduced after the flush, and they will be flush in the transaction commit. Signed-off-by: Miao Xie --- fs/btrfs/transaction.c | 26 ++ 1 file changed, 18 inse

[PATCH 08/17] Btrfs: introduce per-subvolume delalloc inode list

2013-05-15 Thread Miao Xie
When we create a snapshot, we need flush all delalloc inodes in the fs, just flushing the inodes in the source tree is OK. So we introduce per-subvolume delalloc inode list. Signed-off-by: Miao Xie --- fs/btrfs/ctree.h | 22 --- fs/btrfs/dev-replace.c | 2 +- fs/btrfs/disk-io.c

[PATCH 07/17] Btrfs: don't invoke btrfs_invalidate_inodes() in the spin lock context

2013-05-15 Thread Miao Xie
btrfs_invalidate_inodes() may sleep, so we should not invoke it in the spin lock context. Fix it. Signed-off-by: Miao Xie --- fs/btrfs/disk-io.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 642c861..724a0da 100644 --- a/fs/btrfs/disk-io.

[PATCH 02/17] Btrfs: fix unprotected root node of the subvolume's inode rb-tree

2013-05-15 Thread Miao Xie
The root node of the rb-tree may be changed, so we should get it under the lock. Fix it. Signed-off-by: Miao Xie --- fs/btrfs/inode.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 7f6e78a..bf5c399 100644 --- a/fs/btrfs/inode.

[PATCH 01/17] Btrfs: fix accessing a freed tree root

2013-05-15 Thread Miao Xie
inode_tree_del() will move the tree root into the dead root list, and then the tree will be destroyed by the cleaner. So if we remove the delayed node which is cached in the inode after inode_tree_del(), we may access a freed tree root. Fix it. Signed-off-by: Miao Xie --- fs/btrfs/inode.c | 2 +-

Re: subvol copying

2013-05-15 Thread Gabriel de Perthuis
> A user of a workstation has a home directory /home/john as a subvolume. I > wrote a cron job to make read-only snapshots of it under /home/john/backup > which was fortunate as they just ran a script that did something like > "rm -rf ~". > > Apart from copying dozens of gigs of data back, is t