Re: [PATCH v12.1 01/15] btrfs: expand cow_file_range() to support in-band dedup and subpage-blocksize

2016-07-11 Thread Satoru Takeuchi
On 2016/07/12 1:41, David Sterba wrote:
> On Mon, Jul 11, 2016 at 11:05:29AM +0800, Qu Wenruo wrote:
>> From: Wang Xiaoguang 
>>
>> Extract cow_file_range() new parameters for both in-band dedupe and
>> subpage sector size patchset.
>>
>> This should make conflict of both patchset to minimal, and reduce the
>> effort needed to rebase them.
> 
> Great, thanks. I did a test merge, there are still conflicts but they
> seem to be resolvable more easily, picking the changes from the right
> patchset. I haven't tested it so there's more work, but the point was to
> get the conflict surface down.
> 
> There's another candidate, btrfs_set_extent_delalloc as it adds a
> parameter, can you please send a similar patch for that?

He's going to attend LinuxCon Japan (Jul 13 - Jul 15).
So I guess he will reply several days later (next week?).

Thanks,
Satoru

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1 has failing disks, but smart is clear

2016-07-11 Thread Corey Coughlin

Hi Andrei,
Thanks for the info, sorry about the improper terminology.  In 
better news, I discovered that one disk wasn't getting recognized by the 
OS at certain outputs of one of my mini-sas to sata cables, so I got a 
new one, and the disk works fine on that.  So I'm that's at least one 
bad cable, I still have to check the other 3, though.


-- Corey

On 07/08/2016 10:40 PM, Andrei Borzenkov wrote:

07.07.2016 09:40, Corey Coughlin пишет:

Hi Tomasz,
 Thanks for the response!  I should clear some things up, though.

On 07/06/2016 03:59 PM, Tomasz Kusmierz wrote:

On 6 Jul 2016, at 23:14, Corey Coughlin
 wrote:

Hi all,
 Hoping you all can help, have a strange problem, think I know
what's going on, but could use some verification.  I set up a raid1
type btrfs filesystem on an Ubuntu 16.04 system, here's what it looks
like:

btrfs fi show
Label: none  uuid: 597ee185-36ac-4b68-8961-d4adc13f95d4
 Total devices 10 FS bytes used 3.42TiB
 devid1 size 1.82TiB used 1.18TiB path /dev/sdd
 devid2 size 698.64GiB used 47.00GiB path /dev/sdk
 devid3 size 931.51GiB used 280.03GiB path /dev/sdm
 devid4 size 931.51GiB used 280.00GiB path /dev/sdl
 devid5 size 1.82TiB used 1.17TiB path /dev/sdi
 devid6 size 1.82TiB used 823.03GiB path /dev/sdj
 devid7 size 698.64GiB used 47.00GiB path /dev/sdg
 devid8 size 1.82TiB used 1.18TiB path /dev/sda
 devid9 size 1.82TiB used 1.18TiB path /dev/sdb
 devid   10 size 1.36TiB used 745.03GiB path /dev/sdh

Now when I say that the drives mount points change, I'm not saying they
change when I reboot.  They change while the system is running.  For
instance, here's the fi show after I ran a "check --repair" run this
afternoon:

btrfs fi show
Label: none  uuid: 597ee185-36ac-4b68-8961-d4adc13f95d4
 Total devices 10 FS bytes used 3.42TiB
 devid1 size 1.82TiB used 1.18TiB path /dev/sdd
 devid2 size 698.64GiB used 47.00GiB path /dev/sdk
 devid3 size 931.51GiB used 280.03GiB path /dev/sdm
 devid4 size 931.51GiB used 280.00GiB path /dev/sdl
 devid5 size 1.82TiB used 1.17TiB path /dev/sdi
 devid6 size 1.82TiB used 823.03GiB path /dev/sds
 devid7 size 698.64GiB used 47.00GiB path /dev/sdg
 devid8 size 1.82TiB used 1.18TiB path /dev/sda
 devid9 size 1.82TiB used 1.18TiB path /dev/sdb
 devid   10 size 1.36TiB used 745.03GiB path /dev/sdh

Notice that /dev/sdj in the previous run changed to /dev/sds.  There was
no reboot, the mount just changed.  I don't know why that is happening,
but it seems like the majority of the errors are on that drive.  But
given that I've fixed the start/stop issue on that disk, it probably
isn't a WD Green issue.

It's not "mount point", it is just device names. Do not make it sound
more confusing than it already is :)

This implies that disks drop off and reappear. Do you have "dmesg" or
log (/var/log/syslog or /var/log/messages or journalctl) for the same
period of time?



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs: add test for an incremental send after moving directories around

2016-07-11 Thread Eryu Guan
On Sat, Jul 02, 2016 at 01:32:08PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> Test that an incremental send operation works after doing radical changes
> in the directory hierarchy that involve switching the inode that directory
> entries point to.
> 
> This test exercises scenarios used to fail in btrfs and are fixed by the
> following patches for the linux kernel:
> 
>  "Btrfs: send, fix failure to move directories with the same name around"
>  "Btrfs: incremental send, fix invalid paths for rename operations"
> 
> Signed-off-by: Filipe Manana 
> ---
>  tests/btrfs/124 | 261 
> 
>  tests/btrfs/124.out |   2 +
>  tests/btrfs/group   |   1 +
>  3 files changed, 264 insertions(+)
>  create mode 100755 tests/btrfs/124
>  create mode 100644 tests/btrfs/124.out
> 
> diff --git a/tests/btrfs/124 b/tests/btrfs/124
> new file mode 100755
> index 000..38635a3
> --- /dev/null
> +++ b/tests/btrfs/124
> @@ -0,0 +1,261 @@
> +#! /bin/bash
> +# FS QA Test No. btrfs/124
> +#
> +# Test that an incremental send operation works after doing radical changes
> +# in the directory hierarchy that involve switching the inode that directory
> +# entries point to.
> +#
> +#---
> +# Copyright (C) 2016 SUSE Linux Products GmbH. All Rights Reserved.
> +# Author: Filipe Manana 
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
cd /
to make it consistent with other _cleanup :)

> + rm -fr $send_files_dir
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# real QA test starts here
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch
> +_require_fssum
> +
> +send_files_dir=$TEST_DIR/btrfs-test-$seq

These three tests all take use of $TEST_DIR, but don't _require_test, I
think we need it in these tests.

> +
> +rm -f $seqres.full
> +rm -fr $send_files_dir
> +mkdir $send_files_dir
> +
> +_scratch_mkfs >>$seqres.full 2>&1
> +_scratch_mount
> +
> +# case 1
> +mkdir -p $SCRATCH_MNT/case_1/d/p1
> +mkdir $SCRATCH_MNT/case_1/p1
> +
> +# case 2
> +mkdir -p $SCRATCH_MNT/case_2/a
> +mkdir $SCRATCH_MNT/case_2/d
> +mkdir $SCRATCH_MNT/case_2/e
> +mkdir $SCRATCH_MNT/case_2/f
> +mkdir $SCRATCH_MNT/case_2/ance
> +mkdir $SCRATCH_MNT/case_2/d/ance
> +mkdir $SCRATCH_MNT/case_2/a/c
> +mv $SCRATCH_MNT/case_2/e $SCRATCH_MNT/case_2/d/ance
> +mv $SCRATCH_MNT/case_2/f $SCRATCH_MNT/case_2/d/ance
> +mv $SCRATCH_MNT/case_2/ance $SCRATCH_MNT/case_2/d/ance
> +
> +# case 3
> +mkdir -p $SCRATCH_MNT/case_3/d
> +mkdir $SCRATCH_MNT/case_3/a
> +mkdir $SCRATCH_MNT/case_3/waiting_dir
> +mkdir -p $SCRATCH_MNT/case_3/pre/ance
> +mkdir $SCRATCH_MNT/case_3/d/ance
> +mkdir $SCRATCH_MNT/case_3/a/c
> +mv $SCRATCH_MNT/case_3/waiting_dir $SCRATCH_MNT/case_3/d/ance
> +
> +# case 4
> +mkdir -p $SCRATCH_MNT/case_4/tmp
> +mkdir $SCRATCH_MNT/case_4/below_ance
> +mkdir -p $SCRATCH_MNT/case_4/pre/wait_dir
> +mkdir $SCRATCH_MNT/case_4/desc
> +mkdir $SCRATCH_MNT/case_4/ance
> +mv $SCRATCH_MNT/case_4/below_ance $SCRATCH_MNT/case_4/ance
> +mkdir $SCRATCH_MNT/case_4/other_dir
> +
> +# Filesystem looks like:
> +#
> +# .  (ino 
> 256)
> +# |--- case_1/   (ino 
> 257)
> +# |   | d/   (ino 
> 258)
> +# |   | |--- p1/ (ino 
> 259)
> +# |   |
> +# |   | p1/  (ino 
> 260)
> +# |
> +# |--- case_2/   (ino 
> 261)
> +# |   | a/   (ino 
> 262)
> +# |   | | c/ (ino 
> 268)
> +# |   |
> +# |   | d/   (ino 
> 263)
> 

[PATCH] Btrfs: change BUG_ON()'s to ASSERT()'s in backref_cache_cleanup()

2016-07-11 Thread Liu Bo
Since it is just an in-memory building of the backrefs of several
btree blocks, nothing is fatal other than memory leaks, so this
changes BUG_ON()'s to ASSERT()'s.

Signed-off-by: Liu Bo 
---
 fs/btrfs/relocation.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0477dca..9732919 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -235,12 +235,12 @@ static void backref_cache_cleanup(struct backref_cache 
*cache)
cache->last_trans = 0;
 
for (i = 0; i < BTRFS_MAX_LEVEL; i++)
-   BUG_ON(!list_empty(>pending[i]));
-   BUG_ON(!list_empty(>changed));
-   BUG_ON(!list_empty(>detached));
-   BUG_ON(!RB_EMPTY_ROOT(>rb_root));
-   BUG_ON(cache->nr_nodes);
-   BUG_ON(cache->nr_edges);
+   ASSERT(list_empty(>pending[i]));
+   ASSERT(list_empty(>changed));
+   ASSERT(list_empty(>detached));
+   ASSERT(RB_EMPTY_ROOT(>rb_root));
+   ASSERT(!cache->nr_nodes);
+   ASSERT(!cache->nr_edges);
 }
 
 static struct backref_node *alloc_backref_node(struct backref_cache *cache)
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


FIDEDUPERANGE with src_length == 0

2016-07-11 Thread Omar Sandoval
Hey, Darrick,

generic/182 is failing on Btrfs for me with the following output:

--- tests/generic/182.out   2016-07-07 19:51:54.0 -0700
+++ /tmp/fixxfstests/xfstests/results//generic/182.out.bad  2016-07-11 
17:28:28.230039216 -0700
@@ -1,12 +1,10 @@
 QA output created by 182
 Create the original files
-dedupe: Extents did not match.
 f4820540fc0ac02750739896fe028d56  TEST_DIR/test-182/file1
 69ad53078a16243d98e21d9f8704a071  TEST_DIR/test-182/file2
 69ad53078a16243d98e21d9f8704a071  TEST_DIR/test-182/file2.chk
 Compare against check files
 Make the original file almost dedup-able
-dedupe: Extents did not match.
 f4820540fc0ac02750739896fe028d56  TEST_DIR/test-182/file1
 158d4e3578b94b89cbb44493a2110fb9  TEST_DIR/test-182/file2
 158d4e3578b94b89cbb44493a2110fb9  TEST_DIR/test-182/file2.chk

It looks like that test is checking that a dedupe with length == 0 is
treated as a dedupe to EOF, but Btrfs doesn't do that [1]. As far as I
can tell, it never did, but maybe I'm just confused. What was the
behavior when you introduced that test? That seems like a reasonable
thing to do, but I wanted to clear this up before changing/fixing Btrfs.

Thanks.

1: 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/ioctl.c?h=v4.7-rc7#n3122
-- 
Omar
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix panic in balance due to EIO

2016-07-11 Thread Liu Bo
During build_backref_tree(), if we fail to read a btree node,
we can eventually run into BUG_ON(cache->nr_nodes) that we put
in backref_cache_cleanup(), meaning we have at least one
memory leak.

This frees the backref_node that we allocate at the very beginning of 
build_backref_tree().

Signed-off-by: Liu Bo 
---
 fs/btrfs/relocation.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0477dca..f00267a 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1135,6 +1135,8 @@ out:
btrfs_free_path(path1);
btrfs_free_path(path2);
if (err) {
+   int orig_free = 0;
+
while (!list_empty()) {
lower = list_entry(useless.next,
   struct backref_node, list);
@@ -1171,8 +1173,13 @@ out:
lower = list_entry(useless.next,
   struct backref_node, list);
list_del_init(>list);
+   if (lower == node)
+   orig_free = 1;
free_backref_node(cache, lower);
}
+
+   if (!orig_free)
+   free_backref_node(cache, node);
return ERR_PTR(err);
}
ASSERT(!node || !node->detached);
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix eb memory leak due to readpage failure

2016-07-11 Thread Liu Bo
On Mon, Jul 11, 2016 at 06:54:02PM -0400, Chris Mason wrote:
> On Mon, Jul 11, 2016 at 03:48:38PM -0700, Liu Bo wrote:
> > On Mon, Jul 11, 2016 at 02:27:39PM -0400, Chris Mason wrote:
> > > 
> > > 
> > > On 07/11/2016 01:39 PM, Liu Bo wrote:
> > > > eb->io_pages is set in read_extent_buffer_pages().
> > > >
> > > > In case of readpage failure, for pages that have been added to bio,
> > > > it calls bio_endio and later readpage_io_failed_hook() does the work.
> > > >
> > > > When this eb's page (couldn't be the 1st page) fails to add itself to 
> > > > bio
> > > > due to failure in merge_bio(), it cannot decrease eb->io_pages via 
> > > > bio_endio,
> > > >  and ends up with a memory leak eventually.
> > > >
> > > > This lets __do_readpage propagate errors to callers and adds the
> > > >  'atomic_dec(>io_pages)'.
> > > 
> > > Thanks for looking at this Liu, how is it currently being tested?
> > 
> > I have a btrfs disk image which was corrupted by btrfs-corrupt-block
> > tool, in that image, the chunk tree's content has been removed while the
> > chunk node can be read from read successfully, so we'd get -EIO when
> > trying to read tree root's node since __btrfs_map_block() would fail to
> > find the right item in chunk mapping_tree.  Thus, we can test our error
> > handling path in read_extent_buffer_pages().
> 
> Fantastic.  Can you please make this an xfstest, maybe along with a dm-flakey?
> as the second phase?

Sure, this depends on a btrfs-corrupt-block patch, which I've not sent
out, I'll try to work out a xfstests case :)

Btw, I'm also planning to add this into our fuzz images of btrfs-progs.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix eb memory leak due to readpage failure

2016-07-11 Thread Chris Mason

On Mon, Jul 11, 2016 at 03:48:38PM -0700, Liu Bo wrote:

On Mon, Jul 11, 2016 at 02:27:39PM -0400, Chris Mason wrote:



On 07/11/2016 01:39 PM, Liu Bo wrote:
> eb->io_pages is set in read_extent_buffer_pages().
>
> In case of readpage failure, for pages that have been added to bio,
> it calls bio_endio and later readpage_io_failed_hook() does the work.
>
> When this eb's page (couldn't be the 1st page) fails to add itself to bio
> due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio,
>  and ends up with a memory leak eventually.
>
> This lets __do_readpage propagate errors to callers and adds the
>  'atomic_dec(>io_pages)'.

Thanks for looking at this Liu, how is it currently being tested?


I have a btrfs disk image which was corrupted by btrfs-corrupt-block
tool, in that image, the chunk tree's content has been removed while the
chunk node can be read from read successfully, so we'd get -EIO when
trying to read tree root's node since __btrfs_map_block() would fail to
find the right item in chunk mapping_tree.  Thus, we can test our error
handling path in read_extent_buffer_pages().


Fantastic.  Can you please make this an xfstest, maybe along with a dm-flakey?
as the second phase?

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix eb memory leak due to readpage failure

2016-07-11 Thread Liu Bo
On Mon, Jul 11, 2016 at 02:27:39PM -0400, Chris Mason wrote:
> 
> 
> On 07/11/2016 01:39 PM, Liu Bo wrote:
> > eb->io_pages is set in read_extent_buffer_pages().
> > 
> > In case of readpage failure, for pages that have been added to bio,
> > it calls bio_endio and later readpage_io_failed_hook() does the work.
> > 
> > When this eb's page (couldn't be the 1st page) fails to add itself to bio
> > due to failure in merge_bio(), it cannot decrease eb->io_pages via 
> > bio_endio,
> >  and ends up with a memory leak eventually.
> > 
> > This lets __do_readpage propagate errors to callers and adds the
> >  'atomic_dec(>io_pages)'.
> 
> Thanks for looking at this Liu, how is it currently being tested?

I have a btrfs disk image which was corrupted by btrfs-corrupt-block
tool, in that image, the chunk tree's content has been removed while the
chunk node can be read from read successfully, so we'd get -EIO when
trying to read tree root's node since __btrfs_map_block() would fail to
find the right item in chunk mapping_tree.  Thus, we can test our error
handling path in read_extent_buffer_pages().

Thanks,

-liubo

> 
> -chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND v2] fstests: btrfs/079: Fix wrong value passed to available space check.

2016-07-11 Thread Omar Sandoval
From: Qu Wenruo 

Wrong value is passed to _require_fs_space, which should be in unit of
kilobyte(1024), but passed in unit of gigabyte(1024^3).

Fix it.

Reviewed-by: David Sterba 
Signed-off-by: Qu Wenruo 
---
This is from January of last year, looks like it fell through the cracks.

 tests/btrfs/079 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/btrfs/079 b/tests/btrfs/079
index 6aee3a373f91..ed4eb727ee2b 100755
--- a/tests/btrfs/079
+++ b/tests/btrfs/079
@@ -72,7 +72,7 @@ rm -f $seqres.full
 
 _scratch_mkfs >>$seqres.full 2>&1
 _scratch_mount
-_require_fs_space $SCRATCH_MNT $(($filesize / 1024 / 1024 / 1024))
+_require_fs_space $SCRATCH_MNT $(($filesize / 1024))
 $XFS_IO_PROG -f -c "falloc 0 $filesize" $testfile
 
 dd_work() {
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 64-btrfs.rules and degraded boot

2016-07-11 Thread Chris Murphy
On Fri, Jul 8, 2016 at 6:24 AM, Austin S. Hemmelgarn
 wrote:

> To clarify, I'm not trying to argue against adding support, I'm arguing
> against it being mandatory.

By "D-Bus support" I did not mean to indicate mandating it, just that
it would be one possible way to get some very basic state change
messages to user space tools so we're doing the least amount of wheel
reinvention as possible.


>  A filesystem which requires specific system
> services to be running just for regular maintenance tasks is not a well
> designed filesystem.  To be entirely honest, I'm not all that happy about
> the functional dependency on udev to have device discovery, but there's no
> point in me arguing about that...

Well everything else that came before it is effectively deprecated, so
there's no going back. The way forward would be to get udev more
granular state information about a Btrfs volume than 0 and 1.

>
> Just thinking aloud, but why not do a daemon that does the actual
> monitoring, and then provide an interface (at least a UNIX domain socket,
> and optionally a D-Bus endpoint) that other tools can use to query
> filesystem status.  LVM already has a similar setup for monitoring DM-RAID
> volumes, snapshots, and thin storage pools, although it's designed as an
> event driven tool that does something when specific things happen (for
> example, possibly auto-extending snapshots when they start to get full).

That would be consistent with mdadm --monitor and dmeventd, but it is
yet another wheel reinvention at the lower level, which then also
necessitates higher level things to adapt to that interface. It would
be neat if there could be some unification and consistency.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: fix unexpected balance crash due to BUG_ON

2016-07-11 Thread Chris Mason



On 07/11/2016 01:21 PM, Liu Bo wrote:

Mounting a btrfs can resume previous balance operations asynchronously.
An user got a crash when one drive has some corrupt sectors.

Since balance can cancel itself in case of any error, we can gracefully
return errors to upper layers and let balance do the cancel job.

Reported-by: sash 
Signed-off-by: Liu Bo 


v2 Looks good, thanks Liu.

Signed-off-by: Chris Mason 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to mount degraded RAID5

2016-07-11 Thread Chris Murphy
On Mon, Jul 11, 2016 at 11:17 AM, Tomáš Hrdina  wrote:
> sudo btrfs-debug-tree /dev/sdc
> It has 20 lines. Don't know, what you use for bigger files.
>
>
> sudo btrfs-debug-tree -b 6062434418688 /dev/sdc
> http://sebsauvage.net/paste/?fc156dee1d1deb3b#YpG/TA0H3I313jMuC4pgsdj++TcuDaFwWIBeuuOXfCA=
>
> sudo btrfs-debug-tree -b 6062497202176 /dev/sdc
> http://sebsauvage.net/paste/?86621abec9c239bd#kwTpZ7BZLcLw71yCfr3jHKZT08zsaXK3RgdFo7MFoFc=
>
>
> sudo btrfs-debug-tree -b 6062470332416 /dev/sdc
> http://sebsauvage.net/paste/?4ff40fa0b6b201c9#nFk7pT9MLj2w9egUJlgXdkmCkWyp1vSG0kADfq3J7eA=

None of these have anything useful in them, there's no tree root there.


>
> It got some results, but I don't know, what to look for.
>
>
> sudo btrfs-map-logical -l 7008899547136 /dev/sdc
> parent transid verify failed on 7008807157760 wanted 70175 found 70133
> parent transid verify failed on 7008807157760 wanted 70175 found 70133
> checksum verify failed on 7008807157760 found F192848C wanted 1571393A
> checksum verify failed on 7008807157760 found F192848C wanted 1571393A
> bytenr mismatch, want=7008807157760, have=65536
> mirror 1 logical 7008899547136 physical 735226609664 device /dev/sdb
> mirror 2 logical 7008899547136 physical 3166748524544 device /dev/sdc
>
>
> Also I don't know, what to do with this. How to compute new csum.

Right, that's pretty tricky to do manually.



>
> For me, it would be ok to give up and just start fresh.

OK in that case do that.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix eb memory leak due to readpage failure

2016-07-11 Thread Chris Mason



On 07/11/2016 01:39 PM, Liu Bo wrote:

eb->io_pages is set in read_extent_buffer_pages().

In case of readpage failure, for pages that have been added to bio,
it calls bio_endio and later readpage_io_failed_hook() does the work.

When this eb's page (couldn't be the 1st page) fails to add itself to bio
due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio,
 and ends up with a memory leak eventually.

This lets __do_readpage propagate errors to callers and adds the
 'atomic_dec(>io_pages)'.


Thanks for looking at this Liu, how is it currently being tested?

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] Btrfs: fix eb memory leak due to readpage failure

2016-07-11 Thread Liu Bo
eb->io_pages is set in read_extent_buffer_pages().

In case of readpage failure, for pages that have been added to bio,
it calls bio_endio and later readpage_io_failed_hook() does the work.

When this eb's page (couldn't be the 1st page) fails to add itself to bio
due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio,
 and ends up with a memory leak eventually.

This lets __do_readpage propagate errors to callers and adds the
 'atomic_dec(>io_pages)'.

Signed-off-by: Liu Bo 
---
v2: - Move 'dec io_pages' to the caller so that we're consistent with
  write_one_eb()
v3: - Bail out once we fail to read a page and do the cleanup work
  for eb->io_pages

 fs/btrfs/extent_io.c | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ac1a696..7303e5a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2878,6 +2878,7 @@ __get_extent_map(struct inode *inode, struct page *page, 
size_t pg_offset,
  * into the tree that are removed when the IO is done (by the end_io
  * handlers)
  * XXX JDM: This needs looking at to ensure proper page locking
+ * return 0 on success, otherwise return error
  */
 static int __do_readpage(struct extent_io_tree *tree,
 struct page *page,
@@ -2899,7 +2900,7 @@ static int __do_readpage(struct extent_io_tree *tree,
sector_t sector;
struct extent_map *em;
struct block_device *bdev;
-   int ret;
+   int ret = 0;
int nr = 0;
size_t pg_offset = 0;
size_t iosize;
@@ -3080,6 +3081,7 @@ static int __do_readpage(struct extent_io_tree *tree,
} else {
SetPageError(page);
unlock_extent(tree, cur, cur + iosize - 1);
+   goto out;
}
cur = cur + iosize;
pg_offset += iosize;
@@ -3090,7 +3092,7 @@ out:
SetPageUptodate(page);
unlock_page(page);
}
-   return 0;
+   return ret;
 }
 
 static inline void __do_contiguous_readpages(struct extent_io_tree *tree,
@@ -5230,14 +5232,31 @@ int read_extent_buffer_pages(struct extent_io_tree 
*tree,
atomic_set(>io_pages, num_reads);
for (i = start_i; i < num_pages; i++) {
page = eb->pages[i];
+
if (!PageUptodate(page)) {
+   if (ret) {
+   atomic_dec(>io_pages);
+   unlock_page(page);
+   continue;
+   }
+
ClearPageError(page);
err = __extent_read_full_page(tree, page,
  get_extent, ,
  mirror_num, _flags,
  READ | REQ_META);
-   if (err)
+   if (err) {
ret = err;
+   /*
+* We use  in above __extent_read_full_page,
+* so we ensure that if it returns error, the
+* current page fails to add itself to bio and
+* it's been unlocked.
+*
+* We must dec io_pages by ourselves.
+*/
+   atomic_dec(>io_pages);
+   }
} else {
unlock_page(page);
}
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stat() on btrfs reports the st_blocks with delay (data loss in archivers)

2016-07-11 Thread Chris Mason



On 07/11/2016 11:16 AM, David Sterba wrote:

On Mon, Jul 11, 2016 at 11:00:55AM -0400, Chris Mason wrote:

So, the real bug is that we're letting some delalloc stat hang around
after the truncate, probably related to IO in progress.  We do already
account for delalloc in what we return to stat, but there's a corner
case involving truncate where we screw it up.


So the original testcase:


a) some "tool" creates sparse file
b) that tool does not sync explicitly and exits ..
c) tar is called immediately after that to archive the sparse file
d) tar considers [2] the file is completely sparse (because st_blocks is
   zero) and archives no data.  Here comes data loss.


will not happen. The application would basically have to mimick the
provided reproducer script and do the truncate/write loop and be lucky
enough to let tar hit the short race window.



Looking harder there is a race window that can trigger this without the 
truncate loop:


1) application calls write(), we make the pages delalloc (in-ram 
st_blocks goes up)

2) VM calls write_cache_pages, we go find a contiguous delalloc range
3) We call cow_file_range on the locked range of pages
4) cow_file_range clears the delalloc bits (in-ram st_blocks goes down)

< - race begins here ->

5) The io is started
6) The IO completes and extents are inserted into the metadata
7) the on disk/in-ram st_blocks goes up

<  race ends here >

This makes a ton more sense than leaking delalloc bits.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to mount degraded RAID5

2016-07-11 Thread Tomáš Hrdina
sudo btrfs-debug-tree /dev/sdc
It has 20 lines. Don't know, what you use for bigger files.


sudo btrfs-debug-tree -b 6062434418688 /dev/sdc
http://sebsauvage.net/paste/?fc156dee1d1deb3b#YpG/TA0H3I313jMuC4pgsdj++TcuDaFwWIBeuuOXfCA=
 
sudo btrfs-debug-tree -b 6062497202176 /dev/sdc
http://sebsauvage.net/paste/?86621abec9c239bd#kwTpZ7BZLcLw71yCfr3jHKZT08zsaXK3RgdFo7MFoFc=


sudo btrfs-debug-tree -b 6062470332416 /dev/sdc
http://sebsauvage.net/paste/?4ff40fa0b6b201c9#nFk7pT9MLj2w9egUJlgXdkmCkWyp1vSG0kADfq3J7eA=

It got some results, but I don't know, what to look for.


sudo btrfs-map-logical -l 7008899547136 /dev/sdc
parent transid verify failed on 7008807157760 wanted 70175 found 70133
parent transid verify failed on 7008807157760 wanted 70175 found 70133
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
bytenr mismatch, want=7008807157760, have=65536
mirror 1 logical 7008899547136 physical 735226609664 device /dev/sdb
mirror 2 logical 7008899547136 physical 3166748524544 device /dev/sdc


Also I don't know, what to do with this. How to compute new csum.

For me, it would be ok to give up and just start fresh.

Thank you
Tomas


 *From:* Chris Murphy
 *Sent:*  Sunday, July 10, 2016 10:08PM
 *To:* Tomáš Hrdina
*Cc:* Chris Murphy, Btrfs Btrfs
 *Subject:* Re: Unable to mount degraded RAID5

btrfs-debug-tree -b


---
Tato zpráva byla zkontrolována na viry programem Avast Antivirus.
https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: fix unexpected balance crash due to BUG_ON

2016-07-11 Thread Liu Bo
Mounting a btrfs can resume previous balance operations asynchronously.
An user got a crash when one drive has some corrupt sectors.

Since balance can cancel itself in case of any error, we can gracefully
return errors to upper layers and let balance do the cancel job.

Reported-by: sash 
Signed-off-by: Liu Bo 
---
v2: - Initialize path with NULL.
- Show more information when we bail out.

 fs/btrfs/volumes.c | 30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 589f128..348a183 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3421,7 +3421,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
u64 size_to_free;
u64 chunk_type;
struct btrfs_chunk *chunk;
-   struct btrfs_path *path;
+   struct btrfs_path *path = NULL;
struct btrfs_key key;
struct btrfs_key found_key;
struct btrfs_trans_handle *trans;
@@ -3455,13 +3455,35 @@ static int __btrfs_balance(struct btrfs_fs_info 
*fs_info)
ret = btrfs_shrink_device(device, old_size - size_to_free);
if (ret == -ENOSPC)
break;
-   BUG_ON(ret);
+   if (ret) {
+   /* btrfs_shrink_device never returns ret > 0 */
+   WARN_ON(ret > 0);
+   goto error;
+   }
 
trans = btrfs_start_transaction(dev_root, 0);
-   BUG_ON(IS_ERR(trans));
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   btrfs_info(fs_info,
+"%s:%d fails on btrfs_start_transaction() right after 
shrinking devivce %s (original size is %llu new size is %llu",
+  __func__, __LINE__,
+  rcu_str_deref(device->name), old_size,
+  old_size - size_to_free);
+   goto error;
+   }
 
ret = btrfs_grow_device(trans, device, old_size);
-   BUG_ON(ret);
+   if (ret) {
+   btrfs_end_transaction(trans, dev_root);
+   /* btrfs_grow_device never returns ret > 0 */
+   WARN_ON(ret > 0);
+   btrfs_info(fs_info,
+"%s:%d fails on btrfs_grow_device() right after shrinking 
devivce %s (original size is %llu new size is %llu",
+  __func__, __LINE__,
+  rcu_str_deref(device->name), old_size,
+  old_size - size_to_free);
+   goto error;
+   }
 
btrfs_end_transaction(trans, dev_root);
}
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: add option to run balance as daemon

2016-07-11 Thread Tomasz Torcz
On Mon, Jul 11, 2016 at 07:17:28AM -0400, Austin S. Hemmelgarn wrote:
> On 2016-07-11 03:26, Tomasz Torcz wrote:
> > On Tue, Jun 21, 2016 at 11:16:59AM -0400, Austin S. Hemmelgarn wrote:
> > > Currently, balance operations are run synchronously in the foreground.
> > > This is nice for interactive management, but is kind of crappy when you
> > > start looking at automation and similar things.
> > 
> >   It can be done with simplest systemd unit file:
> > btrfs-balance@.service:
> > ---
> > [Unit]
> > Description=btrfs balance for %I
> > 
> > [Service]
> > ExecStart=/usr/bin/btrfs balance start %I
> > ExecStop=/usr/bin/btrfs balance cancel %I
> > ---
> > 
> >   It automates quite nicely and needs no additional code.
> > 
> It's also entirely dependent on a couple of things:
> 1. You're running systemd (not everyone is, I'm certainly not).

  So instead of using widespread, tested code, you re-implement 
parts of it.  BTW, your patch for daemonizing does only 5 steps
out of 15 described in man 7 daemon.

> 2. You're only dealing with the local system.
> 
> The type of situation I'm thinking of is dealing with non-local systems.
> For example, running something like this:
> ssh user@remotehost btrfs balance start --background /
> Keeping the SSH connection open for the duration of the balance has issues
> for some people (may close without keep-alive set, uses network bandwidth
> with keep-alive set, many people who are hosted have bandwidth quotas
> still), and it's extremely useful to have the option to fire and forget.

  I don't get the local part.  Right now, when using above unit you can

ssh user@remotehost systemctl start btrfs-balance@-

(or even
systemctl -H user@remotehost start btrfs-balance@-)

 and balance for / runs in background on target host. With clean
environment, logs being captured, locking against multiple
startups and so on. Right now, without any additional code.

-- 
Tomasz   .. oo o.   oo o. .o   .o o. o. oo o.   ..
Torcz.. .o .o   .o .o oo   oo .o .. .. oo   oo
o.o.o.   .o .. o.   o. o. o.   o. o. oo .. ..   o.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v12.1 01/15] btrfs: expand cow_file_range() to support in-band dedup and subpage-blocksize

2016-07-11 Thread David Sterba
On Mon, Jul 11, 2016 at 11:05:29AM +0800, Qu Wenruo wrote:
> From: Wang Xiaoguang 
> 
> Extract cow_file_range() new parameters for both in-band dedupe and
> subpage sector size patchset.
> 
> This should make conflict of both patchset to minimal, and reduce the
> effort needed to rebase them.

Great, thanks. I did a test merge, there are still conflicts but they
seem to be resolvable more easily, picking the changes from the right
patchset. I haven't tested it so there's more work, but the point was to
get the conflict surface down.

There's another candidate, btrfs_set_extent_delalloc as it adds a
parameter, can you please send a similar patch for that?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: fix free space calculation in dump_space_info()

2016-07-11 Thread Duncan
Wang Xiaoguang posted on Mon, 11 Jul 2016 19:30:04 +0800 as excerpted:

> In btrfs, btrfs_space_info's bytes_may_use is treated as fs used space,
> as what we do in reserve_metadata_bytes() or
> btrfs_alloc_data_chunk_ondemand(), so in dump_space_info(), when
> calculating free space, we should also minus btrfs_space_info's
> bytes_may_use.
> 
> Signed-off-by: Wang Xiaoguang 
> ---
>  fs/btrfs/extent-tree.c | 4 ++--

I'm not a dev so won't evaluate the patch itself.  However, on kernel 
related mailing lists it's standard practice to include a short, often 
one-line per revision, revision changelog.  It helps reviewers keep track 
of what changed since the last time they looked at the patch.  See pretty 
much any on-list v2+ patch as an example.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stat() on btrfs reports the st_blocks with delay (data loss in archivers)

2016-07-11 Thread David Sterba
On Mon, Jul 11, 2016 at 11:00:55AM -0400, Chris Mason wrote:
> So, the real bug is that we're letting some delalloc stat hang around 
> after the truncate, probably related to IO in progress.  We do already 
> account for delalloc in what we return to stat, but there's a corner 
> case involving truncate where we screw it up.

So the original testcase:

> >> a) some "tool" creates sparse file
> >> b) that tool does not sync explicitly and exits ..
> >> c) tar is called immediately after that to archive the sparse file
> >> d) tar considers [2] the file is completely sparse (because st_blocks 
> >> is
> >>zero) and archives no data.  Here comes data loss.

will not happen. The application would basically have to mimick the
provided reproducer script and do the truncate/write loop and be lucky
enough to let tar hit the short race window.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stat() on btrfs reports the st_blocks with delay (data loss in archivers)

2016-07-11 Thread Chris Mason



On 07/11/2016 10:41 AM, David Sterba wrote:

On Sat, Jul 02, 2016 at 09:18:07AM +0200, Pavel Raiskup wrote:

There are optimizations in archivers (tar, rsync, ...) that rely on up2date
st_blocks info.  For example, in GNU tar there is optimization check [1]
whether the 'st_size' reports more data than the 'st_blocks' can hold --> then
tar considers that file is sparse (and does additional steps).

It looks like btrfs doesn't show correct value in 'st_blocks' until the data
are synced.  ATM, there happens that:

a) some "tool" creates sparse file
b) that tool does not sync explicitly and exits ..
c) tar is called immediately after that to archive the sparse file
d) tar considers [2] the file is completely sparse (because st_blocks is
   zero) and archives no data.  Here comes data loss.

Because we fixed 'btrfs' to report non-zero 'st_blocks' when the file data is
small and is in-lined (no real data blocks) -- I consider this is too bug in
btrfs worth fixing.

[1] 
http://git.savannah.gnu.org/cgit/paxutils.git/tree/lib/system.h?id=ec72abd9dd63bbff4534ec77e97b1a6cadfc3cf8#n392
[2] 
http://git.savannah.gnu.org/cgit/tar.git/tree/src/sparse.c?id=ac065c57fdc1788a2769fb119ed0c8146e1b9dd6#n273

Tested on kernel:
kernel-4.5.7-300.fc24.x86_64

Originally reported here, reproducer available there:
https://bugzilla.redhat.com/show_bug.cgi?id=1352061


The reproducer works for me here. So far I found:

* the btrfs implementation of stat.st_blocks (btrfs_getattr) includes
  the 'delayed allocated' bytes, so there is not a problem in principle
  
(https://urldefense.proofpoint.com/v2/url?u=http-3A__lxr.free-2Delectrons.com_source_fs_btrfs_inode.c-23L9372=CwIBAg=5VD0RTtNlTh3ycd41b3MUw=9QPtTAxcitoznaWRKKHoEQ=Y07_PApD0zOC-eaM4Hq-oxAwNlktnY8bDo7LD2GbXRo=Is_ZqFiL7a4sVWAB1k1ZuAgbNMK-sZ1gcU7oLtyfKoY=
 )

* calling fsync on the sparsefile will produce the expected result

* a short delay between ./binary and 'stat' will also produce correct
  result, 0.5 seconds worked for me -- so it IMO proves it's a race
  between writing and reporting the data

* I'm not yet sure where the delay between write and synced
  'inode->delalloc_bytes' comes from

* I think that st_blocks accounting can be wrong anyway, if the file is
  mmap-ed and not msync-ed, I'm writing a reproducer for this case


On my test box running current linux git, things work fine if I run the 
reproducer once.  But if I leave it running in a loop long enough for 
writeback to kick in, I trigger it.


The reproducer has a loop in there where it is adding delalloc writes 
and truncating them away.  What should be happening is that we're 
leaving some delalloc bits set past EOF, which makes us skip bumping 
inode->delalloc_bytes during the new write.


I can kind of confirm this by changing the reproducer to stat directly 
after the write call.  Normally st_blocks is never zero.  But if I leave 
it running in a loop for 30 seconds or so, I eventually get st_block 
zero directly after the write().


If I change the C program to unlink the file on exit, running the binary 
over and over again works every time.


So, the real bug is that we're letting some delalloc stat hang around 
after the truncate, probably related to IO in progress.  We do already 
account for delalloc in what we return to stat, but there's a corner 
case involving truncate where we screw it up.


-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stat() on btrfs reports the st_blocks with delay (data loss in archivers)

2016-07-11 Thread David Sterba
On Sat, Jul 02, 2016 at 09:18:07AM +0200, Pavel Raiskup wrote:
> There are optimizations in archivers (tar, rsync, ...) that rely on up2date
> st_blocks info.  For example, in GNU tar there is optimization check [1]
> whether the 'st_size' reports more data than the 'st_blocks' can hold --> then
> tar considers that file is sparse (and does additional steps).
> 
> It looks like btrfs doesn't show correct value in 'st_blocks' until the data
> are synced.  ATM, there happens that:
> 
> a) some "tool" creates sparse file
> b) that tool does not sync explicitly and exits ..
> c) tar is called immediately after that to archive the sparse file
> d) tar considers [2] the file is completely sparse (because st_blocks is
>zero) and archives no data.  Here comes data loss.
> 
> Because we fixed 'btrfs' to report non-zero 'st_blocks' when the file data is
> small and is in-lined (no real data blocks) -- I consider this is too bug in
> btrfs worth fixing.
> 
> [1] 
> http://git.savannah.gnu.org/cgit/paxutils.git/tree/lib/system.h?id=ec72abd9dd63bbff4534ec77e97b1a6cadfc3cf8#n392
> [2] 
> http://git.savannah.gnu.org/cgit/tar.git/tree/src/sparse.c?id=ac065c57fdc1788a2769fb119ed0c8146e1b9dd6#n273
> 
> Tested on kernel:
> kernel-4.5.7-300.fc24.x86_64
> 
> Originally reported here, reproducer available there:
> https://bugzilla.redhat.com/show_bug.cgi?id=1352061

The reproducer works for me here. So far I found:

* the btrfs implementation of stat.st_blocks (btrfs_getattr) includes
  the 'delayed allocated' bytes, so there is not a problem in principle
  (http://lxr.free-electrons.com/source/fs/btrfs/inode.c#L9372)

* calling fsync on the sparsefile will produce the expected result

* a short delay between ./binary and 'stat' will also produce correct
  result, 0.5 seconds worked for me -- so it IMO proves it's a race
  between writing and reporting the data

* I'm not yet sure where the delay between write and synced
  'inode->delalloc_bytes' comes from

* I think that st_blocks accounting can be wrong anyway, if the file is
  mmap-ed and not msync-ed, I'm writing a reproducer for this case
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs won't mount to /home

2016-07-11 Thread Tomasz Chmielewski

On 2016-07-11 22:56, Roman Mamedov wrote:

On Mon, 11 Jul 2016 22:45:13 +0900
Tomasz Chmielewski  wrote:


So, weird, isn't it?

What's wrong there?


Your systemd unmounts it immediately from /home, search the archives 
there's

been a funny story like that recently.


Yes, could be similar to this one: https://bugs.archlinux.org/task/44658

In my case, the entry did exist in /etc/fstab, but was different than at 
the time of booting the system.




Tomasz Chmielewski
https://wpkg.org
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] receive: strip root subvol path during process_clone

2016-07-11 Thread Benedikt Morbach
Hi David,

On Fri, Jun 17, 2016 at 7:18 PM, David Sterba  wrote:
> thanks for the fix. The mail thread holds enough information to
> reproduce the issue, I'll get to that on Monday. I'm not sure about the
> use of strstr to match the subvolume name so I want to write the tests
> first and craft some paths to see how it works.

any update on this? I've been running this patch for a few weeks now
with my hourly backups.
I could also take a stab at writing a test case but I think I'd need
some pointers for that. ;-)

I took a look at the send/receive patches that Filipe sent earlier and
afaict this isn't fixed in those.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs won't mount to /home

2016-07-11 Thread Roman Mamedov
On Mon, 11 Jul 2016 22:45:13 +0900
Tomasz Chmielewski  wrote:
 
> So, weird, isn't it?
>
> What's wrong there?

Your systemd unmounts it immediately from /home, search the archives there's
been a funny story like that recently.

-- 
With respect,
Roman


pgpUH09kQqplk.pgp
Description: OpenPGP digital signature


btrfs won't mount to /home

2016-07-11 Thread Tomasz Chmielewski

This is kind of strange (kernel 4.6.3, Ubuntu 16.04):

# mount -t btrfs /dev/sda4 /home

# echo $?
0

# dmesg -c
[382148.588847] BTRFS info (device sda4): disk space caching is enabled
[382148.588851] BTRFS: has skinny extents

So it worked?

# ls /home



# df | grep home

# mount | grep /home

# lsof -n | grep /home


All give no output.


So, weird, isn't it?

Now, let's try to mount to /home2:

# mkdir /home2

# mount /dev/sda4 /home2

# mount | grep home
/dev/sda4 on /home2 type btrfs 
(rw,relatime,space_cache,subvolid=5,subvol=/)


# dmesg -c
[382190.199363] BTRFS info (device sda4): disk space caching is enabled
[382190.199370] BTRFS: has skinny extents



What's wrong there?


Tomasz Chmielewski
https://wpkg.org
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate()

2016-07-11 Thread Wang Xiaoguang

hello,

Please ignore this patch, though this patch is correct to me, and pass
the fstests test. I have prepared a new common patch to fix this false
ENOSPC bug. Currently I'm doing fstests test, and will sent them
tomorrow, thanks.

Regards,
Xiaoguang Wang

On 07/06/2016 06:37 PM, Wang Xiaoguang wrote:

Below test scripts can reproduce this false ENOSPC:
#!/bin/bash
dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
dev=$(losetup --show -f fs.img)
mkfs.btrfs -f -M $dev
mkdir /tmp/mntpoint
mount /dev/loop0 /tmp/mntpoint
cd mntpoint
xfs_io -f -c "falloc 0 $((40*1024*1024))" testfile

Above fallocate(2) operation will fail for ENOSPC reason, but indeed
fs still has free space to satisfy this request. The reason is
btrfs_fallocate() dose not decrease btrfs_space_info's bytes_may_use
just in time, and it calls btrfs_free_reserved_data_space_noquota() in
the end of btrfs_fallocate(), which is too late and have already added
false unnecessary pressure to enospc system. See call graph:
btrfs_fallocate()
|-> btrfs_alloc_data_chunk_ondemand()
 It will add btrfs_space_info's bytes_may_use accordingly.
|-> btrfs_prealloc_file_range()
 It will call btrfs_reserve_extent(), but note that alloc type is
 RESERVE_ALLOC_NO_ACCOUNT, so btrfs_update_reserved_bytes() will
 only increase btrfs_space_info's bytes_reserved accordingly, but
 will not decrease btrfs_space_info's bytes_may_use, then obviously
 we have overestimated real needed disk space, and it'll impact
 other processes who do write(2) or fallocate(2) operations, also
 can impact metadata reservation in mixed mode, and bytes_max_use
 will only be decreased in the end of btrfs_fallocate(). To fix
 this false ENOSPC, we need to decrease btrfs_space_info's
 bytes_may_use in btrfs_prealloc_file_range() in time, as what we
 do in cow_file_range(),
 See call graph in :
 cow_file_range()
 |-> extent_clear_unlock_delalloc()
 |-> clear_extent_bit()
 |-> btrfs_clear_bit_hook()
 |-> btrfs_free_reserved_data_space_noquota()
 This function will decrease bytes_may_use accordingly.

So this patch choose to call btrfs_free_reserved_data_space() in
__btrfs_prealloc_file_range() for both successful and failed path.

Also this patch removes some old and useless comments.

Signed-off-by: Wang Xiaoguang 
---
  fs/btrfs/extent-tree.c |  1 -
  fs/btrfs/file.c| 23 ---
  fs/btrfs/inode-map.c   |  3 +--
  fs/btrfs/inode.c   | 12 
  fs/btrfs/relocation.c  | 10 +-
  5 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 82b912a..b0c86d2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3490,7 +3490,6 @@ again:
dcs = BTRFS_DC_SETUP;
else if (ret == -ENOSPC)
set_bit(BTRFS_TRANS_CACHE_ENOSPC, >transaction->flags);
-   btrfs_free_reserved_data_space(inode, 0, num_pages);
  
  out_put:

iput(inode);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 2234e88..f872113 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2669,6 +2669,7 @@ static long btrfs_fallocate(struct file *file, int mode,
  
  	alloc_start = round_down(offset, blocksize);

alloc_end = round_up(offset + len, blocksize);
+   cur_offset = alloc_start;
  
  	/* Make sure we aren't being give some crap mode */

if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
@@ -2761,7 +2762,6 @@ static long btrfs_fallocate(struct file *file, int mode,
  
  	/* First, check if we exceed the qgroup limit */

INIT_LIST_HEAD(_list);
-   cur_offset = alloc_start;
while (1) {
em = btrfs_get_extent(inode, NULL, 0, cur_offset,
  alloc_end - cur_offset, 0);
@@ -2788,6 +2788,14 @@ static long btrfs_fallocate(struct file *file, int mode,
last_byte - cur_offset);
if (ret < 0)
break;
+   } else {
+   /*
+* Do not need to reserve unwritten extent for this
+* range, free reserved data space first, otherwise
+* it'll result false ENOSPC error.
+*/
+   btrfs_free_reserved_data_space(inode, cur_offset,
+   last_byte - cur_offset);
}
free_extent_map(em);
cur_offset = last_byte;
@@ -2839,18 +2847,11 @@ out_unlock:
unlock_extent_cached(_I(inode)->io_tree, alloc_start, locked_end,
 _state, GFP_KERNEL);
  out:
-   /*
-* As we waited the extent range, the data_rsv_map must be empty
-* in the range, as 

[PATCH v2] btrfs: fix free space calculation in dump_space_info()

2016-07-11 Thread Wang Xiaoguang
In btrfs, btrfs_space_info's bytes_may_use is treated as fs used
space, as what we do in reserve_metadata_bytes() or
btrfs_alloc_data_chunk_ondemand(), so in dump_space_info(), when
calculating free space, we should also minus btrfs_space_info's
bytes_may_use.

Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/extent-tree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 81310ff..f1cc8b8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7694,8 +7694,8 @@ static void dump_space_info(struct btrfs_space_info 
*info, u64 bytes,
printk(KERN_INFO "BTRFS: space_info %llu has %llu free, is %sfull\n",
   info->flags,
   info->total_bytes - info->bytes_used - info->bytes_pinned -
-  info->bytes_reserved - info->bytes_readonly,
-  (info->full) ? "" : "not ");
+  info->bytes_reserved - info->bytes_readonly -
+  info->bytes_may_use, (info->full) ? "" : "not ");
printk(KERN_INFO "BTRFS: space_info total=%llu, used=%llu, pinned=%llu, 
"
   "reserved=%llu, may_use=%llu, readonly=%llu\n",
   info->total_bytes, info->bytes_used, info->bytes_pinned,
-- 
2.9.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs: fix free space calculation in dump_space_info()

2016-07-11 Thread Wang Xiaoguang

hello,

On 07/08/2016 10:28 PM, David Sterba wrote:

On Wed, Jul 06, 2016 at 06:16:06PM +0800, Wang Xiaoguang wrote:

hello,

On 07/05/2016 01:10 AM, David Sterba wrote:

On Wed, Jun 29, 2016 at 01:12:16PM +0800, Wang Xiaoguang wrote:

Can you please describe in more detail what is this patch fixing?

In original dump_space_info(), free space info is calculated by
info->total_bytes - info->bytes_used - info->bytes_pinned -
info->bytes_reserved - info->bytes_readonly,
but I think free space info should also minus info->bytes_may_use :)

Not really what I expected. The change is correct but the changelog
should say something "the 'used space' formula is missing the
bytes_may_used, that is used elsewhere eg. __reserve_metadata_bytes or
space_info_add_old_bytes". That way I have something to verify during
the review.

Sorry, later I'll try to make my patches more cleaner, thanks.

Regards,
Xiaoguang Wang








--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND PATCH] btrfs: Handle uninitialised inode eviction

2016-07-11 Thread David Sterba
On Mon, Jul 11, 2016 at 01:57:10PM +0300, Nikolay Borisov wrote:
> 
> 
> On 07/11/2016 11:48 AM, David Sterba wrote:
> > On Mon, Jul 11, 2016 at 09:43:09AM +0300, Nikolay Borisov wrote:
> >> The code flow in btrfs_new_inode allows for btrfs_evict_inode to be
> >> called with not fully initialised inode (e.g. ->root member not
> >> being set). This can happen when btrfs_set_inode_index in
> >> btrfs_new_inode fails, which in turn would call iput for the newly
> >> allocated inode. This in turn leads to vfs calling into btrfs_evict_inode.
> >> This leads to null pointer dereference. To handle this situation check 
> >> whether
> >> the passed inode has root set and just free it in case it doesn't.
> >>
> >> Signed-off-by: Nikolay Borisov 
> >> Reviewed-by: Josef Bacik 
> >> ---
> >>  fs/btrfs/inode.c | 9 -
> >>  1 file changed, 8 insertions(+), 1 deletion(-)
> >>
> >> Hello, 
> >>
> >> I belive this is fixes the issue reported in 
> >> http://thread.gmane.org/gmane.comp.file-systems.btrfs/57809
> > 
> > There's some time left before 4.7 release, so I'll send another pull
> > request, including this patch.
> 
> Now that I think about it, shouldn't this also be queued for stable as well?

Yes. Marking patches with stable tags has been very inconsistent so we
send patches to stable separately.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs: test that send does not issue invalid rmdir operations

2016-07-11 Thread fdmanana
From: Filipe Manana 

Test that an incremental send operation does not prematurely issues rmdir
operations under a particular scenario (the rmdir operation is sent before
the target directory is empty).

This issue is fixed by the following patch for the linux kernel:

  "Btrfs: incremental send, fix premature rmdir operations"

Signed-off-by: Filipe Manana 
---
 tests/btrfs/126 | 123 
 tests/btrfs/126.out |   2 +
 tests/btrfs/group   |   1 +
 3 files changed, 126 insertions(+)
 create mode 100755 tests/btrfs/126
 create mode 100644 tests/btrfs/126.out

diff --git a/tests/btrfs/126 b/tests/btrfs/126
new file mode 100755
index 000..7d33549
--- /dev/null
+++ b/tests/btrfs/126
@@ -0,0 +1,123 @@
+#! /bin/bash
+# FS QA Test No. btrfs/126
+#
+# Test that an incremental send operation does not prematurely issues rmdir
+# operations under a particular scenario (the rmdir operation is sent before
+# the target directory is empty).
+#
+#---
+# Copyright (C) 2016 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana 
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   rm -fr $send_files_dir
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_fssum
+
+send_files_dir=$TEST_DIR/btrfs-test-$seq
+
+rm -f $seqres.full
+rm -fr $send_files_dir
+mkdir $send_files_dir
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+
+mkdir $SCRATCH_MNT/a
+mkdir $SCRATCH_MNT/tmp
+mkdir $SCRATCH_MNT/del
+mv $SCRATCH_MNT/tmp $SCRATCH_MNT/del
+mkdir $SCRATCH_MNT/a/c
+mkdir $SCRATCH_MNT/del/x
+
+# Filesystem looks like:
+#
+# . (ino 256)
+# |--- a/   (ino 257)
+# ||--- c/  (ino 260)
+# |
+# |--- del/ (ino 259)
+#   |--- tmp/   (ino 258)
+#   |--- x/ (ino 261)
+#
+_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
+
+# When inode 260 was processed, rename operations for it and for inode 258 were
+# issued (the rename for inode 260 must happen before the rename for inode 
258).
+# Then immediately after issuing the rename operation for inode 258, and before
+# inode 261 was processed, the send stream issued a rmdir operation for inode
+# 260, which would make the receiver fail with the error ENOTEMPTY because 
inode
+# 261 was not yet renamed, it was still a child of inode 260 at that time.
+#
+mv $SCRATCH_MNT/a/c $SCRATCH_MNT
+mv $SCRATCH_MNT/del/x $SCRATCH_MNT/a
+mv $SCRATCH_MNT/del/tmp $SCRATCH_MNT/c
+rmdir $SCRATCH_MNT/del
+
+# Filesystem now looks like:
+#
+# . (ino 256)
+# |--- a/   (ino 257)
+# ||--- x/  (ino 261)
+# |
+# |--- c/   (ino 260)
+#  |--- tmp/(ino 258)
+#
+_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
+
+run_check $FSSUM_PROG -A -f -w $send_files_dir/1.fssum $SCRATCH_MNT/mysnap1
+run_check $FSSUM_PROG -A -f -w $send_files_dir/2.fssum \
+   -x $SCRATCH_MNT/mysnap2/mysnap1 $SCRATCH_MNT/mysnap2
+
+_run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $send_files_dir/1.snap
+_run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
+   -f $send_files_dir/2.snap
+
+# Now recreate the filesystem by receiving both send streams and verify we get
+# the same content that the original filesystem had.
+_scratch_unmount
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+

[PATCH 2/3] btrfs: add test for incremental send after removing a directory

2016-07-11 Thread fdmanana
From: Filipe Manana 

Test that, under a particular scenario, an incremental send operation does
not leak memory (which used to emit a warning in dmesg/syslog).

This is a regression test for a btrfs kernel fix that has the title:
"Btrfs: send, fix warning due to late freeing of orphan_dir_info
structures".

Signed-off-by: Filipe Manana 
---
 tests/btrfs/125 | 127 
 tests/btrfs/125.out |   2 +
 tests/btrfs/group   |   1 +
 3 files changed, 130 insertions(+)
 create mode 100755 tests/btrfs/125
 create mode 100644 tests/btrfs/125.out

diff --git a/tests/btrfs/125 b/tests/btrfs/125
new file mode 100755
index 000..2ee39c7
--- /dev/null
+++ b/tests/btrfs/125
@@ -0,0 +1,127 @@
+#! /bin/bash
+# FS QA Test No. btrfs/125
+#
+# Test that, under a particular scenario, an incremental send operation does
+# not leak memory (which used to emit a warning in dmesg/syslog).
+#
+#---
+# Copyright (C) 2016 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana 
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   rm -fr $send_files_dir
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_fssum
+
+send_files_dir=$TEST_DIR/btrfs-test-$seq
+
+rm -f $seqres.full
+rm -fr $send_files_dir
+mkdir $send_files_dir
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+
+mkdir $SCRATCH_MNT/a
+mkdir $SCRATCH_MNT/tmp
+mkdir $SCRATCH_MNT/del
+mv $SCRATCH_MNT/tmp $SCRATCH_MNT/del
+mkdir $SCRATCH_MNT/a/c
+mkdir $SCRATCH_MNT/del/x
+mkdir $SCRATCH_MNT/del/y
+
+# Filesystem looks like:
+#
+# . (ino 256)
+# |--- a/   (ino 257)
+# ||--- c/  (ino 260)
+# |
+# |--- del/ (ino 259)
+#   |--- tmp/   (ino 258)
+#   |--- x/ (ino 261)
+#   |--- y/ (ino 262)
+#
+_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
+
+# It used to be that when attempting to issue an rmdir operation for inode 259,
+# the kernel allocated an orphan_dir_info structure so that later after doing
+# the delayed rename operation for inode 258 (which happened once inode 260 was
+# renamed) it would check if it could finally issue a rmdir instruction for
+# inode 259. If it couldn't, it would not release the previously allocated
+# orphan_dir_info structure immediately. Instead it would only release it once
+# it finished the send stream and it would emit a warning in dmesg/syslog.
+#
+mv $SCRATCH_MNT/a/c $SCRATCH_MNT
+mv $SCRATCH_MNT/del/x $SCRATCH_MNT/a
+mv $SCRATCH_MNT/del/y $SCRATCH_MNT/a
+mv $SCRATCH_MNT/del/tmp $SCRATCH_MNT/c
+rmdir $SCRATCH_MNT/del
+
+# Filesystem now looks like:
+#
+# . (ino 256)
+# |--- a/   (ino 257)
+# ||--- x/  (ino 261)
+# ||--- y/  (ino 262)
+# |
+# |--- c/   (ino 260)
+#  |--- tmp/(ino 258)
+#
+_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
+
+run_check $FSSUM_PROG -A -f -w $send_files_dir/1.fssum $SCRATCH_MNT/mysnap1
+run_check $FSSUM_PROG -A -f -w $send_files_dir/2.fssum \
+   -x $SCRATCH_MNT/mysnap2/mysnap1 $SCRATCH_MNT/mysnap2
+
+_run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $send_files_dir/1.snap
+_run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
+   -f $send_files_dir/2.snap
+

Re: [PATCH] btrfs-progs: add option to run balance as daemon

2016-07-11 Thread Austin S. Hemmelgarn

On 2016-07-11 03:26, Tomasz Torcz wrote:

On Tue, Jun 21, 2016 at 11:16:59AM -0400, Austin S. Hemmelgarn wrote:

Currently, balance operations are run synchronously in the foreground.
This is nice for interactive management, but is kind of crappy when you
start looking at automation and similar things.


  It can be done with simplest systemd unit file:
btrfs-balance@.service:
---
[Unit]
Description=btrfs balance for %I

[Service]
ExecStart=/usr/bin/btrfs balance start %I
ExecStop=/usr/bin/btrfs balance cancel %I
---

  It automates quite nicely and needs no additional code.


It's also entirely dependent on a couple of things:
1. You're running systemd (not everyone is, I'm certainly not).
2. You're only dealing with the local system.

The type of situation I'm thinking of is dealing with non-local systems. 
 For example, running something like this:

ssh user@remotehost btrfs balance start --background /
Keeping the SSH connection open for the duration of the balance has 
issues for some people (may close without keep-alive set, uses network 
bandwidth with keep-alive set, many people who are hosted have bandwidth 
quotas still), and it's extremely useful to have the option to fire and 
forget.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs: add test for an incremental send after moving directories around

2016-07-11 Thread fdmanana
From: Filipe Manana 

Test that an incremental send operation works after doing radical changes
in the directory hierarchy that involve switching the inode that directory
entries point to.

This test exercises scenarios used to fail in btrfs and are fixed by the
following patches for the linux kernel:

 "Btrfs: send, fix failure to move directories with the same name around"
 "Btrfs: incremental send, fix invalid paths for rename operations"

Signed-off-by: Filipe Manana 
---
 tests/btrfs/124 | 261 
 tests/btrfs/124.out |   2 +
 tests/btrfs/group   |   1 +
 3 files changed, 264 insertions(+)
 create mode 100755 tests/btrfs/124
 create mode 100644 tests/btrfs/124.out

diff --git a/tests/btrfs/124 b/tests/btrfs/124
new file mode 100755
index 000..38635a3
--- /dev/null
+++ b/tests/btrfs/124
@@ -0,0 +1,261 @@
+#! /bin/bash
+# FS QA Test No. btrfs/124
+#
+# Test that an incremental send operation works after doing radical changes
+# in the directory hierarchy that involve switching the inode that directory
+# entries point to.
+#
+#---
+# Copyright (C) 2016 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana 
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   rm -fr $send_files_dir
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_fssum
+
+send_files_dir=$TEST_DIR/btrfs-test-$seq
+
+rm -f $seqres.full
+rm -fr $send_files_dir
+mkdir $send_files_dir
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+
+# case 1
+mkdir -p $SCRATCH_MNT/case_1/d/p1
+mkdir $SCRATCH_MNT/case_1/p1
+
+# case 2
+mkdir -p $SCRATCH_MNT/case_2/a
+mkdir $SCRATCH_MNT/case_2/d
+mkdir $SCRATCH_MNT/case_2/e
+mkdir $SCRATCH_MNT/case_2/f
+mkdir $SCRATCH_MNT/case_2/ance
+mkdir $SCRATCH_MNT/case_2/d/ance
+mkdir $SCRATCH_MNT/case_2/a/c
+mv $SCRATCH_MNT/case_2/e $SCRATCH_MNT/case_2/d/ance
+mv $SCRATCH_MNT/case_2/f $SCRATCH_MNT/case_2/d/ance
+mv $SCRATCH_MNT/case_2/ance $SCRATCH_MNT/case_2/d/ance
+
+# case 3
+mkdir -p $SCRATCH_MNT/case_3/d
+mkdir $SCRATCH_MNT/case_3/a
+mkdir $SCRATCH_MNT/case_3/waiting_dir
+mkdir -p $SCRATCH_MNT/case_3/pre/ance
+mkdir $SCRATCH_MNT/case_3/d/ance
+mkdir $SCRATCH_MNT/case_3/a/c
+mv $SCRATCH_MNT/case_3/waiting_dir $SCRATCH_MNT/case_3/d/ance
+
+# case 4
+mkdir -p $SCRATCH_MNT/case_4/tmp
+mkdir $SCRATCH_MNT/case_4/below_ance
+mkdir -p $SCRATCH_MNT/case_4/pre/wait_dir
+mkdir $SCRATCH_MNT/case_4/desc
+mkdir $SCRATCH_MNT/case_4/ance
+mv $SCRATCH_MNT/case_4/below_ance $SCRATCH_MNT/case_4/ance
+mkdir $SCRATCH_MNT/case_4/other_dir
+
+# Filesystem looks like:
+#
+# .  (ino 256)
+# |--- case_1/   (ino 257)
+# |   | d/   (ino 258)
+# |   | |--- p1/ (ino 259)
+# |   |
+# |   | p1/  (ino 260)
+# |
+# |--- case_2/   (ino 261)
+# |   | a/   (ino 262)
+# |   | | c/ (ino 268)
+# |   |
+# |   | d/   (ino 263)
+# | | ance/  (ino 267)
+# | | e/ (ino 264)
+# | | f/ (ino 265)
+# | | ance/  (ino 266)
+# |
+# |--- case_3/   (ino 269)
+# |   | a/   (ino 271)
+# |   | | c/

[PATCH 5/7] Btrfs: send, fix warning due to late freeing of orphan_dir_info structures

2016-07-11 Thread fdmanana
From: Robbie Ko 

Under certain situations, when doing an incremental send, we can end up
not freeing orphan_dir_info structures as soon as they are no longer
needed. Instead we end up freeing them only after finishing the send
stream, which causes a warning to be emitted:

[282735.229200] [ cut here ]
[282735.229968] WARNING: CPU: 9 PID: 10588 at fs/btrfs/send.c:6298 
btrfs_ioctl_send+0xe2f/0xe51 [btrfs]
[282735.231282] Modules linked in: btrfs crc32c_generic xor raid6_pq 
acpi_cpufreq tpm_tis ppdev tpm parport_pc psmouse parport sg pcspkr i2c_piix4 
i2c_core evdev processor serio_raw button loop autofs4 ext4 crc16 jbd2 mbcache 
sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci 
virtio_ring virtio e1000 scsi_mod floppy [last unloaded: btrfs]
[282735.237130] CPU: 9 PID: 10588 Comm: btrfs Tainted: GW   
4.6.0-rc7-btrfs-next-31+ #1
[282735.239309] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by 
qemu-project.org 04/01/2014
[282735.240160]   880224273ca8 8126b42c 

[282735.240160]   880224273ce8 81052b14 
189a24273ac8
[282735.240160]  8802210c9800  0001 

[282735.240160] Call Trace:
[282735.240160]  [] dump_stack+0x67/0x90
[282735.240160]  [] __warn+0xc2/0xdd
[282735.240160]  [] warn_slowpath_null+0x1d/0x1f
[282735.240160]  [] btrfs_ioctl_send+0xe2f/0xe51 [btrfs]
[282735.240160]  [] btrfs_ioctl+0x14f/0x1f81 [btrfs]
[282735.240160]  [] ? arch_local_irq_save+0x9/0xc
[282735.240160]  [] vfs_ioctl+0x18/0x34
[282735.240160]  [] do_vfs_ioctl+0x550/0x5be
[282735.240160]  [] ? __fget+0x6b/0x77
[282735.240160]  [] ? __fget_light+0x62/0x71
[282735.240160]  [] SyS_ioctl+0x57/0x79
[282735.240160]  [] entry_SYSCALL_64_fastpath+0x18/0xa8
[282735.240160]  [] ? time_hardirqs_off+0x9/0x14
[282735.240160]  [] ? trace_hardirqs_off_caller+0x1f/0xaa
[282735.256343] ---[ end trace a4539270c8056f93 ]---

Consider the following example:

  Parent snapshot:

  . (ino 256)
  |--- a/   (ino 257)
  ||--- c/  (ino 260)
  |
  |--- del/ (ino 259)
|--- tmp/   (ino 258)
|--- x/ (ino 261)
|--- y/ (ino 262)

  Send snapshot:

  . (ino 256)
  |--- a/   (ino 257)
  ||--- x/  (ino 261)
  ||--- y/  (ino 262)
  |
  |--- c/   (ino 260)
   |--- tmp/(ino 258)

1) When processing inode 258, we end up delaying its rename operation
   because it has an ancestor (in the send snapshot) that has a higher
   inode number (inode 260) which was also renamed in the send snapshot,
   therefore we delay the rename of inode 258 so that it happens after
   inode 260 is renamed;

2) When processing inode 259, we end up delaying its deletion (rmdir
   operation) because it has a child inode (258) that has its rename
   operation delayed. At this point we allocate an orphan_dir_info
   structure and tag inode 258 so that we later attempt to see if we
   can delete (rmdir) inode 259 once inode 258 is renamed;

3) When we process inode 260, after renaming it we finally do the rename
   operation for inode 258. Once we issue the rename operation for inode
   258 we notice that this inode was tagged so that we attempt to see
   if at this point we can delete (rmdir) inode 259. But at this point
   we can not still delete inode 259 because it has 2 children, inodes
   261 and 262, that were not yet processed and therefore not yet
   moved (renamed) away from inode 259. We end up not freeing the
   orphan_dir_info structure allocated in step 2;

4) We process inodes 261 and 262, and once we move/rename inode 262
   we issue the rmdir operation for inode 260;

5) We finish the send stream and notice that red black tree that
   contains orphan_dir_info structures is not empty, so we emit
   a warning and then free any orphan_dir_structures left.

So fix this by freeing an orphan_dir_info structure once we try to
apply a pending rename operation if we can not delete yet the tagged
directory.

A test case for fstests follows soon.

Signed-off-by: Robbie Ko 
Signed-off-by: Filipe Manana 
[Modified changelog to be more detailed and easier to understand]

Signed-off-by: Filipe Manana 
---
 fs/btrfs/send.c | 4 
 1 

[PATCH 4/7] Btrfs: incremental send, fix premature rmdir operations

2016-07-11 Thread fdmanana
From: Robbie Ko 

Under certain situations, an incremental send operation can contain
a rmdir operation that will make the receiving end fail when attempting
to execute it, because the target directory is not yet empty.

Consider the following example:

  Parent snapshot:

  . (ino 256)
  |--- a/   (ino 257)
  ||--- c/  (ino 260)
  |
  |--- del/ (ino 259)
|--- tmp/   (ino 258)
|--- x/ (ino 261)

  Send snapshot:

  . (ino 256)
  |--- a/   (ino 257)
  ||--- x/  (ino 261)
  |
  |--- c/   (ino 260)
   |--- tmp/(ino 258)

1) When processing inode 258, we delay its rename operation because inode
   260 is its new parent in the send snapshot and it was not yet renamed
   (since 260 > 258, that is, beyond the current progress);

2) When processing inode 259, we realize we can not yet send an rmdir
   operation (against inode 259) because inode 258 was still not yet
   renamed/moved away from inode 259. Therefore we update data structures
   so that after inode 258 is renamed, we try again to see if we can
   finally send an rmdir operation for inode 259;

3) When we process inode 260, we send a rename operation for it followed
   by a rename operation for inode 258. Once we send the rename operation
   for inode 258 we then check if we can finally issue an rmdir for its
   previous parent, inode 259, by calling the can_rmdir() function with
   a value of sctx->cur_ino + 1 (260 + 1 = 261) for its "progress"
   argument. This makes can_rmdir() return true (value 1) because even
   though there's still a child inode of inode 259 that was not yet
   renamed/moved, which is inode 261, the given value of progress (261)
   is not lower then 261 (that is, not lower than the inode number of
   some child of inode 259). So we end up sending a rmdir operation for
   inode 259 before its child inode 261 is processed and renamed.

So fix this by passing the correct progress value to the call to
can_rmdir() from within apply_dir_move() (where we issue delayed rename
operations), which should match stcx->cur_ino (the number of the inode
currently being processed) and not sctx->cur_ino + 1.

A test case for fstests follows soon.

Signed-off-by: Robbie Ko 
Signed-off-by: Filipe Manana 
[Rewrote change log to be more detailed, clear and well formatted]

Signed-off-by: Filipe Manana 
---
 fs/btrfs/send.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 3d6a4dd..4115aba 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3236,7 +3236,7 @@ static int apply_dir_move(struct send_ctx *sctx, struct 
pending_dir_move *pm)
/* already deleted */
goto finish;
}
-   ret = can_rmdir(sctx, rmdir_ino, odi->gen, sctx->cur_ino + 1);
+   ret = can_rmdir(sctx, rmdir_ino, odi->gen, sctx->cur_ino);
if (ret < 0)
goto out;
if (!ret)
-- 
2.7.0.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] Btrfs: send, fix invalid leaf accesses due to incorrect utimes operations

2016-07-11 Thread fdmanana
From: Robbie Ko 

During an incremental send, if we have delayed rename operations for inodes
that were children of directories which were removed in the send snapshot,
we can end up accessing incorrect items in a leaf or accessing beyond the
last item of the leaf due to issuing utimes operations for the removed
inodes. Consider the following example:

  Parent snapshot:
  . (ino 256)
  |--- a/   (ino 257)
  ||--- c/  (ino 262)
  |
  |--- b/   (ino 258)
  ||--- d/  (ino 263)
  |
  |--- del/ (ino 261)
|--- x/ (ino 259)
|--- y/ (ino 260)

  Send snapshot:

  . (ino 256)
  |--- a/   (ino 257)
  |
  |--- b/   (ino 258)
  |
  |--- c/   (ino 262)
  ||--- y/  (ino 260)
  |
  |--- d/   (ino 263)
   |--- x/  (ino 259)

1) When processing inodes 259 and 260, we end up delaying their rename
   operations because their parents, inodes 263 and 262 respectively, were
   not yet processed and therefore not yet renamed;

2) When processing inode 262, its rename operation is issued and right
   after the rename operation for inode 260 is issued. However right after
   issuing the rename operation for inode 260, at send.c:apply_dir_move(),
   we issue utimes operations for all current and past parents of inode
   260. This means we try to send a utimes operation for its old parent,
   inode 261 (deleted in the send snapshot), which does not cause any
   immediate and deterministic failure, because when the target inode is
   not found in the send snapshot, the send.c:send_utimes() function
   ignores it and uses the leaf region pointed to by path->slots[0],
   which can be any unrelated item (belonging to other inode) or it can
   be a region outside the leaf boundaries, if the leaf is full and
   path->slots[0] matches the number of items in the leaf. So we end
   up either successfully sending a utimes operation, which is fine
   and irrelevant because the old parent (inode 261) will end up being
   deleted later, or we end up doing an invalid memory access tha
   crashes the kernel.

So fix this by making apply_dir_move() issue utimes operations only for
parents that still exist in the send snapshot. In a separate patch we
will make send_utimes() return an error (-ENOENT) if the given inode
does not exists in the send snapshot.

Signed-off-by: Robbie Ko 
Signed-off-by: Filipe Manana 
[Rewrote change log to be more detailed and better organized]

Signed-off-by: Filipe Manana 
---
 fs/btrfs/send.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index e552c33..8b65396 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3270,8 +3270,18 @@ finish:
 * and old parent(s).
 */
list_for_each_entry(cur, >update_refs, list) {
-   if (cur->dir == rmdir_ino)
+   /*
+* The parent inode might have been deleted in the send snapshot
+*/
+   ret = get_inode_info(sctx->send_root, cur->dir, NULL,
+NULL, NULL, NULL, NULL, NULL);
+   if (ret == -ENOENT) {
+   ret = 0;
continue;
+   }
+   if (ret < 0)
+   goto out;
+
ret = send_utimes(sctx, cur->dir, cur->dir_gen);
if (ret < 0)
goto out;
-- 
2.7.0.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] Btrfs: send, avoid incorrect leaf accesses when sending utimes operations

2016-07-11 Thread fdmanana
From: Filipe Manana 

The caller of send_utimes() is supposed to be sure that the inode number
it passes to this function does actually exists in the send snapshot.
However due to logic/algorithm bugs (such as the one fixed by the patch
titled "Btrfs: send, fix invalid leaf accesses due to incorrect utimes
operations"), this might not be the case and when that happens it makes
send_utimes() access use an unrelated leaf item as the target inode item
or access beyond a leaf's boundaries (when the leaf is full and
path->slots[0] matches the number of items in the leaf).

So if the call to btrfs_search_slot() done by send_utimes() does not find
the inode item, just make sure send_utimes() returns -ENOENT and does not
silently accesses unrelated leaf items or does invalid leaf accesses, also
allowing us to easialy and deterministically catch such algorithmic/logic
bugs.

Signed-off-by: Filipe Manana 
---
 fs/btrfs/send.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 8b65396..2db8dc8 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -2502,6 +2502,8 @@ verbose_printk("btrfs: send_utimes %llu\n", ino);
key.type = BTRFS_INODE_ITEM_KEY;
key.offset = 0;
ret = btrfs_search_slot(NULL, sctx->send_root, , path, 0, 0);
+   if (ret > 0)
+   ret = -ENOENT;
if (ret < 0)
goto out;
 
-- 
2.7.0.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] Btrfs: send, add missing error check for calls to path_loop()

2016-07-11 Thread fdmanana
From: Filipe Manana 

The function path_loop() can return a negative integer, signaling an
error, 0 if there's no path loop and 1 if there's a path loop. We were
treating any non zero values as meaning that a path loop exists. Fix
this by explicitly checking for errors and gracefully return them to
user space.

Signed-off-by: Filipe Manana 
---
 fs/btrfs/send.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 993e1ba..0dc05bb 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3200,6 +3200,8 @@ static int apply_dir_move(struct send_ctx *sctx, struct 
pending_dir_move *pm)
 
sctx->send_progress = sctx->cur_ino + 1;
ret = path_loop(sctx, name, pm->ino, pm->gen, );
+   if (ret < 0)
+   goto out;
if (ret) {
LIST_HEAD(deleted_refs);
ASSERT(ancestor > BTRFS_FIRST_FREE_OBJECTID);
-- 
2.7.0.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] Btrfs: incremental send, fix invalid paths for rename operations

2016-07-11 Thread fdmanana
From: Filipe Manana 

Example scenario:

  Parent snapshot:

  .   (ino 277)
  | tmp/  (ino 278)
  | pre/  (ino 280)
  |  | wait_dir/  (ino 281)
  |
  | desc/ (ino 282)
  | ance/ (ino 283)
  |   | below_ance/   (ino 279)
  |
  | other_dir/(ino 284)

  Send snapshot:

  .   (ino 277)
  | tmp/  (ino 278)
 | other_dir/ (ino 284)
   | below_ance/  (ino 279)
   || pre/(ino 280)
   |
   | wait_dir/(ino 281)
  | desc/ (ino 282)
  | ance/ (ino 283)

While computing the send stream the following steps happen:

1) While processing inode 279 we end up delaying its rename operation
   because its new parent in the send snapshot, inode 284, was not
   yet processed and therefore not yet renamed;

2) Later when processing inode 280 we end up renaming it immediately to
   "ance/below_once/pre" and not delay its rename operation because its
   new parent (inode 279 in the send snapshot) has its rename operation
   delayed and inode 280 is not an encestor of inode 279 (its parent in
   the send snapshot) in the parent snapshot;

3) When processing inode 281 we end up delaying its rename operation
   because its new parent in the send snapshot, inode 284, was not yet
   processed and therefore not yet renamed;

4) When processing inode 282 we do not delay its rename operation because
   its parent in the send snapshot, inode 281, already has its own rename
   operation delayed and our current inode (282) is not an ancestor of
   inode 281 in the parent snapshot. Therefore inode 282 is renamed to
   "ance/below_ance/pre/wait_dir";

5) When processing inode 283 we realize that we can rename it because one
   of its ancestors in the send snapshot, inode 281, has its rename
   operation delayed and inode 283 is not an ancestor of inode 281 in the
   parent snapshot. So a rename operation to rename inode 283 to
   "ance/below_ance/pre/wait_dir/desc/ance" is issued. This path is
   invalid due to a missing path building loop that was undetected by
   the incremental send implementation, as inode 283 ends up getting
   included twice in the path (once with its path in the parent snapshot).
   Therefore its rename operation must wait before the ancestor inode 284
   is renamed.

Fix this by not terminating the rename dependency checks when we find an
ancestor, in the send snapshot, that has its rename operation delayed. So
that we continue doing the same checks if the current inode is not an
ancestor, in the parent snapshot, of an ancestor in the send snapshot we
are processing in the loop.

The problem and reproducer were reported by Robbie Ko, as part of a patch
titled "Btrfs: incremental send, avoid ancestor rename to descendant".
However the fix was unnecessarily complicated and can be addressed with
much less code and effort.

Reported-by: Robbie Ko 
Signed-off-by: Filipe Manana 
---
 fs/btrfs/send.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 0dc05bb..3d6a4dd 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3534,7 +3534,8 @@ static int wait_for_parent_move(struct send_ctx *sctx,
ret = is_ancestor(sctx->parent_root,
  sctx->cur_ino, sctx->cur_inode_gen,
  ino, path_before);
-   break;
+   if (ret)
+   break;
}
 
fs_path_reset(path_before);
-- 
2.7.0.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] Btrfs: send, fix failure to move directories with the same name around

2016-07-11 Thread fdmanana
From: Robbie Ko 

When doing an incremental send we can end up not moving directories that
have the same name. This happens when the same parent directory has
different child directories with the same name in the parent and send
snapshots.

For example, consider the following scenario:

  Parent snapshot:

  .   (ino 256)
  | d/(ino 257)
  | |--- p1/  (ino 258)
  |
  | p1/   (ino 259)

  Send snapshot:

  .(ino 256)
  |--- d/  (ino 257)
   |--- p1/(ino 259)
 |--- p1/  (ino 258)

The directory named "d" (inode 257) has in both snapshots an entry with
the name "p1" but it refers to different inodes in both snapshots (inode
258 in the parent snapshot and inode 259 in the send snapshot). When
attempting to move inode 258, the operation is delayed because its new
parent, inode 259, was not yet moved/renamed (as the stream is currently
processing inode 258). Then when processing inode 259, we also end up
delaying its move/rename operation so that it happens after inode 258 is
moved/renamed. This decision to delay the move/rename rename operation
of inode 259 is due to the fact that the new parent inode (257) still
has inode 258 as its child, which has the same name has inode 259. So
we end up with inode 258 move/rename operation waiting for inode's 259
move/rename operation, which in turn it waiting for inode's 258
move/rename. This results in ending the send stream without issuing
move/rename operations for inodes 258 and 259 and generating the
following warnings in syslog/dmesg:

[148402.979747] [ cut here ]
[148402.980588] WARNING: CPU: 14 PID: 4117 at fs/btrfs/send.c:6177 
btrfs_ioctl_send+0xe03/0xe51 [btrfs]
[148402.981928] Modules linked in: btrfs crc32c_generic xor raid6_pq 
acpi_cpufreq tpm_tis ppdev tpm parport_pc psmouse parport sg pcspkr i2c_piix4 
i2c_core evdev processor serio_raw button loop autofs4 ext4 crc16 jbd2 mbcache 
sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci 
virtio_ring virtio e1000 scsi_mod floppy [last unloaded: btrfs]
[148402.986999] CPU: 14 PID: 4117 Comm: btrfs Tainted: GW   
4.6.0-rc7-btrfs-next-31+ #1
[148402.988136] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by 
qemu-project.org 04/01/2014
[148402.988136]   88022139fca8 8126b42c 

[148402.988136]   88022139fce8 81052b14 
18212139fac8
[148402.988136]  88022b0db400  0001 

[148402.988136] Call Trace:
[148402.988136]  [] dump_stack+0x67/0x90
[148402.988136]  [] __warn+0xc2/0xdd
[148402.988136]  [] warn_slowpath_null+0x1d/0x1f
[148402.988136]  [] btrfs_ioctl_send+0xe03/0xe51 [btrfs]
[148402.988136]  [] btrfs_ioctl+0x14f/0x1f81 [btrfs]
[148402.988136]  [] ? arch_local_irq_save+0x9/0xc
[148402.988136]  [] ? __lock_is_held+0x3c/0x57
[148402.988136]  [] vfs_ioctl+0x18/0x34
[148402.988136]  [] do_vfs_ioctl+0x550/0x5be
[148402.988136]  [] ? __fget+0x6b/0x77
[148402.988136]  [] ? __fget_light+0x62/0x71
[148402.988136]  [] SyS_ioctl+0x57/0x79
[148402.988136]  [] entry_SYSCALL_64_fastpath+0x18/0xa8
[148402.988136]  [] ? trace_hardirqs_off_caller+0x3f/0xaa
[148403.011373] ---[ end trace a4539270c8056f8b ]---
[148403.012296] [ cut here ]
[148403.013071] WARNING: CPU: 14 PID: 4117 at fs/btrfs/send.c:6194 
btrfs_ioctl_send+0xe19/0xe51 [btrfs]
[148403.014447] Modules linked in: btrfs crc32c_generic xor raid6_pq 
acpi_cpufreq tpm_tis ppdev tpm parport_pc psmouse parport sg pcspkr i2c_piix4 
i2c_core evdev processor serio_raw button loop autofs4 ext4 crc16 jbd2 mbcache 
sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci 
virtio_ring virtio e1000 scsi_mod floppy [last unloaded: btrfs]
[148403.019708] CPU: 14 PID: 4117 Comm: btrfs Tainted: GW   
4.6.0-rc7-btrfs-next-31+ #1
[148403.020104] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by 
qemu-project.org 04/01/2014
[148403.020104]   88022139fca8 8126b42c 

[148403.020104]   88022139fce8 81052b14 
18322139fac8
[148403.020104]  88022b0db400  0001 

[148403.020104] Call Trace:
[148403.020104]  [] dump_stack+0x67/0x90
[148403.020104]  [] __warn+0xc2/0xdd
[148403.020104]  [] warn_slowpath_null+0x1d/0x1f
[148403.020104]  [] btrfs_ioctl_send+0xe19/0xe51 [btrfs]
[148403.020104]  [] btrfs_ioctl+0x14f/0x1f81 [btrfs]
[148403.020104]  [] ? arch_local_irq_save+0x9/0xc
[148403.020104]  [] ? __lock_is_held+0x3c/0x57
[148403.020104]  [] vfs_ioctl+0x18/0x34
[148403.020104]  [] do_vfs_ioctl+0x550/0x5be
[148403.020104]  [] ? __fget+0x6b/0x77
[148403.020104]  [] ? __fget_light+0x62/0x71
[148403.020104]  [] SyS_ioctl+0x57/0x79
[148403.020104]  [] entry_SYSCALL_64_fastpath+0x18/0xa8
[148403.020104]  [] ? 

Re: [RESEND PATCH] btrfs: Handle uninitialised inode eviction

2016-07-11 Thread Nikolay Borisov


On 07/11/2016 11:48 AM, David Sterba wrote:
> On Mon, Jul 11, 2016 at 09:43:09AM +0300, Nikolay Borisov wrote:
>> The code flow in btrfs_new_inode allows for btrfs_evict_inode to be
>> called with not fully initialised inode (e.g. ->root member not
>> being set). This can happen when btrfs_set_inode_index in
>> btrfs_new_inode fails, which in turn would call iput for the newly
>> allocated inode. This in turn leads to vfs calling into btrfs_evict_inode.
>> This leads to null pointer dereference. To handle this situation check 
>> whether
>> the passed inode has root set and just free it in case it doesn't.
>>
>> Signed-off-by: Nikolay Borisov 
>> Reviewed-by: Josef Bacik 
>> ---
>>  fs/btrfs/inode.c | 9 -
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> Hello, 
>>
>> I belive this is fixes the issue reported in 
>> http://thread.gmane.org/gmane.comp.file-systems.btrfs/57809
> 
> There's some time left before 4.7 release, so I'll send another pull
> request, including this patch.

Now that I think about it, shouldn't this also be queued for stable as well?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND PATCH] btrfs: Fix slab accounting flags

2016-07-11 Thread David Sterba
On Mon, Jul 11, 2016 at 12:12:21AM +0300, Nikolay Borisov wrote:
> BTRFS is using a variety of slab caches to satisfy internal needs.
> Those slab caches are always allocated with the SLAB_RECLAIM_ACCOUNT,
> meaning allocations from the caches are going to be accounted as
> SReclaimable. At the same time btrfs is not registering any shrinkers
> whatsoever, thus preventing memory from the slabs to be shrunk. This
> means those caches are not in fact reclaimable.
> 
> To fix this remove the SLAB_RECLAIM_ACCOUNT on all caches apart from the
> inode cache, since this one is being freed by the generic VFS super_block
> shrinker. Also set the transaction related caches as SLAB_TEMPORARY,
> to better document the lifetime of the objects (it just translates
> to SLAB_RECLAIM_ACCOUNT).
> 
> Signed-off-by: Nikolay Borisov 
> Reviewed-by: David Sterba 

FYI, patch is queued for 4.8.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A lot warnings in dmesg while running thunderbird

2016-07-11 Thread Chris Mason

On Mon, Jul 11, 2016 at 11:28:01AM +0530, Chandan Rajendra wrote:

On Friday, July 08, 2016 12:02:35 PM Chris Mason wrote:


Can you please run the attached test program:

gcc -o short-write short-write.c -lpthread
./short-write some-new-file-on-btrfs

I want to see if you're triggering the same problem we've tried to fix,
or something else.



Hi Chris,

I am able to reproduce the issue with the 'short-write' program. But before
the call trace associated with btrfs_destroy_inode(), I see the following call
trace ...

[ cut here ]
WARNING: CPU: 2 PID: 2311 at 
/home/chandan/repos/linux/fs/btrfs/extent-tree.c:4303 
btrfs_free_reserved_data_space_noquota+0xe8/0x100
Modules linked in:


[ ... ]



I will continue to debug and find out the root cause.


Thanks Chandan, I'm able to reproduce the same thing more easily by changing
short-write to have:

#define ROUNDS 4096

and by getting rid of the sync_file_range call.

Still nailing down where the accounting is going wrong, so any help is
appreciated.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: can't use btrfs on USB-stick (write errors)

2016-07-11 Thread David Sterba
On Tue, Jun 21, 2016 at 11:54:57PM +0900, Tomasz Chmielewski wrote:
> I've tried to use btrfs on USB-stick, but unfortunately it fails with 
> write errors.
> 
> The below is for kernel 4.4; I've tried with 4.6.2, and it fails in a 
> similar way.
> 
> 
> I'm not sure how to reliably reproduce it, but it seems to me it has 
> something to do with:
> 
> - plenty of random writes
> 
> - USB stick sometimes "very slow to reply" (may be USB-stick dependent)
> 
> 
> The closest thing to reproduce pretty much within an hour was launching 
> ktorrent and downloading a couple of Linux isos.
> 
> ext4 does not fail under similar load. USB stick is brand new, writing 
> data with dd is successful, reading data with dd is successful (whether 
> it's ext4, btrfs or raw partition); it only seems to fail with btrfs and 
> plenty of random writes.

I think it's the write pattern that does not go well with the usb stick.
>From the logs provided, it's an EIO during transaction commit, that's
usually fatal so the filesystem switches to read-only.

The dd write pattern (and consecutive writes) do not stress the flash
translation layer or it's internal block allocation and reclaim. As you
say it's not easy to reproduce, I'm more inclined to blame the usb
stick.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: dedupe-inband enable/reconfigure: force option does not take argument

2016-07-11 Thread David Sterba
On Fri, Jul 08, 2016 at 03:11:42PM +0900, Satoru Takeuchi wrote:
> ---
> This patch can be applied to integration-20160704(2355a7e5dcdf122d1924)

I will not apply this separately, this should be folded to the proper
patchset.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: du: fix to skip not btrfs dir/file

2016-07-11 Thread David Sterba
On Wed, Jul 06, 2016 at 05:42:33PM +0200, Holger Hoffstätte wrote:
> >At the absolute minimum, I think that these messages should go to
> > stderr (like du does when it deosn't have permissions), and should go
> > away with -q. They're still irritating, but at least you can get rid
> > of them easily.
> 
> If anything this should require a --verbose, not the other way
> around. Maybe instead of breaking the output just indicate the
> special status via "-- --" values, or default to 0.00?

The '--' replacement values sound good to me (0.00 would be confusing).

> Still, we're explicitly only interested in btrfs stuff and not
> anything else, so printing non-information can only yield noise.

I think the extra lines would happen mostly in scenario where
directories are mountpoints and thus have a potentially different
filesystem. If we print just a single line for a mountpoint (and not
descend to it), I don't think it will create that much noise in the
output.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs module does not load on sparc64

2016-07-11 Thread David Sterba
On Thu, Jul 07, 2016 at 05:29:35PM +0300, Anatoly Pugachev wrote:
> Compiled linux kernel (git version 4.7.0-rc6+) using my own kernel
> config file, enabling :
> 
> CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y
> CONFIG_BTRFS_DEBUG=y
> CONFIG_BTRFS_ASSERT=y
> 
> and now I can't load btrfs module:
> 
> # modprobe btrfs
> modprobe: ERROR: could not insert 'btrfs': Invalid argument
> 
> 
> and in logs (and on console):
> 
> [1897399.942697] Btrfs loaded, crc32c=crc32c-generic, debug=on, assert=on
> [1897400.024645] BTRFS: selftest: sectorsize: 8192  nodesize: 8192
> [1897400.098089] BTRFS: selftest: Running btrfs free space cache tests
> [1897400.175863] BTRFS: selftest: Running extent only tests
> [1897400.241871] BTRFS: selftest: Running bitmap only tests
> [1897400.307877] BTRFS: selftest: Running bitmap and extent tests
> [1897400.380329] BTRFS: selftest: Running space stealing from bitmap to extent
> [1897400.470517] BTRFS: selftest: Free space cache tests finished
> [1897400.542875] BTRFS: selftest: Running extent buffer operation tests
> [1897400.621710] BTRFS: selftest: Running btrfs_split_item tests
> [1897400.692929] BTRFS: selftest: Running extent I/O tests
> [1897400.757459] BTRFS: selftest: Running find delalloc tests
> [1897401.082670] BTRFS: selftest: Running extent buffer bitmap tests
> [1897401.161223] BTRFS: selftest: Setting straddling pages failed

The sanity tests fail at some point with EINVAL, so the module will fail
to load in turn. Looking at the test itself (__test_eb_bitmaps), it does
no make any assumptions about page size etc. so this "should work".
Taking powerpc with 64k page size for another reference where the tests
work, I'm not sure what exactly could be wrong here.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND PATCH] btrfs: Handle uninitialised inode eviction

2016-07-11 Thread David Sterba
On Mon, Jul 11, 2016 at 09:43:09AM +0300, Nikolay Borisov wrote:
> The code flow in btrfs_new_inode allows for btrfs_evict_inode to be
> called with not fully initialised inode (e.g. ->root member not
> being set). This can happen when btrfs_set_inode_index in
> btrfs_new_inode fails, which in turn would call iput for the newly
> allocated inode. This in turn leads to vfs calling into btrfs_evict_inode.
> This leads to null pointer dereference. To handle this situation check whether
> the passed inode has root set and just free it in case it doesn't.
> 
> Signed-off-by: Nikolay Borisov 
> Reviewed-by: Josef Bacik 
> ---
>  fs/btrfs/inode.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> Hello, 
> 
> I belive this is fixes the issue reported in 
> http://thread.gmane.org/gmane.comp.file-systems.btrfs/57809

There's some time left before 4.7 release, so I'll send another pull
request, including this patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: add option to run balance as daemon

2016-07-11 Thread Tomasz Torcz
On Tue, Jun 21, 2016 at 11:16:59AM -0400, Austin S. Hemmelgarn wrote:
> Currently, balance operations are run synchronously in the foreground.
> This is nice for interactive management, but is kind of crappy when you
> start looking at automation and similar things.
 
  It can be done with simplest systemd unit file:
btrfs-balance@.service:
---
[Unit]
Description=btrfs balance for %I

[Service]
ExecStart=/usr/bin/btrfs balance start %I
ExecStop=/usr/bin/btrfs balance cancel %I
---

  It automates quite nicely and needs no additional code.

-- 
Tomasz Torcz"Funeral in the morning, IDE hacking
xmpp: zdzich...@chrome.plin the afternoon and evening." - Alan Cox

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RESEND PATCH] btrfs: Handle uninitialised inode eviction

2016-07-11 Thread Nikolay Borisov
The code flow in btrfs_new_inode allows for btrfs_evict_inode to be
called with not fully initialised inode (e.g. ->root member not
being set). This can happen when btrfs_set_inode_index in
btrfs_new_inode fails, which in turn would call iput for the newly
allocated inode. This in turn leads to vfs calling into btrfs_evict_inode.
This leads to null pointer dereference. To handle this situation check whether
the passed inode has root set and just free it in case it doesn't.

Signed-off-by: Nikolay Borisov 
Reviewed-by: Josef Bacik 
---
 fs/btrfs/inode.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

Hello, 

I belive this is fixes the issue reported in 
http://thread.gmane.org/gmane.comp.file-systems.btrfs/57809

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4421954720b8..b51723811d01 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5159,11 +5159,18 @@ void btrfs_evict_inode(struct inode *inode)
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_block_rsv *rsv, *global_rsv;
int steal_from_global = 0;
-   u64 min_size = btrfs_calc_trunc_metadata_size(root, 1);
+   u64 min_size;
int ret;
 
trace_btrfs_inode_evict(inode);
 
+   if (!root) {
+   kmem_cache_free(btrfs_inode_cachep, BTRFS_I(inode));
+   return;
+   }
+
+   min_size = btrfs_calc_trunc_metadata_size(root, 1);
+
evict_inode_truncate_pages(inode);
 
if (inode->i_nlink &&
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html