Re: btrfs send extremely slow (almost stuck)

2016-08-30 Thread Qu Wenruo



At 08/31/2016 09:35 AM, Jeff Mahoney wrote:

On 8/28/16 10:12 PM, Qu Wenruo wrote:



At 08/29/2016 10:11 AM, Qu Wenruo wrote:



At 08/28/2016 11:38 AM, Oliver Freyermuth wrote:

Dear btrfs experts,

I just tried to make use of btrfs send / receive for incremental
backups (using btrbk to simplify the process).
It seems that on my two machines, btrfs send gets stuck after
transferring some GiB - it's not fully halted, but instead of making
full use of the available I/O, I get something < 500 kiB on average,
which are just some "full speed spikes" with many seconds / minutes of
no I/O in between.

During this "halting", btrfs send eats one full CPU core.
A "perf top" shows this is spent in "find_parent_nodes" and
"__merge_refs" inside the kernel.
I am using btrfs-progs 4.7 and kernel 4.7.0.


Unknown bug, while unfortunately no good idea to solve yet.


Sorry, known bug, not unknown


I'm working on a patch to replace the lists with a pair of trees that
get merged after filling in the missing parents.


Wow, nice.
I was planning to do it but didn't get started yet.

The list is really causing the problem.
Converting to rb_tree should at least reduce the O(n^3)~O(n^4)
to O(n^2logn).


While the backref walk call in the loop of iterating every file extents 
is never a good idea for me, I'll still try to fix at the send side as 
an RFC patch too.


Thanks,
Qu


The reflink xfstests don't complete, ever.  btrfs/130 triggers soft
lockups but do complete eventually -- and that's only with ~4k list
elements.

-Jeff





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send extremely slow (almost stuck)

2016-08-30 Thread Jeff Mahoney
On 8/28/16 10:12 PM, Qu Wenruo wrote:
> 
> 
> At 08/29/2016 10:11 AM, Qu Wenruo wrote:
>>
>>
>> At 08/28/2016 11:38 AM, Oliver Freyermuth wrote:
>>> Dear btrfs experts,
>>>
>>> I just tried to make use of btrfs send / receive for incremental
>>> backups (using btrbk to simplify the process).
>>> It seems that on my two machines, btrfs send gets stuck after
>>> transferring some GiB - it's not fully halted, but instead of making
>>> full use of the available I/O, I get something < 500 kiB on average,
>>> which are just some "full speed spikes" with many seconds / minutes of
>>> no I/O in between.
>>>
>>> During this "halting", btrfs send eats one full CPU core.
>>> A "perf top" shows this is spent in "find_parent_nodes" and
>>> "__merge_refs" inside the kernel.
>>> I am using btrfs-progs 4.7 and kernel 4.7.0.
>>
>> Unknown bug, while unfortunately no good idea to solve yet.
> 
> Sorry, known bug, not unknown

I'm working on a patch to replace the lists with a pair of trees that
get merged after filling in the missing parents.

The reflink xfstests don't complete, ever.  btrfs/130 triggers soft
lockups but do complete eventually -- and that's only with ~4k list
elements.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


[PATCH] btrfsprogs: only install udev rules for udev >= 190

2016-08-30 Thread Jeff Mahoney
Prior to udev v190, there was no btrfs builtin helper.  Installing it on
systems with an older udev will cause problems.

Signed-off-by: Jeff Mahoney 
---

 configure.ac |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index 97e89f2..8fd8f42 100644
--- a/configure.ac
+++ b/configure.ac
@@ -161,7 +161,13 @@ PKG_STATIC(UUID_LIBS_STATIC, [uuid])
 PKG_CHECK_MODULES(ZLIB, [zlib])
 PKG_STATIC(ZLIB_LIBS_STATIC, [zlib])
 
-UDEVDIR="$(pkg-config udev --variable=udevdir)"
+# udev v190 introduced the btrfs builtin and a udev rule to use it.
+# Our udev rule gives us the friendly dm names but isn't required (or valid)
+# on earlier releases.
+UDEVDIR=
+if pkg-config udev --atleast-version 190; then
+   UDEVDIR="$(pkg-config udev --variable=udevdir)"
+fi
 AC_SUBST(UDEVDIR)
 
 dnl lzo library does not provide pkg-config, let use classic way

-- 
Jeff Mahoney
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: fuzz-test: Add image for wrong chunk item in root tree

2016-08-30 Thread Qu Wenruo



At 08/30/2016 10:17 PM, David Sterba wrote:

On Tue, Aug 30, 2016 at 10:15:50AM +0800, Qu Wenruo wrote:

From: Lukas Lueg 

Reported by Lukas and the same image from him.

DATA_RELOC tree's key type is modifed to CHUNK_ITEM, causing btrfsck
interpret it as CHUNK_ITEM and cause 0 num_stripes.

Add the image to fuzz-test.

Signed-off-by: Lukas Lueg 


BTW I think you should put Reported-by here, that's the reporter's
credit. The signed-off from you is for your contribution to the git
repository (packing the image, documenting the origin etc). I've fixed
that in the commit.



Thanks for the fix.

I'll keep this in mind.

Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recommendation on raid5 drive error resolution

2016-08-30 Thread Chris Murphy
On Tue, Aug 30, 2016 at 3:23 PM, Gareth Pye  wrote:
> On Wed, Aug 31, 2016 at 4:28 AM, Chris Murphy  wrote:
>> But I'd try a newer kernel before you
>> give up on it.
>
>
> Any recommendations on liveCDs that have recent kernels & btrfs tools?
> For no apparent reason system isn't booting normally either, and I'm
> reluctant to fix that before at least confirming the things I at least
> partially care about have a recent backup.

Fedora 25 Alpha released today with kernel 4.8rc2 and btrfs-progs 4.6.1.
https://getfedora.org/en/workstation/prerelease/

The top green "Download" button offers GNOME. If you want something
smaller, on the right hand side are netinstall images with the same
kernel and progs, but no GUI. You can choose the Troubleshooting menu,
and then the Rescue a Fedora System option. It boots, and then you're
at a text UI where you can just get to a shell, option 3.

The easiest way to create a USB stick is with dd and it'll boot
practically anything, BIOS, UEFI, even Macs. Not all wireless firmware
is included in these media, if you have a wired connection it'll be
easier to get dmesg and and contents of btrfs check off. If you opt
for the larger image (GNOME), it's a bit easier to get the terminal
output into a file and either scp it to another computer or you can
also use fpaste  and it'll spit back a URL where it uploaded
the text.




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recommendation on raid5 drive error resolution

2016-08-30 Thread Gareth Pye
Or I could just once again select the right boot device in the bios. I
think I want some new hardware :)

On Wed, Aug 31, 2016 at 7:23 AM, Gareth Pye  wrote:
> On Wed, Aug 31, 2016 at 4:28 AM, Chris Murphy  wrote:
>> But I'd try a newer kernel before you
>> give up on it.
>
>
> Any recommendations on liveCDs that have recent kernels & btrfs tools?
> For no apparent reason system isn't booting normally either, and I'm
> reluctant to fix that before at least confirming the things I at least
> partially care about have a recent backup.
>
> --
> Gareth Pye - blog.cerberos.id.au
> Level 2 MTG Judge, Melbourne, Australia



-- 
Gareth Pye - blog.cerberos.id.au
Level 2 MTG Judge, Melbourne, Australia
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recommendation on raid5 drive error resolution

2016-08-30 Thread Gareth Pye
On Wed, Aug 31, 2016 at 4:28 AM, Chris Murphy  wrote:
> But I'd try a newer kernel before you
> give up on it.


Any recommendations on liveCDs that have recent kernels & btrfs tools?
For no apparent reason system isn't booting normally either, and I'm
reluctant to fix that before at least confirming the things I at least
partially care about have a recent backup.

-- 
Gareth Pye - blog.cerberos.id.au
Level 2 MTG Judge, Melbourne, Australia
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: `btrfs dev del` fails with `No space left on device`

2016-08-30 Thread ojab //
On Tue, Aug 30, 2016 at 5:13 PM, Chris Murphy  wrote:
> On Tue, Aug 30, 2016 at 4:22 AM, ojab //  wrote:
>> On Mon, Aug 29, 2016 at 9:05 PM, Chris Murphy  
>> wrote:
>>> On Mon, Aug 29, 2016 at 10:04 AM, ojab //  wrote:
>>> What do you get for 'btrfs fi us '
>>
>> $ sudo btrfs fi us /mnt/xxx/
>> Overall:
>> Device size:  3.64TiB
>> Device allocated: 1.82TiB
>> Device unallocated:   1.82TiB
>> Device missing: 0.00B
>> Used: 1.81TiB
>> Free (estimated): 1.83TiB (min: 943.55GiB)
>> Data ratio:  1.00
>> Metadata ratio:  2.00
>> Global reserve: 512.00MiB (used: 0.00B)
>>
>> Data,RAID0: Size:1.81TiB, Used:1.80TiB
>>/dev/sdb1928.48GiB
>>/dev/sdc1928.48GiB
>>
>> Metadata,RAID1: Size:3.00GiB, Used:2.15GiB
>>/dev/sdb1  3.00GiB
>>/dev/sdc1  3.00GiB
>>
>> System,RAID1: Size:32.00MiB, Used:176.00KiB
>>/dev/sdb1 32.00MiB
>>/dev/sdc1 32.00MiB
>>
>> Unallocated:
>>/dev/sdb1  1.01MiB
>>/dev/sdc1  1.00MiB
>>/dev/sdd1  1.82TiB
>
>
> The confusion is understandable because sdd1 is bigger than sdc1, so
> why can't everything on sdc1 be moved to sdd1? Well, dev add > dev del
> doesn't really do that, it's going to end up rewriting metadata to
> sdb1 also, and there isn't enough space. Yes, there's 800MiB of unused
> space in metadata chunks on sdb1 and sdc1, it should be enough (?) but
> clearly it wants more than this for whatever reason. You could argue
> it's a bug or some suboptimal behavior, but because this is a 99% full
> file system, I'm willing to be it's a low priority bug. Because this
> is raid0 you really need to add two devices, not just one.
>
>> I don't quite understand what exactly btrfs is trying to do: I assume
>> that block groups should be relocated to the new/empty drive,
>
> There is a scant chance 'btrfs replace' will work better here. But
> still the real problem remains, even if you replace sdc1 with sdd1,
> sdb1 is still 99% full which in effect makes the file system 99% full
> because it can't do anymore raid0 on sdb1, and it's not possible to do
> raid0 chunks on a single sdd1 device.
>
> If you can't add a 4th drive, you're going to have to convert to
> single profile. Keep all three drives attached, 'btrfs balance start
> -dconvert=single' and then once that's complete you should be able to
> remove /dev/sdc1, although this will take a while because first
> conversion will use space on all three drives, and then the removable
> of sdc1 will have to copy chunks off before it can be removed.
>
>> but
>> during the delete `btrfs fi us` shows
>> Unallocated:
>> /dev/sdc1 16.00EiB
>
> Known bug, also happens when resizing and conversions.
>
>
>
>> so deleted partition is counted as maximum possible empty drive and
>> blocks are relocated to it instead of new/empty drive? (kernel-4.7.2 &
>> btrfs-progs-4.7.1 here)
>> Is there any way to see where and why block groups are relocated
>> during `delete`?
>
> The two reasons this isn't working is a.) it's 99% full already and
> b.) it's raid0, so merely adding one device isn't sufficient. It's
> probably too full even to do a 3 device balance to restripe raid0
> across 3 devices, which is still inefficient because it would leave
> 50% of the space on sdd as unusable. To do this with uneven devices
> and use all the space, you're going to have to use single profile.
>
>
>
> --
> Chris Murphy

Ah, thanks for the elaboration, it makes things much more meaningful now!

//wbr ojab
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl

2016-08-30 Thread Darrick J. Wong
[add a few more relevant lists to cc]

On Mon, Aug 29, 2016 at 03:34:11PM -0600, Andreas Dilger wrote:
> On Aug 25, 2016, at 5:26 PM, Darrick J. Wong  wrote:
> > 
> > Document the new XFS_IOC_GETFSMAP ioctl that returns the physical
> > layout of a (disk-based) filesystem.
> > 
> > Signed-off-by: Darrick J. Wong 
> > ---
> > man2/ioctl_xfs_ioc_getfsmap.2 |  294 
> > +
> > 1 file changed, 294 insertions(+)
> > create mode 100644 man2/ioctl_xfs_ioc_getfsmap.2
> > 
> > 
> > diff --git a/man2/ioctl_xfs_ioc_getfsmap.2 b/man2/ioctl_xfs_ioc_getfsmap.2
> > new file mode 100644
> > index 000..0d9ed47
> > --- /dev/null
> > +++ b/man2/ioctl_xfs_ioc_getfsmap.2
> > @@ -0,0 +1,294 @@
> > +.\" Copyright (c) 2016, Oracle.  All rights reserved.
> > +.\"
> > +.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
> > +.\" This is free documentation; you can redistribute it and/or
> > +.\" modify it under the terms of the GNU General Public License as
> > +.\" published by the Free Software Foundation; either version 2 of
> > +.\" the License, or (at your option) any later version.
> > +.\"
> > +.\" The GNU General Public License's references to "object code"
> > +.\" and "executables" are to be interpreted as the output of any
> > +.\" document formatting or typesetting system, including
> > +.\" intermediate and printed output.
> > +.\"
> > +.\" This manual is distributed in the hope that it will be useful,
> > +.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +.\" GNU General Public License for more details.
> > +.\"
> > +.\" You should have received a copy of the GNU General Public
> > +.\" License along with this manual; if not, see
> > +.\" .
> > +.\" %%%LICENSE_END
> > +.TH IOCTL-XFS_IOC_GETFSMAP 2 2016-07-20 "Linux" "Linux Programmer's Manual"
> > +.SH NAME
> > +ioctl_xfs_ioc_getfsmap \- retrieve the physical layout of the filesystem
> > +.SH SYNOPSIS
> > +.br
> > +.B #include 
> > +.br
> > +.B #include 
> > +.sp
> > +.BI "int ioctl(int " fd ", XFS_IOC_GETFSMAP, struct getfsmap * " arg );
> > +.SH DESCRIPTION
> > +This
> > +.BR ioctl (2)
> > +retrieves physical extent mappings for a filesystem.
> > +This information can be used to discover which files are mapped to a 
> > physical
> > +block, examine free space, or find known bad blocks, among other things.
> > +
> > +The sole argument to this ioctl should be an array of the following
> > +structure:
> > +.in +4n
> > +.nf
> > +
> > +struct getfsmap {
> > +   __u32   fmv_device; /* device id */
> > +   __u32   fmv_unused1;/* future use, must be zero */
> > +   __u64   fmv_block;  /* starting block */
> > +   __u64   fmv_owner;  /* owner id */
> > +   __u64   fmv_offset; /* file offset of segment */
> > +   __u64   fmv_length; /* length of segment, blocks */
> > +   __u32   fmv_oflags; /* mapping flags */
> > +   __u32   fmv_iflags; /* control flags (1st structure) */
> > +   __u32   fmv_count;  /* # of entries in array incl. input */
> > +   __u32   fmv_entries;/* # of entries filled in (output). */
> > +   __u64   fmv_unused2;/* future use, must be zero */
> > +};
> > +
> > +.fi
> > +.in
> > +The array must contain at least two elements.
> > +The first two array elements specify the lowest and highest reverse-mapping
> > +keys, respectively, for which userspace would like physical mapping
> > +information.
> > +A reverse mapping key consists of the tuple (device, block, owner, offset).
> > +The owner and offset fields are part of the key because some filesystems
> > +support sharing physical blocks between multiple files and
> > +therefore may return multiple mappings for a given physical block.
> > +
> > +.SS Fields of struct getfsmap
> > +.PP
> > +The
> > +.I fmv_device
> > +field contains a 32-bit cookie to uniquely identify the underlying storage
> > +device.
> > +If the
> > +.B FMV_HOF_DEV_T
> > +flag is set in the header's
> > +.I fmv_oflags
> > +field, this field contains a dev_t from which major and minor numbers can
> > +be extracted.
> > +If the flag is not set, this field contains a value that must be unique
> > +for each storage device.
> > +
> > +.PP
> > +The
> > +.I fmv_unused1
> > +field must be zero in the first two array elements.
> > +
> > +.PP
> > +The
> > +.I fmv_block
> > +field contains the 512-byte sector address of the extent.
> 
> Why would you use 512-byte sectors in a new interface?

I started designing XFS GETFSMAP with the intent of making it feel
familiar to anyone who'd already used the XFS GETBMAP interface.
Hence you pass in an array of struct getfsmap[N] where the start of
the array are key fields and the rest are filled out by the kernel,
and the units are 512-byte blocks.  As a result, some things 

Re: Recommendation on raid5 drive error resolution

2016-08-30 Thread Chris Murphy
On Tue, Aug 30, 2016 at 12:04 PM, Chris Murphy  wrote:

> One of us would have to go look in source to see what causes "[
> 163.612313] BTRFS: failed to read the system array on sdd" to appear

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/fs/btrfs/disk-io.c?id=refs/tags/v4.7.2
line 2864

And btrfs_read_sys_array is found here on 6587. So
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/fs/btrfs/volumes.c?id=refs/tags/v4.7.2

And then comparing your 4.4.13 to 4.7.2
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/diff/fs/btrfs/disk-io.c?id=v4.7.2=v4.4.13

There are changes in these areas but looks like they're mainly
printk's becoming btrfs_err. But I'd try a newer kernel before you
give up on it.

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/diff/fs/btrfs/volumes.c?id=v4.7.2=v4.4.13
More changes here too.

I suggest using btrfs-progs 4.5.3 or 4.6.1. You could also try 4.7 but
I'm getting some weird unexplained errors that only progs 4.7
complains about (clean scrubs, clean mounts, completely working file
system, but a buncha backref complaints from 4.7's btrfs check). But I
think the super-recover -v output should be reliable with any version
in the last ~year.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recommendation on raid5 drive error resolution

2016-08-30 Thread Chris Murphy
On Tue, Aug 30, 2016 at 3:58 AM, Gareth Pye  wrote:
> Okay, things aren't looking good. The FS wont mount for me:
> http://pastebin.com/sEEdRxsN

Try to mount with -o ro,degraded. I have no idea which device it'll
end up dropping, but it might at least get you a read only mount so
you can get stuff off - if you want - without modifying the file
system.

One of us would have to go look in source to see what causes "[
163.612313] BTRFS: failed to read the system array on sdd" to appear
for each device. It's suspicious that every drive produces that
message, and there are no fixup messages at all ever. So it sounds
like it's not even getting far enough to figure out what's bad and
reconstruct from parity. And I don't even see csum errors either,
which is also suspicious. It's like the boot strapping itself is
failing which kinda implicates superblocks?

What do you get for

btrfs rescue super-recover -v /dev/sdX ?



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid 5 to raid 1: balance hangs and scrub aborts. Is this salvageable?

2016-08-30 Thread Chris Murphy
Well it looks like metadata corruption, if it were just a case of data
corruption there'd be a file name path associated with the checksum
mismatch error; and in that case you'd be able to just delete the file
(or first extract a copy of it it with btrfs restore) and then it
wouldn't trigger csum errors anymore, and replace it with a confirmed
good copy later.

Also, btrfs check by itself doesn't check data extent csums, only
metadata csums. So again it sounds like there's a corrupt leaf and for
whatever reason it's not being fixed up by parity. And Btrfs really
obsinate about continuing on, especially read-write mounted, once
there's a csum error in metadata that it can't fix from parity or
another copy. And even XFS now will also go read only in similar
circumstances. It's just that its repair is a lot older and has some
ways to infer metadata when it's missing or bad, and Btrfs check
hasn't gotten that far yet. I think one of the future tests for btrfs
check when it hits csum errors is the ability to test if the outcome
is plausible if it assumes the metadata is actually good and it's the
csum itself that's bad. Or to iterate plausible variations of metadata
that cause the csum to match. And if it all works out then to CoW the
rebuilt metadata.

Anyway, fixing this would be really tedious with the existing tools.
You'd need to iterate some portions of this leaf to get it to match
its csum; or just change the csum to match the metadata as it is right
now, and then see if things work again. It's a reasonably good idea
though to just delete any associated file, because that will cause its
metadata to go away also, and the source of this problem. But often a
leaf or node will contain references to more than just one file.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Fuzzer test fix

2016-08-30 Thread Lukas Lueg
>> And special notes for the BUG_ON fix:
>> The fix just fixes a small corner, while tons of BUG_ON()/abort() are
>> still here and there.
>> We need quite a lot of boring work to handle them later.
>
> Yeah yeah, that's been neglected for a very long time. The kernel has
> the abort_transaction infrastructure, the userspace hasn't been updated
> in the same way. Long way to go, but every removed bug_on counts.

I've been holding back more images that reach abort() as their coming
is pretty clear and actually don't need any fuzzing: Every code path
that eventually leads to abort() will get executed sooner or later. As
of now, there are 50 unique code paths that reach abort(). Somebody
has to bite the bullet and add some error paths :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: does btrfs-receive use/compare the checksums from the btrfs-send side?

2016-08-30 Thread Sean Greenslade
On Sun, Aug 28, 2016 at 10:25:32PM +0200, Christoph Anton Mitterer wrote:
> On Sun, 2016-08-28 at 22:19 +0200, Adam Borowski wrote:
> > Transports over which you're likely to send a filesystem stream
> > already
> > protect against corruption.
> Well... in some cases,... but not always... just consider a plain old
> netcat...

Netcat uses TCP by default, so there is error correction and a
guaranteed-correct stream transfer there.

--Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: `btrfs dev del` fails with `No space left on device`

2016-08-30 Thread Chris Murphy
On Tue, Aug 30, 2016 at 4:22 AM, ojab //  wrote:
> On Mon, Aug 29, 2016 at 9:05 PM, Chris Murphy  wrote:
>> On Mon, Aug 29, 2016 at 10:04 AM, ojab //  wrote:
>> What do you get for 'btrfs fi us '
>
> $ sudo btrfs fi us /mnt/xxx/
> Overall:
> Device size:  3.64TiB
> Device allocated: 1.82TiB
> Device unallocated:   1.82TiB
> Device missing: 0.00B
> Used: 1.81TiB
> Free (estimated): 1.83TiB (min: 943.55GiB)
> Data ratio:  1.00
> Metadata ratio:  2.00
> Global reserve: 512.00MiB (used: 0.00B)
>
> Data,RAID0: Size:1.81TiB, Used:1.80TiB
>/dev/sdb1928.48GiB
>/dev/sdc1928.48GiB
>
> Metadata,RAID1: Size:3.00GiB, Used:2.15GiB
>/dev/sdb1  3.00GiB
>/dev/sdc1  3.00GiB
>
> System,RAID1: Size:32.00MiB, Used:176.00KiB
>/dev/sdb1 32.00MiB
>/dev/sdc1 32.00MiB
>
> Unallocated:
>/dev/sdb1  1.01MiB
>/dev/sdc1  1.00MiB
>/dev/sdd1  1.82TiB


The confusion is understandable because sdd1 is bigger than sdc1, so
why can't everything on sdc1 be moved to sdd1? Well, dev add > dev del
doesn't really do that, it's going to end up rewriting metadata to
sdb1 also, and there isn't enough space. Yes, there's 800MiB of unused
space in metadata chunks on sdb1 and sdc1, it should be enough (?) but
clearly it wants more than this for whatever reason. You could argue
it's a bug or some suboptimal behavior, but because this is a 99% full
file system, I'm willing to be it's a low priority bug. Because this
is raid0 you really need to add two devices, not just one.

> I don't quite understand what exactly btrfs is trying to do: I assume
> that block groups should be relocated to the new/empty drive,

There is a scant chance 'btrfs replace' will work better here. But
still the real problem remains, even if you replace sdc1 with sdd1,
sdb1 is still 99% full which in effect makes the file system 99% full
because it can't do anymore raid0 on sdb1, and it's not possible to do
raid0 chunks on a single sdd1 device.

If you can't add a 4th drive, you're going to have to convert to
single profile. Keep all three drives attached, 'btrfs balance start
-dconvert=single' and then once that's complete you should be able to
remove /dev/sdc1, although this will take a while because first
conversion will use space on all three drives, and then the removable
of sdc1 will have to copy chunks off before it can be removed.

> but
> during the delete `btrfs fi us` shows
> Unallocated:
> /dev/sdc1 16.00EiB

Known bug, also happens when resizing and conversions.



> so deleted partition is counted as maximum possible empty drive and
> blocks are relocated to it instead of new/empty drive? (kernel-4.7.2 &
> btrfs-progs-4.7.1 here)
> Is there any way to see where and why block groups are relocated
> during `delete`?

The two reasons this isn't working is a.) it's 99% full already and
b.) it's raid0, so merely adding one device isn't sufficient. It's
probably too full even to do a 3 device balance to restripe raid0
across 3 devices, which is still inefficient because it would leave
50% of the space on sdd as unusable. To do this with uneven devices
and use all the space, you're going to have to use single profile.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-08-30 Thread Ronan Arraes Jardim Chagas
Em Ter, 2016-08-30 às 10:44 -0600, Chris Murphy escreveu:
> It sounds related to read-only snapshots to me. I wonder if this
> system has something busy that's writing to a file, database, even
> maybe something just spamming journald, and then there's a read-only
> snapshot during the write, which then triggers the enospc.
> 

I saw the problem yesterday after lunch time (13:00) and the last
snapper snapshot was taken at 10:17:

snapper list
Tipo   | #  | Pre # | Data | Usuário | Limpeza
| Descrição   | Dados de usuário
---++---+--+--+--
---+---+--
single | 0  |   |  |
root | | current   |  
single | 1  |   | Ter 16 Ago 2016 15:07:25 BRT |
root | | first root filesystem |  
single | 2  |   | Ter 16 Ago 2016 15:15:57 BRT | root |
number  | after installation| important=yes
pre| 4  |   | Ter 16 Ago 2016 15:26:44 BRT | root |
number  | zypp(y2base)  | important=yes
post   | 5  | 4 | Ter 16 Ago 2016 16:12:46 BRT | root |
number  |   | important=yes
pre| 29 |   | Ter 16 Ago 2016 18:02:43 BRT | root |
number  | zypp(zypper)  | important=yes
post   | 30 | 29| Ter 16 Ago 2016 18:07:34 BRT | root |
number  |   | important=yes
pre| 45 |   | Seg 22 Ago 2016 13:59:45 BRT | root |
number  | zypp(zypper)  | important=yes
post   | 46 | 45| Seg 22 Ago 2016 14:11:17 BRT | root |
number  |   | important=yes
pre| 89 |   | Seg 29 Ago 2016 09:56:19 BRT | root |
number  | yast sw_single|  
pre| 90 |   | Seg 29 Ago 2016 10:00:00 BRT | root |
number  | zypp(y2base)  | important=no 
post   | 91 | 90| Seg 29 Ago 2016 10:01:11 BRT | root |
number  |   | important=no 
pre| 92 |   | Seg 29 Ago 2016 10:07:01 BRT | root |
number  | zypp(y2base)  | important=no 
post   | 93 | 92| Seg 29 Ago 2016 10:07:10 BRT | root |
number  |   | important=no 
pre| 94 |   | Seg 29 Ago 2016 10:12:32 BRT | root |
number  | zypp(y2base)  | important=no 
post   | 95 | 94| Seg 29 Ago 2016 10:14:25 BRT | root |
number  |   | important=no 
post   | 96 | 89| Seg 29 Ago 2016 10:17:17 BRT | root |
number  |   |                 

> Ronan, if you're given a work around, then it's even less likely the
> bug gets fixed. But if you can disable snapper snapshots entirely and
> the problem doesn't happen; or if you can increase the frequency of
> snapper snapshots and the problem happens more often, that might help
> narrow it down to a point where it's more easily reproduced. If it's
> not related, that's still useful to know.

I agree with you. The problem is that since this is a production
machine, it is kind very problematic to have so many reboots that
occurs randomly.

I will install something using zypper, which will trigger snapper, and
see if the problem will be triggered. I will be out of the office this
afternoon, so the machine will be on idle.

Best regards,
Ronan Arraes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-08-30 Thread Chris Murphy
On Tue, Aug 30, 2016 at 6:50 AM, Ronan Arraes Jardim Chagas
 wrote:
> Hi!
>
> Em Ter, 2016-08-30 às 10:12 +0800, Wang Xiaoguang escreveu:
>> For metadata, "bytes_may_use" is about 80GB, it's very big,
>> I think this value is very abnormal.
>>
>> So this explains why you have huge unallocated space, you still
>> get ENOSPC error. In kernel btrfs, there is a function
>> should_alloc_chunk()
>> to determine whether to allocate new chunks(new device space)
>>   num_bytes = total_bytes - bytes_readonly; it's 2147483648
>>   num_allocated = bytes_used + bytes_reserved; it's 977354752
>>
>> if num_allocated < num_bytes * 0.8, it will not allocate new device
>> space :) even you
>> have huge unallocated space.
>>
>> I think the root reason is that bytes_may_use has some computation
>> error and
>> is not be converted to bytes_used or bytes_reserved.
>>
>> I just explain why you get ENOSPC error even with huge unallocated
>> space
>> from
>> codes :)
>>
>
> Thanks! At least we known why ENOSPC is happening.
>
>> Can you work out a reproducer for this ENOSPC error, then I can
>> dig into codes to figure out the true reason.
>
> Unfortunately I failed in every attempt to trigger the problem. It
> happens randomly and I could not figure out yet what was triggering it.
> First, I though it was related to a build process inside a chroot jail,
> but then I see the problem happening after the computer being idle for
> a long time (+- 1h). So, no clues yet :(
>
> Is there any workaround I can do?

It sounds related to read-only snapshots to me. I wonder if this
system has something busy that's writing to a file, database, even
maybe something just spamming journald, and then there's a read-only
snapshot during the write, which then triggers the enospc.

Ronan, if you're given a work around, then it's even less likely the
bug gets fixed. But if you can disable snapper snapshots entirely and
the problem doesn't happen; or if you can increase the frequency of
snapper snapshots and the problem happens more often, that might help
narrow it down to a point where it's more easily reproduced. If it's
not related, that's still useful to know.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Fuzzer test fix

2016-08-30 Thread David Sterba
On Tue, Aug 30, 2016 at 03:22:12PM +0800, Qu Wenruo wrote:
> Cc: Lukas Lueg 
> 
> Thanks for the fuzz test from Lukas, quite a lot of bugs are exposed.
> 
> The full fixes can be fetched from my github:
> https://github.com/adam900710/btrfs-progs/tree/fuzz_fix_160830
> 
> The branch has go through fuzz and mkfs tests.
> 
> For full low-memory mode checker, I'll push it to David first, so for
> low-memory mode fuzzer test, it will need some time.
> 
> Test cases uses the same image submitted by Lukas.
> Although all these root causes are pinned, it still need quite a lot of
> work to make corrupt-block able to create minimal image.

That's not necessary to create a minimal image, but the extended
functionality of corrupt-block would help us to extend the testing.

> So I choose to directly use his images as test cases.
> 
> And special notes for the BUG_ON fix:
> The fix just fixes a small corner, while tons of BUG_ON()/abort() are
> still here and there.
> We need quite a lot of boring work to handle them later.

Yeah yeah, that's been neglected for a very long time. The kernel has
the abort_transaction infrastructure, the userspace hasn't been updated
in the same way. Long way to go, but every removed bug_on counts.

> While the good news is, new low memory mode(at least for extent and
> chunk tree check part) is quite safe against such things.
> I can't wait to see how the full low-memory mode works under fuzzer
> tests.
> 
> 
> Lukas Lueg (2):
>   btrfs-progs: fuzz-test: Add test case for invalid drop level
>   btrfs-progs: fuzz-test: Add test case for unaligned extent item
> 
> Qu Wenruo (3):
>   btrfs-progs: fsck: Check drop level before walking through fs tree
>   btrfs-progs: fsck: Check bytenr alignment for extent item
>   btrfs-progs: fsck: Avoid abort and BUG_ON in add_tree_backref

All applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs-progs: fuzz-test: Add image for unaligned tree block ptr

2016-08-30 Thread David Sterba
On Tue, Aug 30, 2016 at 11:29:33AM +0800, Qu Wenruo wrote:
> From: Lukas Lueg 
> 
> Add test case image for unaligned tree block ptr.
> It should lead to BUG_ON in free_extent_buffer().
> 
> Signed-off-by: Lukas Lueg 
> Signed-off-by: Qu Wenruo 

Both applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: fuzz-test: Add image for wrong chunk item in root tree

2016-08-30 Thread David Sterba
On Tue, Aug 30, 2016 at 10:15:50AM +0800, Qu Wenruo wrote:
> From: Lukas Lueg 
> 
> Reported by Lukas and the same image from him.
> 
> DATA_RELOC tree's key type is modifed to CHUNK_ITEM, causing btrfsck
> interpret it as CHUNK_ITEM and cause 0 num_stripes.
> 
> Add the image to fuzz-test.
> 
> Signed-off-by: Lukas Lueg 

BTW I think you should put Reported-by here, that's the reporter's
credit. The signed-off from you is for your contribution to the git
repository (packing the image, documenting the origin etc). I've fixed
that in the commit.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid 5 to raid 1: balance hangs and scrub aborts. Is this salvageable?

2016-08-30 Thread henkjan gersen
The problem with that strategy in my case is that I can't get a handle
on what the inode that triggers the problem is. As a result I don't
know which files/directories I would need to delete and restore from
backup later on to make the scrub/balance succeed. From the first post

>>> [root@quasar:~] # btrfs inspect-internal logical-resolve 99586523447296 
>>> /storage/
>>> ERROR: logical ino ioctl: No such file or directory

Trying with subvolumes instead of the main volume in that command
gives me the same output. It is almost as if it is trying to find an
inode that doesn't exist at all. The reason I think this is because a
full copy of all data to /dev/null succeeds without triggering an IO
error.

Thinking of it, it might be that the needed information is in the
dump-tree information for that logical position (see below). And
indeed the other logical address (=95053431504896) gives me a
filename. I'll try to delete that one and see what happens.

item 147 key (99586523447296 EXTENT_ITEM 16384) itemoff 8735 itemsize 51
extent refs 1 gen 914350 flags TREE_BLOCK
tree block key (EXTENT_CSUM EXTENT_CSUM 95053431504896) level 0
tree block backref root 7

key (EXTENT_CSUM EXTENT_CSUM 95053431504896) block 99586523447296
(6078279019) gen 914350

On 30 August 2016 at 12:37, Justin Kilpatrick  wrote:
> I ran this all off my personal machine, so whenever it locked up I
> just forced a power cycle, I did this probably more than a dozen
> times. I think the link below is the one I used to translate innodes
> into file names for me to delete and restore from backups (in addition
> to the files that where marked as corrupt by the end when I did
> scrub). I'm not sure there is a better strategy than just ramming your
> way through the conversion with little regard for whatever data ends
> up in your way, but if there is it probably involves digging into the
> balance code and adding in behavior that just kills whatever file is
> giving it trouble and lets you know to restore it later.
>
> http://serverfault.com/questions/746938/how-to-find-the-file-at-a-certain-btrfs-inode
>
> On Tue, Aug 30, 2016 at 2:57 AM, henkjan gersen  wrote:
>> Thanks for the response Justin. This is exactly what I tried before
>> posting to the list, but it doesn't seem to get me anywhere. The
>> moment I hit the logical address that is flagged up in btrfs check as
>> problematic the balancing operation just sits there and does nothing,
>> but the operation also can't be canceled. (scrub aborts at that same
>> logical address)
>>
>> For example:
>>
>> root@quasar:~] # btrfs balance start -mconvert=raid1,soft /storage/
>>
>> The corresponding output in dmegs is below. Note that the line with
>> 455 extends doesn't repeat, which is where the process gets stuck.
>>
>> [  534.686123] BTRFS info (device sde): relocating block group
>> 135393234714624 flags 257
>> [  536.387826] BTRFS info (device sde): found 65 extents
>> [  537.871757] BTRFS info (device sde): found 65 extents
>> [  538.790607] BTRFS info (device sde): relocating block group
>> 95050853777408 flags 257
>> [  557.759729] BTRFS warning (device sde): sde checksum verify failed
>> on 99586523447296 wanted D883E9B found DF677297 level 0
>> [  557.759851] BTRFS warning (device sde): sde checksum verify failed
>> on 99586523447296 wanted D883E9B found DF677297 level 0
>> [  557.760084] BTRFS warning (device sde): sde checksum verify failed
>> on 99586523447296 wanted D883E9B found DF677297 level 0
>> [  557.760200] BTRFS warning (device sde): sde checksum verify failed
>> on 99586523447296 wanted D883E9B found DF677297 level 0
>> [  557.760391] BTRFS warning (device sde): sde checksum verify failed
>> on 99586523447296 wanted D883E9B found DF677297 level 0
>> [  557.760483] BTRFS warning (device sde): sde checksum verify failed
>> on 99586523447296 wanted D883E9B found DF677297 level 0
>> [  557.760662] BTRFS warning (device sde): sde checksum verify failed
>> on 99586523447296 wanted D883E9B found DF677297 level 0
>> [  557.760738] BTRFS warning (device sde): sde checksum verify failed
>> on 99586523447296 wanted D883E9B found DF677297 level 0
>> [  557.760951] BTRFS warning (device sde): sde checksum verify failed
>> on 99586523447296 wanted D883E9B found DF677297 level 0
>> [  557.761028] BTRFS warning (device sde): sde checksum verify failed
>> on 99586523447296 wanted D883E9B found DF677297 level 0
>> [  566.281448] BTRFS info (device sde): found 455 extents
>> [  566.837080] csum_tree_block: 8104 callbacks suppressed
>> [  566.837087] BTRFS warning (device sde): sde checksum verify failed
>> on 99586523447296 wanted D883E9B found DF677297 level 0
>> [  566.837228] BTRFS warning (device sde): sde checksum verify failed
>> on 99586523447296 wanted D883E9B found DF677297 level 0
>> [  584.440088] BTRFS info (device sde): relocating block group
>> 99586147418112 flags 132
>>
>> I can request to cancel the 

Re: [PATCH] btrfs-progs: fuzz-test: Add image for wrong chunk item in root tree

2016-08-30 Thread David Sterba
On Tue, Aug 30, 2016 at 10:15:50AM +0800, Qu Wenruo wrote:
> From: Lukas Lueg 
> 
> Reported by Lukas and the same image from him.
> 
> DATA_RELOC tree's key type is modifed to CHUNK_ITEM, causing btrfsck
> interpret it as CHUNK_ITEM and cause 0 num_stripes.
> 
> Add the image to fuzz-test.
> 
> Signed-off-by: Lukas Lueg 
> Signed-off-by: Qu Wenruo 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-08-30 Thread Ronan Arraes Jardim Chagas
Hi!

Em Ter, 2016-08-30 às 10:12 +0800, Wang Xiaoguang escreveu:
> For metadata, "bytes_may_use" is about 80GB, it's very big,
> I think this value is very abnormal.
> 
> So this explains why you have huge unallocated space, you still
> get ENOSPC error. In kernel btrfs, there is a function
> should_alloc_chunk()
> to determine whether to allocate new chunks(new device space)
>   num_bytes = total_bytes - bytes_readonly; it's 2147483648
>   num_allocated = bytes_used + bytes_reserved; it's 977354752
> 
> if num_allocated < num_bytes * 0.8, it will not allocate new device 
> space :) even you
> have huge unallocated space.
> 
> I think the root reason is that bytes_may_use has some computation
> error and
> is not be converted to bytes_used or bytes_reserved.
> 
> I just explain why you get ENOSPC error even with huge unallocated
> space 
> from
> codes :)
> 

Thanks! At least we known why ENOSPC is happening.

> Can you work out a reproducer for this ENOSPC error, then I can
> dig into codes to figure out the true reason.

Unfortunately I failed in every attempt to trigger the problem. It
happens randomly and I could not figure out yet what was triggering it.
First, I though it was related to a build process inside a chroot jail,
but then I see the problem happening after the computer being idle for
a long time (+- 1h). So, no clues yet :(

Is there any workaround I can do?

Best regards,
Ronan Arraes


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: check: skip shared node or leaf check for low_memory mode

2016-08-30 Thread David Sterba
On Tue, Aug 30, 2016 at 07:44:17PM +0800, Wang Xiaoguang wrote:
> Hi,
> 
> On 08/30/2016 07:32 PM, David Sterba wrote:
> > On Tue, Aug 30, 2016 at 09:50:20AM +0800, Qu Wenruo wrote:
> >>> Are they not? The low-memory patchset has been released in 4.7.1, the
> >>> devel branch is always on top of master branch. I see both branches
> >>> pushed to the public git repos so I don't see what you mean.
> >> Unfortunately, the low memory mode is not fully merged into devel branch.
> >>
> >> Only the first part (extent and chunk tree) is merged.
> >>
> >> The second part(fs tree) is not merged yet.
> >>
> >> Patches like the following is not in either devel/master branch:
> >> Lu Fengqi (13):
> >> btrfs-progs: move btrfs_extref_hash() to hash.h
> >> btrfs-progs: check: introduce function to find dir_item
> >> btrfs-progs: check: introduce function to check inode_ref
> >> btrfs-progs: check: introduce function to check inode_extref
> >> btrfs-progs: check: introduce function to find inode_ref
> >> btrfs-progs: check: introduce a function to check dir_item
> >> btrfs-progs: check: introduce function to check file extent
> >> btrfs-progs: check: introduce function to check inode item
> >> btrfs-progs: check: introduce function to check fs root
> >> btrfs-progs: check: introduce function to check root ref
> >> btrfs-progs: check: introduce low_memory mode fs_tree check
> >> btrfs-progs: check: fix the return value bug of cmd_check()
> >> btrfs-progs: check: fix false warning for check_extent_item()
> >>
> >> So Wang found it confusing and unable to apply his patch to devel branch.
> > He could have replied himself, I think the conversation would feel
> > better when we can talk directly :)
> Yes, I agree. I was on leave for a while this morning :)

Doh, I may be confused by the names, I mean that I've never seen a mail
from 'Lu Fengqi'.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: check: skip shared node or leaf check for low_memory mode

2016-08-30 Thread Wang Xiaoguang

Hi,

On 08/30/2016 07:32 PM, David Sterba wrote:

On Tue, Aug 30, 2016 at 09:50:20AM +0800, Qu Wenruo wrote:

Are they not? The low-memory patchset has been released in 4.7.1, the
devel branch is always on top of master branch. I see both branches
pushed to the public git repos so I don't see what you mean.

Unfortunately, the low memory mode is not fully merged into devel branch.

Only the first part (extent and chunk tree) is merged.

The second part(fs tree) is not merged yet.

Patches like the following is not in either devel/master branch:
Lu Fengqi (13):
btrfs-progs: move btrfs_extref_hash() to hash.h
btrfs-progs: check: introduce function to find dir_item
btrfs-progs: check: introduce function to check inode_ref
btrfs-progs: check: introduce function to check inode_extref
btrfs-progs: check: introduce function to find inode_ref
btrfs-progs: check: introduce a function to check dir_item
btrfs-progs: check: introduce function to check file extent
btrfs-progs: check: introduce function to check inode item
btrfs-progs: check: introduce function to check fs root
btrfs-progs: check: introduce function to check root ref
btrfs-progs: check: introduce low_memory mode fs_tree check
btrfs-progs: check: fix the return value bug of cmd_check()
btrfs-progs: check: fix false warning for check_extent_item()

So Wang found it confusing and unable to apply his patch to devel branch.

He could have replied himself, I think the conversation would feel
better when we can talk directly :)

Yes, I agree. I was on leave for a while this morning :)

Regards,
Xiaoguang Wang


So the situation with the patchset is a bit messed up. I thought there
was only one patchset for the low-memory mode as the subjects are hard
to tell appart "introduce something" 20 times. I should have spotted
that, but you know how many patches float in the mailinglist, mistakes
happen.

Now that I know where the problem is, I'll add the remaining patches to
devel and release in next or next-next round, depending on the review.






--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid 5 to raid 1: balance hangs and scrub aborts. Is this salvageable?

2016-08-30 Thread Justin Kilpatrick
I ran this all off my personal machine, so whenever it locked up I
just forced a power cycle, I did this probably more than a dozen
times. I think the link below is the one I used to translate innodes
into file names for me to delete and restore from backups (in addition
to the files that where marked as corrupt by the end when I did
scrub). I'm not sure there is a better strategy than just ramming your
way through the conversion with little regard for whatever data ends
up in your way, but if there is it probably involves digging into the
balance code and adding in behavior that just kills whatever file is
giving it trouble and lets you know to restore it later.

http://serverfault.com/questions/746938/how-to-find-the-file-at-a-certain-btrfs-inode

On Tue, Aug 30, 2016 at 2:57 AM, henkjan gersen  wrote:
> Thanks for the response Justin. This is exactly what I tried before
> posting to the list, but it doesn't seem to get me anywhere. The
> moment I hit the logical address that is flagged up in btrfs check as
> problematic the balancing operation just sits there and does nothing,
> but the operation also can't be canceled. (scrub aborts at that same
> logical address)
>
> For example:
>
> root@quasar:~] # btrfs balance start -mconvert=raid1,soft /storage/
>
> The corresponding output in dmegs is below. Note that the line with
> 455 extends doesn't repeat, which is where the process gets stuck.
>
> [  534.686123] BTRFS info (device sde): relocating block group
> 135393234714624 flags 257
> [  536.387826] BTRFS info (device sde): found 65 extents
> [  537.871757] BTRFS info (device sde): found 65 extents
> [  538.790607] BTRFS info (device sde): relocating block group
> 95050853777408 flags 257
> [  557.759729] BTRFS warning (device sde): sde checksum verify failed
> on 99586523447296 wanted D883E9B found DF677297 level 0
> [  557.759851] BTRFS warning (device sde): sde checksum verify failed
> on 99586523447296 wanted D883E9B found DF677297 level 0
> [  557.760084] BTRFS warning (device sde): sde checksum verify failed
> on 99586523447296 wanted D883E9B found DF677297 level 0
> [  557.760200] BTRFS warning (device sde): sde checksum verify failed
> on 99586523447296 wanted D883E9B found DF677297 level 0
> [  557.760391] BTRFS warning (device sde): sde checksum verify failed
> on 99586523447296 wanted D883E9B found DF677297 level 0
> [  557.760483] BTRFS warning (device sde): sde checksum verify failed
> on 99586523447296 wanted D883E9B found DF677297 level 0
> [  557.760662] BTRFS warning (device sde): sde checksum verify failed
> on 99586523447296 wanted D883E9B found DF677297 level 0
> [  557.760738] BTRFS warning (device sde): sde checksum verify failed
> on 99586523447296 wanted D883E9B found DF677297 level 0
> [  557.760951] BTRFS warning (device sde): sde checksum verify failed
> on 99586523447296 wanted D883E9B found DF677297 level 0
> [  557.761028] BTRFS warning (device sde): sde checksum verify failed
> on 99586523447296 wanted D883E9B found DF677297 level 0
> [  566.281448] BTRFS info (device sde): found 455 extents
> [  566.837080] csum_tree_block: 8104 callbacks suppressed
> [  566.837087] BTRFS warning (device sde): sde checksum verify failed
> on 99586523447296 wanted D883E9B found DF677297 level 0
> [  566.837228] BTRFS warning (device sde): sde checksum verify failed
> on 99586523447296 wanted D883E9B found DF677297 level 0
> [  584.440088] BTRFS info (device sde): relocating block group
> 99586147418112 flags 132
>
> I can request to cancel the operation, which gets picked up. However
> the balancing doesn't actually stop, probably because it is in the
> process of relocating a block
>
> [root@quasar:~] # btrfs balance status /storage/
> Balance on '/storage/' is running, cancel requested
> 0 out of about 4 chunks balanced (9738 considered), 100% left
>
> This happens for both metadata and actual data and the only way out is
> forcing a hard reboot (=reset switch). My hope would be that I could
> delete the file corresponding to the offending logical address, but I
> can't find out what that logical address corresponds to.
>
>
> On 29 August 2016 at 12:41, Justin Kilpatrick  wrote:
>> I converted my significantly smaller raid 5 array to raid 1 a little
>> less than a year ago now and I encountered some similar issues.
>>
>> What i ended up doing was starting balance again and again with
>> slightly different arguments (usually thresholds for what blocks to
>> move) and eventually (a week or two, even with a small array) I
>> managed a full conversion with only some data loss, which I was able
>> to find and correct from backups with scrub.
>>
>> On Mon, Aug 29, 2016 at 5:57 AM, henkjan gersen  wrote:
>>> Following the recent posts on the mailing list I'm trying to convert a
>>> running raid5 system to raid1. This conversion  fails to complete with
>>> checksum verify failures. Running a scrub does not fix these checksum
>>> 

Re: [PATCH] btrfs-progs: check: skip shared node or leaf check for low_memory mode

2016-08-30 Thread David Sterba
On Tue, Aug 30, 2016 at 09:50:20AM +0800, Qu Wenruo wrote:
> > Are they not? The low-memory patchset has been released in 4.7.1, the
> > devel branch is always on top of master branch. I see both branches
> > pushed to the public git repos so I don't see what you mean.
> 
> Unfortunately, the low memory mode is not fully merged into devel branch.
> 
> Only the first part (extent and chunk tree) is merged.
> 
> The second part(fs tree) is not merged yet.
> 
> Patches like the following is not in either devel/master branch:
> Lu Fengqi (13):
>btrfs-progs: move btrfs_extref_hash() to hash.h
>btrfs-progs: check: introduce function to find dir_item
>btrfs-progs: check: introduce function to check inode_ref
>btrfs-progs: check: introduce function to check inode_extref
>btrfs-progs: check: introduce function to find inode_ref
>btrfs-progs: check: introduce a function to check dir_item
>btrfs-progs: check: introduce function to check file extent
>btrfs-progs: check: introduce function to check inode item
>btrfs-progs: check: introduce function to check fs root
>btrfs-progs: check: introduce function to check root ref
>btrfs-progs: check: introduce low_memory mode fs_tree check
>btrfs-progs: check: fix the return value bug of cmd_check()
>btrfs-progs: check: fix false warning for check_extent_item()
> 
> So Wang found it confusing and unable to apply his patch to devel branch.

He could have replied himself, I think the conversation would feel
better when we can talk directly :)

So the situation with the patchset is a bit messed up. I thought there
was only one patchset for the low-memory mode as the subjects are hard
to tell appart "introduce something" 20 times. I should have spotted
that, but you know how many patches float in the mailinglist, mistakes
happen.

Now that I know where the problem is, I'll add the remaining patches to
devel and release in next or next-next round, depending on the review.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multiple bugs found by fuzzing BTRFS

2016-08-30 Thread David Sterba
On Mon, Aug 29, 2016 at 08:47:10PM +0200, Lukas Lueg wrote:
> I'll report new issues to bz as they turn up from the current round
> only if they represent a yet unreported kind of problem (e.g. there
> are stack-based buffer over- and underruns lurking, I lost them due to
> a bug in my setup, though). The next round will be much faster as I've
> now vastly improved my automatic bug triage and fuzzing speed.
> 
> I lost interest once after bugs went unanswered - there are bugs still
> open and unanswered from 2015/04. I hope this won't be a problem this
> time.

Yeah, the lack if replies is unfortunate and happens. There's a
disproportion between number of people who report bugs and who go
through them and fix.  I personally look out for the fuzzing bugs as
they usually come with an image and it's easy to create a testcase from
them, reproducible bugs also tend to get fixes faster.

I must have missed the bugs though, there are 3 fuzzed images, reported
by you in bugs 96971, 97191 and 97271. I see two more (97031 and 97021)
and will look into them.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: `btrfs dev del` fails with `No space left on device`

2016-08-30 Thread Hugo Mills
On Tue, Aug 30, 2016 at 10:22:24AM +, ojab // wrote:
> On Mon, Aug 29, 2016 at 9:05 PM, Chris Murphy  wrote:
> > On Mon, Aug 29, 2016 at 10:04 AM, ojab //  wrote:
> > What do you get for 'btrfs fi us '
> 
> $ sudo btrfs fi us /mnt/xxx/
> Overall:
> Device size:  3.64TiB
> Device allocated: 1.82TiB
> Device unallocated:   1.82TiB
> Device missing: 0.00B
> Used: 1.81TiB
> Free (estimated): 1.83TiB (min: 943.55GiB)
> Data ratio:  1.00
> Metadata ratio:  2.00
> Global reserve: 512.00MiB (used: 0.00B)
> 
> Data,RAID0: Size:1.81TiB, Used:1.80TiB
>/dev/sdb1928.48GiB
>/dev/sdc1928.48GiB
> 
> Metadata,RAID1: Size:3.00GiB, Used:2.15GiB
>/dev/sdb1  3.00GiB
>/dev/sdc1  3.00GiB
> 
> System,RAID1: Size:32.00MiB, Used:176.00KiB
>/dev/sdb1 32.00MiB
>/dev/sdc1 32.00MiB
> 
> Unallocated:
>/dev/sdb1  1.01MiB
>/dev/sdc1  1.00MiB
>/dev/sdd1  1.82TiB

   Basically, your FS is full. Note that replacing just one of the
devices in a RAID-0 array with a larger one is not going to help. What
you really need here is "single", not RAID-0. I would probably try
converting to single first (which should move quite a lot of it to the
new device anyway), and then the device delete:

 btrfs balance start -dconvert=single,soft /mountpoint
 btrfs dev delete /dev/sdc1 /mountpoint

   Hugo.

> > You can see what the state of block groups are with btrfs-debugfs
> > which is in kdave btrfs-progs git. Chances are you need a larger
> > value, -dusage=15 -musage=15 to free up space on devid 1 and 2. Then
> > maybe devid 3 can be removed.
> 
> btrfs-debugfs output:
> https://gist.github.com/ojab/a3c59983e8fb6679b8fdc0e88c0c9e60
> Before `delete` the was about 60Gb of free space, looks like it was
> filled during `delete` (I've seen similar behavior during `btrfs fi
> defrag`) and I should use `-dusage=69` and up.
> 
> I don't quite understand what exactly btrfs is trying to do: I assume
> that block groups should be relocated to the new/empty drive, but
> during the delete `btrfs fi us` shows
> Unallocated:
> /dev/sdc1 16.00EiB
> 
> so deleted partition is counted as maximum possible empty drive and
> blocks are relocated to it instead of new/empty drive? (kernel-4.7.2 &
> btrfs-progs-4.7.1 here)
> Is there any way to see where and why block groups are relocated
> during `delete`?
> 
> //wbr ojab

-- 
Hugo Mills | Everything simple is false. Everything which is
hugo@... carfax.org.uk | complex is unusable
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: `btrfs dev del` fails with `No space left on device`

2016-08-30 Thread ojab //
On Mon, Aug 29, 2016 at 9:05 PM, Chris Murphy  wrote:
> On Mon, Aug 29, 2016 at 10:04 AM, ojab //  wrote:
> What do you get for 'btrfs fi us '

$ sudo btrfs fi us /mnt/xxx/
Overall:
Device size:  3.64TiB
Device allocated: 1.82TiB
Device unallocated:   1.82TiB
Device missing: 0.00B
Used: 1.81TiB
Free (estimated): 1.83TiB (min: 943.55GiB)
Data ratio:  1.00
Metadata ratio:  2.00
Global reserve: 512.00MiB (used: 0.00B)

Data,RAID0: Size:1.81TiB, Used:1.80TiB
   /dev/sdb1928.48GiB
   /dev/sdc1928.48GiB

Metadata,RAID1: Size:3.00GiB, Used:2.15GiB
   /dev/sdb1  3.00GiB
   /dev/sdc1  3.00GiB

System,RAID1: Size:32.00MiB, Used:176.00KiB
   /dev/sdb1 32.00MiB
   /dev/sdc1 32.00MiB

Unallocated:
   /dev/sdb1  1.01MiB
   /dev/sdc1  1.00MiB
   /dev/sdd1  1.82TiB

>
> You can see what the state of block groups are with btrfs-debugfs
> which is in kdave btrfs-progs git. Chances are you need a larger
> value, -dusage=15 -musage=15 to free up space on devid 1 and 2. Then
> maybe devid 3 can be removed.

btrfs-debugfs output:
https://gist.github.com/ojab/a3c59983e8fb6679b8fdc0e88c0c9e60
Before `delete` the was about 60Gb of free space, looks like it was
filled during `delete` (I've seen similar behavior during `btrfs fi
defrag`) and I should use `-dusage=69` and up.

I don't quite understand what exactly btrfs is trying to do: I assume
that block groups should be relocated to the new/empty drive, but
during the delete `btrfs fi us` shows
Unallocated:
/dev/sdc1 16.00EiB

so deleted partition is counted as maximum possible empty drive and
blocks are relocated to it instead of new/empty drive? (kernel-4.7.2 &
btrfs-progs-4.7.1 here)
Is there any way to see where and why block groups are relocated
during `delete`?

//wbr ojab
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recommendation on raid5 drive error resolution

2016-08-30 Thread Gareth Pye
Okay, things aren't looking good. The FS wont mount for me:
http://pastebin.com/sEEdRxsN

On Tue, Aug 30, 2016 at 9:01 AM, Gareth Pye  wrote:
> When I can get this stupid box to boot from an external drive I'll
> have some idea of what is going on



-- 
Gareth Pye - blog.cerberos.id.au
Level 2 MTG Judge, Melbourne, Australia
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs and systemd

2016-08-30 Thread Stefan Priebe - Profihost AG
Am 29.08.2016 um 13:33 schrieb Timofey Titovets:
> Do you try: nofail,noauto,x-systemd.automount ?

sure this fails too as it has the same timeout in systemd.

Mr. Poettering has recommanded me todo the following:
# mkdir -p /etc/systemd/system/$(systemd-escape --suffix=mount -p
/foo/bar/baz).d/
# cat > /etc/systemd/system/$(systemd-escape --suffix=mount -p
/foo/bar/baz).d/timeout.conf < 2016-08-29 9:28 GMT+03:00 Stefan Priebe - Profihost AG 
> :
>> Hi Qu,
>>
>> Am 29.08.2016 um 03:48 schrieb Qu Wenruo:
>>>
>>>
>>> At 08/29/2016 04:15 AM, Stefan Priebe - Profihost AG wrote:
 Hi,

 i'm trying to get my 60TB btrfs volume to mount with systemd at boot.
 But this always fails with: "mounting timed out. Stopping." after 90s.
>>>
>>> 60TB is quite large, and under most case it will already cause mount
>>> speed problem.
>>>
>>> In our test environment, filling a fs with 16K small files to 2T (just
>>> 128K files)will already slow the mount process to 10s.
>>>
>>> For larger fs, or more specifically, large extent tree, will slow the
>>> mount process obviously.
>>>
>>> The root fix will need a rework of extent tree.
>>> AFAIK Josef is working on the rework.
>>>
>>> So the btrfs fix will need some time.
>>
>> thanks but i've no problem with the long mount time (in my case 6
>> minutes) i'm just wondering how to live with it with systemd. As it
>> always cancels the mount process after 90s and i see no fstab option to
>> change this.
>>
>> Greets,
>> Stefan
>>
>>>
>>> Thanks,
>>> Qu

 I can't find any fstab setting for systemd to higher this timeout.
 There's just  the x-systemd.device-timeout but this controls how long to
 wait for the device and not for the mount command.

 Is there any solution for big btrfs volumes and systemd?

 Greets,
 Stefan
 --
 To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] btrfs-progs: fsck: Avoid abort and BUG_ON in add_tree_backref

2016-08-30 Thread Qu Wenruo
Add_tree_backref() can cause BUG_ON() and abort() in quite a lot of
cases, from the ENOMEM to existing tree backref records.

Change all these BUG_ON() and abort() to return proper values.
And modify all callers to handle such problems.

Reported-by: Lukas Lueg 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 75 ++--
 1 file changed, 53 insertions(+), 22 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index c56b176..ef3e3a1 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -4864,20 +4864,25 @@ static int add_tree_backref(struct cache_tree 
*extent_cache, u64 bytenr,
 
add_extent_rec_nolookup(extent_cache, );
 
+   /* really a bug in cache_extent implement now */
cache = lookup_cache_extent(extent_cache, bytenr, 1);
if (!cache)
-   abort();
+   return -ENOENT;
}
 
rec = container_of(cache, struct extent_record, cache);
if (rec->start != bytenr) {
-   abort();
+   /*
+* Several cause, from unaligned bytenr to over lapping extents
+*/
+   return -EEXIST;
}
 
back = find_tree_backref(rec, parent, root);
if (!back) {
back = alloc_tree_backref(rec, parent, root);
-   BUG_ON(!back);
+   if (!back)
+   return -ENOMEM;
}
 
if (found_ref) {
@@ -5154,16 +5159,18 @@ static int process_extent_ref_v0(struct cache_tree 
*extent_cache,
 {
struct btrfs_extent_ref_v0 *ref0;
struct btrfs_key key;
+   int ret;
 
btrfs_item_key_to_cpu(leaf, , slot);
ref0 = btrfs_item_ptr(leaf, slot, struct btrfs_extent_ref_v0);
if (btrfs_ref_objectid_v0(leaf, ref0) < BTRFS_FIRST_FREE_OBJECTID) {
-   add_tree_backref(extent_cache, key.objectid, key.offset, 0, 0);
+   ret = add_tree_backref(extent_cache, key.objectid, key.offset,
+   0, 0);
} else {
-   add_data_backref(extent_cache, key.objectid, key.offset, 0,
-0, 0, btrfs_ref_count_v0(leaf, ref0), 0, 0);
+   ret = add_data_backref(extent_cache, key.objectid, key.offset,
+   0, 0, 0, btrfs_ref_count_v0(leaf, ref0), 0, 0);
}
-   return 0;
+   return ret;
 }
 #endif
 
@@ -5406,6 +5413,7 @@ static int process_extent_item(struct btrfs_root *root,
struct extent_record tmpl;
unsigned long end;
unsigned long ptr;
+   int ret;
int type;
u32 item_size = btrfs_item_size_nr(eb, slot);
u64 refs = 0;
@@ -5485,12 +5493,18 @@ static int process_extent_item(struct btrfs_root *root,
offset = btrfs_extent_inline_ref_offset(eb, iref);
switch (type) {
case BTRFS_TREE_BLOCK_REF_KEY:
-   add_tree_backref(extent_cache, key.objectid,
-0, offset, 0);
+   ret = add_tree_backref(extent_cache, key.objectid,
+   0, offset, 0);
+   if (ret < 0)
+   error("add_tree_backref failed: %s",
+ strerror(-ret));
break;
case BTRFS_SHARED_BLOCK_REF_KEY:
-   add_tree_backref(extent_cache, key.objectid,
-offset, 0, 0);
+   ret = add_tree_backref(extent_cache, key.objectid,
+   offset, 0, 0);
+   if (ret < 0)
+   error("add_tree_backref failed: %s",
+ strerror(-ret));
break;
case BTRFS_EXTENT_DATA_REF_KEY:
dref = (struct btrfs_extent_data_ref *)(>offset);
@@ -6413,13 +6427,19 @@ static int run_next_block(struct btrfs_root *root,
}
 
if (key.type == BTRFS_TREE_BLOCK_REF_KEY) {
-   add_tree_backref(extent_cache, key.objectid, 0,
-key.offset, 0);
+   ret = add_tree_backref(extent_cache,
+   key.objectid, 0, key.offset, 0);
+   if (ret < 0)
+   error("add_tree_backref failed: %s",
+ strerror(-ret));
continue;
}
if (key.type == BTRFS_SHARED_BLOCK_REF_KEY) {
-   add_tree_backref(extent_cache, key.objectid,
-

[PATCH 3/5] btrfs-progs: fsck: Check bytenr alignment for extent item

2016-08-30 Thread Qu Wenruo
Check bytenr alignment for extent item to filter invalid items early.

Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index 2aa0a7b..c56b176 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -5422,6 +5422,11 @@ static int process_extent_item(struct btrfs_root *root,
num_bytes = key.offset;
}
 
+   if (!IS_ALIGNED(key.objectid, root->sectorsize)) {
+   error("ignoring invalid extent, bytenr %llu is not aligned to 
%u",
+ key.objectid, root->sectorsize);
+   return -EIO;
+   }
if (item_size < sizeof(*ei)) {
 #ifdef BTRFS_COMPAT_EXTENT_TREE_V0
struct btrfs_extent_item_v0 *ei0;
@@ -5448,6 +5453,16 @@ static int process_extent_item(struct btrfs_root *root,
metadata = 1;
else
metadata = 0;
+   if (metadata && num_bytes != root->nodesize) {
+   error("ignore invalid metadata extent, length %llu does not 
equal to %u",
+ num_bytes, root->nodesize);
+   return -EIO;
+   }
+   if (!metadata && !IS_ALIGNED(num_bytes, root->sectorsize)) {
+   error("ignore invalid data extent, length %llu is not aligned 
to %u",
+ num_bytes, root->sectorsize);
+   return -EIO;
+   }
 
memset(, 0, sizeof(tmpl));
tmpl.start = key.objectid;
-- 
2.9.3



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] btrfs-progs: fuzz-test: Add test case for invalid drop level

2016-08-30 Thread Qu Wenruo
From: Lukas Lueg 

Signed-off-by: Lukas Lueg 
Signed-off-by: Qu Wenruo 
---
 tests/fuzz-tests/images/invalid-drop-level.raw.txt |  30 +
 tests/fuzz-tests/images/invalid-drop-level.raw.xz  | Bin 0 -> 3788 bytes
 2 files changed, 30 insertions(+)
 create mode 100644 tests/fuzz-tests/images/invalid-drop-level.raw.txt
 create mode 100644 tests/fuzz-tests/images/invalid-drop-level.raw.xz

diff --git a/tests/fuzz-tests/images/invalid-drop-level.raw.txt 
b/tests/fuzz-tests/images/invalid-drop-level.raw.txt
new file mode 100644
index 000..dab91dc
--- /dev/null
+++ b/tests/fuzz-tests/images/invalid-drop-level.raw.txt
@@ -0,0 +1,30 @@
+URL: https://bugzilla.kernel.org/show_bug.cgi?id=154021
+Lukas Lueg 2016-08-26 22:53:42 UTC
+
+Created attachment 230361 [details]
+Image triggering btrfsck to segv
+
+The fuzzer hit again:
+
+==32522==ERROR: AddressSanitizer: SEGV on unknown address 0x00027fff801c (pc
+0x004a952e bp 0x7fff5222ce70 sp 0x7fff5222c600 T0)
+#0 0x4a952d in __asan_memcpy
+(/home/lukas/dev/btrfsfuzz/bin-asan/bin/btrfs+0x4a952d)
+#1 0x66a323 in read_extent_buffer
+/home/lukas/dev/btrfsfuzz/src-asan/extent_io.c:867:2
+#2 0x55ad25 in btrfs_node_key
+/home/lukas/dev/btrfsfuzz/src-asan/./ctree.h:1668:2
+#3 0x58573b in check_fs_root
+/home/lukas/dev/btrfsfuzz/src-asan/cmds-check.c:3748:3
+#4 0x544136 in check_fs_roots
+/home/lukas/dev/btrfsfuzz/src-asan/cmds-check.c:3896:10
+#5 0x53d8c5 in cmd_check
+/home/lukas/dev/btrfsfuzz/src-asan/cmds-check.c:11470:8
+#6 0x4f105f in main /home/lukas/dev/btrfsfuzz/src-asan/btrfs.c:243:8
+#7 0x7fea1bcb7730 in __libc_start_main (/lib64/libc.so.6+0x20730)
+#8 0x421238 in _start
+(/home/lukas/dev/btrfsfuzz/bin-asan/bin/btrfs+0x421238)
+
+
+See the attached image to reproduce using btrfs-progs btrfs-progs
+v4.7-42-g56e9586.
diff --git a/tests/fuzz-tests/images/invalid-drop-level.raw.xz 
b/tests/fuzz-tests/images/invalid-drop-level.raw.xz
new file mode 100644
index 
..76c58dce433dc6939c35d25cd4c2f2165be3c94c
GIT binary patch
literal 3788
zcmeH~=|2>T8pp>rOk7JTgJxt4Lt`gQ*2CB^^nl!aLzI
z@0jC7ai7WK7!5Tr{LL$kd#5c(071;%nA5O#be2iJ)jDeLZ*AUR(PJ8YwA4!I3$
zTpdI@RKC`Yxssl+H|V4x+X#s3G;?A=8m?^YJdYIPJx2VFvt3`A6$)
zbFc+!VFWSadFqZ1VmvcIKef|;?3cDy`_{s?GW@7B(Ex|nkC6#iqSykX37G9Ug~#WP
z%6x4!yjQ$@x!b5Tv((=S{)}`|}(rg2fE(rPBVmo?fUidm$7C}KFN^K}O`%*N*
zm#DVVwLK**cnmyCx!@JFD1CqS%#^7VKf0!sIBB=F9AB^QAS>wg#X$ItOle?MEZsey
zEu5fG2YH#`&@w%HnnmcLVkX2BefrqmAvV9MpZM(Nl$%=l%st10G=B
zH=W%`=09o4T8Zy5S>Pbeuv$qr%@^HI-ouYQMzd%AFBu(|_+uFq;a_`y*z542{kJVM
zhkuVtVq9^Jd9$uy6CCecX?RsVoX>$iMX`1Stp|dvFxS+u%zj$h>_O$~O(1@F?egG1
zR_Vg(skuT3i>bAGsG%>x

[PATCH 5/5] btrfs-progs: fuzz-test: Add test case for unaligned extent item

2016-08-30 Thread Qu Wenruo
From: Lukas Lueg 

Signed-off-by: Lukas Lueg 
Signed-off-by: Qu Wenruo 
---
 tests/fuzz-tests/images/unaligned-extent-item.raw.txt |   8 
 tests/fuzz-tests/images/unaligned-extent-item.raw.xz  | Bin 0 -> 3684 bytes
 2 files changed, 8 insertions(+)
 create mode 100644 tests/fuzz-tests/images/unaligned-extent-item.raw.txt
 create mode 100644 tests/fuzz-tests/images/unaligned-extent-item.raw.xz

diff --git a/tests/fuzz-tests/images/unaligned-extent-item.raw.txt 
b/tests/fuzz-tests/images/unaligned-extent-item.raw.txt
new file mode 100644
index 000..7f0b804
--- /dev/null
+++ b/tests/fuzz-tests/images/unaligned-extent-item.raw.txt
@@ -0,0 +1,8 @@
+URL: https://bugzilla.kernel.org/show_bug.cgi?id=155181
+Lukas Lueg 2016-08-28 10:52:32 UTC
+
+Created attachment 230891 [details]
+BTRFS-image that reaches abort() in btrfsck
+
+More news from the fuzzer. The attached image causes btrfsck to reach abort()
+in in cmds-check.c:add_tree_backref(); using btrfs-progs v4.7-42-g56e9586.
diff --git a/tests/fuzz-tests/images/unaligned-extent-item.raw.xz 
b/tests/fuzz-tests/images/unaligned-extent-item.raw.xz
new file mode 100644
index 
..c401f2e575467fd33118e3e0dacc2e35636065f2
GIT binary patch
literal 3684
zcmeH~X*ApU8pr>!t0fF_acgVr)V`}|)!0&7wKUe+6GenjgK8)yRBIz^2YOfO6Y&2(w4*;N9Y_iZO>O-BWCa(ZqZ~@C$$I6_M
zp9r$C*&)9LmK1RuX%4HDhg6seITpP@?8b>1*(J>ou_n6wYl*2
zyzmLu;9cb$qBPTnOcE-)l>Gaeg7sbHm<9u~|q+->gZJAq+WzgwIOtuqF?7tsC}!
zG-`Ge-`=(r4^yR7UM1!ty~&$~|MSMJfu?0{jP8e=a>(aiSi?t$+q
zkB?fF78d6|+6B_yJE(-PPi%eRDCl4$ndNffYFn|#vsT9ylnj~)$BTqpQSp}EK+bEY
z=gO*u7&(g^#g(XvTBsjgm6|?LM94~3&)ZRnJ&@|5gL9HV6tVc%jh(xR%+z^
z+vS-_9ZIF?By>-?`h<=se?MHEaFi^vt#{af*EbV|L5Tg>=ILrto@lEnO!bNtft1
zoRpf6O#$9ufNwudPwji7v}J|E@BLItG7oQ;QT@AShIa1o%#oP9*!Pp?Lbqw*AWGFu
z`Zo*?mo2BNk5tf+HN*Jq^~Dd-O*EVRE5G1lHFXyHT(0DuDdFMt8;@6|IV)JTA8
zHi-P{^m?|Z^0R0`NPca%>HqUsFSL>9-P2i=pJX&_jQILG7a&
zbcE@VlQA|JJ}4+c%)`K0wz8X)q$^$7c276*4~fWS4eY3_m%P>CqMHRnEr{P7M^z7h
z+hYDpf0veDJs1WLvC0(_W{VUsF!2b6z$m*&)dpO5oV3I&)zNbA%}M`sC#5&|cct^Q
zS_`nHA}ZM%oM!8`QTakGq=oOTw=n%2%{s%wF_{ARxSfycpGh{nH)
z*Tjug)*?5a#;pp#%o&@=gTRAL>|E#ot=>OUX7O^%r3d*XGqkl3{1a2F_AphB-FnK8
zefOifnsoxf(n+BUBI3mMilK0{D8Y2gdZ_}5TJ(QBzxtsI%0kDw{7tW|*W~30;SN2m
zgCr82fC3{Oo-_GGFLJFl)Zq`7if`^jL6X-^ncdD=pi+nJ2N9hK;fvy2U7W`wi5(AKE;{4*k+Gl!+FqF-rDs-`X70%S4(4U*U<^e7uA$9j5iWr@
z?*bfysaqRnC(Bx%Wj%%hVEbRia~O}HysE)%JNQvcf){eh!dlcPd(wqp)!)j
z!GR0$AFj{lCr8eR%sPz~9IH?RqKafzldWm-1(D6Xrtc+;4?RgUF$a>xKZGo{sXpjK
z?b)p;(b0zf3PYEY6wd^FxA5z$ePWr%5(=EtBISA2bfxPd%sn{SZud3T{Z1KI@P(*NnnS{vXNEW87+4e-VKc#fWBLgxGR_urIz0)qfYM
zaN+^xH)0yVUy0bCrQFXz=nA%M`U>m|*5g-TUwBV^1@=3D;dssi+`yyt^zHNi{w@N`
hbsEqIst!E>sOvNihvPFAQ#vH~hcY>SD{>(`{{>VdBPRd=

literal 0
HcmV?d1

-- 
2.9.3



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] btrfs-progs: fsck: Check drop level before walking through fs tree

2016-08-30 Thread Qu Wenruo
Exposed by fuzzed image from Lukas, which contains invalid drop level
(16), causing segfault when accessing path->nodes[drop_level].

This patch will check drop level against fs tree level and
BTRFS_MAX_LEVEL to avoid such problem.

Reported-by: Lukas Lueg 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index 1e1f7c9..2aa0a7b 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -3742,6 +3742,11 @@ static int check_fs_root(struct btrfs_root *root,
btrfs_disk_key_to_cpu(, _item->drop_progress);
level = root_item->drop_level;
path.lowest_level = level;
+   if (level > btrfs_header_level(root->node) ||
+   level >= BTRFS_MAX_LEVEL) {
+   error("ignoring invalid drop level: %u", level);
+   goto skip_walking;
+   }
wret = btrfs_search_slot(NULL, root, , , 0, 0);
if (wret < 0)
goto skip_walking;
-- 
2.9.3



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] Fuzzer test fix

2016-08-30 Thread Qu Wenruo
Cc: Lukas Lueg 

Thanks for the fuzz test from Lukas, quite a lot of bugs are exposed.

The full fixes can be fetched from my github:
https://github.com/adam900710/btrfs-progs/tree/fuzz_fix_160830

The branch has go through fuzz and mkfs tests.

For full low-memory mode checker, I'll push it to David first, so for
low-memory mode fuzzer test, it will need some time.

Test cases uses the same image submitted by Lukas.
Although all these root causes are pinned, it still need quite a lot of
work to make corrupt-block able to create minimal image.

So I choose to directly use his images as test cases.

And special notes for the BUG_ON fix:
The fix just fixes a small corner, while tons of BUG_ON()/abort() are
still here and there.
We need quite a lot of boring work to handle them later.

While the good news is, new low memory mode(at least for extent and
chunk tree check part) is quite safe against such things.
I can't wait to see how the full low-memory mode works under fuzzer
tests.


Lukas Lueg (2):
  btrfs-progs: fuzz-test: Add test case for invalid drop level
  btrfs-progs: fuzz-test: Add test case for unaligned extent item

Qu Wenruo (3):
  btrfs-progs: fsck: Check drop level before walking through fs tree
  btrfs-progs: fsck: Check bytenr alignment for extent item
  btrfs-progs: fsck: Avoid abort and BUG_ON in add_tree_backref

 cmds-check.c   |  95 -
 tests/fuzz-tests/images/invalid-drop-level.raw.txt |  30 +++
 tests/fuzz-tests/images/invalid-drop-level.raw.xz  | Bin 0 -> 3788 bytes
 .../images/unaligned-extent-item.raw.txt   |   8 ++
 .../fuzz-tests/images/unaligned-extent-item.raw.xz | Bin 0 -> 3684 bytes
 5 files changed, 111 insertions(+), 22 deletions(-)
 create mode 100644 tests/fuzz-tests/images/invalid-drop-level.raw.txt
 create mode 100644 tests/fuzz-tests/images/invalid-drop-level.raw.xz
 create mode 100644 tests/fuzz-tests/images/unaligned-extent-item.raw.txt
 create mode 100644 tests/fuzz-tests/images/unaligned-extent-item.raw.xz

-- 
2.9.3



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid 5 to raid 1: balance hangs and scrub aborts. Is this salvageable?

2016-08-30 Thread henkjan gersen
Thanks for the response Justin. This is exactly what I tried before
posting to the list, but it doesn't seem to get me anywhere. The
moment I hit the logical address that is flagged up in btrfs check as
problematic the balancing operation just sits there and does nothing,
but the operation also can't be canceled. (scrub aborts at that same
logical address)

For example:

root@quasar:~] # btrfs balance start -mconvert=raid1,soft /storage/

The corresponding output in dmegs is below. Note that the line with
455 extends doesn't repeat, which is where the process gets stuck.

[  534.686123] BTRFS info (device sde): relocating block group
135393234714624 flags 257
[  536.387826] BTRFS info (device sde): found 65 extents
[  537.871757] BTRFS info (device sde): found 65 extents
[  538.790607] BTRFS info (device sde): relocating block group
95050853777408 flags 257
[  557.759729] BTRFS warning (device sde): sde checksum verify failed
on 99586523447296 wanted D883E9B found DF677297 level 0
[  557.759851] BTRFS warning (device sde): sde checksum verify failed
on 99586523447296 wanted D883E9B found DF677297 level 0
[  557.760084] BTRFS warning (device sde): sde checksum verify failed
on 99586523447296 wanted D883E9B found DF677297 level 0
[  557.760200] BTRFS warning (device sde): sde checksum verify failed
on 99586523447296 wanted D883E9B found DF677297 level 0
[  557.760391] BTRFS warning (device sde): sde checksum verify failed
on 99586523447296 wanted D883E9B found DF677297 level 0
[  557.760483] BTRFS warning (device sde): sde checksum verify failed
on 99586523447296 wanted D883E9B found DF677297 level 0
[  557.760662] BTRFS warning (device sde): sde checksum verify failed
on 99586523447296 wanted D883E9B found DF677297 level 0
[  557.760738] BTRFS warning (device sde): sde checksum verify failed
on 99586523447296 wanted D883E9B found DF677297 level 0
[  557.760951] BTRFS warning (device sde): sde checksum verify failed
on 99586523447296 wanted D883E9B found DF677297 level 0
[  557.761028] BTRFS warning (device sde): sde checksum verify failed
on 99586523447296 wanted D883E9B found DF677297 level 0
[  566.281448] BTRFS info (device sde): found 455 extents
[  566.837080] csum_tree_block: 8104 callbacks suppressed
[  566.837087] BTRFS warning (device sde): sde checksum verify failed
on 99586523447296 wanted D883E9B found DF677297 level 0
[  566.837228] BTRFS warning (device sde): sde checksum verify failed
on 99586523447296 wanted D883E9B found DF677297 level 0
[  584.440088] BTRFS info (device sde): relocating block group
99586147418112 flags 132

I can request to cancel the operation, which gets picked up. However
the balancing doesn't actually stop, probably because it is in the
process of relocating a block

[root@quasar:~] # btrfs balance status /storage/
Balance on '/storage/' is running, cancel requested
0 out of about 4 chunks balanced (9738 considered), 100% left

This happens for both metadata and actual data and the only way out is
forcing a hard reboot (=reset switch). My hope would be that I could
delete the file corresponding to the offending logical address, but I
can't find out what that logical address corresponds to.


On 29 August 2016 at 12:41, Justin Kilpatrick  wrote:
> I converted my significantly smaller raid 5 array to raid 1 a little
> less than a year ago now and I encountered some similar issues.
>
> What i ended up doing was starting balance again and again with
> slightly different arguments (usually thresholds for what blocks to
> move) and eventually (a week or two, even with a small array) I
> managed a full conversion with only some data loss, which I was able
> to find and correct from backups with scrub.
>
> On Mon, Aug 29, 2016 at 5:57 AM, henkjan gersen  wrote:
>> Following the recent posts on the mailing list I'm trying to convert a
>> running raid5 system to raid1. This conversion  fails to complete with
>> checksum verify failures. Running a scrub does not fix these checksum
>> failures and moreover scrub itself aborts after ~9TB (despite repeated
>> tries).
>>
>> All disks in the array complete a long smartctl test without any
>> errors. Running a scrub after remounting the array with the
>> recovery-option also makes no difference, it still aborts. For
>> clarity:  I can mount the array without issues and copying all files
>> and directories to /dev/zero completes without any errors in the logs.
>>
>> Any suggestions on how to salvage the array would be highly
>> appreciated as I'm out of options/ideas for this. I do have a backup
>> of the important bits, but still restoring it will take time.
>>
>> The information of the system:
>>
>> --
>>
>> Linux-kernel: 4.4.6 (Slackware)
>> btrfs-progs v4.5.3
>>
>> [root@quasar:~] # btrfs fi show
>> Label: 'btr_pool2'  uuid: 7c9b2b91-1e89-45fe-8726-91a97663bb5c
>> Total devices 7 FS bytes used 9.97TiB
>> devid3 size 3.64TiB used 3.34TiB path /dev/sdh
>> devid4 size