Bug#910074: linux-image-4.9.0-8-amd64: BTRFS data loss - kernel BUG at .../linux-4.9.110/fs/btrfs/ctree.c:3178

2018-10-08 Thread Nicholas D Steeves
On Wed, Oct 03, 2018 at 09:56:13AM +, Michael Firth wrote:
> > 
> > As far as I know, if 'btrfs check' is clean then you're in the clear for 
> > any known
> > issues involving the fs structure.  Of course, a 'btrfs scrub' is necessary 
> > to
> > check for data and metadata corruption...  BTW, if you're using an ssd, make
> > sure you're mounting with -o nossd, because as far as I know linux-4.9.x 
> > still
> > hasn't been patched.
> > P.S. that requires a full rebalance to take effect.  Make up-to-date backups
> > before running that rebalance...
> 
> That is good to know. I will run a "scrub" on the partition soon to check for 
> any
> other issues. I am running on a VM on top of a hardware RAID array of
> spinning disks, so hopefully the SSD issue doesn't apply.

To find out:

  cat /sys/block/your_hardware_raid_block_device/queue/rotational

If it returns "0" then you're affected by the -o ssd bug, but if it
returns "1" then there is nothing to worry about. :-)  While this
issue will become obsolete when the fix is backported I'm curious to
learn if hardware RAID registers as nonrotational, so please let me
know.  I suspect it will register as nonrotational, because then the
kernel will let the RAID controller merge and reorder IO as it sees
fit.

> 
> There was a BTRFS patch in the update that became available an hour after my
> crash:
> 
> linux (4.9.110-3+deb9u5) stretch-security; urgency=high
> .
> .
> .
>   * btrfs: relocation: Only remove reloc rb_trees if reloc control has been
> initialized (CVE-2018-14609)
> 
> I'm not sure if that is a Debian specific patch or whether it is a Debian 
> specific 
> merge from another version

Oh Nice!  I'm really happy to see this.  Thank you Ben and kernel team!

> > I wasn't able to find status of the second one wrt linux-4.9.x.
> 
> Though the description is similar, I don't think patch 972419 is actually 
> either of
> the two patches referenced from that mail. I think I will ask on the BTRFS 
> mailing
> list what the current status of all of these patches is.

Thanks.

> > In principle I agree; although I think it would be safer to coordinate with 
> > Greg
> > Kroah-Harman about getting them applied upstream before importing them
> > into Debian, since (afaik) we don't have any btrfs specialists working on 
> > our
> > kernel...people who would know if importing one of these patches will
> > introduce unintended side-effects or a rabbit hole of patches.  Maybe it
> > would be safer to look at the delta between btrfs in 4.9.x and 4.14.x and 
> > ask
> > for backported fixes from 4.14.x to 4.9.x? (eg: more than six months of
> > testing in 4.14.x, like the -o ssd bug that is still present in 4.9.x)
> > 
> I agree with this, and with Hans's comment that because it isn't a Debian 
> specific
> issue it should be handled upstream. I guess the question comes partly from 
> not
> knowing if/how/when upstream V4.9.X kernel changes are merged into the Debian
> Stretch kernel.

The version number is a hint.  Looking at the changelog for the
package, you'll see stretch released with 4.9.30-2, was updated with
security fixes in 4.9.30-2+deb9u1, was updated from upstream LTS in
4.9.47-1, etc, and is now at 4.9.110-3+deb9u6.


Cheers,
Nicholas



Bug#910074: linux-image-4.9.0-8-amd64: BTRFS data loss - kernel BUG at .../linux-4.9.110/fs/btrfs/ctree.c:3178

2018-10-08 Thread Nicholas D Steeves
Hi Hans,

On Wed, Oct 03, 2018 at 12:05:31AM +0200, Hans van Kranenburg wrote:
> Hi,
> 
> On 10/02/2018 10:08 PM, Nicholas D Steeves wrote:
> > Hi Michael,
> > 
> > On Tue, Oct 02, 2018 at 11:37:45AM +0100, Michael Firth wrote:
> >>
> >> BTRFS may not be a filesystem that everyone uses, but I feel if it is in
> >> the Debian kernel then bugs that can cause data loss should be fixed if
> >> a patch already exists.
> > 
> > In principle I agree; although I think it would be safer to coordinate
> > with Greg Kroah-Harman about getting them applied upstream before
> > importing them into Debian, since (afaik)
> 
> > we don't have any btrfs
> > specialists working on our kernel...people who would know if importing
> > one of these patches will introduce unintended side-effects or a
> > rabbit hole of patches.
> 
> This is not a debian specific issue. The upstream btrfs team does not
> have enough work capacity to do this, and mainly focuses on going
> forward instead of looking back. And I don't think there's really
> someone who would know the things mentioned above except for the authors
> of the patches themselves (who tag them for stable if it's data
> corruption and if they know it will work (tm)), or the btrfs maintainer
> who knows which ones to put together in which order to prepare the next
> kernel release.

Agreed!  Also, acknowledging when issues aren't Debian-specific and
then working with upstream so everyone can benefit is one reason we
have a great reputation for giving back to the larger community :-)

> 
> >  Maybe it would be safer to look at the delta
> > between btrfs in 4.9.x and 4.14.x and ask for backported fixes from
> > 4.14.x to 4.9.x? (eg: more than six months of testing in 4.14.x, like
> > the -o ssd bug that is still present in 4.9.x)
> 
> For the -o ssd issue, in hindsight, it was a mistake to not get that
> into 4.9 earlier.
> 
> Every user who wants to try out btrfs on his/her computer with Stretch
> and uses it as the root filesystem on a disk which is not too large is
> still affected by this sub-optimal behaviour.
> 
> So I guess that's a TODO for me, to still get it done now. It's 951e7966
> and 583b723151 with a few small changes to make it apply. At least it
> has had enough testing, and the amount of users with out-of-space
> filesystems has decreased notably in the last year in #btrfs IRC. :)

Thank you!  I updated our wiki page within a week of learning about
the patch, but in the future would you prefer if I file a bug?  I
don't imagine it will be more than two bugs a year ;-)  Btw, would you
please forward the bug to me for the 951e7966 and 583b723151 backport?

Sincerely,
Nicholas


signature.asc
Description: PGP signature


Bug#910074: linux-image-4.9.0-8-amd64: BTRFS data loss - kernel BUG at .../linux-4.9.110/fs/btrfs/ctree.c:3178

2018-10-03 Thread Michael Firth
Hi,

> -Original Message-
> 
> Hi Michael,
> 
> On Tue, Oct 02, 2018 at 11:37:45AM +0100, Michael Firth wrote:
> >
> > After this, there was a file that was errored on the filesystem (as
> > reported by 'btrfs check'), and it seems BTRFS doesn't have any tools
> > to resolve the error. Deleting the file at the reported inode has
> > cleared the error from 'btrfs check', but I am not 100% sure that will
> > have fixed all the corruption.
> >
> 
> As far as I know, if 'btrfs check' is clean then you're in the clear for any 
> known
> issues involving the fs structure.  Of course, a 'btrfs scrub' is necessary to
> check for data and metadata corruption...  BTW, if you're using an ssd, make
> sure you're mounting with -o nossd, because as far as I know linux-4.9.x still
> hasn't been patched.
> P.S. that requires a full rebalance to take effect.  Make up-to-date backups
> before running that rebalance...

That is good to know. I will run a "scrub" on the partition soon to check for 
any
other issues. I am running on a VM on top of a hardware RAID array of
spinning disks, so hopefully the SSD issue doesn't apply.

> 
> > This issue looks very like the bug described at:
> >
> > https://www.spinics.net/lists/linux-btrfs/msg60984.html
> >
> > And in bug report #708509 for a much older kernel (from 2013)
> >
> > According to the BTRFS mailing list post above, there are patches
> > submitted to fix this issue (or one with the same symptoms).
> > Is there any way to easily determine if these patches are in the
> > Debian version of the V4.9.110 kernel?
> 
> The last time I checked I couldn't find any btrfs-specific ones in Debian; I 
> used
> apt-get source and expected to find a quilt series.

There was a BTRFS patch in the update that became available an hour after my
crash:

linux (4.9.110-3+deb9u5) stretch-security; urgency=high
.
.
.
  * btrfs: relocation: Only remove reloc rb_trees if reloc control has been
initialized (CVE-2018-14609)

I'm not sure if that is a Debian specific patch or whether it is a Debian 
specific 
merge from another version

> 
> > If not, what is the route to get these patches incorporated? Do I need
> > to talk to the BTRFS people about getting the patches in to the stock
> > V4.9 kernel, or is this something that the Debian team would apply
> > directly?
> 
> The first of the two patches from that 29 Nov 2016 linux-btrfs email appears
> to be queued for linux-4.9.119:
>   https://lore.kernel.org/patchwork/patch/972419/
> 

So I guess the related question that I should have asked is whether there is
information on how upstream changes are merged into the Debian kernel, and
what the likely delay between (for example) the 4.9.119 mainline kernel being
released, and the Debian version following it?

> I wasn't able to find status of the second one wrt linux-4.9.x.

Though the description is similar, I don't think patch 972419 is actually 
either of
the two patches referenced from that mail. I think I will ask on the BTRFS 
mailing
list what the current status of all of these patches is.

> 
> > Kernel bug report output included below, in case it is useful.
> >
> > BTRFS may not be a filesystem that everyone uses, but I feel if it is
> > in the Debian kernel then bugs that can cause data loss should be
> > fixed if a patch already exists.
> 
> In principle I agree; although I think it would be safer to coordinate with 
> Greg
> Kroah-Harman about getting them applied upstream before importing them
> into Debian, since (afaik) we don't have any btrfs specialists working on our
> kernel...people who would know if importing one of these patches will
> introduce unintended side-effects or a rabbit hole of patches.  Maybe it
> would be safer to look at the delta between btrfs in 4.9.x and 4.14.x and ask
> for backported fixes from 4.14.x to 4.9.x? (eg: more than six months of
> testing in 4.14.x, like the -o ssd bug that is still present in 4.9.x)
> 
I agree with this, and with Hans's comment that because it isn't a Debian 
specific
issue it should be handled upstream. I guess the question comes partly from not
knowing if/how/when upstream V4.9.X kernel changes are merged into the Debian
Stretch kernel.

> Cheers,
> Nicholas


Regards

Michael



Bug#910074: linux-image-4.9.0-8-amd64: BTRFS data loss - kernel BUG at .../linux-4.9.110/fs/btrfs/ctree.c:3178

2018-10-02 Thread Hans van Kranenburg
Hi,

On 10/02/2018 10:08 PM, Nicholas D Steeves wrote:
> Hi Michael,
> 
> On Tue, Oct 02, 2018 at 11:37:45AM +0100, Michael Firth wrote:
>>
>> After this, there was a file that was errored on the filesystem (as
>> reported by 'btrfs check'), and it seems BTRFS doesn't have any tools to
>> resolve the error. Deleting the file at the reported inode has cleared
>> the error from 'btrfs check', but I am not 100% sure that will have
>> fixed all the corruption.
>>
> 
> As far as I know, if 'btrfs check' is clean then you're in the clear
> for any known issues involving the fs structure.  Of course, a 'btrfs
> scrub' is necessary to check for data and metadata corruption...  BTW,
> if you're using an ssd, make sure you're mounting with -o nossd,
> because as far as I know linux-4.9.x still hasn't been patched.
> P.S. that requires a full rebalance to take effect.  Make up-to-date
> backups before running that rebalance...
> 
>> This issue looks very like the bug described at:
>>
>> https://www.spinics.net/lists/linux-btrfs/msg60984.html
>>
>> And in bug report #708509 for a much older kernel (from 2013)
>>
>> According to the BTRFS mailing list post above, there are patches
>> submitted to fix this issue (or one with the same symptoms).
>> Is there any way to easily determine if these patches are in the Debian
>> version of the V4.9.110 kernel?
> 
> The last time I checked I couldn't find any btrfs-specific ones in
> Debian; I used apt-get source and expected to find a quilt series.
> 
>> If not, what is the route to get these patches incorporated? Do I need
>> to talk to the BTRFS people about getting the patches in to the stock
>> V4.9 kernel, or is this something that the Debian team would apply
>> directly?
> 
> The first of the two patches from that 29 Nov 2016 linux-btrfs email
> appears to be queued for linux-4.9.119:
>   https://lore.kernel.org/patchwork/patch/972419/
> 
> I wasn't able to find status of the second one wrt linux-4.9.x.
> 
>> Kernel bug report output included below, in case it is useful.
>>
>> BTRFS may not be a filesystem that everyone uses, but I feel if it is in
>> the Debian kernel then bugs that can cause data loss should be fixed if
>> a patch already exists.
> 
> In principle I agree; although I think it would be safer to coordinate
> with Greg Kroah-Harman about getting them applied upstream before
> importing them into Debian, since (afaik)

> we don't have any btrfs
> specialists working on our kernel...people who would know if importing
> one of these patches will introduce unintended side-effects or a
> rabbit hole of patches.

This is not a debian specific issue. The upstream btrfs team does not
have enough work capacity to do this, and mainly focuses on going
forward instead of looking back. And I don't think there's really
someone who would know the things mentioned above except for the authors
of the patches themselves (who tag them for stable if it's data
corruption and if they know it will work (tm)), or the btrfs maintainer
who knows which ones to put together in which order to prepare the next
kernel release.

>  Maybe it would be safer to look at the delta
> between btrfs in 4.9.x and 4.14.x and ask for backported fixes from
> 4.14.x to 4.9.x? (eg: more than six months of testing in 4.14.x, like
> the -o ssd bug that is still present in 4.9.x)

For the -o ssd issue, in hindsight, it was a mistake to not get that
into 4.9 earlier.

Every user who wants to try out btrfs on his/her computer with Stretch
and uses it as the root filesystem on a disk which is not too large is
still affected by this sub-optimal behaviour.

So I guess that's a TODO for me, to still get it done now. It's 951e7966
and 583b723151 with a few small changes to make it apply. At least it
has had enough testing, and the amount of users with out-of-space
filesystems has decreased notably in the last year in #btrfs IRC. :)

Hans



signature.asc
Description: OpenPGP digital signature


Bug#910074: linux-image-4.9.0-8-amd64: BTRFS data loss - kernel BUG at .../linux-4.9.110/fs/btrfs/ctree.c:3178

2018-10-02 Thread Nicholas D Steeves
Hi Michael,

On Tue, Oct 02, 2018 at 11:37:45AM +0100, Michael Firth wrote:
> 
> After this, there was a file that was errored on the filesystem (as
> reported by 'btrfs check'), and it seems BTRFS doesn't have any tools to
> resolve the error. Deleting the file at the reported inode has cleared
> the error from 'btrfs check', but I am not 100% sure that will have
> fixed all the corruption.
>

As far as I know, if 'btrfs check' is clean then you're in the clear
for any known issues involving the fs structure.  Of course, a 'btrfs
scrub' is necessary to check for data and metadata corruption...  BTW,
if you're using an ssd, make sure you're mounting with -o nossd,
because as far as I know linux-4.9.x still hasn't been patched.
P.S. that requires a full rebalance to take effect.  Make up-to-date
backups before running that rebalance...

> This issue looks very like the bug described at:
> 
> https://www.spinics.net/lists/linux-btrfs/msg60984.html
> 
> And in bug report #708509 for a much older kernel (from 2013)
> 
> According to the BTRFS mailing list post above, there are patches
> submitted to fix this issue (or one with the same symptoms).
> Is there any way to easily determine if these patches are in the Debian
> version of the V4.9.110 kernel?

The last time I checked I couldn't find any btrfs-specific ones in
Debian; I used apt-get source and expected to find a quilt series.

> If not, what is the route to get these patches incorporated? Do I need
> to talk to the BTRFS people about getting the patches in to the stock
> V4.9 kernel, or is this something that the Debian team would apply
> directly?

The first of the two patches from that 29 Nov 2016 linux-btrfs email
appears to be queued for linux-4.9.119:
  https://lore.kernel.org/patchwork/patch/972419/

I wasn't able to find status of the second one wrt linux-4.9.x.

> Kernel bug report output included below, in case it is useful.
> 
> BTRFS may not be a filesystem that everyone uses, but I feel if it is in
> the Debian kernel then bugs that can cause data loss should be fixed if
> a patch already exists.

In principle I agree; although I think it would be safer to coordinate
with Greg Kroah-Harman about getting them applied upstream before
importing them into Debian, since (afaik) we don't have any btrfs
specialists working on our kernel...people who would know if importing
one of these patches will introduce unintended side-effects or a
rabbit hole of patches.  Maybe it would be safer to look at the delta
between btrfs in 4.9.x and 4.14.x and ask for backported fixes from
4.14.x to 4.9.x? (eg: more than six months of testing in 4.14.x, like
the -o ssd bug that is still present in 4.9.x)

Cheers,
Nicholas


signature.asc
Description: PGP signature


Bug#910074: linux-image-4.9.0-8-amd64: BTRFS data loss - kernel BUG at .../linux-4.9.110/fs/btrfs/ctree.c:3178

2018-10-02 Thread Michael Firth
Package: src:linux
Version: 4.9.110-3+deb9u4
Severity: important

Dear Maintainer,

While extracting files over NFS into a BTRFS file system a kernel bug
report was triggered. I don't believe the bug is related to the fact
that the access was over NFS, but should declare that just in case.

After this, there was a file that was errored on the filesystem (as
reported by 'btrfs check'), and it seems BTRFS doesn't have any tools to
resolve the error. Deleting the file at the reported inode has cleared
the error from 'btrfs check', but I am not 100% sure that will have
fixed all the corruption.

This issue looks very like the bug described at:

https://www.spinics.net/lists/linux-btrfs/msg60984.html

And in bug report #708509 for a much older kernel (from 2013)

According to the BTRFS mailing list post above, there are patches
submitted to fix this issue (or one with the same symptoms).
Is there any way to easily determine if these patches are in the Debian
version of the V4.9.110 kernel?

If not, what is the route to get these patches incorporated? Do I need
to talk to the BTRFS people about getting the patches in to the stock
V4.9 kernel, or is this something that the Debian team would apply
directly?

Kernel bug report output included below, in case it is useful.

BTRFS may not be a filesystem that everyone uses, but I feel if it is in
the Debian kernel then bugs that can cause data loss should be fixed if
a patch already exists.

Thanks

Michael

-- Package-specific info:
** Version:
Linux version 4.9.0-8-amd64 (debian-ker...@lists.debian.org) (gcc version 6.3.0 
20170516 (Debian 6.3.0-18+deb9u1) ) #1 SMP Debian 4.9.110-3+deb9u4 (2018-08-21)

** Command line:
BOOT_IMAGE=/vmlinuz-4.9.0-8-amd64 root=/dev/mapper/SysVG-RootVol ro 
net.ifnames=0 biosdevname=0 quiet

** Not tainted

** Kernel log:

Oct  1 15:33:26 xxx kernel: [522324.107309] [ cut here ]
Oct  1 15:33:26 xxx kernel: [522324.108854] kernel BUG at 
/build/linux-AcJpTp/linux-4.9.110/fs/btrfs/ctree.c:3178!
Oct  1 15:33:26 xxx kernel: [522324.110432] invalid opcode:  [#1] SMP
Oct  1 15:33:26 xxx kernel: [522324.111945] Modules linked in: nfsv3 
rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache evdev pcspkr joydev edac_core 
crct10dif_pclmul crc32_pclmul serio_raw ghash_clmulni_intel vmw_balloon 
intel_rapl_perf sg vmwgfx ttm shpchp drm_kms_helper drm ac button 
vmw_vsock_vmci_transport vsock vmw_vmci nfsd nfs_acl lockd grace drbd lru_cache 
libcrc32c auth_rpcgss oid_registry sunrpc ip_tables x_tables autofs4 ext4 crc16 
jbd2 fscrypto ecb mbcache btrfs crc32c_generic xor raid6_pq hid_generic usbhid 
hid dm_mod sr_mod cdrom sd_mod ata_generic crc32c_intel aesni_intel aes_x86_64 
glue_helper lrw gf128mul ablk_helper cryptd psmouse ahci ehci_pci ata_piix 
libahci uhci_hcd libata vmw_pvscsi ehci_hcd vmxnet3 usbcore scsi_mod usb_common 
i2c_piix4
Oct  1 15:33:26 xxx kernel: [522324.120155] CPU: 1 PID: 835 Comm: nfsd Not 
tainted 4.9.0-8-amd64 #1 Debian 4.9.110-3+deb9u4
Oct  1 15:33:26 xxx kernel: [522324.120742] Hardware name: VMware, Inc. VMware 
Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/28/2017
Oct  1 15:33:26 xxx kernel: [522324.121906] task: 8dbc26c48440 task.stack: 
a7fac23e4000
Oct  1 15:33:26 xxx kernel: [522324.122485] RIP: 0010:[]  
[] btrfs_set_item_key_safe+0x182/0x190 [btrfs]
Oct  1 15:33:26 xxx kernel: [522324.123667] RSP: 0018:a7fac23e7568  EFLAGS: 
00010246
Oct  1 15:33:26 xxx kernel: [522324.124243] RAX:  RBX: 
8dbaf6432070 RCX: 0001
Oct  1 15:33:26 xxx kernel: [522324.124818] RDX:  RSI: 
a7fac23e767e RDI: a7fac23e757f
Oct  1 15:33:26 xxx kernel: [522324.125382] RBP: a7fac23e756e R08: 
8dba8aa280e0 R09: 1000
Oct  1 15:33:26 xxx kernel: [522324.125940] R10:  R11: 
0003 R12: 8dbac17f2000
Oct  1 15:33:26 xxx kernel: [522324.126502] R13: 000f R14: 
8dba8aa28040 R15: a7fac23e767e
Oct  1 15:33:26 xxx kernel: [522324.127052] FS: () 
GS:8dbc3fc4() knlGS:
Oct  1 15:33:26 xxx kernel: [522324.127585] CS:  0010 DS:  ES:  CR0: 
80050033
Oct  1 15:33:26 xxx kernel: [522324.128049] CR2: 7fd54fd9eab4 CR3: 
00041ae02000 CR4: 00760670
Oct  1 15:33:26 xxx kernel: [522324.128539] DR0:  DR1: 
 DR2: 
Oct  1 15:33:26 xxx kernel: [522324.129002] DR3:  DR6: 
fffe0ff0 DR7: 0400
Oct  1 15:33:26 xxx kernel: [522324.129451] PKRU: 5554
Oct  1 15:33:26 xxx kernel: [522324.129904] Stack:
Oct  1 15:33:26 xxx kernel: [522324.130334]  c850002d 006c0027 
5100 6c0027c8
Oct  1 15:33:26 xxx kernel: [522324.130766]  0001 1d91fb7ce01d2bcc 
8dbaf6432070 8dba8aa28040
Oct  1 15:33:26 xxx kernel: [522324.131196]  3a60 0001 
00014000