Bug#910074: linux-image-4.9.0-8-amd64: BTRFS data loss - kernel BUG at .../linux-4.9.110/fs/btrfs/ctree.c:3178
On Wed, Oct 03, 2018 at 09:56:13AM +, Michael Firth wrote: > > > > As far as I know, if 'btrfs check' is clean then you're in the clear for > > any known > > issues involving the fs structure. Of course, a 'btrfs scrub' is necessary > > to > > check for data and metadata corruption... BTW, if you're using an ssd, make > > sure you're mounting with -o nossd, because as far as I know linux-4.9.x > > still > > hasn't been patched. > > P.S. that requires a full rebalance to take effect. Make up-to-date backups > > before running that rebalance... > > That is good to know. I will run a "scrub" on the partition soon to check for > any > other issues. I am running on a VM on top of a hardware RAID array of > spinning disks, so hopefully the SSD issue doesn't apply. To find out: cat /sys/block/your_hardware_raid_block_device/queue/rotational If it returns "0" then you're affected by the -o ssd bug, but if it returns "1" then there is nothing to worry about. :-) While this issue will become obsolete when the fix is backported I'm curious to learn if hardware RAID registers as nonrotational, so please let me know. I suspect it will register as nonrotational, because then the kernel will let the RAID controller merge and reorder IO as it sees fit. > > There was a BTRFS patch in the update that became available an hour after my > crash: > > linux (4.9.110-3+deb9u5) stretch-security; urgency=high > . > . > . > * btrfs: relocation: Only remove reloc rb_trees if reloc control has been > initialized (CVE-2018-14609) > > I'm not sure if that is a Debian specific patch or whether it is a Debian > specific > merge from another version Oh Nice! I'm really happy to see this. Thank you Ben and kernel team! > > I wasn't able to find status of the second one wrt linux-4.9.x. > > Though the description is similar, I don't think patch 972419 is actually > either of > the two patches referenced from that mail. I think I will ask on the BTRFS > mailing > list what the current status of all of these patches is. Thanks. > > In principle I agree; although I think it would be safer to coordinate with > > Greg > > Kroah-Harman about getting them applied upstream before importing them > > into Debian, since (afaik) we don't have any btrfs specialists working on > > our > > kernel...people who would know if importing one of these patches will > > introduce unintended side-effects or a rabbit hole of patches. Maybe it > > would be safer to look at the delta between btrfs in 4.9.x and 4.14.x and > > ask > > for backported fixes from 4.14.x to 4.9.x? (eg: more than six months of > > testing in 4.14.x, like the -o ssd bug that is still present in 4.9.x) > > > I agree with this, and with Hans's comment that because it isn't a Debian > specific > issue it should be handled upstream. I guess the question comes partly from > not > knowing if/how/when upstream V4.9.X kernel changes are merged into the Debian > Stretch kernel. The version number is a hint. Looking at the changelog for the package, you'll see stretch released with 4.9.30-2, was updated with security fixes in 4.9.30-2+deb9u1, was updated from upstream LTS in 4.9.47-1, etc, and is now at 4.9.110-3+deb9u6. Cheers, Nicholas
Bug#910074: linux-image-4.9.0-8-amd64: BTRFS data loss - kernel BUG at .../linux-4.9.110/fs/btrfs/ctree.c:3178
Hi Hans, On Wed, Oct 03, 2018 at 12:05:31AM +0200, Hans van Kranenburg wrote: > Hi, > > On 10/02/2018 10:08 PM, Nicholas D Steeves wrote: > > Hi Michael, > > > > On Tue, Oct 02, 2018 at 11:37:45AM +0100, Michael Firth wrote: > >> > >> BTRFS may not be a filesystem that everyone uses, but I feel if it is in > >> the Debian kernel then bugs that can cause data loss should be fixed if > >> a patch already exists. > > > > In principle I agree; although I think it would be safer to coordinate > > with Greg Kroah-Harman about getting them applied upstream before > > importing them into Debian, since (afaik) > > > we don't have any btrfs > > specialists working on our kernel...people who would know if importing > > one of these patches will introduce unintended side-effects or a > > rabbit hole of patches. > > This is not a debian specific issue. The upstream btrfs team does not > have enough work capacity to do this, and mainly focuses on going > forward instead of looking back. And I don't think there's really > someone who would know the things mentioned above except for the authors > of the patches themselves (who tag them for stable if it's data > corruption and if they know it will work (tm)), or the btrfs maintainer > who knows which ones to put together in which order to prepare the next > kernel release. Agreed! Also, acknowledging when issues aren't Debian-specific and then working with upstream so everyone can benefit is one reason we have a great reputation for giving back to the larger community :-) > > > Maybe it would be safer to look at the delta > > between btrfs in 4.9.x and 4.14.x and ask for backported fixes from > > 4.14.x to 4.9.x? (eg: more than six months of testing in 4.14.x, like > > the -o ssd bug that is still present in 4.9.x) > > For the -o ssd issue, in hindsight, it was a mistake to not get that > into 4.9 earlier. > > Every user who wants to try out btrfs on his/her computer with Stretch > and uses it as the root filesystem on a disk which is not too large is > still affected by this sub-optimal behaviour. > > So I guess that's a TODO for me, to still get it done now. It's 951e7966 > and 583b723151 with a few small changes to make it apply. At least it > has had enough testing, and the amount of users with out-of-space > filesystems has decreased notably in the last year in #btrfs IRC. :) Thank you! I updated our wiki page within a week of learning about the patch, but in the future would you prefer if I file a bug? I don't imagine it will be more than two bugs a year ;-) Btw, would you please forward the bug to me for the 951e7966 and 583b723151 backport? Sincerely, Nicholas signature.asc Description: PGP signature
Bug#910074: linux-image-4.9.0-8-amd64: BTRFS data loss - kernel BUG at .../linux-4.9.110/fs/btrfs/ctree.c:3178
Hi, > -Original Message- > > Hi Michael, > > On Tue, Oct 02, 2018 at 11:37:45AM +0100, Michael Firth wrote: > > > > After this, there was a file that was errored on the filesystem (as > > reported by 'btrfs check'), and it seems BTRFS doesn't have any tools > > to resolve the error. Deleting the file at the reported inode has > > cleared the error from 'btrfs check', but I am not 100% sure that will > > have fixed all the corruption. > > > > As far as I know, if 'btrfs check' is clean then you're in the clear for any > known > issues involving the fs structure. Of course, a 'btrfs scrub' is necessary to > check for data and metadata corruption... BTW, if you're using an ssd, make > sure you're mounting with -o nossd, because as far as I know linux-4.9.x still > hasn't been patched. > P.S. that requires a full rebalance to take effect. Make up-to-date backups > before running that rebalance... That is good to know. I will run a "scrub" on the partition soon to check for any other issues. I am running on a VM on top of a hardware RAID array of spinning disks, so hopefully the SSD issue doesn't apply. > > > This issue looks very like the bug described at: > > > > https://www.spinics.net/lists/linux-btrfs/msg60984.html > > > > And in bug report #708509 for a much older kernel (from 2013) > > > > According to the BTRFS mailing list post above, there are patches > > submitted to fix this issue (or one with the same symptoms). > > Is there any way to easily determine if these patches are in the > > Debian version of the V4.9.110 kernel? > > The last time I checked I couldn't find any btrfs-specific ones in Debian; I > used > apt-get source and expected to find a quilt series. There was a BTRFS patch in the update that became available an hour after my crash: linux (4.9.110-3+deb9u5) stretch-security; urgency=high . . . * btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized (CVE-2018-14609) I'm not sure if that is a Debian specific patch or whether it is a Debian specific merge from another version > > > If not, what is the route to get these patches incorporated? Do I need > > to talk to the BTRFS people about getting the patches in to the stock > > V4.9 kernel, or is this something that the Debian team would apply > > directly? > > The first of the two patches from that 29 Nov 2016 linux-btrfs email appears > to be queued for linux-4.9.119: > https://lore.kernel.org/patchwork/patch/972419/ > So I guess the related question that I should have asked is whether there is information on how upstream changes are merged into the Debian kernel, and what the likely delay between (for example) the 4.9.119 mainline kernel being released, and the Debian version following it? > I wasn't able to find status of the second one wrt linux-4.9.x. Though the description is similar, I don't think patch 972419 is actually either of the two patches referenced from that mail. I think I will ask on the BTRFS mailing list what the current status of all of these patches is. > > > Kernel bug report output included below, in case it is useful. > > > > BTRFS may not be a filesystem that everyone uses, but I feel if it is > > in the Debian kernel then bugs that can cause data loss should be > > fixed if a patch already exists. > > In principle I agree; although I think it would be safer to coordinate with > Greg > Kroah-Harman about getting them applied upstream before importing them > into Debian, since (afaik) we don't have any btrfs specialists working on our > kernel...people who would know if importing one of these patches will > introduce unintended side-effects or a rabbit hole of patches. Maybe it > would be safer to look at the delta between btrfs in 4.9.x and 4.14.x and ask > for backported fixes from 4.14.x to 4.9.x? (eg: more than six months of > testing in 4.14.x, like the -o ssd bug that is still present in 4.9.x) > I agree with this, and with Hans's comment that because it isn't a Debian specific issue it should be handled upstream. I guess the question comes partly from not knowing if/how/when upstream V4.9.X kernel changes are merged into the Debian Stretch kernel. > Cheers, > Nicholas Regards Michael
Bug#910074: linux-image-4.9.0-8-amd64: BTRFS data loss - kernel BUG at .../linux-4.9.110/fs/btrfs/ctree.c:3178
Hi, On 10/02/2018 10:08 PM, Nicholas D Steeves wrote: > Hi Michael, > > On Tue, Oct 02, 2018 at 11:37:45AM +0100, Michael Firth wrote: >> >> After this, there was a file that was errored on the filesystem (as >> reported by 'btrfs check'), and it seems BTRFS doesn't have any tools to >> resolve the error. Deleting the file at the reported inode has cleared >> the error from 'btrfs check', but I am not 100% sure that will have >> fixed all the corruption. >> > > As far as I know, if 'btrfs check' is clean then you're in the clear > for any known issues involving the fs structure. Of course, a 'btrfs > scrub' is necessary to check for data and metadata corruption... BTW, > if you're using an ssd, make sure you're mounting with -o nossd, > because as far as I know linux-4.9.x still hasn't been patched. > P.S. that requires a full rebalance to take effect. Make up-to-date > backups before running that rebalance... > >> This issue looks very like the bug described at: >> >> https://www.spinics.net/lists/linux-btrfs/msg60984.html >> >> And in bug report #708509 for a much older kernel (from 2013) >> >> According to the BTRFS mailing list post above, there are patches >> submitted to fix this issue (or one with the same symptoms). >> Is there any way to easily determine if these patches are in the Debian >> version of the V4.9.110 kernel? > > The last time I checked I couldn't find any btrfs-specific ones in > Debian; I used apt-get source and expected to find a quilt series. > >> If not, what is the route to get these patches incorporated? Do I need >> to talk to the BTRFS people about getting the patches in to the stock >> V4.9 kernel, or is this something that the Debian team would apply >> directly? > > The first of the two patches from that 29 Nov 2016 linux-btrfs email > appears to be queued for linux-4.9.119: > https://lore.kernel.org/patchwork/patch/972419/ > > I wasn't able to find status of the second one wrt linux-4.9.x. > >> Kernel bug report output included below, in case it is useful. >> >> BTRFS may not be a filesystem that everyone uses, but I feel if it is in >> the Debian kernel then bugs that can cause data loss should be fixed if >> a patch already exists. > > In principle I agree; although I think it would be safer to coordinate > with Greg Kroah-Harman about getting them applied upstream before > importing them into Debian, since (afaik) > we don't have any btrfs > specialists working on our kernel...people who would know if importing > one of these patches will introduce unintended side-effects or a > rabbit hole of patches. This is not a debian specific issue. The upstream btrfs team does not have enough work capacity to do this, and mainly focuses on going forward instead of looking back. And I don't think there's really someone who would know the things mentioned above except for the authors of the patches themselves (who tag them for stable if it's data corruption and if they know it will work (tm)), or the btrfs maintainer who knows which ones to put together in which order to prepare the next kernel release. > Maybe it would be safer to look at the delta > between btrfs in 4.9.x and 4.14.x and ask for backported fixes from > 4.14.x to 4.9.x? (eg: more than six months of testing in 4.14.x, like > the -o ssd bug that is still present in 4.9.x) For the -o ssd issue, in hindsight, it was a mistake to not get that into 4.9 earlier. Every user who wants to try out btrfs on his/her computer with Stretch and uses it as the root filesystem on a disk which is not too large is still affected by this sub-optimal behaviour. So I guess that's a TODO for me, to still get it done now. It's 951e7966 and 583b723151 with a few small changes to make it apply. At least it has had enough testing, and the amount of users with out-of-space filesystems has decreased notably in the last year in #btrfs IRC. :) Hans signature.asc Description: OpenPGP digital signature
Bug#910074: linux-image-4.9.0-8-amd64: BTRFS data loss - kernel BUG at .../linux-4.9.110/fs/btrfs/ctree.c:3178
Hi Michael, On Tue, Oct 02, 2018 at 11:37:45AM +0100, Michael Firth wrote: > > After this, there was a file that was errored on the filesystem (as > reported by 'btrfs check'), and it seems BTRFS doesn't have any tools to > resolve the error. Deleting the file at the reported inode has cleared > the error from 'btrfs check', but I am not 100% sure that will have > fixed all the corruption. > As far as I know, if 'btrfs check' is clean then you're in the clear for any known issues involving the fs structure. Of course, a 'btrfs scrub' is necessary to check for data and metadata corruption... BTW, if you're using an ssd, make sure you're mounting with -o nossd, because as far as I know linux-4.9.x still hasn't been patched. P.S. that requires a full rebalance to take effect. Make up-to-date backups before running that rebalance... > This issue looks very like the bug described at: > > https://www.spinics.net/lists/linux-btrfs/msg60984.html > > And in bug report #708509 for a much older kernel (from 2013) > > According to the BTRFS mailing list post above, there are patches > submitted to fix this issue (or one with the same symptoms). > Is there any way to easily determine if these patches are in the Debian > version of the V4.9.110 kernel? The last time I checked I couldn't find any btrfs-specific ones in Debian; I used apt-get source and expected to find a quilt series. > If not, what is the route to get these patches incorporated? Do I need > to talk to the BTRFS people about getting the patches in to the stock > V4.9 kernel, or is this something that the Debian team would apply > directly? The first of the two patches from that 29 Nov 2016 linux-btrfs email appears to be queued for linux-4.9.119: https://lore.kernel.org/patchwork/patch/972419/ I wasn't able to find status of the second one wrt linux-4.9.x. > Kernel bug report output included below, in case it is useful. > > BTRFS may not be a filesystem that everyone uses, but I feel if it is in > the Debian kernel then bugs that can cause data loss should be fixed if > a patch already exists. In principle I agree; although I think it would be safer to coordinate with Greg Kroah-Harman about getting them applied upstream before importing them into Debian, since (afaik) we don't have any btrfs specialists working on our kernel...people who would know if importing one of these patches will introduce unintended side-effects or a rabbit hole of patches. Maybe it would be safer to look at the delta between btrfs in 4.9.x and 4.14.x and ask for backported fixes from 4.14.x to 4.9.x? (eg: more than six months of testing in 4.14.x, like the -o ssd bug that is still present in 4.9.x) Cheers, Nicholas signature.asc Description: PGP signature
Bug#910074: linux-image-4.9.0-8-amd64: BTRFS data loss - kernel BUG at .../linux-4.9.110/fs/btrfs/ctree.c:3178
Package: src:linux Version: 4.9.110-3+deb9u4 Severity: important Dear Maintainer, While extracting files over NFS into a BTRFS file system a kernel bug report was triggered. I don't believe the bug is related to the fact that the access was over NFS, but should declare that just in case. After this, there was a file that was errored on the filesystem (as reported by 'btrfs check'), and it seems BTRFS doesn't have any tools to resolve the error. Deleting the file at the reported inode has cleared the error from 'btrfs check', but I am not 100% sure that will have fixed all the corruption. This issue looks very like the bug described at: https://www.spinics.net/lists/linux-btrfs/msg60984.html And in bug report #708509 for a much older kernel (from 2013) According to the BTRFS mailing list post above, there are patches submitted to fix this issue (or one with the same symptoms). Is there any way to easily determine if these patches are in the Debian version of the V4.9.110 kernel? If not, what is the route to get these patches incorporated? Do I need to talk to the BTRFS people about getting the patches in to the stock V4.9 kernel, or is this something that the Debian team would apply directly? Kernel bug report output included below, in case it is useful. BTRFS may not be a filesystem that everyone uses, but I feel if it is in the Debian kernel then bugs that can cause data loss should be fixed if a patch already exists. Thanks Michael -- Package-specific info: ** Version: Linux version 4.9.0-8-amd64 (debian-ker...@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ) #1 SMP Debian 4.9.110-3+deb9u4 (2018-08-21) ** Command line: BOOT_IMAGE=/vmlinuz-4.9.0-8-amd64 root=/dev/mapper/SysVG-RootVol ro net.ifnames=0 biosdevname=0 quiet ** Not tainted ** Kernel log: Oct 1 15:33:26 xxx kernel: [522324.107309] [ cut here ] Oct 1 15:33:26 xxx kernel: [522324.108854] kernel BUG at /build/linux-AcJpTp/linux-4.9.110/fs/btrfs/ctree.c:3178! Oct 1 15:33:26 xxx kernel: [522324.110432] invalid opcode: [#1] SMP Oct 1 15:33:26 xxx kernel: [522324.111945] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache evdev pcspkr joydev edac_core crct10dif_pclmul crc32_pclmul serio_raw ghash_clmulni_intel vmw_balloon intel_rapl_perf sg vmwgfx ttm shpchp drm_kms_helper drm ac button vmw_vsock_vmci_transport vsock vmw_vmci nfsd nfs_acl lockd grace drbd lru_cache libcrc32c auth_rpcgss oid_registry sunrpc ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache btrfs crc32c_generic xor raid6_pq hid_generic usbhid hid dm_mod sr_mod cdrom sd_mod ata_generic crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse ahci ehci_pci ata_piix libahci uhci_hcd libata vmw_pvscsi ehci_hcd vmxnet3 usbcore scsi_mod usb_common i2c_piix4 Oct 1 15:33:26 xxx kernel: [522324.120155] CPU: 1 PID: 835 Comm: nfsd Not tainted 4.9.0-8-amd64 #1 Debian 4.9.110-3+deb9u4 Oct 1 15:33:26 xxx kernel: [522324.120742] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/28/2017 Oct 1 15:33:26 xxx kernel: [522324.121906] task: 8dbc26c48440 task.stack: a7fac23e4000 Oct 1 15:33:26 xxx kernel: [522324.122485] RIP: 0010:[] [] btrfs_set_item_key_safe+0x182/0x190 [btrfs] Oct 1 15:33:26 xxx kernel: [522324.123667] RSP: 0018:a7fac23e7568 EFLAGS: 00010246 Oct 1 15:33:26 xxx kernel: [522324.124243] RAX: RBX: 8dbaf6432070 RCX: 0001 Oct 1 15:33:26 xxx kernel: [522324.124818] RDX: RSI: a7fac23e767e RDI: a7fac23e757f Oct 1 15:33:26 xxx kernel: [522324.125382] RBP: a7fac23e756e R08: 8dba8aa280e0 R09: 1000 Oct 1 15:33:26 xxx kernel: [522324.125940] R10: R11: 0003 R12: 8dbac17f2000 Oct 1 15:33:26 xxx kernel: [522324.126502] R13: 000f R14: 8dba8aa28040 R15: a7fac23e767e Oct 1 15:33:26 xxx kernel: [522324.127052] FS: () GS:8dbc3fc4() knlGS: Oct 1 15:33:26 xxx kernel: [522324.127585] CS: 0010 DS: ES: CR0: 80050033 Oct 1 15:33:26 xxx kernel: [522324.128049] CR2: 7fd54fd9eab4 CR3: 00041ae02000 CR4: 00760670 Oct 1 15:33:26 xxx kernel: [522324.128539] DR0: DR1: DR2: Oct 1 15:33:26 xxx kernel: [522324.129002] DR3: DR6: fffe0ff0 DR7: 0400 Oct 1 15:33:26 xxx kernel: [522324.129451] PKRU: 5554 Oct 1 15:33:26 xxx kernel: [522324.129904] Stack: Oct 1 15:33:26 xxx kernel: [522324.130334] c850002d 006c0027 5100 6c0027c8 Oct 1 15:33:26 xxx kernel: [522324.130766] 0001 1d91fb7ce01d2bcc 8dbaf6432070 8dba8aa28040 Oct 1 15:33:26 xxx kernel: [522324.131196] 3a60 0001 00014000