Re: Oops in btrfs_recover_relocation, kernel 4.8.1

2017-03-12 Thread Qu Wenruo



At 03/10/2017 08:23 PM, Hugo Mills wrote:

   Does anyone recall seeing this oops before? Is it something that
can be fixed with a newer kernel? (I'm on a USB stick for this, so a
new kernel is a major undertaking, and I'd like some reasonable
expectation of success if I do it).


Yes, v4.10 has the fix for the bug.


   Background: I'm rebuilding a dead server. I needed to reduce the
device count on this FS to 6. Stupidly, I attached one device using an
external USB case, and the USB connection reset during the device
delete (within a few seconds). I can mount the FS -o ro,recovery, but
using -o recovery on its own causes the oops below. If I can recover
in-place, that would save me a *lot* of time in restoring backups.

   Also... qgroups, WTH? I've *never* enabled qgroups on this FS.


Did you use any auto-backup system?
Quite a lot of them will enable qgroup.

Anyway, after mounting with v4.10 kernel, you can easily find out if 
qgroup is enabled.


Thanks,
Qu



   For what it's worth, the FS passes btrfs check --readonly with no
errors reported. (btrfs --version is 4.7.3).

   Hugo.

[  566.852589] BTRFS warning (device sdh1): 'recovery' is deprecated, use 
'usebackuproot' instead
[  566.852591] BTRFS info (device sdh1): trying to use backup root at mount time
[  566.852592] BTRFS info (device sdh1): disk space caching is enabled
[  566.922803] BTRFS info (device sdh1): bdev /dev/sdh1 errs: wr 0, rd 20, 
flush 0, corrupt 0, gen 0
[  578.715616] BUG: unable to handle kernel paging request at fe50
[  578.715619] IP: [] 
qgroup_fix_relocated_data_extents+0x2b/0x2c0 [btrfs]
[  578.715638] PGD 2f400f067 PUD 2f4011067 PMD 0
[  578.715640] Oops:  [#1]
[  578.715642] Modules linked in: cpufreq_userspace(E) cpufreq_powersave(E) 
cpufreq_conservative(E) kvm_amd(E) kvm(E) irqbypass(E) crc32_pclmul(E) 
efi_pstore(E) ghash_clmulni_intel(E) pcspkr(E) serio_raw(E) efivars(E) 
fam15h_power(E) k10temp(E) btrfs(E) acpi_cpufreq(E) tpm_tis(E) tpm_tis_core(E) 
tpm(E) sp5100_tco(E) sg(E) snd_hda_codec_realtek(E) snd_hda_codec_hdmi(E) 
snd_hda_codec_generic(E) snd_hda_intel(E) 9p(E) snd_hda_codec(E) 9pnet(E) 
snd_hda_core(E) fscache(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) 
soundcore(E) shpchp(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) 
configfs(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) 
fuse(E) evdev(E) aoe(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) loop(E) 
overlay(E) nls_utf8(E) isofs(E) raid10(E) raid456(E) async_raid6_recov(E)
[  578.715663]  async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) 
raid6_pq(E) libcrc32c(E) crc32c_generic(E) raid0(E) multipath(E) linear(E) 
dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) raid1(E) md_mod(E) sd_mod(E) 
hid_generic(E) usbhid(E) hid(E) uas(E) usb_storage(E) crc32c_intel(E) 
aesni_intel(E) aes_x86_64(E) glue_helper(E) lrw(E) gf128mul(E) ablk_helper(E) 
cryptd(E) ohci_pci(E) ahci(E) libahci(E) sata_sil24(E) i2c_piix4(E) r8169(E) 
mii(E) ehci_pci(E) ohci_hcd(E) ehci_hcd(E) libata(E) scsi_mod(E) radeon(E) 
i2c_algo_bit(E) drm_kms_helper(E) xhci_pci(E) xhci_hcd(E) usbcore(E) 
usb_common(E) ttm(E) drm(E) button(E)
[  578.715684] CPU: 0 PID: 3532 Comm: mount Tainted: GE   
4.8.0-1-grml-amd64 #1 Debian 4.8.15-1+grml.1
[  578.715684] Hardware name: Gigabyte Technology Co., Ltd. To be filled by 
O.E.M./970A-DS3P, BIOS FD 02/26/2016
[  578.715686] task: 8fd7ead82fc0 task.stack: 8fd7db15c000
[  578.715687] RIP: 0010:[]  [] 
qgroup_fix_relocated_data_extents+0x2b/0x2c0 [btrfs]
[  578.715699] RSP: 0018:8fd7db15fa08  EFLAGS: 00010246
[  578.715700] RAX: 8fd7db133800 RBX: 8fd7e30c79a0 RCX: 
[  578.715701] RDX: 8fd7ddb5cd10 RSI: 8fd7dc6b5000 RDI: 8fd7ddb5cc80
[  578.715701] RBP: 8fd7e7445000 R08:  R09: 8fd7ddb5cc80
[  578.715702] R10:  R11:  R12: 8fd7db15faa0
[  578.715703] R13: 8fd7dc6b5000 R14:  R15: 8fd7ddb5cc80
[  578.715704] FS:  7f23cd67f480() GS:bca35000() 
knlGS:
[  578.715705] CS:  0010 DS:  ES:  CR0: 80050033
[  578.715706] CR2: fe50 CR3: 00042302f000 CR4: 000406b0
[  578.715707] Stack:
[  578.715707]   0801 8fd7ddb5cc80 
0801
[  578.715710]  8fd7ddb5cc80 c0a24c44  

[  578.715711]  8f00db15fa40 36d3c76a 8fd7e30c79a0 
8fd7e7445000
[  578.715713] Call Trace:
[  578.715726]  [] ? start_transaction+0x94/0x4c0 [btrfs]
[  578.715738]  [] ? btrfs_recover_relocation+0x2e8/0x420 
[btrfs]
[  578.715750]  [] ? open_ctree+0x2158/0x2680 [btrfs]
[  578.715752]  [] ? snprintf+0x49/0x60
[  578.715762]  [] ? btrfs_mount+0xd26/0xe70 [btrfs]
[  578.715765]  [] ? lookup_fast+0x52/0x300
[  578.715767]  [] ? mount_fs+0x36/0x170
[  578.715770]  [] ? kstrdup+0x45/0x50
[  578.715772]  [] ? 

Re: [PATCH v7 1/2] btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error

2017-03-12 Thread Qu Wenruo



At 03/13/2017 04:49 AM, Stefan Priebe - Profihost AG wrote:

Hi Qu,

while V5 was running fine against the openSUSE-42.2 kernel (based on v4.4).


Thanks for the test.



V7 results in OOPS to me:
BUG: unable to handle kernel NULL pointer dereference at 01f0


This 0x1f0 is the same as offsetof(struct brrfs_root, fs_info), quite 
nice clue.



IP: [] __endio_write_update_ordered+0x33/0x140 [btrfs]


IP points to:
---
static inline bool btrfs_is_free_space_inode(struct btrfs_inode *inode)
{
struct btrfs_root *root = inode->root; << Either here

if (root == root->fs_info->tree_root && << Or here
btrfs_ino(inode) != BTRFS_BTREE_INODE_OBJECTID)

---

Taking the above offset into consideration, it's only possible for later 
case.


So here, we have a btrfs_inode whose @root is NULL.

This can be fixed easily by checking @root inside 
btrfs_is_free_space_inode(), as the backtrace shows that it's only 
happening for DirectIO, and it won't happen for free space cache inode.


But I'm more curious how this happened for a more accurate fix, or we 
could have other NULL pointer access.


Did you have any reproducer for this?

Thanks,
Qu


PGD 14e18d4067 PUD 14e1868067 PMD 0
Oops:  [#1] SMP
Modules linked in: netconsole xt_multiport ipt_REJECT nf_reject_ipv4
xt_set iptable_filter ip_tables x_tables ip_set_hash_net ip_set
nfnetlink crc32_pclmul button loop btrfs xor usbhid raid6_pq ata_generic
virtio_blk virtio_net uhci_hcd ehci_hcd i2c_piix4 usbcore virtio_pci
i2c_core usb_common ata_piix floppy
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.52+112-ph #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.7.5-20140722_172050-sagunt 04/01/2014
task: b4e0f500 ti: b4e0 task.ti: b4e0
RIP: 0010:[] []
__endio_write_update_ordered+0x33/0x140 [btrfs]
RSP: 0018:8814eae03cd8 EFLAGS: 00010086
RAX:  RBX: 8814e8fd5aa8 RCX: 0001
RDX: 0010 RSI: 0010 RDI: 8814e45885c0
RBP: 8814eae03d10 R08: 8814e8334000 R09: 00018040003a
R10: ea00507d8d00 R11: 88141f634080 R12: 8814e45885c0
R13: 8814e125d700 R14: 0010 R15: 8800376c6a80
FS: () GS:8814eae0() knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 01f0 CR3: 0014e34c9000 CR4: 001406f0Stack:
 0010 8814e8fd5aa8 8814e953f3c0
8814e125d700 0010 8800376c6a80 8814eae03d38
c03ddf67 8814e86b6a80 8814e8fd5aa8 0001
Call Trace:
[] btrfs_endio_direct_write+0x37/0x60 [btrfs]
[] bio_endio+0x57/0x60
[] btrfs_end_bio+0xa1/0x140 [btrfs]
[] bio_endio+0x57/0x60
[] blk_update_request+0x8b/0x330
[] blk_mq_end_request+0x1a/0x70
[] virtblk_request_done+0x3f/0x70 [virtio_blk]
[] __blk_mq_complete_request+0x78/0xe0
[] blk_mq_complete_request+0x1c/0x20
[] virtblk_done+0x64/0xe0 [virtio_blk]
[] vring_interrupt+0x3a/0x90
[] __handle_irq_event_percpu+0x89/0x1b0
[] handle_irq_event_percpu+0x23/0x60
[] handle_irq_event+0x3b/0x60
[] handle_edge_irq+0x6f/0x150
[] handle_irq+0x1d/0x30
[] do_IRQ+0x4b/0xd0
[] common_interrupt+0x8c/0x8c
DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b
Leftover inexact backtrace:
2017-03-12 20:33:08 
2017-03-12 20:33:08  [] ? native_safe_halt+0x6/0x10
[] default_idle+0x1e/0xe0
[] arch_cpu_idle+0xf/0x20
[] default_idle_call+0x3b/0x40
[] cpu_startup_entry+0x29a/0x370
[] rest_init+0x7c/0x80
[] start_kernel+0x490/0x49d
[] ? early_idt_handler_array+0x120/0x120
[] x86_64_start_reservations+0x2a/0x2c
[] x86_64_start_kernel+0x13b/0x14a
Code: e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 ec 10 48 8b 87 70 fc
ff ff 4c 8b 87 38 fe ff ff 48 c7 45 c8 00 00 00 00 48 89 75 d0 <48> 8b
b8 f0 01 00 00 48 3b 47 28 49 8b 84 24 78 fc ff ff 0f 84
RIP [] __endio_write_update_ordered+0x33/0x140 [btrfs]
RSP 
CR2: 01f0
---[ end trace 7529a0652fd7873e ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x3300 from 0x8100 (relocation range:
0x8000-0xbfff)

Greets,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 1/2] btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error

2017-03-12 Thread Stefan Priebe - Profihost AG
Hi Qu,

while V5 was running fine against the openSUSE-42.2 kernel (based on v4.4).

V7 results in OOPS to me:
BUG: unable to handle kernel NULL pointer dereference at 01f0
IP: [] __endio_write_update_ordered+0x33/0x140 [btrfs]
PGD 14e18d4067 PUD 14e1868067 PMD 0
Oops:  [#1] SMP
Modules linked in: netconsole xt_multiport ipt_REJECT nf_reject_ipv4
xt_set iptable_filter ip_tables x_tables ip_set_hash_net ip_set
nfnetlink crc32_pclmul button loop btrfs xor usbhid raid6_pq ata_generic
virtio_blk virtio_net uhci_hcd ehci_hcd i2c_piix4 usbcore virtio_pci
i2c_core usb_common ata_piix floppy
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.52+112-ph #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.7.5-20140722_172050-sagunt 04/01/2014
task: b4e0f500 ti: b4e0 task.ti: b4e0
RIP: 0010:[] []
__endio_write_update_ordered+0x33/0x140 [btrfs]
RSP: 0018:8814eae03cd8 EFLAGS: 00010086
RAX:  RBX: 8814e8fd5aa8 RCX: 0001
RDX: 0010 RSI: 0010 RDI: 8814e45885c0
RBP: 8814eae03d10 R08: 8814e8334000 R09: 00018040003a
R10: ea00507d8d00 R11: 88141f634080 R12: 8814e45885c0
R13: 8814e125d700 R14: 0010 R15: 8800376c6a80
FS: () GS:8814eae0() knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 01f0 CR3: 0014e34c9000 CR4: 001406f0Stack:
 0010 8814e8fd5aa8 8814e953f3c0
8814e125d700 0010 8800376c6a80 8814eae03d38
c03ddf67 8814e86b6a80 8814e8fd5aa8 0001
Call Trace:
[] btrfs_endio_direct_write+0x37/0x60 [btrfs]
[] bio_endio+0x57/0x60
[] btrfs_end_bio+0xa1/0x140 [btrfs]
[] bio_endio+0x57/0x60
[] blk_update_request+0x8b/0x330
[] blk_mq_end_request+0x1a/0x70
[] virtblk_request_done+0x3f/0x70 [virtio_blk]
[] __blk_mq_complete_request+0x78/0xe0
[] blk_mq_complete_request+0x1c/0x20
[] virtblk_done+0x64/0xe0 [virtio_blk]
[] vring_interrupt+0x3a/0x90
[] __handle_irq_event_percpu+0x89/0x1b0
[] handle_irq_event_percpu+0x23/0x60
[] handle_irq_event+0x3b/0x60
[] handle_edge_irq+0x6f/0x150
[] handle_irq+0x1d/0x30
[] do_IRQ+0x4b/0xd0
[] common_interrupt+0x8c/0x8c
DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b
Leftover inexact backtrace:
2017-03-12 20:33:08 
2017-03-12 20:33:08  [] ? native_safe_halt+0x6/0x10
[] default_idle+0x1e/0xe0
[] arch_cpu_idle+0xf/0x20
[] default_idle_call+0x3b/0x40
[] cpu_startup_entry+0x29a/0x370
[] rest_init+0x7c/0x80
[] start_kernel+0x490/0x49d
[] ? early_idt_handler_array+0x120/0x120
[] x86_64_start_reservations+0x2a/0x2c
[] x86_64_start_kernel+0x13b/0x14a
Code: e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 ec 10 48 8b 87 70 fc
ff ff 4c 8b 87 38 fe ff ff 48 c7 45 c8 00 00 00 00 48 89 75 d0 <48> 8b
b8 f0 01 00 00 48 3b 47 28 49 8b 84 24 78 fc ff ff 0f 84
RIP [] __endio_write_update_ordered+0x33/0x140 [btrfs]
RSP 
CR2: 01f0
---[ end trace 7529a0652fd7873e ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x3300 from 0x8100 (relocation range:
0x8000-0xbfff)

Greets,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html