Re: [PATCH v7 1/2] btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error

2017-03-14 Thread Qu Wenruo



At 03/14/2017 05:06 PM, Stefan Priebe - Profihost AG wrote:

Thanks Qu, removing BTRFS_I from the inode fixes this issue to me.

Greets,
Stefan



Glad to hear that.

And a small tip is, when compiling kernel(at least btrfs module), any 
warning should be checked carefully.


Such type mismatch will cause a warning and can help us to avoid such 
problem.


Thanks,
Qu


Am 14.03.2017 um 03:50 schrieb Qu Wenruo:



At 03/13/2017 09:26 PM, Stefan Priebe - Profihost AG wrote:


Am 13.03.2017 um 08:39 schrieb Qu Wenruo:



At 03/13/2017 03:26 PM, Stefan Priebe - Profihost AG wrote:

Hi Qu,

Am 13.03.2017 um 02:16 schrieb Qu Wenruo:

But wasn't this part of the code identical in V5? Why does it only
happen with V7?


There are still difference, but just as you said, the related
part(checking if inode is free space cache inode) is identical across v5
and v7.


But if i boot v7 it always happens. If i boot v5 it always works. Have
done 5 repeatet tests.


I rechecked the code change between v7 and v5.

It turns out that, the code base may cause the problem.

In v7, the base is v4.11-rc1, which introduced quite a lot of
btrfs_inode cleanup.

One of the difference is the parameter for btrfs_is_free_space_inode().

In v7, the parameter @inode changed from struct inode to struct
btrfs_inode.

So in v7, we're passing BTRFS_I(inode) to btrfs_is_free_space_inode(),
other than plain inode.

That's the most possible cause for me here.

So would you please paste the final patch applied to your tree?
Git diff or git format-patch can both handle it.

Thanks,
Qu




I'm afraid that's a rare race leading to NULL btrfs_inode->root, which
could happen in both v5 and v7.

What's the difference between SUSE and mainline kernel?


A lot ;-) But i don't think anything related.


Maybe some mainline kernel commits have already fixed it?


May be no idea. But i haven't found any reason why v5 works.

Stefan



Thanks,
Qu






--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 1/2] btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error

2017-03-14 Thread Stefan Priebe - Profihost AG
Thanks Qu, removing BTRFS_I from the inode fixes this issue to me.

Greets,
Stefan


Am 14.03.2017 um 03:50 schrieb Qu Wenruo:
> 
> 
> At 03/13/2017 09:26 PM, Stefan Priebe - Profihost AG wrote:
>>
>> Am 13.03.2017 um 08:39 schrieb Qu Wenruo:
>>>
>>>
>>> At 03/13/2017 03:26 PM, Stefan Priebe - Profihost AG wrote:
 Hi Qu,

 Am 13.03.2017 um 02:16 schrieb Qu Wenruo:

 But wasn't this part of the code identical in V5? Why does it only
 happen with V7?
>>>
>>> There are still difference, but just as you said, the related
>>> part(checking if inode is free space cache inode) is identical across v5
>>> and v7.
>>
>> But if i boot v7 it always happens. If i boot v5 it always works. Have
>> done 5 repeatet tests.
> 
> I rechecked the code change between v7 and v5.
> 
> It turns out that, the code base may cause the problem.
> 
> In v7, the base is v4.11-rc1, which introduced quite a lot of
> btrfs_inode cleanup.
> 
> One of the difference is the parameter for btrfs_is_free_space_inode().
> 
> In v7, the parameter @inode changed from struct inode to struct
> btrfs_inode.
> 
> So in v7, we're passing BTRFS_I(inode) to btrfs_is_free_space_inode(),
> other than plain inode.
> 
> That's the most possible cause for me here.
> 
> So would you please paste the final patch applied to your tree?
> Git diff or git format-patch can both handle it.
> 
> Thanks,
> Qu
> 
>>
>>> I'm afraid that's a rare race leading to NULL btrfs_inode->root, which
>>> could happen in both v5 and v7.
>>>
>>> What's the difference between SUSE and mainline kernel?
>>
>> A lot ;-) But i don't think anything related.
>>
>>> Maybe some mainline kernel commits have already fixed it?
>>
>> May be no idea. But i haven't found any reason why v5 works.
>>
>> Stefan
>>
>>>
>>> Thanks,
>>> Qu

> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 1/2] btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error

2017-03-13 Thread Qu Wenruo



At 03/13/2017 09:26 PM, Stefan Priebe - Profihost AG wrote:


Am 13.03.2017 um 08:39 schrieb Qu Wenruo:



At 03/13/2017 03:26 PM, Stefan Priebe - Profihost AG wrote:

Hi Qu,

Am 13.03.2017 um 02:16 schrieb Qu Wenruo:

But wasn't this part of the code identical in V5? Why does it only
happen with V7?


There are still difference, but just as you said, the related
part(checking if inode is free space cache inode) is identical across v5
and v7.


But if i boot v7 it always happens. If i boot v5 it always works. Have
done 5 repeatet tests.


I rechecked the code change between v7 and v5.

It turns out that, the code base may cause the problem.

In v7, the base is v4.11-rc1, which introduced quite a lot of 
btrfs_inode cleanup.


One of the difference is the parameter for btrfs_is_free_space_inode().

In v7, the parameter @inode changed from struct inode to struct btrfs_inode.

So in v7, we're passing BTRFS_I(inode) to btrfs_is_free_space_inode(), 
other than plain inode.


That's the most possible cause for me here.

So would you please paste the final patch applied to your tree?
Git diff or git format-patch can both handle it.

Thanks,
Qu




I'm afraid that's a rare race leading to NULL btrfs_inode->root, which
could happen in both v5 and v7.

What's the difference between SUSE and mainline kernel?


A lot ;-) But i don't think anything related.


Maybe some mainline kernel commits have already fixed it?


May be no idea. But i haven't found any reason why v5 works.

Stefan



Thanks,
Qu





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 1/2] btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error

2017-03-13 Thread Qu Wenruo



At 03/13/2017 09:26 PM, Stefan Priebe - Profihost AG wrote:


Am 13.03.2017 um 08:39 schrieb Qu Wenruo:



At 03/13/2017 03:26 PM, Stefan Priebe - Profihost AG wrote:

Hi Qu,

Am 13.03.2017 um 02:16 schrieb Qu Wenruo:


At 03/13/2017 04:49 AM, Stefan Priebe - Profihost AG wrote:

Hi Qu,

while V5 was running fine against the openSUSE-42.2 kernel (based on
v4.4).


Thanks for the test.


V7 results in OOPS to me:
BUG: unable to handle kernel NULL pointer dereference at
01f0


This 0x1f0 is the same as offsetof(struct brrfs_root, fs_info), quite
nice clue.


IP: [] __endio_write_update_ordered+0x33/0x140
[btrfs]


IP points to:
---
static inline bool btrfs_is_free_space_inode(struct btrfs_inode *inode)
{
struct btrfs_root *root = inode->root; << Either here

if (root == root->fs_info->tree_root && << Or here
btrfs_ino(inode) != BTRFS_BTREE_INODE_OBJECTID)

---

Taking the above offset into consideration, it's only possible for later
case.

So here, we have a btrfs_inode whose @root is NULL.


But wasn't this part of the code identical in V5? Why does it only
happen with V7?


There are still difference, but just as you said, the related
part(checking if inode is free space cache inode) is identical across v5
and v7.


But if i boot v7 it always happens. If i boot v5 it always works. Have
done 5 repeatet tests.


That's a problem now, I'll dig it further to find out the difference.

Thanks,
Qu




I'm afraid that's a rare race leading to NULL btrfs_inode->root, which
could happen in both v5 and v7.

What's the difference between SUSE and mainline kernel?


A lot ;-) But i don't think anything related.


Maybe some mainline kernel commits have already fixed it?


May be no idea. But i haven't found any reason why v5 works.

Stefan



Thanks,
Qu



This can be fixed easily by checking @root inside
btrfs_is_free_space_inode(), as the backtrace shows that it's only
happening for DirectIO, and it won't happen for free space cache inode.

But I'm more curious how this happened for a more accurate fix, or we
could have other NULL pointer access.

Did you have any reproducer for this?


Sorry no - this is a production MariaDB Server running btrfs with
compress-force=zlib. But if i could test anything i'll do.

Greets,
Stefan



Thanks,
Qu


PGD 14e18d4067 PUD 14e1868067 PMD 0
Oops:  [#1] SMP
Modules linked in: netconsole xt_multiport ipt_REJECT nf_reject_ipv4
xt_set iptable_filter ip_tables x_tables ip_set_hash_net ip_set
nfnetlink crc32_pclmul button loop btrfs xor usbhid raid6_pq
ata_generic
virtio_blk virtio_net uhci_hcd ehci_hcd i2c_piix4 usbcore virtio_pci
i2c_core usb_common ata_piix floppy
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.52+112-ph #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.7.5-20140722_172050-sagunt 04/01/2014
task: b4e0f500 ti: b4e0 task.ti: b4e0
RIP: 0010:[] []
__endio_write_update_ordered+0x33/0x140 [btrfs]
RSP: 0018:8814eae03cd8 EFLAGS: 00010086
RAX:  RBX: 8814e8fd5aa8 RCX: 0001
RDX: 0010 RSI: 0010 RDI: 8814e45885c0
RBP: 8814eae03d10 R08: 8814e8334000 R09: 00018040003a
R10: ea00507d8d00 R11: 88141f634080 R12: 8814e45885c0
R13: 8814e125d700 R14: 0010 R15: 8800376c6a80
FS: () GS:8814eae0()
knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 01f0 CR3: 0014e34c9000 CR4: 001406f0Stack:
 0010 8814e8fd5aa8 8814e953f3c0
8814e125d700 0010 8800376c6a80 8814eae03d38
c03ddf67 8814e86b6a80 8814e8fd5aa8 0001
Call Trace:
[] btrfs_endio_direct_write+0x37/0x60 [btrfs]
[] bio_endio+0x57/0x60
[] btrfs_end_bio+0xa1/0x140 [btrfs]
[] bio_endio+0x57/0x60
[] blk_update_request+0x8b/0x330
[] blk_mq_end_request+0x1a/0x70
[] virtblk_request_done+0x3f/0x70 [virtio_blk]
[] __blk_mq_complete_request+0x78/0xe0
[] blk_mq_complete_request+0x1c/0x20
[] virtblk_done+0x64/0xe0 [virtio_blk]
[] vring_interrupt+0x3a/0x90
[] __handle_irq_event_percpu+0x89/0x1b0
[] handle_irq_event_percpu+0x23/0x60
[] handle_irq_event+0x3b/0x60
[] handle_edge_irq+0x6f/0x150
[] handle_irq+0x1d/0x30
[] do_IRQ+0x4b/0xd0
[] common_interrupt+0x8c/0x8c
DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b
Leftover inexact backtrace:
2017-03-12 20:33:08 
2017-03-12 20:33:08  [] ?
native_safe_halt+0x6/0x10
[] default_idle+0x1e/0xe0
[] arch_cpu_idle+0xf/0x20
[] default_idle_call+0x3b/0x40
[] cpu_startup_entry+0x29a/0x370
[] rest_init+0x7c/0x80
[] start_kernel+0x490/0x49d
[] ? early_idt_handler_array+0x120/0x120
[] x86_64_start_reservations+0x2a/0x2c
[] x86_64_start_kernel+0x13b/0x14a
Code: e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 ec 10 48 8b 87 70 fc
ff ff 4c 8b 87 38 fe ff ff 48 c7 45 c8 00 00 00 00 48 89 75 d0 <48> 8b
b8 f0 01 00 00 48 3b 47 28 49 8b 84 24 78 fc ff ff 0f 84
RIP 

Re: [PATCH v7 1/2] btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error

2017-03-13 Thread Stefan Priebe - Profihost AG

Am 13.03.2017 um 08:39 schrieb Qu Wenruo:
> 
> 
> At 03/13/2017 03:26 PM, Stefan Priebe - Profihost AG wrote:
>> Hi Qu,
>>
>> Am 13.03.2017 um 02:16 schrieb Qu Wenruo:
>>>
>>> At 03/13/2017 04:49 AM, Stefan Priebe - Profihost AG wrote:
 Hi Qu,

 while V5 was running fine against the openSUSE-42.2 kernel (based on
 v4.4).
>>>
>>> Thanks for the test.
>>>
 V7 results in OOPS to me:
 BUG: unable to handle kernel NULL pointer dereference at
 01f0
>>>
>>> This 0x1f0 is the same as offsetof(struct brrfs_root, fs_info), quite
>>> nice clue.
>>>
 IP: [] __endio_write_update_ordered+0x33/0x140
 [btrfs]
>>>
>>> IP points to:
>>> ---
>>> static inline bool btrfs_is_free_space_inode(struct btrfs_inode *inode)
>>> {
>>> struct btrfs_root *root = inode->root; << Either here
>>>
>>> if (root == root->fs_info->tree_root && << Or here
>>> btrfs_ino(inode) != BTRFS_BTREE_INODE_OBJECTID)
>>>
>>> ---
>>>
>>> Taking the above offset into consideration, it's only possible for later
>>> case.
>>>
>>> So here, we have a btrfs_inode whose @root is NULL.
>>
>> But wasn't this part of the code identical in V5? Why does it only
>> happen with V7?
> 
> There are still difference, but just as you said, the related
> part(checking if inode is free space cache inode) is identical across v5
> and v7.

But if i boot v7 it always happens. If i boot v5 it always works. Have
done 5 repeatet tests.

> I'm afraid that's a rare race leading to NULL btrfs_inode->root, which
> could happen in both v5 and v7.
> 
> What's the difference between SUSE and mainline kernel?

A lot ;-) But i don't think anything related.

> Maybe some mainline kernel commits have already fixed it?

May be no idea. But i haven't found any reason why v5 works.

Stefan

> 
> Thanks,
> Qu
>>
>>> This can be fixed easily by checking @root inside
>>> btrfs_is_free_space_inode(), as the backtrace shows that it's only
>>> happening for DirectIO, and it won't happen for free space cache inode.
>>>
>>> But I'm more curious how this happened for a more accurate fix, or we
>>> could have other NULL pointer access.
>>>
>>> Did you have any reproducer for this?
>>
>> Sorry no - this is a production MariaDB Server running btrfs with
>> compress-force=zlib. But if i could test anything i'll do.
>>
>> Greets,
>> Stefan
>>
>>>
>>> Thanks,
>>> Qu
>>>
 PGD 14e18d4067 PUD 14e1868067 PMD 0
 Oops:  [#1] SMP
 Modules linked in: netconsole xt_multiport ipt_REJECT nf_reject_ipv4
 xt_set iptable_filter ip_tables x_tables ip_set_hash_net ip_set
 nfnetlink crc32_pclmul button loop btrfs xor usbhid raid6_pq
 ata_generic
 virtio_blk virtio_net uhci_hcd ehci_hcd i2c_piix4 usbcore virtio_pci
 i2c_core usb_common ata_piix floppy
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.52+112-ph #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
 1.7.5-20140722_172050-sagunt 04/01/2014
 task: b4e0f500 ti: b4e0 task.ti: b4e0
 RIP: 0010:[] []
 __endio_write_update_ordered+0x33/0x140 [btrfs]
 RSP: 0018:8814eae03cd8 EFLAGS: 00010086
 RAX:  RBX: 8814e8fd5aa8 RCX: 0001
 RDX: 0010 RSI: 0010 RDI: 8814e45885c0
 RBP: 8814eae03d10 R08: 8814e8334000 R09: 00018040003a
 R10: ea00507d8d00 R11: 88141f634080 R12: 8814e45885c0
 R13: 8814e125d700 R14: 0010 R15: 8800376c6a80
 FS: () GS:8814eae0()
 knlGS:
 CS: 0010 DS:  ES:  CR0: 80050033
 CR2: 01f0 CR3: 0014e34c9000 CR4: 001406f0Stack:
  0010 8814e8fd5aa8 8814e953f3c0
 8814e125d700 0010 8800376c6a80 8814eae03d38
 c03ddf67 8814e86b6a80 8814e8fd5aa8 0001
 Call Trace:
 [] btrfs_endio_direct_write+0x37/0x60 [btrfs]
 [] bio_endio+0x57/0x60
 [] btrfs_end_bio+0xa1/0x140 [btrfs]
 [] bio_endio+0x57/0x60
 [] blk_update_request+0x8b/0x330
 [] blk_mq_end_request+0x1a/0x70
 [] virtblk_request_done+0x3f/0x70 [virtio_blk]
 [] __blk_mq_complete_request+0x78/0xe0
 [] blk_mq_complete_request+0x1c/0x20
 [] virtblk_done+0x64/0xe0 [virtio_blk]
 [] vring_interrupt+0x3a/0x90
 [] __handle_irq_event_percpu+0x89/0x1b0
 [] handle_irq_event_percpu+0x23/0x60
 [] handle_irq_event+0x3b/0x60
 [] handle_edge_irq+0x6f/0x150
 [] handle_irq+0x1d/0x30
 [] do_IRQ+0x4b/0xd0
 [] common_interrupt+0x8c/0x8c
 DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b
 Leftover inexact backtrace:
 2017-03-12 20:33:08 
 2017-03-12 20:33:08  [] ?
 native_safe_halt+0x6/0x10
 [] default_idle+0x1e/0xe0
 [] arch_cpu_idle+0xf/0x20
 [] default_idle_call+0x3b/0x40
 [] cpu_startup_entry+0x29a/0x370
 [] 

Re: [PATCH v7 1/2] btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error

2017-03-13 Thread Qu Wenruo



At 03/13/2017 03:26 PM, Stefan Priebe - Profihost AG wrote:

Hi Qu,

Am 13.03.2017 um 02:16 schrieb Qu Wenruo:


At 03/13/2017 04:49 AM, Stefan Priebe - Profihost AG wrote:

Hi Qu,

while V5 was running fine against the openSUSE-42.2 kernel (based on
v4.4).


Thanks for the test.


V7 results in OOPS to me:
BUG: unable to handle kernel NULL pointer dereference at 01f0


This 0x1f0 is the same as offsetof(struct brrfs_root, fs_info), quite
nice clue.


IP: [] __endio_write_update_ordered+0x33/0x140 [btrfs]


IP points to:
---
static inline bool btrfs_is_free_space_inode(struct btrfs_inode *inode)
{
struct btrfs_root *root = inode->root; << Either here

if (root == root->fs_info->tree_root && << Or here
btrfs_ino(inode) != BTRFS_BTREE_INODE_OBJECTID)

---

Taking the above offset into consideration, it's only possible for later
case.

So here, we have a btrfs_inode whose @root is NULL.


But wasn't this part of the code identical in V5? Why does it only
happen with V7?


There are still difference, but just as you said, the related 
part(checking if inode is free space cache inode) is identical across v5 
and v7.



I'm afraid that's a rare race leading to NULL btrfs_inode->root, which 
could happen in both v5 and v7.


What's the difference between SUSE and mainline kernel?
Maybe some mainline kernel commits have already fixed it?

Thanks,
Qu



This can be fixed easily by checking @root inside
btrfs_is_free_space_inode(), as the backtrace shows that it's only
happening for DirectIO, and it won't happen for free space cache inode.

But I'm more curious how this happened for a more accurate fix, or we
could have other NULL pointer access.

Did you have any reproducer for this?


Sorry no - this is a production MariaDB Server running btrfs with
compress-force=zlib. But if i could test anything i'll do.

Greets,
Stefan



Thanks,
Qu


PGD 14e18d4067 PUD 14e1868067 PMD 0
Oops:  [#1] SMP
Modules linked in: netconsole xt_multiport ipt_REJECT nf_reject_ipv4
xt_set iptable_filter ip_tables x_tables ip_set_hash_net ip_set
nfnetlink crc32_pclmul button loop btrfs xor usbhid raid6_pq ata_generic
virtio_blk virtio_net uhci_hcd ehci_hcd i2c_piix4 usbcore virtio_pci
i2c_core usb_common ata_piix floppy
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.52+112-ph #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.7.5-20140722_172050-sagunt 04/01/2014
task: b4e0f500 ti: b4e0 task.ti: b4e0
RIP: 0010:[] []
__endio_write_update_ordered+0x33/0x140 [btrfs]
RSP: 0018:8814eae03cd8 EFLAGS: 00010086
RAX:  RBX: 8814e8fd5aa8 RCX: 0001
RDX: 0010 RSI: 0010 RDI: 8814e45885c0
RBP: 8814eae03d10 R08: 8814e8334000 R09: 00018040003a
R10: ea00507d8d00 R11: 88141f634080 R12: 8814e45885c0
R13: 8814e125d700 R14: 0010 R15: 8800376c6a80
FS: () GS:8814eae0()
knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 01f0 CR3: 0014e34c9000 CR4: 001406f0Stack:
 0010 8814e8fd5aa8 8814e953f3c0
8814e125d700 0010 8800376c6a80 8814eae03d38
c03ddf67 8814e86b6a80 8814e8fd5aa8 0001
Call Trace:
[] btrfs_endio_direct_write+0x37/0x60 [btrfs]
[] bio_endio+0x57/0x60
[] btrfs_end_bio+0xa1/0x140 [btrfs]
[] bio_endio+0x57/0x60
[] blk_update_request+0x8b/0x330
[] blk_mq_end_request+0x1a/0x70
[] virtblk_request_done+0x3f/0x70 [virtio_blk]
[] __blk_mq_complete_request+0x78/0xe0
[] blk_mq_complete_request+0x1c/0x20
[] virtblk_done+0x64/0xe0 [virtio_blk]
[] vring_interrupt+0x3a/0x90
[] __handle_irq_event_percpu+0x89/0x1b0
[] handle_irq_event_percpu+0x23/0x60
[] handle_irq_event+0x3b/0x60
[] handle_edge_irq+0x6f/0x150
[] handle_irq+0x1d/0x30
[] do_IRQ+0x4b/0xd0
[] common_interrupt+0x8c/0x8c
DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b
Leftover inexact backtrace:
2017-03-12 20:33:08 
2017-03-12 20:33:08  [] ? native_safe_halt+0x6/0x10
[] default_idle+0x1e/0xe0
[] arch_cpu_idle+0xf/0x20
[] default_idle_call+0x3b/0x40
[] cpu_startup_entry+0x29a/0x370
[] rest_init+0x7c/0x80
[] start_kernel+0x490/0x49d
[] ? early_idt_handler_array+0x120/0x120
[] x86_64_start_reservations+0x2a/0x2c
[] x86_64_start_kernel+0x13b/0x14a
Code: e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 ec 10 48 8b 87 70 fc
ff ff 4c 8b 87 38 fe ff ff 48 c7 45 c8 00 00 00 00 48 89 75 d0 <48> 8b
b8 f0 01 00 00 48 3b 47 28 49 8b 84 24 78 fc ff ff 0f 84
RIP [] __endio_write_update_ordered+0x33/0x140 [btrfs]
RSP 
CR2: 01f0
---[ end trace 7529a0652fd7873e ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x3300 from 0x8100 (relocation range:
0x8000-0xbfff)

Greets,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to 

Re: [PATCH v7 1/2] btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error

2017-03-13 Thread Stefan Priebe - Profihost AG
Hi Qu,

Am 13.03.2017 um 02:16 schrieb Qu Wenruo:
> 
> At 03/13/2017 04:49 AM, Stefan Priebe - Profihost AG wrote:
>> Hi Qu,
>>
>> while V5 was running fine against the openSUSE-42.2 kernel (based on
>> v4.4).
> 
> Thanks for the test.
> 
>> V7 results in OOPS to me:
>> BUG: unable to handle kernel NULL pointer dereference at 01f0
> 
> This 0x1f0 is the same as offsetof(struct brrfs_root, fs_info), quite
> nice clue.
> 
>> IP: [] __endio_write_update_ordered+0x33/0x140 [btrfs]
> 
> IP points to:
> ---
> static inline bool btrfs_is_free_space_inode(struct btrfs_inode *inode)
> {
> struct btrfs_root *root = inode->root; << Either here
> 
> if (root == root->fs_info->tree_root && << Or here
> btrfs_ino(inode) != BTRFS_BTREE_INODE_OBJECTID)
> 
> ---
> 
> Taking the above offset into consideration, it's only possible for later
> case.
> 
> So here, we have a btrfs_inode whose @root is NULL.

But wasn't this part of the code identical in V5? Why does it only
happen with V7?

> This can be fixed easily by checking @root inside
> btrfs_is_free_space_inode(), as the backtrace shows that it's only
> happening for DirectIO, and it won't happen for free space cache inode.
> 
> But I'm more curious how this happened for a more accurate fix, or we
> could have other NULL pointer access.
> 
> Did you have any reproducer for this?

Sorry no - this is a production MariaDB Server running btrfs with
compress-force=zlib. But if i could test anything i'll do.

Greets,
Stefan

> 
> Thanks,
> Qu
> 
>> PGD 14e18d4067 PUD 14e1868067 PMD 0
>> Oops:  [#1] SMP
>> Modules linked in: netconsole xt_multiport ipt_REJECT nf_reject_ipv4
>> xt_set iptable_filter ip_tables x_tables ip_set_hash_net ip_set
>> nfnetlink crc32_pclmul button loop btrfs xor usbhid raid6_pq ata_generic
>> virtio_blk virtio_net uhci_hcd ehci_hcd i2c_piix4 usbcore virtio_pci
>> i2c_core usb_common ata_piix floppy
>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.52+112-ph #1
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> 1.7.5-20140722_172050-sagunt 04/01/2014
>> task: b4e0f500 ti: b4e0 task.ti: b4e0
>> RIP: 0010:[] []
>> __endio_write_update_ordered+0x33/0x140 [btrfs]
>> RSP: 0018:8814eae03cd8 EFLAGS: 00010086
>> RAX:  RBX: 8814e8fd5aa8 RCX: 0001
>> RDX: 0010 RSI: 0010 RDI: 8814e45885c0
>> RBP: 8814eae03d10 R08: 8814e8334000 R09: 00018040003a
>> R10: ea00507d8d00 R11: 88141f634080 R12: 8814e45885c0
>> R13: 8814e125d700 R14: 0010 R15: 8800376c6a80
>> FS: () GS:8814eae0()
>> knlGS:
>> CS: 0010 DS:  ES:  CR0: 80050033
>> CR2: 01f0 CR3: 0014e34c9000 CR4: 001406f0Stack:
>>  0010 8814e8fd5aa8 8814e953f3c0
>> 8814e125d700 0010 8800376c6a80 8814eae03d38
>> c03ddf67 8814e86b6a80 8814e8fd5aa8 0001
>> Call Trace:
>> [] btrfs_endio_direct_write+0x37/0x60 [btrfs]
>> [] bio_endio+0x57/0x60
>> [] btrfs_end_bio+0xa1/0x140 [btrfs]
>> [] bio_endio+0x57/0x60
>> [] blk_update_request+0x8b/0x330
>> [] blk_mq_end_request+0x1a/0x70
>> [] virtblk_request_done+0x3f/0x70 [virtio_blk]
>> [] __blk_mq_complete_request+0x78/0xe0
>> [] blk_mq_complete_request+0x1c/0x20
>> [] virtblk_done+0x64/0xe0 [virtio_blk]
>> [] vring_interrupt+0x3a/0x90
>> [] __handle_irq_event_percpu+0x89/0x1b0
>> [] handle_irq_event_percpu+0x23/0x60
>> [] handle_irq_event+0x3b/0x60
>> [] handle_edge_irq+0x6f/0x150
>> [] handle_irq+0x1d/0x30
>> [] do_IRQ+0x4b/0xd0
>> [] common_interrupt+0x8c/0x8c
>> DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b
>> Leftover inexact backtrace:
>> 2017-03-12 20:33:08 
>> 2017-03-12 20:33:08  [] ? native_safe_halt+0x6/0x10
>> [] default_idle+0x1e/0xe0
>> [] arch_cpu_idle+0xf/0x20
>> [] default_idle_call+0x3b/0x40
>> [] cpu_startup_entry+0x29a/0x370
>> [] rest_init+0x7c/0x80
>> [] start_kernel+0x490/0x49d
>> [] ? early_idt_handler_array+0x120/0x120
>> [] x86_64_start_reservations+0x2a/0x2c
>> [] x86_64_start_kernel+0x13b/0x14a
>> Code: e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 ec 10 48 8b 87 70 fc
>> ff ff 4c 8b 87 38 fe ff ff 48 c7 45 c8 00 00 00 00 48 89 75 d0 <48> 8b
>> b8 f0 01 00 00 48 3b 47 28 49 8b 84 24 78 fc ff ff 0f 84
>> RIP [] __endio_write_update_ordered+0x33/0x140 [btrfs]
>> RSP 
>> CR2: 01f0
>> ---[ end trace 7529a0652fd7873e ]---
>> Kernel panic - not syncing: Fatal exception in interrupt
>> Kernel Offset: 0x3300 from 0x8100 (relocation range:
>> 0x8000-0xbfff)
>>
>> Greets,
>> Stefan
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> 
> 
--
To unsubscribe from this list: send the 

Re: [PATCH v7 1/2] btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error

2017-03-12 Thread Qu Wenruo



At 03/13/2017 04:49 AM, Stefan Priebe - Profihost AG wrote:

Hi Qu,

while V5 was running fine against the openSUSE-42.2 kernel (based on v4.4).


Thanks for the test.



V7 results in OOPS to me:
BUG: unable to handle kernel NULL pointer dereference at 01f0


This 0x1f0 is the same as offsetof(struct brrfs_root, fs_info), quite 
nice clue.



IP: [] __endio_write_update_ordered+0x33/0x140 [btrfs]


IP points to:
---
static inline bool btrfs_is_free_space_inode(struct btrfs_inode *inode)
{
struct btrfs_root *root = inode->root; << Either here

if (root == root->fs_info->tree_root && << Or here
btrfs_ino(inode) != BTRFS_BTREE_INODE_OBJECTID)

---

Taking the above offset into consideration, it's only possible for later 
case.


So here, we have a btrfs_inode whose @root is NULL.

This can be fixed easily by checking @root inside 
btrfs_is_free_space_inode(), as the backtrace shows that it's only 
happening for DirectIO, and it won't happen for free space cache inode.


But I'm more curious how this happened for a more accurate fix, or we 
could have other NULL pointer access.


Did you have any reproducer for this?

Thanks,
Qu


PGD 14e18d4067 PUD 14e1868067 PMD 0
Oops:  [#1] SMP
Modules linked in: netconsole xt_multiport ipt_REJECT nf_reject_ipv4
xt_set iptable_filter ip_tables x_tables ip_set_hash_net ip_set
nfnetlink crc32_pclmul button loop btrfs xor usbhid raid6_pq ata_generic
virtio_blk virtio_net uhci_hcd ehci_hcd i2c_piix4 usbcore virtio_pci
i2c_core usb_common ata_piix floppy
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.52+112-ph #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.7.5-20140722_172050-sagunt 04/01/2014
task: b4e0f500 ti: b4e0 task.ti: b4e0
RIP: 0010:[] []
__endio_write_update_ordered+0x33/0x140 [btrfs]
RSP: 0018:8814eae03cd8 EFLAGS: 00010086
RAX:  RBX: 8814e8fd5aa8 RCX: 0001
RDX: 0010 RSI: 0010 RDI: 8814e45885c0
RBP: 8814eae03d10 R08: 8814e8334000 R09: 00018040003a
R10: ea00507d8d00 R11: 88141f634080 R12: 8814e45885c0
R13: 8814e125d700 R14: 0010 R15: 8800376c6a80
FS: () GS:8814eae0() knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 01f0 CR3: 0014e34c9000 CR4: 001406f0Stack:
 0010 8814e8fd5aa8 8814e953f3c0
8814e125d700 0010 8800376c6a80 8814eae03d38
c03ddf67 8814e86b6a80 8814e8fd5aa8 0001
Call Trace:
[] btrfs_endio_direct_write+0x37/0x60 [btrfs]
[] bio_endio+0x57/0x60
[] btrfs_end_bio+0xa1/0x140 [btrfs]
[] bio_endio+0x57/0x60
[] blk_update_request+0x8b/0x330
[] blk_mq_end_request+0x1a/0x70
[] virtblk_request_done+0x3f/0x70 [virtio_blk]
[] __blk_mq_complete_request+0x78/0xe0
[] blk_mq_complete_request+0x1c/0x20
[] virtblk_done+0x64/0xe0 [virtio_blk]
[] vring_interrupt+0x3a/0x90
[] __handle_irq_event_percpu+0x89/0x1b0
[] handle_irq_event_percpu+0x23/0x60
[] handle_irq_event+0x3b/0x60
[] handle_edge_irq+0x6f/0x150
[] handle_irq+0x1d/0x30
[] do_IRQ+0x4b/0xd0
[] common_interrupt+0x8c/0x8c
DWARF2 unwinder stuck at ret_from_intr+0x0/0x1b
Leftover inexact backtrace:
2017-03-12 20:33:08 
2017-03-12 20:33:08  [] ? native_safe_halt+0x6/0x10
[] default_idle+0x1e/0xe0
[] arch_cpu_idle+0xf/0x20
[] default_idle_call+0x3b/0x40
[] cpu_startup_entry+0x29a/0x370
[] rest_init+0x7c/0x80
[] start_kernel+0x490/0x49d
[] ? early_idt_handler_array+0x120/0x120
[] x86_64_start_reservations+0x2a/0x2c
[] x86_64_start_kernel+0x13b/0x14a
Code: e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 ec 10 48 8b 87 70 fc
ff ff 4c 8b 87 38 fe ff ff 48 c7 45 c8 00 00 00 00 48 89 75 d0 <48> 8b
b8 f0 01 00 00 48 3b 47 28 49 8b 84 24 78 fc ff ff 0f 84
RIP [] __endio_write_update_ordered+0x33/0x140 [btrfs]
RSP 
CR2: 01f0
---[ end trace 7529a0652fd7873e ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x3300 from 0x8100 (relocation range:
0x8000-0xbfff)

Greets,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 1/2] btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error

2017-03-09 Thread Liu Bo
On Wed, Mar 08, 2017 at 10:25:51AM +0800, Qu Wenruo wrote:
> [BUG]
> When btrfs_reloc_clone_csum() reports error, it can underflow metadata
> and leads to kernel assertion on outstanding extents in
> run_delalloc_nocow() and cow_file_range().
> 
>  BTRFS info (device vdb5): relocating block group 12582912 flags data
>  BTRFS info (device vdb5): found 1 extents
>  assertion failed: inode->outstanding_extents >= num_extents, file: 
> fs/btrfs//extent-tree.c, line: 5858
> 
> Currently, due to another bug blocking ordered extents, the bug is only
> reproducible under certain block group layout and using error injection.
> 
> a) Create one data block group with one 4K extent in it.
>To avoid the bug that hangs btrfs due to ordered extent which never
>finishes
> b) Make btrfs_reloc_clone_csum() always fail
> c) Relocate that block group
> 
> [CAUSE]
> run_delalloc_nocow() and cow_file_range() handles error from
> btrfs_reloc_clone_csum() wrongly:
> 
> (The ascii chart shows a more generic case of this bug other than the
> bug mentioned above)
> 
> |<-- delalloc range --->|
> | OE 1 | OE 2 | ... | OE n |
> |<--- cleanup range --->|
> |<---  --->|
>  \/
>  btrfs_finish_ordered_io() range
> 
> So error handler, which calls extent_clear_unlock_delalloc() with
> EXTENT_DELALLOC and EXTENT_DO_ACCOUNT bits, and btrfs_finish_ordered_io()
> will both cover OE n, and free its metadata, causing metadata under flow.
> 
> [Fix]
> The fix is to ensure after calling btrfs_add_ordered_extent(), we only
> call error handler after increasing the iteration offset, so that
> cleanup range won't cover any created ordered extent.
> 
> |<-- delalloc range --->|
> | OE 1 | OE 2 | ... | OE n |
> |<---  --->|<-- cleanup range ->|
>  \/
>  btrfs_finish_ordered_io() range

Reviewed-by: Liu Bo 

Thanks,

-liubo
> 
> Signed-off-by: Qu Wenruo 
> ---
> v6:
>   New, split from v5 patch, as this is a separate bug.
> v7:
>   Fix a wrong operator pointed out by Liu Bo.
>   Test case will follow soon.
> ---
>  fs/btrfs/inode.c | 51 +++
>  1 file changed, 39 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index c40060cc481f..fe588bf30d02 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -998,15 +998,24 @@ static noinline int cow_file_range(struct inode *inode,
>   BTRFS_DATA_RELOC_TREE_OBJECTID) {
>   ret = btrfs_reloc_clone_csums(inode, start,
> cur_alloc_size);
> + /*
> +  * Only drop cache here, and process as normal.
> +  *
> +  * We must not allow extent_clear_unlock_delalloc()
> +  * at out_unlock label to free meta of this ordered
> +  * extent, as its meta should be freed by
> +  * btrfs_finish_ordered_io().
> +  *
> +  * So we must continue until @start is increased to
> +  * skip current ordered extent.
> +  */
>   if (ret)
> - goto out_drop_extent_cache;
> + btrfs_drop_extent_cache(BTRFS_I(inode), start,
> + start + ram_size - 1, 0);
>   }
>  
>   btrfs_dec_block_group_reservations(fs_info, ins.objectid);
>  
> - if (disk_num_bytes < cur_alloc_size)
> - break;
> -
>   /* we're not doing compressed IO, don't unlock the first
>* page (which the caller expects to stay locked), don't
>* clear any dirty bits and don't set any writeback bits
> @@ -1022,10 +1031,21 @@ static noinline int cow_file_range(struct inode 
> *inode,
>delalloc_end, locked_page,
>EXTENT_LOCKED | EXTENT_DELALLOC,
>op);
> - disk_num_bytes -= cur_alloc_size;
> + if (disk_num_bytes < cur_alloc_size)
> + disk_num_bytes = 0;
> + else
> + disk_num_bytes -= cur_alloc_size;
>   num_bytes -= cur_alloc_size;
>   alloc_hint = ins.objectid + ins.offset;
>   start += cur_alloc_size;
> +
> + /*
> +  * btrfs_reloc_clone_csums() error, since start is increased
> +  * extent_clear_unlock_delalloc() at out_unlock label won't
> +  * free metadata of current ordered extent, we're OK to exit.
> +  */
> + if (ret)
> + 

Re: [PATCH v7 1/2] btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error

2017-03-08 Thread Filipe Manana
On Wed, Mar 8, 2017 at 2:25 AM, Qu Wenruo  wrote:
> [BUG]
> When btrfs_reloc_clone_csum() reports error, it can underflow metadata
> and leads to kernel assertion on outstanding extents in
> run_delalloc_nocow() and cow_file_range().
>
>  BTRFS info (device vdb5): relocating block group 12582912 flags data
>  BTRFS info (device vdb5): found 1 extents
>  assertion failed: inode->outstanding_extents >= num_extents, file: 
> fs/btrfs//extent-tree.c, line: 5858
>
> Currently, due to another bug blocking ordered extents, the bug is only
> reproducible under certain block group layout and using error injection.
>
> a) Create one data block group with one 4K extent in it.
>To avoid the bug that hangs btrfs due to ordered extent which never
>finishes
> b) Make btrfs_reloc_clone_csum() always fail
> c) Relocate that block group
>
> [CAUSE]
> run_delalloc_nocow() and cow_file_range() handles error from
> btrfs_reloc_clone_csum() wrongly:
>
> (The ascii chart shows a more generic case of this bug other than the
> bug mentioned above)
>
> |<-- delalloc range --->|
> | OE 1 | OE 2 | ... | OE n |
> |<--- cleanup range --->|
> |<---  --->|
>  \/
>  btrfs_finish_ordered_io() range
>
> So error handler, which calls extent_clear_unlock_delalloc() with
> EXTENT_DELALLOC and EXTENT_DO_ACCOUNT bits, and btrfs_finish_ordered_io()
> will both cover OE n, and free its metadata, causing metadata under flow.
>
> [Fix]
> The fix is to ensure after calling btrfs_add_ordered_extent(), we only
> call error handler after increasing the iteration offset, so that
> cleanup range won't cover any created ordered extent.
>
> |<-- delalloc range --->|
> | OE 1 | OE 2 | ... | OE n |
> |<---  --->|<-- cleanup range ->|
>  \/
>  btrfs_finish_ordered_io() range
>
> Signed-off-by: Qu Wenruo 

Reviewed-by: Filipe Manana 

Thanks

> ---
> v6:
>   New, split from v5 patch, as this is a separate bug.
> v7:
>   Fix a wrong operator pointed out by Liu Bo.
>   Test case will follow soon.
> ---
>  fs/btrfs/inode.c | 51 +++
>  1 file changed, 39 insertions(+), 12 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index c40060cc481f..fe588bf30d02 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -998,15 +998,24 @@ static noinline int cow_file_range(struct inode *inode,
> BTRFS_DATA_RELOC_TREE_OBJECTID) {
> ret = btrfs_reloc_clone_csums(inode, start,
>   cur_alloc_size);
> +   /*
> +* Only drop cache here, and process as normal.
> +*
> +* We must not allow extent_clear_unlock_delalloc()
> +* at out_unlock label to free meta of this ordered
> +* extent, as its meta should be freed by
> +* btrfs_finish_ordered_io().
> +*
> +* So we must continue until @start is increased to
> +* skip current ordered extent.
> +*/
> if (ret)
> -   goto out_drop_extent_cache;
> +   btrfs_drop_extent_cache(BTRFS_I(inode), start,
> +   start + ram_size - 1, 0);
> }
>
> btrfs_dec_block_group_reservations(fs_info, ins.objectid);
>
> -   if (disk_num_bytes < cur_alloc_size)
> -   break;
> -
> /* we're not doing compressed IO, don't unlock the first
>  * page (which the caller expects to stay locked), don't
>  * clear any dirty bits and don't set any writeback bits
> @@ -1022,10 +1031,21 @@ static noinline int cow_file_range(struct inode 
> *inode,
>  delalloc_end, locked_page,
>  EXTENT_LOCKED | EXTENT_DELALLOC,
>  op);
> -   disk_num_bytes -= cur_alloc_size;
> +   if (disk_num_bytes < cur_alloc_size)
> +   disk_num_bytes = 0;
> +   else
> +   disk_num_bytes -= cur_alloc_size;
> num_bytes -= cur_alloc_size;
> alloc_hint = ins.objectid + ins.offset;
> start += cur_alloc_size;
> +
> +   /*
> +* btrfs_reloc_clone_csums() error, since start is increased
> +* extent_clear_unlock_delalloc() at out_unlock label won't
> +* free metadata of current ordered