Re: [PATCH v2] Btrfs: fix file/data loss caused by fsync after rename and new inode

2016-09-03 Thread Chris Mason

On 08/18/2016 11:03 AM, Chris Mason wrote:

On Wed, Mar 30, 2016 at 11:37:21PM +0100, fdman...@kernel.org wrote:

From: Filipe Manana 

If we rename an inode A (be it a file or a directory), create a new
inode B with the old name of inode A and under the same parent directory,
fsync inode B and then power fail, at log tree replay time we end up
removing inode A completely. If inode A is a directory then all its files
are gone too.


I bisected a crash with dbench down to this patch.  The reproduction was:

mkfs.btrfs -m single -f /dev/vdb
mount /dev/vdb /btrfs
cd /btrfs
mkdir clients
for x in `seq 0 100` ; do btrfs subvol create clients/client$x ; done
sync
dbench 100

In other words, run dbench with a subvol per dbench thread.  It crashes
immediately, most often with an invalid access in copy_from_user during
file_write.  The pattern of crashes and location just show general
memory corruption and the actual stack trace wasn't very useful.

With this patch reverted the runs last much much longer, but we still
hit a crash eventually.  It's not clear to me if this is two different
bugs or if Filipe's patch just makes the corruption much easier to hit.
I'm still digging through it all, but here's a common backtrace with
this patch reverted:



Lots of debugging later, we're leaving btrfs_log_ctx structures from the 
stack in lists without taking them out before btrfs_sync_file() or 
btrfs_sync_log() exit.  I'm still figuring out all of the corner cases 
where it happens, but at least things are starting to make sense again.


I should have a patch out the door on Tuesday.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: fix file/data loss caused by fsync after rename and new inode

2016-08-18 Thread Chris Mason

On Wed, Mar 30, 2016 at 11:37:21PM +0100, fdman...@kernel.org wrote:

From: Filipe Manana 

If we rename an inode A (be it a file or a directory), create a new
inode B with the old name of inode A and under the same parent directory,
fsync inode B and then power fail, at log tree replay time we end up
removing inode A completely. If inode A is a directory then all its files
are gone too.


I bisected a crash with dbench down to this patch.  The reproduction 
was:


mkfs.btrfs -m single -f /dev/vdb
mount /dev/vdb /btrfs
cd /btrfs
mkdir clients
for x in `seq 0 100` ; do btrfs subvol create clients/client$x ; done
sync
dbench 100

In other words, run dbench with a subvol per dbench thread.  It crashes 
immediately, most often with an invalid access in copy_from_user during 
file_write.  The pattern of crashes and location just show general 
memory corruption and the actual stack trace wasn't very useful.


With this patch reverted the runs last much much longer, but we still 
hit a crash eventually.  It's not clear to me if this is two different 
bugs or if Filipe's patch just makes the corruption much easier to hit.  
I'm still digging through it all, but here's a common backtrace with

this patch reverted:

BUG: unable to handle kernel paging request at 00017298
IP: [] queued_spin_lock_slowpath+0x139/0x200
PGD 7df68a067 PUD 7df68b067 PMD 0
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: crc32c_intel i2c_piix4 aesni_intel i2c_core 
aes_x86_64 glue_helper virtio_net serio_raw lrw floppy pcspkr gf128mul 
ablk_helper button cryptd sch_fq_codel autofs4 virtio_blk

CPU: 6 PID: 1125 Comm: dbench Not tainted 4.7.0-1-g00cc018 #220
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.0-1.fc24 
04/01/2014

task: 88072c918e40 ti: 88072cf5 task.ti: 88072cf5
RIP: 0010:[]  [] 
queued_spin_lock_slowpath+0x139/0x200
RSP: 0018:8807eff83ac8  EFLAGS: 00010002
RAX: 263d RBX: 8807eff97290 RCX: 001d
RDX: 00017298 RSI: 8807eff83b58 RDI: 8807540702fc
RBP: 8807eff83b88 R08:  R09: 0001a228
R10: 88080fffad80 R11: 005a R12: 0001
R13:  R14: 8807eff83d48 R15: 0003
FS:  7fb2b8810700() GS:8807eff8() 
knlGS:

CS:  0010 DS:  ES:  CR0: 80050033
CR2: 00017298 CR3: 0007df689000 CR4: 000406e0
Stack:
 8807540702fc 8807ec37a918 0001
8807ec37a900 8807ec37a980 8807eff97290 810961da
003c 8800ba864dd8 880798f94189 8807ef89a000
Call Trace:

[] ? select_idle_sibling+0x2a/0x120
[] ? wake_up_process+0x15/0x20
[] ? wake_up_worker+0x30/0x40
[] ? insert_work+0x78/0xc0
[] _raw_spin_lock_irqsave+0x4e/0x50
[] try_to_wake_up+0x3c/0x410
[] ? _raw_spin_unlock+0x16/0x40
[] ? __queue_work+0x1b0/0x530
[] ? update_cfs_shares+0xcf/0x110
[] default_wake_function+0x12/0x20
[] autoremove_wake_function+0x16/0x40
[] wake_bit_function+0x34/0x40
[] __wake_up_common+0x56/0x90
[] __wake_up+0x48/0x70
[] __wake_up_bit+0x48/0x50
[] end_page_writeback+0x81/0xa0
[] end_bio_extent_writepage+0x79/0xe0
[] bio_endio+0x6b/0x80
[] btrfs_end_bio+0x102/0x190
[] bio_endio+0x6b/0x80
[] blk_update_request+0x1e8/0x330
[] blk_mq_end_request+0x1a/0x40
[] virtblk_request_done+0x71/0xe0 [virtio_blk]
[] ? blkdev_issue_zeroout+0x1d0/0x1d0
[] __blk_mq_complete_request_remote+0x13/0x20
[] flush_smp_call_function_queue+0x8b/0x180
[] ? debug_smp_processor_id+0x17/0x20
[] generic_smp_call_function_single_interrupt+0x13/0x20
[] smp_call_function_single_interrupt+0x27/0x40
[] call_function_single_interrupt+0x7f/0x90

[] ? queued_spin_lock_slowpath+0x14c/0x200
[] ? queued_spin_lock_slowpath+0xc1/0x200
[] ? _raw_spin_unlock_irqrestore+0xe/0x40
[] queued_write_lock_slowpath+0x95/0xa0
[] ? finish_wait+0x6f/0x90
[] ? preempt_count_add+0xb8/0xd0
[] _raw_write_lock+0x32/0x40
[] btrfs_tree_lock+0x146/0x2c0
[] ? woken_wake_function+0x20/0x20
[] ? preempt_count_add+0xb8/0xd0
[] ? _raw_read_lock+0x3e/0x40
[] ? btrfs_tree_read_lock+0x78/0x170
[] ? woken_wake_function+0x20/0x20
[] ? autoremove_wake_function+0x16/0x40
[] ? btrfs_root_node+0x4f/0x90
[] btrfs_lock_root_node+0x34/0x50
[] btrfs_search_slot+0x769/0x9c0
[] ? btrfs_tree_unlock+0x6c/0xd0
[] btrfs_del_csums+0x239/0x330
[] __btrfs_free_extent+0x73f/0xe00
[] ? kmem_cache_free+0x22d/0x240
[] __btrfs_run_delayed_refs+0xbc7/0x1300
[] ? find_get_pages_tag+0x18f/0x2f0
[] ? extent_write_cache_pages.clone.0+0x3dd/0x460
[] btrfs_run_delayed_refs+0x8a/0x2b0
[] btrfs_commit_transaction+0x51/0xcb0
[] ? __filemap_fdatawait_range+0x9b/0x170
[] ? preempt_count_add+0xb8/0xd0
[] ? _raw_spin_unlock_irq+0x17/0x40
[] ? btrfs_lookup_first_ordered_extent+0x97/0xd0
[] ? btrfs_wait_ordered_range+0xf1/0x130
[] btrfs_sync_file+0x3ce/0x4b0
[] ? __audit_syscall_entry+0xb0/0x110
[] vfs_fsync_range+0x4c/0xb0
[] ? 

Re: [PATCH v2] Btrfs: fix file/data loss caused by fsync after rename and new inode

2016-03-31 Thread Chris Mason
On Wed, Mar 30, 2016 at 11:37:21PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> If we rename an inode A (be it a file or a directory), create a new
> inode B with the old name of inode A and under the same parent directory,
> fsync inode B and then power fail, at log tree replay time we end up
> removing inode A completely. If inode A is a directory then all its files
> are gone too.
> 
> Example scenarios where this happens:
> This is reproducible with the following steps, taken from a couple of
> test cases written for fstests which are going to be submitted upstream
> soon:

Thanks Filipe!  Since this is an older bug, I won't rush it into
tomorrow's pull, but I'll test and get it into next week.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: fix file/data loss caused by fsync after rename and new inode

2016-03-31 Thread Duncan
fdmanana posted on Wed, 30 Mar 2016 23:37:21 +0100 as excerpted:

> From: Filipe Manana 
> 
> If we rename an inode A (be it a file or a directory), create a new
> inode B with the old name of inode A and under the same parent
> directory, fsync inode B and then power fail, at log tree replay time
> we end up removing inode A completely. If inode A is a directory then
> all its files are gone too.

...

> V2: Node code changes, only updated the change log and the comment to
> be more clear about the problems solved by the new checks.

If there's a V3 anyway, apparent typo:

s/Node code/No code/

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html