BUG during send, cannot delete subvolume
Hi All, I had a ctree.c error during a send/receive backup: kernel BUG at fs/btrfs/ctree.c:1862 Nothing seemed to go wrong otherwise on the file system. After restarting the send, it completed, but I'm left with a subvolume I can't delete: BTRFS warning (device sdb1): Attempt to delete subvolume 176188 during send I don't see any zombie btrfs send processes lying around. Is there anyway to delete this volume? Do I just need a reboot? -Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
Well, it's at zero now... # btrfs fi df /export/ Data, single: total=30.45TiB, used=30.25TiB System, DUP: total=32.00MiB, used=3.62MiB Metadata, DUP: total=66.50GiB, used=65.16GiB GlobalReserve, single: total=512.00MiB, used=0.00B On 01/12/17 16:47, Duncan wrote: Hans van Kranenburg posted on Fri, 01 Dec 2017 18:06:23 +0100 as excerpted: On 12/01/2017 05:31 PM, Matt McKinnon wrote: Sorry, I missed your in-line reply: 2) How big is this filesystem? What does your `btrfs fi df /mountpoint` say? # btrfs fi df /export/ Data, single: total=30.45TiB, used=30.25TiB System, DUP: total=32.00MiB, used=3.62MiB Metadata, DUP: total=66.50GiB, used=65.08GiB GlobalReserve, single: total=512.00MiB, used=53.69MiB Multi-TiB filesystem, check. total/used ratio looks healthy. Not so healthy, from here. Data/metadata are healthy, yes, but... Any usage at all of global reserve is a red flag indicating that something in the filesystem thinks, or thought when it resorted to global reserve, that space is running out. Global reserve usage doesn't really hint what the problem is, but it's definitely a red flag that there /is/ a problem, and it's easily overlooked, as it apparently was here. It's likely indication of a bug, possibly one of the ones fixed right around 4.12/4.13. I'll let the devs and better experts take it from there, but I'd certainly be worried until global reserve drops to zero usage. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
Right. The file system is 48T, with 17T available, so we're not quite pushing it yet. So far so good on the space_cache=v2 mount. I'm surprised this isn't on the gotcha page in the wiki; it may end up making a world of difference to the users here Thanks again, Matt On 01/12/17 13:24, Hans van Kranenburg wrote: On 12/01/2017 06:57 PM, Holger Hoffstätte wrote: On 12/01/17 18:34, Matt McKinnon wrote: Thanks, I'll give space_cache=v2 a shot. Yes, very much recommended. My mount options are: rw,relatime,space_cache,autodefrag,subvolid=5,subvol=/ Turn autodefrag off and use noatime instead of relatime. Your filesystem also seems very full, We don't know. btrfs fi df only displays allocated space. And that being full is good, it means not too much free space fragments everywhere. that's bad with every filesystem but *especially* with btrfs because the allocator has to work really hard to find free space for COWing. Really consider deleting stuff or adding more space. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
Thanks, I'll give space_cache=v2 a shot. My mount options are: rw,relatime,space_cache,autodefrag,subvolid=5,subvol=/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
Sorry, I missed your in-line reply: 1) The one right above, btrfs_write_out_cache, is the write-out of the free space cache v1. Do you see this for multiple seconds going on, and does it match the time when it's writing X MB/s to disk? It seems to only last until the next watch update. [] io_schedule+0x16/0x40 [] get_request+0x23e/0x720 [] blk_queue_bio+0xc1/0x3a0 [] generic_make_request+0xf8/0x2a0 [] submit_bio+0x75/0x150 [] btrfs_map_bio+0xe5/0x2f0 [btrfs] [] btree_submit_bio_hook+0x8c/0xe0 [btrfs] [] submit_one_bio+0x63/0xa0 [btrfs] [] flush_epd_write_bio+0x3b/0x50 [btrfs] [] flush_write_bio+0xe/0x10 [btrfs] [] btree_write_cache_pages+0x379/0x450 [btrfs] [] btree_writepages+0x5d/0x70 [btrfs] [] do_writepages+0x1c/0x70 [] __filemap_fdatawrite_range+0xaa/0xe0 [] filemap_fdatawrite_range+0x13/0x20 [] btrfs_write_marked_extents+0xe9/0x110 [btrfs] [] btrfs_write_and_wait_transaction.isra.22+0x3d/0x80 [btrfs] [] btrfs_commit_transaction+0x665/0x900 [btrfs] [] transaction_kthread+0x18a/0x1c0 [btrfs] [] kthread+0x109/0x140 [] ret_from_fork+0x25/0x30 The last three lines will stick around for a while. Is switching to space cache v2 something that everyone should be doing? Something that would be a good test at least? 2) How big is this filesystem? What does your `btrfs fi df /mountpoint` say? # btrfs fi df /export/ Data, single: total=30.45TiB, used=30.25TiB System, DUP: total=32.00MiB, used=3.62MiB Metadata, DUP: total=66.50GiB, used=65.08GiB GlobalReserve, single: total=512.00MiB, used=53.69MiB 3) What kind of workload are you running? E.g. how can you describe it within a range from "big files which just sit there" to "small writes and deletes all over the place all the time"? It's a pretty light workload most of the time. It's a file system that exports two NFS shares to a small lab group. I believe it is more small reads all over a large file (MRI imaging) rather than small writes. 4) What kernel version is this? `uname -a` output? # uname -a Linux machine_name 4.12.8-custom #1 SMP Tue Aug 22 10:15:01 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
These seem to come up most often: [] transaction_kthread+0x133/0x1c0 [btrfs] [] kthread+0x109/0x140 [] ret_from_fork+0x25/0x30 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
Thanks for this. Here's what I get: [] transaction_kthread+0x133/0x1c0 [btrfs] [] kthread+0x109/0x140 [] ret_from_fork+0x25/0x30 ... [] io_schedule+0x16/0x40 [] get_request+0x23e/0x720 [] blk_queue_bio+0xc1/0x3a0 [] generic_make_request+0xf8/0x2a0 [] submit_bio+0x75/0x150 [] btrfs_map_bio+0xe5/0x2f0 [btrfs] [] btree_submit_bio_hook+0x8c/0xe0 [btrfs] [] submit_one_bio+0x63/0xa0 [btrfs] [] flush_epd_write_bio+0x3b/0x50 [btrfs] [] flush_write_bio+0xe/0x10 [btrfs] [] btree_write_cache_pages+0x379/0x450 [btrfs] [] btree_writepages+0x5d/0x70 [btrfs] [] do_writepages+0x1c/0x70 [] __filemap_fdatawrite_range+0xaa/0xe0 [] filemap_fdatawrite_range+0x13/0x20 [] btrfs_write_marked_extents+0xe9/0x110 [btrfs] [] btrfs_write_and_wait_transaction.isra.22+0x3d/0x80 [btrfs] [] btrfs_commit_transaction+0x665/0x900 [btrfs] ... [] io_schedule+0x16/0x40 [] wait_on_page_bit+0xe8/0x120 [] read_extent_buffer_pages+0x1cd/0x2e0 [btrfs] [] btree_read_extent_buffer_pages+0x9f/0x100 [btrfs] [] read_tree_block+0x32/0x50 [btrfs] [] read_block_for_search.isra.32+0x120/0x2e0 [btrfs] [] btrfs_next_old_leaf+0x215/0x400 [btrfs] [] btrfs_next_leaf+0x10/0x20 [btrfs] [] btrfs_lookup_csums_range+0x12e/0x410 [btrfs] [] csum_exist_in_range.isra.49+0x2a/0x81 [btrfs] [] run_delalloc_nocow+0x9b2/0xa10 [btrfs] [] run_delalloc_range+0x68/0x340 [btrfs] [] writepage_delalloc.isra.47+0xf0/0x140 [btrfs] [] __extent_writepage+0xc7/0x290 [btrfs] [] extent_write_cache_pages.constprop.53+0x2b5/0x450 [btrfs] [] extent_writepages+0x4d/0x70 [btrfs] [] btrfs_writepages+0x28/0x30 [btrfs] [] do_writepages+0x1c/0x70 [] __filemap_fdatawrite_range+0xaa/0xe0 [] filemap_fdatawrite_range+0x13/0x20 [] btrfs_fdatawrite_range+0x20/0x50 [btrfs] [] __btrfs_write_out_cache+0x3d9/0x420 [btrfs] [] btrfs_write_out_cache+0x86/0x100 [btrfs] [] btrfs_write_dirty_block_groups+0x261/0x390 [btrfs] [] commit_cowonly_roots+0x1fb/0x290 [btrfs] [] btrfs_commit_transaction+0x434/0x900 [btrfs] ... [] tree_search_offset.isra.23+0x37/0x1d0 [btrfs] -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs-transacti hammering the system
Hi All, Is there any way to figure out what exactly btrfs-transacti is chugging on? I have a few file systems that seem to get wedged for days on end with this process pegged around 100%. I've stopped all snapshots, made sure no quotas were enabled, turned on autodefrag in the mount options, tried manual defragging, kernel upgrades, yet still this brings my system to a crawl. Network I/O to the system seems very tiny. The only I/O I see to the disk is btrfs-transacti writing a couple M/s. # time touch foo real2m54.303s user0m0.000s sys 0m0.002s # uname -r 4.12.8-custom # btrfs --version btrfs-progs v4.13.3 Yes, I know I'm a bit behind there... -Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kernel BUG at fs/btrfs/ctree.c:3182
Hi All, Been having issues on one machine and I was wondering if I could get some help tracking the issue down. # uname -a Linux riperton 4.13.5-custom #1 SMP Sat Oct 7 18:28:16 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux # btrfs --version btrfs-progs v4.13.3 # btrfs fi show Label: none uuid: 8133a362-8e41-4da4-b607-a27832861157 Total devices 1 FS bytes used 41.64TiB devid1 size 50.93TiB used 41.88TiB path /dev/sda1 # btrfs fi df /export/ Data, single: total=41.70TiB, used=41.57TiB System, DUP: total=64.00MiB, used=4.56MiB Metadata, DUP: total=90.00GiB, used=72.30GiB Metadata, single: total=1.53GiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B [617994.948036] [ cut here ] [617994.948040] kernel BUG at fs/btrfs/ctree.c:3182! [617994.952786] invalid opcode: [#1] SMP [617994.956896] Modules linked in: ipmi_devintf xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables intel_ra pl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btrfs pcbc aesni_intel aes_ x86_64 crypto_simd glue_helper cryptd dm_multipath joydev lpc_ich mei_me mei nfsd ioatdma auth_rpcgss nfs_acl ipmi_si wmi nfs ipmi_msghandler lockd grace sunrp c fscache shpchp mac_hid lp parport ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov hid_generic async_memcpy async_pq usbhid async_xor hid as ync_tx xor igb raid6_pq libcrc32c i2c_algo_bit raid1 ahci dca raid0 libahci ptp megaraid_sas multipath pps_core linear dm_mirror dm_region_hash dm_log [617995.025316] CPU: 1 PID: 3191 Comm: nfsd Tainted: GW 4.13.5-custom #1 [617995.032965] Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014 [617995.042092] task: 996bac7d5a00 task.stack: bb7984b74000 [617995.048134] RIP: 0010:btrfs_set_item_key_safe+0x14e/0x160 [btrfs] [617995.054310] RSP: 0018:bb7984b77658 EFLAGS: 00010246 [617995.059622] RAX: RBX: 0037 RCX: 00018000 [617995.066834] RDX: RSI: bb7984b7776e RDI: bb7984b77677 [617995.074051] RBP: bb7984b776b0 R08: bb7984b77677 R09: [617995.081263] R10: R11: 0003 R12: bb7984b77666 [617995.088483] R13: 99679cc00460 R14: bb7984b7776e R15: 9966184867a8 [617995.095705] FS: () GS:9967afc8() knlGS: [617995.103876] CS: 0010 DS: ES: CR0: 80050033 [617995.109707] CR2: 7fdbaad6 CR3: 00071fe09000 CR4: 001406e0 [617995.116921] Call Trace: [617995.119493] __btrfs_drop_extents+0x50c/0xdd0 [btrfs] [617995.124663] ? btrfs_encode_fh+0xd0/0xd0 [btrfs] [617995.129390] btrfs_log_changed_extents+0x31b/0x640 [btrfs] [617995.134990] ? free_extent_buffer+0x4b/0x90 [btrfs] [617995.139976] btrfs_log_inode+0x8de/0xb90 [btrfs] [617995.144686] ? dput+0xf1/0x1d0 [617995.147847] btrfs_log_inode_parent+0x21a/0x960 [btrfs] [617995.153164] ? kmem_cache_alloc+0x194/0x1a0 [617995.157459] ? start_transaction+0x120/0x440 [btrfs] [617995.162528] btrfs_log_dentry_safe+0x69/0x90 [btrfs] [617995.167599] btrfs_sync_file+0x2ab/0x3e0 [btrfs] [617995.172309] vfs_fsync_range+0x3d/0xb0 [617995.176168] btrfs_file_write_iter+0x45b/0x560 [btrfs] [617995.181396] do_iter_readv_writev+0xe2/0x130 [617995.185753] do_iter_write+0x7f/0x190 [617995.189506] vfs_iter_write+0x19/0x30 [617995.193271] nfsd_vfs_write+0xb1/0x310 [nfsd] [617995.197719] nfsd_write+0x134/0x1e0 [nfsd] [617995.201908] nfsd3_proc_write+0x92/0x110 [nfsd] [617995.206533] nfsd_dispatch+0xb9/0x250 [nfsd] [617995.210915] svc_process_common+0x36e/0x6f0 [sunrpc] [617995.215979] svc_process+0xfc/0x1c0 [sunrpc] [617995.220339] nfsd+0xe9/0x160 [nfsd] [617995.223918] kthread+0x109/0x140 [617995.227238] ? nfsd_destroy+0x60/0x60 [nfsd] [617995.231591] ? kthread_park+0x60/0x60 [617995.235348] ret_from_fork+0x25/0x30 [617995.239010] Code: 48 8b 45 bf 48 8d 7d c7 4c 89 f6 48 89 45 d0 0f b6 45 be 88 45 cf 48 8b 45 b6 48 89 45 c7 e8 aa f3 ff ff 85 c0 0f 8f 55 ff ff ff <0f> 0b 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 [617995.257983] RIP: btrfs_set_item_key_safe+0x14e/0x160 [btrfs] RSP: bb7984b77658 [617995.265696] ---[ end trace 41d8bb716a419cdd ]--- And after a reboot we come up with this warning: [ 112.712899] [ cut here ] [ 112.712943] WARNING: CPU: 5 PID: 505 at fs/btrfs/file.c:547 btrfs_drop_extent_cache+0x3c5/0x3d0 [btrfs] [ 112.712944] Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel xt_tcpudp kvm nf_conntrack_ipv4 nf_defrag_ipv4 irqbypass xt_conntrack crct10dif_pclmul nf_conntrack crc32_pclmul ghash_clmulni_intel pcbc iptable_filter ip_tables aesni_intel x_tables aes_x86_64 crypto_simd glue_helper cryptd dm_multipath
Re: Struggling with file system slowness
Those snapshots were created using Marc Merlin's script (thanks, Marc). They don't do anything except sit around on the file system for a week or so and then are removed. I'm now doing quarter-hourly snaps instead of nightly since I have nightly backups of the filesytem going off-site. So far the btrfs-transaction and memory spikes have not returned. -Matt On 05/09/2017 03:14 PM, Liu Bo wrote: On Fri, May 05, 2017 at 09:24:32AM -0400, Matt McKinnon wrote: Too little information. Is IO happening at the same time? Is compression on? Deduplicated? Lots of subvolumes? SSD? What kind of workload and file size/distribution profile? Only write IO during the load spikes. No compression, no deduplication. 12 volumes (including snapshots). Spinning disks. Medium workload; file sizes are all over the map since this hold about 30 user home directories. Interestingly enough, the problems which had persisted for many weeks went away when all snapshots were removed. btrfs-transaction spikes disappeared. Memory usage went from 30G to under 2G. Were those snapshots served as backup? Could you please elaborate how you create snapshots? We could probably hammer out a testcase to improve the situation. Thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Struggling with file system slowness
Hi All, Trying to peg down why I have one server that has btrfs-transacti pegged at 100% CPU for most of the time. I thought this might have to do with fragmentation as mentioned in the Gotchas page in the wiki (btrfs-endio-wri doesn't seem to be involved as mentioned in the wiki), but after running a full defrag of the file system, and also enabling the 'autodefrag' mount option, the problem still persists. What's the best way to figure out what btrfs is chugging away at here? Kernel: 4.10.13-custom btrfs-progs: v4.10.2 -Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hard crash on 4.9.5, part 2
I have an error on this file system I've had in the distant pass where the mount would fail with a "file exists" error. Running a btrfs check gives the following over and over again: Found file extent holes: start: 0, len: 290816 root 257 inode 28472371 errors 1000, some csum missing root 257 inode 28472416 errors 1000, some csum missing root 257 inode 9182183 errors 1000, some csum missing root 257 inode 9182186 errors 1000, some csum missing root 257 inode 28419536 errors 1100, file extent discount, some csum missing Found file extent holes: start: 0, len: 290816 root 257 inode 28472371 errors 1000, some csum missing root 257 inode 28472416 errors 1000, some csum missing root 257 inode 9182183 errors 1000, some csum missing root 257 inode 9182186 errors 1000, some csum missing root 257 inode 28419536 errors 1100, file extent discount, some csum missing Are these found per subvolume snapshot I have and will eventually end? Here is the crash after the mount (with recovery/usebackuproot): [ 627.233213] BTRFS warning (device sda1): 'recovery' is deprecated, use 'usebackuproot' instead [ 627.233216] BTRFS info (device sda1): trying to use backup root at mount time [ 627.233218] BTRFS info (device sda1): disk space caching is enabled [ 627.233220] BTRFS info (device sda1): has skinny extents [ 709.234688] [ cut here ] [ 709.234734] WARNING: CPU: 5 PID: 3468 at fs/btrfs/file.c:546 btrfs_drop_extent_cache+0x3e8/0x400 [btrfs] [ 709.234735] Modules linked in: ipmi_devintf nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache lp parport intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel xt_tcpudp kvm nf_conntrack_ipv4 nf_defrag_ipv4 irqbypass crct10d if_pclmul crc32_pclmul ghash_clmulni_intel xt_conntrack aesni_intel btrfs nf_conntrack aes_x86_64 lrw gf128mul iptable_filter glue_h elper ip_tables ablk_helper cryptd x_tables dm_multipath joydev mei_me ioatdma mei lpc_ich wmi ipmi_si ipmi_msghandler shpchp mac_hi d ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic megarai d_sas raid6_pq ahci libcrc32c libahci igb usbhid raid1 hid i2c_algo_bit raid0 dca ptp multipath pps_core linear dm_mirror dm_region_ hash dm_log [ 709.234812] CPU: 5 PID: 3468 Comm: mount Not tainted 4.9.5-custom #1 [ 709.234813] Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014 [ 709.234816] bd3784bb7568 8e3c8e7c [ 709.234820] bd3784bb75a8 8e07d3d1 02220070 9e5f0ae4d150 [ 709.234823] 0002d000 9e5f0bc91f78 9e5f0bc91da8 0002c000 [ 709.234827] Call Trace: [ 709.234837] [] dump_stack+0x63/0x87 [ 709.234846] [] __warn+0xd1/0xf0 [ 709.234850] [] warn_slowpath_null+0x1d/0x20 [ 709.234874] [] btrfs_drop_extent_cache+0x3e8/0x400 [btrfs] [ 709.234895] [] __btrfs_drop_extents+0x5b2/0xd30 [btrfs] [ 709.234914] [] ? generic_bin_search.constprop.36+0x8b/0x1e0 [btrfs] [ 709.234931] [] ? btrfs_set_path_blocking+0x36/0x70 [btrfs] [ 709.234942] [] ? kmem_cache_alloc+0x194/0x1a0 [ 709.234958] [] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [ 709.234977] [] btrfs_drop_extents+0x79/0xa0 [btrfs] [ 709.235002] [] replay_one_extent+0x414/0x7b0 [btrfs] [ 709.235007] [] ? autoremove_wake_function+0x40/0x40 [ 709.235030] [] replay_one_buffer+0x4cc/0x7c0 [btrfs] [ 709.235053] [] ? mark_extent_buffer_accessed+0x4f/0x70 [btrfs] [ 709.235074] [] walk_down_log_tree+0x1ba/0x3b0 [btrfs] [ 709.235094] [] walk_log_tree+0xb4/0x1a0 [btrfs] [ 709.235114] [] btrfs_recover_log_trees+0x20e/0x460 [btrfs] [ 709.235133] [] ? replay_one_extent+0x7b0/0x7b0 [btrfs] [ 709.235154] [] open_ctree+0x2640/0x27f0 [btrfs] [ 709.235171] [] btrfs_mount+0xca4/0xec0 [btrfs] [ 709.235176] [] ? find_next_zero_bit+0x1e/0x20 [ 709.235180] [] ? pcpu_next_unpop+0x3e/0x50 [ 709.235184] [] ? find_next_bit+0x19/0x20 [ 709.235190] [] mount_fs+0x39/0x160 [ 709.235193] [] ? __alloc_percpu+0x15/0x20 [ 709.235196] [] vfs_kern_mount+0x67/0x110 [ 709.235213] [] btrfs_mount+0x18b/0xec0 [btrfs] [ 709.235216] [] ? find_next_zero_bit+0x1e/0x20 [ 709.235220] [] mount_fs+0x39/0x160 [ 709.235223] [] ? __alloc_percpu+0x15/0x20 [ 709.235225] [] vfs_kern_mount+0x67/0x110 [ 709.235228] [] do_mount+0x1bb/0xc80 [ 709.235232] [] ? kmem_cache_alloc_trace+0x14b/0x1b0 [ 709.235235] [] SyS_mount+0x83/0xd0 [ 709.235240] [] entry_SYSCALL_64_fastpath+0x1e/0xad [ 709.235243] ---[ end trace d4e5dcddb432b7d3 ]--- [ 709.354972] BTRFS: error (device sda1) in btrfs_replay_log:2506: errno=-17 Object already exists (Failed to recover log tree) [ 709.355570] BTRFS error (device sda1): cleaner transaction attach returned -30 [ 709.548919] BTRFS error (device sda1): open_ctree failed -Matt -- To unsubscribe from this list: send the line "unsubscribe
Re: Hard crash on 4.9.5
This same file system (which crashed again with the same errors) is also giving this output during a metadata or data balance: Jan 27 19:42:47 my_machine kernel: [ 335.018123] BTRFS info (device sda1): no csum found for inode 28472371 start 2191360 Jan 27 19:42:47 my_machine kernel: [ 335.018128] BTRFS info (device sda1): no csum found for inode 28472371 start 2195456 Jan 27 19:42:47 my_machine kernel: [ 335.018491] BTRFS info (device sda1): no csum found for inode 28472371 start 4018176 Jan 27 19:42:47 my_machine kernel: [ 335.018496] BTRFS info (device sda1): no csum found for inode 28472371 start 4022272 Jan 27 19:42:47 my_machine kernel: [ 335.018499] BTRFS info (device sda1): no csum found for inode 28472371 start 4026368 Jan 27 19:42:47 my_machine kernel: [ 335.018502] BTRFS info (device sda1): no csum found for inode 28472371 start 4030464 Jan 27 19:42:47 my_machine kernel: [ 335.019443] BTRFS info (device sda1): no csum found for inode 28472371 start 6156288 Jan 27 19:42:47 my_machine kernel: [ 335.019688] BTRFS info (device sda1): no csum found for inode 28472371 start 7933952 Jan 27 19:42:47 my_machine kernel: [ 335.019693] BTRFS info (device sda1): no csum found for inode 28472371 start 7938048 Jan 27 19:42:47 my_machine kernel: [ 335.019754] BTRFS info (device sda1): no csum found for inode 28472371 start 8077312 Jan 27 19:42:47 my_machine kernel: [ 335.025485] BTRFS warning (device sda1): csum failed ino 28472371 off 2191360 csum 4031061501 expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.025490] BTRFS warning (device sda1): csum failed ino 28472371 off 2195456 csum 2371784003 expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.025526] BTRFS warning (device sda1): csum failed ino 28472371 off 4018176 csum 3812080098 expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.025531] BTRFS warning (device sda1): csum failed ino 28472371 off 4022272 csum 2776681411 expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.025534] BTRFS warning (device sda1): csum failed ino 28472371 off 4026368 csum 1179241675 expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.025540] BTRFS warning (device sda1): csum failed ino 28472371 off 4030464 csum 1256914217 expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.026142] BTRFS warning (device sda1): csum failed ino 28472371 off 7933952 csum 2695958066 expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.026147] BTRFS warning (device sda1): csum failed ino 28472371 off 7938048 csum 3260800596 expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.026934] BTRFS warning (device sda1): csum failed ino 28472371 off 6156288 csum 4293116449 expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.033249] BTRFS warning (device sda1): csum failed ino 28472371 off 8077312 csum 4031878292 expected csum 0 Can these be ignored? On 01/25/2017 04:06 PM, Liu Bo wrote: On Mon, Jan 23, 2017 at 03:03:55PM -0500, Matt McKinnon wrote: Wondering what to do about this error which says 'reboot needed'. Has happened a three times in the past week: Well, I don't think btrfs's logic here is wrong, the following stack shows that a nfs client has sent a second unlink against the same inode while somehow the inode was not fully deleted by the first unlink. So it'd be good that you could add some debugging information to get us further. Thanks, -liubo Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device sda1): err add delayed dir index item(index: 23810) into the deletion tree of the delayed node(root id: 257, inode id: 2661433, errno: -17) Jan 23 14:16:17 my_machine kernel: [ 2568.611010] [ cut here ] Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at fs/btrfs/delayed-inode.c:1557! Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode: [#1] SMP Jan 23 14:16:17 my_machine kernel: [ 2568.624960] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_rej ect_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd au th_rpcgss nfs_acl nfs lockd grace sunrpc fscache intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_int el kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper crypt d dm_multipath joydev mei_me mei lpc_ich ioatdma wmi ipmi_si ipmi_msghandler btrfs shpchp mac_hid lp parport ses enclosure scsi_tran sport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c igb hid_generic i2c_algo_ bit raid1 dca usbhid ahci raid0 ptp megaraid_sas multipath Jan 23 14:16:17 my_machine kernel: [ 2568.697150] hid libahci pps_core linear dm_mirror dm_region_hash dm_log Jan 23 14:16:17 my_machine kernel: [ 2568.702689] CPU: 0 PID: 2440 Comm: nfsd Tainted: GW 4.9.5-custom #1 Jan 23 14:16:17
Hard crash on 4.9.5
Wondering what to do about this error which says 'reboot needed'. Has happened a three times in the past week: Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device sda1): err add delayed dir index item(index: 23810) into the deletion tree of the delayed node(root id: 257, inode id: 2661433, errno: -17) Jan 23 14:16:17 my_machine kernel: [ 2568.611010] [ cut here ] Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at fs/btrfs/delayed-inode.c:1557! Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode: [#1] SMP Jan 23 14:16:17 my_machine kernel: [ 2568.624960] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_rej ect_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd au th_rpcgss nfs_acl nfs lockd grace sunrpc fscache intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_int el kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper crypt d dm_multipath joydev mei_me mei lpc_ich ioatdma wmi ipmi_si ipmi_msghandler btrfs shpchp mac_hid lp parport ses enclosure scsi_tran sport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c igb hid_generic i2c_algo_ bit raid1 dca usbhid ahci raid0 ptp megaraid_sas multipath Jan 23 14:16:17 my_machine kernel: [ 2568.697150] hid libahci pps_core linear dm_mirror dm_region_hash dm_log Jan 23 14:16:17 my_machine kernel: [ 2568.702689] CPU: 0 PID: 2440 Comm: nfsd Tainted: GW 4.9.5-custom #1 Jan 23 14:16:17 my_machine kernel: [ 2568.710166] Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28 /2014 Jan 23 14:16:17 my_machine kernel: [ 2568.719207] task: 95a42addab80 task.stack: b9da8533 Jan 23 14:16:17 my_machine kernel: [ 2568.725124] RIP: 0010:[] [] btrfs_delete_delayed_dir_inde x+0x286/0x290 [btrfs] Jan 23 14:16:17 my_machine kernel: [ 2568.735604] RSP: 0018:b9da85333be0 EFLAGS: 00010286 Jan 23 14:16:17 my_machine kernel: [ 2568.740917] RAX: RBX: 95a3b104b690 RCX: Jan 23 14:16:17 my_machine kernel: [ 2568.748048] RDX: 0001 RSI: 95a42fc0dcc8 RDI: 95a42fc0dcc8 Jan 23 14:16:17 my_machine kernel: [ 2568.755171] RBP: b9da85333c48 R08: 0491 R09: Jan 23 14:16:17 my_machine kernel: [ 2568.762297] R10: 0005 R11: 0006 R12: 95a3b104b6d8 Jan 23 14:16:17 my_machine kernel: [ 2568.769429] R13: 5d02 R14: 95a82953d800 R15: ffef Jan 23 14:16:17 my_machine kernel: [ 2568.776555] FS: () GS:95a42fc0() knlGS: Jan 23 14:16:17 my_machine kernel: [ 2568.784639] CS: 0010 DS: ES: CR0: 80050033 Jan 23 14:16:17 my_machine kernel: [ 2568.790377] CR2: 7f12ea376000 CR3: 0003e1e07000 CR4: 001406f0 Jan 23 14:16:17 my_machine kernel: [ 2568.797503] Stack: Jan 23 14:16:17 my_machine kernel: [ 2568.799524] 9b7fe5f2 95a3b104b560 0004 95a3f96b3e80 Jan 23 14:16:17 my_machine kernel: [ 2568.806983] 95a3f96b3e80 39ff95a814eeeb68 6000289c 5d02 Jan 23 14:16:17 my_machine kernel: [ 2568.814436] 95a3f7457c40 95a3bcb74138 95a814eeeb68 00289c39 Jan 23 14:16:17 my_machine kernel: [ 2568.821891] Call Trace: Jan 23 14:16:17 my_machine kernel: [ 2568.824343] [] ? mutex_lock+0x12/0x2f Jan 23 14:16:17 my_machine kernel: [ 2568.829671] [] __btrfs_unlink_inode+0x198/0x4c0 [btrfs] Jan 23 14:16:17 my_machine kernel: [ 2568.836555] [] btrfs_unlink_inode+0x1c/0x40 [btrfs] Jan 23 14:16:17 my_machine kernel: [ 2568.843086] [] btrfs_unlink+0x6b/0xb0 [btrfs] Jan 23 14:16:17 my_machine kernel: [ 2568.849091] [] vfs_unlink+0xda/0x190 Jan 23 14:16:17 my_machine kernel: [ 2568.854315] [] ? lookup_one_len+0xd3/0x130 Jan 23 14:16:17 my_machine kernel: [ 2568.860075] [] nfsd_unlink+0x16e/0x210 [nfsd] Jan 23 14:16:17 my_machine kernel: [ 2568.866084] [] nfsd3_proc_remove+0x7c/0x110 [nfsd] Jan 23 14:16:17 my_machine kernel: [ 2568.872529] [] nfsd_dispatch+0xb8/0x1f0 [nfsd] Jan 23 14:16:17 my_machine kernel: [ 2568.878641] [] svc_process_common+0x43f/0x700 [sunrpc] Jan 23 14:16:17 my_machine kernel: [ 2568.885432] [] svc_process+0xfc/0x1c0 [sunrpc] Jan 23 14:16:17 my_machine kernel: [ 2568.891528] [] nfsd+0xf0/0x160 [nfsd] Jan 23 14:16:17 my_machine kernel: [ 2568.896838] [] ? nfsd_destroy+0x60/0x60 [nfsd] Jan 23 14:16:17 my_machine kernel: [ 2568.902931] [] kthread+0xca/0xe0 Jan 23 14:16:17 my_machine kernel: [ 2568.907807] [] ? kthread_park+0x60/0x60 Jan 23 14:16:17 my_machine kernel: [ 2568.913296] [] ret_from_fork+0x25/0x30 Jan 23 14:16:17 my_machine kernel: [ 2568.918693] Code: ff ff 48 8b 43 10 49 8b
kernel crash after upgrading to 4.9
Hi All, I seem to have a similar issue to a subject in December: Subject: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another In my case, this is caused when rsync'ing large amounts of data over NFS to the server with the BTRFS file system. This was not apparent in the previous kernel (4.7). The poster mentioned some suggestions from Ducan here: https://mail-archive.com/linux-btrfs@vger.kernel.org/msg60083.html But those are not visible in the thread. What suggestions were given to help alleviate this pain? -Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
t here ] [ 79.922000] WARNING: CPU: 6 PID: 2632 at fs/btrfs/file.c:546 btrfs_drop_extent_cache+0x3e8/0x400 [btrfs] [ 79.922002] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btrfs aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd dm_multipath nfsd auth_rpcgss joydev nfs_acl mei_me nfs lpc_ich mei lockd wmi grace ipmi_si sunrpc ipmi_msghandler fscache shpchp ioatdma mac_hid lp parport ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic igb raid6_pq i2c_algo_bit libcrc32c dca usbhid raid1 ahci raid0 ptp megaraid_sas multipath hid libahci pps_core linear dm_mirror dm_region_hash dm_log [ 79.922063] CPU: 6 PID: 2632 Comm: mount Not tainted 4.7.0-custom #1 [ 79.922065] Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014 [ 79.922067] 88046ca1f538 813b816c [ 79.922071] 88046ca1f578 8107a321 02226ca1f5e0 [ 79.922074] 880841d19460 e000 880841e21290 880841e210c0 [ 79.922077] Call Trace: [ 79.922089] [] dump_stack+0x63/0x87 [ 79.922096] [] __warn+0xd1/0xf0 [ 79.922099] [] warn_slowpath_null+0x1d/0x20 [ 79.922117] [] btrfs_drop_extent_cache+0x3e8/0x400 [btrfs] [ 79.922133] [] __btrfs_drop_extents+0x5b2/0xd30 [btrfs] [ 79.922147] [] ? generic_bin_search.constprop.36+0x85/0x190 [btrfs] [ 79.922160] [] ? btrfs_set_path_blocking+0x36/0x70 [btrfs] [ 79.922173] [] ? btrfs_search_slot+0x438/0x970 [btrfs] [ 79.922178] [] ? kmem_cache_alloc+0x1d6/0x1f0 [ 79.922190] [] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [ 79.922205] [] btrfs_drop_extents+0x79/0xa0 [btrfs] [ 79.94] [] replay_one_extent+0x419/0x750 [btrfs] [ 79.922241] [] replay_one_buffer+0x4db/0x7d0 [btrfs] [ 79.922258] [] ? mark_extent_buffer_accessed+0x4f/0x70 [btrfs] [ 79.922274] [] walk_down_log_tree+0x1cc/0x3d0 [btrfs] [ 79.922289] [] walk_log_tree+0xba/0x1a0 [btrfs] [ 79.922304] [] btrfs_recover_log_trees+0x213/0x470 [btrfs] [ 79.922318] [] ? replay_one_extent+0x750/0x750 [btrfs] [ 79.922335] [] open_ctree+0x264d/0x2760 [btrfs] [ 79.922348] [] btrfs_mount+0xc94/0xeb0 [btrfs] [ 79.922353] [] ? find_next_zero_bit+0x1e/0x20 [ 79.922358] [] ? pcpu_next_unpop+0x3e/0x50 [ 79.922362] [] ? find_next_bit+0x19/0x20 [ 79.922368] [] mount_fs+0x39/0x160 [ 79.922371] [] ? __alloc_percpu+0x15/0x20 [ 79.922375] [] vfs_kern_mount+0x67/0x110 [ 79.922387] [] btrfs_mount+0x18b/0xeb0 [btrfs] [ 79.922390] [] ? find_next_zero_bit+0x1e/0x20 [ 79.922394] [] mount_fs+0x39/0x160 [ 79.922397] [] ? __alloc_percpu+0x15/0x20 [ 79.922399] [] vfs_kern_mount+0x67/0x110 [ 79.922402] [] do_mount+0x22a/0xd90 [ 79.922406] [] ? __kmalloc_track_caller+0x1af/0x250 [ 79.922408] [] ? strndup_user+0x41/0x80 [ 79.922411] [] ? memdup_user+0x42/0x70 [ 79.922413] [] SyS_mount+0x83/0xd0 [ 79.922418] [] entry_SYSCALL_64_fastpath+0x1e/0xa8 [ 79.922436] ---[ end trace 0db3466cdad31dcf ]--- On 08/09/2016 10:25 PM, Chris Murphy wrote: On Tue, Aug 9, 2016 at 6:29 PM, Matt McKinnon <m...@techsquare.com> wrote: Spoke too soon. Do I need to continue to run with that mount option in place? It shouldn't be necessary. Something's still wrong for some reason, even with DUP metadata being CoW'd so someone else is going to have to speak up what the problem is. And that btrfs check not only doesn't come up clean but crashes suggests some confluence of things in kernel 4.3 and your hardware conspired to make the file system inconsistent in a way that isn't immediately recovering the usual way. That is, usebackuproots working suggests that there's a bug elsewhere in the storage stack because normally that shouldn't be necessary - something's happened out of order. 1 size 50.93TiB used 22.67TiB path /dev/sda1 What is the exact nature of this block device? If getting this back up and running is urgent I suggest inquiring on IRC what the next steps are. In the meantime I'd get a btrfs-image (which is probably going to be quite large given metadata is 60GiB), if that pukes then see if 'btrfs inspect-internal dump-tree /dev/sda1 > dumptree.log' which may also fail but before it fails might contain something useful. Obviously btrfs check shouldn't crash so that's a bug already. What do you get for free -m? It's known that btrfs check needs a lot of memory and pretty much all the metadata needs to be read in, so... if you have an SSD available it might make sense to setup a huge pile of swap on that SSD and rerun btrfs check. -- To unsubscribe from this list: send the line "uns
Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
I performed a quick balance which gave me: [39020.030638] BTRFS info (device sda1): relocating block group 25428383236096 flags 1 [39020.206097] BTRFS warning (device sda1): block group 23113395863552 has wrong amount of free space [39020.206101] BTRFS warning (device sda1): failed to load free space cache for block group 23113395863552, rebuilding it now then a crash dump. Remounted with -o clear_cache,nospace_cache and the balance completed. Running a larger balance now. Will umount, and remount with default options to see if that works. -Matt On 08/10/2016 03:09 AM, g6094...@freenet.de wrote: Hi, from what i see you have a non finished balance ongoing, since you have system and metadata DUP and single information on disk. so you should (re)run a balance for this data. sash Am 10.08.2016 um 02:17 schrieb Matt McKinnon: -o usebackuproot worked well. after the file system settled, performing a sync and a clean umount, a normal mount works now as well. Anything I should be doing going forward? Thanks, Matt On 08/09/2016 08:01 PM, Chris Murphy wrote: On Tue, Aug 9, 2016 at 5:15 PM, Matt McKinnon <m...@techsquare.com> wrote: Hello, Our server recently crashed and was rebooted. When it returned our BTRFS volume is mounting read-only: What happens when you try mounting with -o usebackuproot ? If that fails, what output do you get for 'btrfs check' (without --repair)? If you only get some "errors 400, nbytes wrong" then --repair should fix the problem. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
# btrfs check /dev/sda1 Checking filesystem on /dev/sda1 UUID: 33f9089e-acc7-4a39-8b83-b18bb182faaf checking extents ref mismatch on [958277767168 5894144] extent item 0, found 1 Backref 958277767168 root 257 owner 15799573 offset 750342144 num_refs 0 not found in extent tree Incorrect local backref count on 958277767168 root 257 owner 15799573 offset 750342144 found 1 wanted 0 back 0x15d380f90 backpointer mismatch on [958277767168 5894144] ref mismatch on [958298935296 9666560] extent item 0, found 2 Backref 958298935296 root 257 owner 15799573 offset 559185920 num_refs 0 not found in extent tree Incorrect local backref count on 958298935296 root 257 owner 15799573 offset 559185920 found 2 wanted 0 back 0x15d3809a0 backpointer mismatch on [958298935296 9666560] about 859 of those ... Then: owner ref check failed [25737445867520 16384] checking free space cache There is no free space entry for 109105479680-109105496064 There is no free space entry for 109105479680-109551026176 cache appears valid but isn't 109014155264 There is no free space entry for 139709693952-139709710336 There is no free space entry for 139709693952-140152668160 cache appears valid but isn't 139615797248 Wanted offset 171291525120, found 171291426816 Wanted offset 171291525120, found 171291426816 cache appears valid but isn't 171291181056 Wanted offset 220146597888, found 220146532352 Wanted offset 220146597888, found 220146532352 cache appears valid but isn't 220146434048 btrfs: unable to add free space :-17 free-space-cache.c:824: btrfs_add_free_space: Assertion `ret == -EEXIST` failed. btrfs[0x464af9] btrfs(btrfs_add_free_space+0x154)[0x46531f] btrfs(load_free_space_cache+0xab7)[0x465e36] btrfs(cmd_check+0x22c7)[0x42db0e] btrfs(main+0x155)[0x40a4fd] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7faad34cdf45] btrfs[0x40a0f9] and we crashed out of the check there. -Matt On 08/09/2016 08:06 PM, Chris Murphy wrote: On Tue, Aug 9, 2016 at 6:01 PM, Chris Murphy <li...@colorremedies.com> wrote: On Tue, Aug 9, 2016 at 5:15 PM, Matt McKinnon <m...@techsquare.com> wrote: Hello, Our server recently crashed and was rebooted. When it returned our BTRFS volume is mounting read-only: What happens when you try mounting with -o usebackuproot ? If that fails, what output do you get for 'btrfs check' (without --repair)? If you only get some "errors 400, nbytes wrong" then --repair should fix the problem. This could also be a regression somewhere... https://bugzilla.kernel.org/show_bug.cgi?id=60522 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
Spoke too soon. Do I need to continue to run with that mount option in place? [ 83.775984] BTRFS warning (device sda1): block group 25741009879040 has wrong amount of free space [ 83.775989] BTRFS warning (device sda1): failed to load free space cache for block group 25741009879040, rebuilding it now [ 85.231748] BTRFS warning (device sda1): block group 25737721544704 has wrong amount of free space [ 85.231752] BTRFS warning (device sda1): failed to load free space cache for block group 25737721544704, rebuilding it now [ 98.913796] BTRFS info (device sda1): disk space caching is enabled [ 98.913803] BTRFS info (device sda1): has skinny extents [ 179.564408] BTRFS warning (device sda1): block group 78412513280 has wrong amount of free space [ 179.564414] BTRFS warning (device sda1): failed to load free space cache for block group 78412513280, rebuilding it now [ 667.106718] [ cut here ] [ 667.106772] WARNING: CPU: 0 PID: 2726 at fs/btrfs/extent-tree.c:2963 btrfs_run_delayed_refs+0x292/0x2d0 [btrfs] [ 667.106775] BTRFS: Transaction aborted (error -17) [ 667.106777] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel btrfs kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd dm_multipath joydev lpc_ich mei_me mei wmi ipmi_si ipmi_msghandler nfsd auth_rpcgss nfs_acl nfs lockd grace ioatdma sunrpc shpchp mac_hid fscache lp parport ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c hid_generic igb raid1 usbhid i2c_algo_bit ahci raid0 dca multipath ptp hid megaraid_sas libahci linear pps_core dm_mirror dm_region_hash dm_log [ 667.106859] CPU: 0 PID: 2726 Comm: btrfs-transacti Not tainted 4.7.0-custom #1 [ 667.106861] Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014 [ 667.106864] 880464e73c08 813b816c 880464e73c58 [ 667.106869] 880464e73c48 8107a321 0b936c3cc170 [ 667.106873] 880443191130 88046c3cc170 88046b43f000 [ 667.106878] Call Trace: [ 667.106889] [] dump_stack+0x63/0x87 [ 667.106896] [] __warn+0xd1/0xf0 [ 667.106901] [] warn_slowpath_fmt+0x4f/0x60 [ 667.106925] [] btrfs_run_delayed_refs+0x292/0x2d0 [btrfs] [ 667.106947] [] btrfs_write_dirty_block_groups+0x178/0x3b0 [btrfs] [ 667.106974] [] commit_cowonly_roots+0x23c/0x2e0 [btrfs] [ 667.106999] [] btrfs_commit_transaction+0x4fb/0xa80 [btrfs] [ 667.107021] [] transaction_kthread+0x1d2/0x200 [btrfs] [ 667.107042] [] ? btrfs_cleanup_transaction+0x580/0x580 [btrfs] [ 667.107047] [] kthread+0xc9/0xe0 [ 667.107053] [] ret_from_fork+0x1f/0x40 [ 667.107056] [] ? kthread_park+0x60/0x60 [ 667.107060] ---[ end trace 336c80ba4db66e78 ]--- [ 667.107065] BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists [ 667.116389] BTRFS info (device sda1): forced readonly [ 667.117081] BTRFS warning (device sda1): Skipping commit of aborted transaction. [ 667.117086] BTRFS: error (device sda1) in cleanup_transaction:1853: errno=-17 Object already exists On 08/09/2016 08:06 PM, Chris Murphy wrote: On Tue, Aug 9, 2016 at 6:01 PM, Chris Murphy <li...@colorremedies.com> wrote: On Tue, Aug 9, 2016 at 5:15 PM, Matt McKinnon <m...@techsquare.com> wrote: Hello, Our server recently crashed and was rebooted. When it returned our BTRFS volume is mounting read-only: What happens when you try mounting with -o usebackuproot ? If that fails, what output do you get for 'btrfs check' (without --repair)? If you only get some "errors 400, nbytes wrong" then --repair should fix the problem. This could also be a regression somewhere... https://bugzilla.kernel.org/show_bug.cgi?id=60522 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
-o usebackuproot worked well. after the file system settled, performing a sync and a clean umount, a normal mount works now as well. Anything I should be doing going forward? Thanks, Matt On 08/09/2016 08:01 PM, Chris Murphy wrote: On Tue, Aug 9, 2016 at 5:15 PM, Matt McKinnon <m...@techsquare.com> wrote: Hello, Our server recently crashed and was rebooted. When it returned our BTRFS volume is mounting read-only: What happens when you try mounting with -o usebackuproot ? If that fails, what output do you get for 'btrfs check' (without --repair)? If you only get some "errors 400, nbytes wrong" then --repair should fix the problem. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
Hello, Our server recently crashed and was rebooted. When it returned our BTRFS volume is mounting read-only: [ 142.395093] BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists [ 142.404418] BTRFS info (device sda1): forced readonly I tried upgrading the kernel from 4.3 to 4.7. Upgraded btrfs-progs to v4.7 as well. # uname -a Linux hostname 4.7.0-custom #1 SMP Tue Aug 9 11:16:28 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux # btrfs --version btrfs-progs v4.7 # btrfs fi show Label: none uuid: 33f9089e-acc7-4a39-8b83-b18bb182faaf Total devices 1 FS bytes used 14.95TiB devid1 size 50.93TiB used 22.67TiB path /dev/sda1 # btrfs fi df /export/ Data, single: total=22.53TiB, used=14.89TiB System, DUP: total=40.00MiB, used=2.39MiB System, single: total=4.00MiB, used=0.00B Metadata, DUP: total=70.50GiB, used=60.21GiB Metadata, single: total=1.51GiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B # dmesg [ 142.394841] [ cut here ] [ 142.394874] WARNING: CPU: 6 PID: 269 at fs/btrfs/extent-tree.c:2963 btrfs_run_delayed_refs+0x292/0x2d0 [btrfs] [ 142.394876] BTRFS: Transaction aborted (error -17) [ 142.394878] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd dm_multipath joydev lpc_ich mei_me mei ioatdma wmi ipmi_si ipmi_msghandler shpchp mac_hid btrfs lp parport ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor igb raid6_pq libcrc32c i2c_algo_bit raid1 hid_generic dca usbhid raid0 ptp hid ahci megaraid_sas multipath libahci pps_core linear dm_mirror dm_region_hash dm_log [ 142.394942] CPU: 6 PID: 269 Comm: kworker/u18:5 Not tainted 4.7.0-custom #1 [ 142.394944] Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014 [ 142.394966] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [ 142.394969] 88086a057ca0 813b816c 88086a057cf0 [ 142.394972] 88086a057ce0 8107a321 0b9325288170 [ 142.394975] 8808519eb000 880825288170 88086b2c1000 0020 [ 142.394978] Call Trace: [ 142.394987] [] dump_stack+0x63/0x87 [ 142.394993] [] __warn+0xd1/0xf0 [ 142.394996] [] warn_slowpath_fmt+0x4f/0x60 [ 142.395012] [] btrfs_run_delayed_refs+0x292/0x2d0 [btrfs] [ 142.395025] [] delayed_ref_async_start+0x94/0xb0 [btrfs] [ 142.395044] [] normal_work_helper+0xc0/0x2d0 [btrfs] [ 142.395050] [] ? pwq_activate_delayed_work+0x42/0xb0 [ 142.395066] [] btrfs_extent_refs_helper+0x12/0x20 [btrfs] [ 142.395070] [] process_one_work+0x153/0x3f0 [ 142.395073] [] worker_thread+0x12b/0x4b0 [ 142.395076] [] ? rescuer_thread+0x340/0x340 [ 142.395079] [] kthread+0xc9/0xe0 [ 142.395085] [] ret_from_fork+0x1f/0x40 [ 142.395088] [] ? kthread_park+0x60/0x60 [ 142.395090] ---[ end trace e2b0b8dc37502011 ]--- [ 142.395093] BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists [ 142.404418] BTRFS info (device sda1): forced readonly -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
corruption, bad block, input/output errors - do i run --repair?
Hi All, I'm running into some corruption and I wanted to seek out advice on whether or not to run btrfs check --repair, or if I should fall back to my backup file server, or both. The system is mountable, and usable. # uname -a Linux cbmm-fs 3.17.2-custom #1 SMP Thu Oct 30 14:09:57 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux # btrfs --version Btrfs v3.14.2 # btrfs fi show Label: none uuid: 30c15060-8fb4-4926-87d4-f7d08c3033c5 Total devices 1 FS bytes used 58.92TiB devid1 size 76.40TiB used 59.05TiB path /dev/sda1 # btrfs fi df /home Data, single: total=58.75TiB, used=58.75TiB System, DUP: total=32.00MiB, used=2.66MiB System, single: total=4.00MiB, used=3.68MiB Metadata, DUP: total=119.00GiB, used=116.63GiB Metadata, single: total=64.01GiB, used=57.68GiB GlobalReserve, single: total=512.00MiB, used=0.00B I did run into some RO snapshot corruption which caused me to run btrfs check: parent transid verify failed on 20809493159936 wanted 4486137218058286914 found 390978 parent transid verify failed on 20809493159936 wanted 4486137218058286914 found 390978 Ignoring transid failure Checking filesystem on /dev/sda1 UUID: 30c15060-8fb4-4926-87d4-f7d08c3033c5 checking extents bad block 69290357067776 Errors found in extent allocation tree or chunk allocation checking free space cache checking fs roots ... dir isize wrong 1 error errors 500, file extent discount, nbytes wrong 14 errors errors 2001, no inode item, link count wrong 257302 errors ... found 185063071745 bytes used err is 1 total csum bytes: 8428 total tree bytes: 1889284096 total fs tree bytes: 962678784 total extent tree bytes: 159297536 btree space waste bytes: 340014684 file data blocks allocated: 57344 referenced 57344 Btrfs v3.14.2 Output of a scrub: ERROR: scrubbing /home failed for device id 1 (Input/output error) scrub canceled for 30c15060-8fb4-4926-87d4-f7d08c3033c5 scrub started at Mon Nov 3 06:43:58 2014 and was aborted after 7613 seconds data_extents_scrubbed: 248507555 tree_extents_scrubbed: 10870729 data_bytes_scrubbed: 15375990317056 tree_bytes_scrubbed: 44526505984 read_errors: 0 csum_errors: 0 verify_errors: 0 no_csum: 15712 csum_discards: 988018 super_errors: 0 malloc_errors: 0 uncorrectable_errors: 0 unverified_errors: 0 corrected_errors: 0 last_physical: 15425663205376 Output of a balance: ERROR: error during balancing '/home' - Input/output error There may be more info in syslog - try dmesg | tail [501087.506642] [ cut here ] [501087.543971] WARNING: CPU: 5 PID: 31885 at fs/btrfs/relocation.c:925 build_backref_tree+0x11f0/0x1230 [btrfs]() [501087.543991] Modules linked in: ipmi_devintf(E) autofs4(E) sb_edac(E) edac_core(E) joydev(E) mei_me(E) mei(E) lpc_ich(E) ioatdma(E) ipmi_si(E) wmi(E) mac_hid(E) bnep(E) rfcomm(E) bluetooth(E) lp(E) parport(E) nfsd(E) nfs_acl(E) auth_rpcgss(E) nfs(E) fscache(E) lockd(E) sunrpc(E) ses(E) enclosure(E) hid_generic(E) ahci(E) libahci(E) usbhid(E) hid(E) igb(E) dca(E) i2c_algo_bit(E) ptp(E) pps_core(E) megaraid_sas(E) btrfs(E) raid6_pq(E) xor(E) libcrc32c(E) [501087.543995] CPU: 5 PID: 31885 Comm: btrfs Tainted: G D E 3.17.2-custom #1 [501087.543997] Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0a 12/27/2013 [501087.543999] 039d 88000eadb808 8176733c 0282 [501087.544001] 88000eadb848 8107163c 1000 [501087.544003] 8801d0d9acf0 880497c70380 0001 0001 [501087.544004] Call Trace: [501087.544014] [8176733c] dump_stack+0x46/0x58 [501087.544022] [8107163c] warn_slowpath_common+0x8c/0xc0 [501087.544024] [8107168a] warn_slowpath_null+0x1a/0x20 [501087.544039] [a00b4020] build_backref_tree+0x11f0/0x1230 [btrfs] [501087.544052] [a00b4331] relocate_tree_blocks+0x2d1/0x690 [btrfs] [501087.544060] [811c1609] ? kmem_cache_alloc_trace+0x39/0x1f0 [501087.544072] [a00b54a2] relocate_block_group+0x202/0x5f0 [btrfs] [501087.544083] [a00b5a40] btrfs_relocate_block_group+0x1b0/0x2d0 [btrfs] [501087.544098] [a0088cf5] btrfs_relocate_chunk.isra.62+0x75/0x760 [btrfs] [501087.544111] [a0084d86] ? release_extent_buffer+0x36/0xe0 [btrfs] [501087.544124] [a0085281] ? free_extent_buffer+0x61/0xc0 [btrfs] [501087.544136] [a008d7db] btrfs_balance+0x8ab/0xf50 [btrfs] [501087.544150] [a00985ac] btrfs_ioctl_balance+0x1cc/0x530 [btrfs] [501087.544156] [811786eb] ? lru_cache_add_active_or_unevictable+0x2b/0xa0 [501087.544168] [a009aa82] btrfs_ioctl+0x562/0x1f00 [btrfs] [501087.544173] [811e9c0b] ? putname+0x2b/0x40 [501087.544176] [811ef193] ? user_path_at_empty+0x63/0xa0 [501087.544183] [8105f59c] ?