BUG during send, cannot delete subvolume
Hi All, I had a ctree.c error during a send/receive backup: kernel BUG at fs/btrfs/ctree.c:1862 Nothing seemed to go wrong otherwise on the file system. After restarting the send, it completed, but I'm left with a subvolume I can't delete: BTRFS warning (device sdb1): Attempt to delete subvolume 176188 during send I don't see any zombie btrfs send processes lying around. Is there anyway to delete this volume? Do I just need a reboot? -Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
Well, it's at zero now... # btrfs fi df /export/ Data, single: total=30.45TiB, used=30.25TiB System, DUP: total=32.00MiB, used=3.62MiB Metadata, DUP: total=66.50GiB, used=65.16GiB GlobalReserve, single: total=512.00MiB, used=0.00B On 01/12/17 16:47, Duncan wrote: Hans van Kranenburg posted on Fri, 01 Dec 2017 18:06:23 +0100 as excerpted: On 12/01/2017 05:31 PM, Matt McKinnon wrote: Sorry, I missed your in-line reply: 2) How big is this filesystem? What does your `btrfs fi df /mountpoint` say? # btrfs fi df /export/ Data, single: total=30.45TiB, used=30.25TiB System, DUP: total=32.00MiB, used=3.62MiB Metadata, DUP: total=66.50GiB, used=65.08GiB GlobalReserve, single: total=512.00MiB, used=53.69MiB Multi-TiB filesystem, check. total/used ratio looks healthy. Not so healthy, from here. Data/metadata are healthy, yes, but... Any usage at all of global reserve is a red flag indicating that something in the filesystem thinks, or thought when it resorted to global reserve, that space is running out. Global reserve usage doesn't really hint what the problem is, but it's definitely a red flag that there /is/ a problem, and it's easily overlooked, as it apparently was here. It's likely indication of a bug, possibly one of the ones fixed right around 4.12/4.13. I'll let the devs and better experts take it from there, but I'd certainly be worried until global reserve drops to zero usage. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
Right. The file system is 48T, with 17T available, so we're not quite pushing it yet. So far so good on the space_cache=v2 mount. I'm surprised this isn't on the gotcha page in the wiki; it may end up making a world of difference to the users here Thanks again, Matt On 01/12/17 13:24, Hans van Kranenburg wrote: On 12/01/2017 06:57 PM, Holger Hoffstätte wrote: On 12/01/17 18:34, Matt McKinnon wrote: Thanks, I'll give space_cache=v2 a shot. Yes, very much recommended. My mount options are: rw,relatime,space_cache,autodefrag,subvolid=5,subvol=/ Turn autodefrag off and use noatime instead of relatime. Your filesystem also seems very full, We don't know. btrfs fi df only displays allocated space. And that being full is good, it means not too much free space fragments everywhere. that's bad with every filesystem but *especially* with btrfs because the allocator has to work really hard to find free space for COWing. Really consider deleting stuff or adding more space. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
Thanks, I'll give space_cache=v2 a shot. My mount options are: rw,relatime,space_cache,autodefrag,subvolid=5,subvol=/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
Sorry, I missed your in-line reply: 1) The one right above, btrfs_write_out_cache, is the write-out of the free space cache v1. Do you see this for multiple seconds going on, and does it match the time when it's writing X MB/s to disk? It seems to only last until the next watch update. [] io_schedule+0x16/0x40 [] get_request+0x23e/0x720 [] blk_queue_bio+0xc1/0x3a0 [] generic_make_request+0xf8/0x2a0 [] submit_bio+0x75/0x150 [] btrfs_map_bio+0xe5/0x2f0 [btrfs] [] btree_submit_bio_hook+0x8c/0xe0 [btrfs] [] submit_one_bio+0x63/0xa0 [btrfs] [] flush_epd_write_bio+0x3b/0x50 [btrfs] [] flush_write_bio+0xe/0x10 [btrfs] [] btree_write_cache_pages+0x379/0x450 [btrfs] [] btree_writepages+0x5d/0x70 [btrfs] [] do_writepages+0x1c/0x70 [] __filemap_fdatawrite_range+0xaa/0xe0 [] filemap_fdatawrite_range+0x13/0x20 [] btrfs_write_marked_extents+0xe9/0x110 [btrfs] [] btrfs_write_and_wait_transaction.isra.22+0x3d/0x80 [btrfs] [] btrfs_commit_transaction+0x665/0x900 [btrfs] [] transaction_kthread+0x18a/0x1c0 [btrfs] [] kthread+0x109/0x140 [] ret_from_fork+0x25/0x30 The last three lines will stick around for a while. Is switching to space cache v2 something that everyone should be doing? Something that would be a good test at least? 2) How big is this filesystem? What does your `btrfs fi df /mountpoint` say? # btrfs fi df /export/ Data, single: total=30.45TiB, used=30.25TiB System, DUP: total=32.00MiB, used=3.62MiB Metadata, DUP: total=66.50GiB, used=65.08GiB GlobalReserve, single: total=512.00MiB, used=53.69MiB 3) What kind of workload are you running? E.g. how can you describe it within a range from "big files which just sit there" to "small writes and deletes all over the place all the time"? It's a pretty light workload most of the time. It's a file system that exports two NFS shares to a small lab group. I believe it is more small reads all over a large file (MRI imaging) rather than small writes. 4) What kernel version is this? `uname -a` output? # uname -a Linux machine_name 4.12.8-custom #1 SMP Tue Aug 22 10:15:01 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
These seem to come up most often: [] transaction_kthread+0x133/0x1c0 [btrfs] [] kthread+0x109/0x140 [] ret_from_fork+0x25/0x30 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transacti hammering the system
Thanks for this. Here's what I get: [] transaction_kthread+0x133/0x1c0 [btrfs] [] kthread+0x109/0x140 [] ret_from_fork+0x25/0x30 ... [] io_schedule+0x16/0x40 [] get_request+0x23e/0x720 [] blk_queue_bio+0xc1/0x3a0 [] generic_make_request+0xf8/0x2a0 [] submit_bio+0x75/0x150 [] btrfs_map_bio+0xe5/0x2f0 [btrfs] [] btree_submit_bio_hook+0x8c/0xe0 [btrfs] [] submit_one_bio+0x63/0xa0 [btrfs] [] flush_epd_write_bio+0x3b/0x50 [btrfs] [] flush_write_bio+0xe/0x10 [btrfs] [] btree_write_cache_pages+0x379/0x450 [btrfs] [] btree_writepages+0x5d/0x70 [btrfs] [] do_writepages+0x1c/0x70 [] __filemap_fdatawrite_range+0xaa/0xe0 [] filemap_fdatawrite_range+0x13/0x20 [] btrfs_write_marked_extents+0xe9/0x110 [btrfs] [] btrfs_write_and_wait_transaction.isra.22+0x3d/0x80 [btrfs] [] btrfs_commit_transaction+0x665/0x900 [btrfs] ... [] io_schedule+0x16/0x40 [] wait_on_page_bit+0xe8/0x120 [] read_extent_buffer_pages+0x1cd/0x2e0 [btrfs] [] btree_read_extent_buffer_pages+0x9f/0x100 [btrfs] [] read_tree_block+0x32/0x50 [btrfs] [] read_block_for_search.isra.32+0x120/0x2e0 [btrfs] [] btrfs_next_old_leaf+0x215/0x400 [btrfs] [] btrfs_next_leaf+0x10/0x20 [btrfs] [] btrfs_lookup_csums_range+0x12e/0x410 [btrfs] [] csum_exist_in_range.isra.49+0x2a/0x81 [btrfs] [] run_delalloc_nocow+0x9b2/0xa10 [btrfs] [] run_delalloc_range+0x68/0x340 [btrfs] [] writepage_delalloc.isra.47+0xf0/0x140 [btrfs] [] __extent_writepage+0xc7/0x290 [btrfs] [] extent_write_cache_pages.constprop.53+0x2b5/0x450 [btrfs] [] extent_writepages+0x4d/0x70 [btrfs] [] btrfs_writepages+0x28/0x30 [btrfs] [] do_writepages+0x1c/0x70 [] __filemap_fdatawrite_range+0xaa/0xe0 [] filemap_fdatawrite_range+0x13/0x20 [] btrfs_fdatawrite_range+0x20/0x50 [btrfs] [] __btrfs_write_out_cache+0x3d9/0x420 [btrfs] [] btrfs_write_out_cache+0x86/0x100 [btrfs] [] btrfs_write_dirty_block_groups+0x261/0x390 [btrfs] [] commit_cowonly_roots+0x1fb/0x290 [btrfs] [] btrfs_commit_transaction+0x434/0x900 [btrfs] ... [] tree_search_offset.isra.23+0x37/0x1d0 [btrfs] -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs-transacti hammering the system
Hi All, Is there any way to figure out what exactly btrfs-transacti is chugging on? I have a few file systems that seem to get wedged for days on end with this process pegged around 100%. I've stopped all snapshots, made sure no quotas were enabled, turned on autodefrag in the mount options, tried manual defragging, kernel upgrades, yet still this brings my system to a crawl. Network I/O to the system seems very tiny. The only I/O I see to the disk is btrfs-transacti writing a couple M/s. # time touch foo real2m54.303s user0m0.000s sys 0m0.002s # uname -r 4.12.8-custom # btrfs --version btrfs-progs v4.13.3 Yes, I know I'm a bit behind there... -Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kernel BUG at fs/btrfs/ctree.c:3182
Hi All, Been having issues on one machine and I was wondering if I could get some help tracking the issue down. # uname -a Linux riperton 4.13.5-custom #1 SMP Sat Oct 7 18:28:16 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux # btrfs --version btrfs-progs v4.13.3 # btrfs fi show Label: none uuid: 8133a362-8e41-4da4-b607-a27832861157 Total devices 1 FS bytes used 41.64TiB devid1 size 50.93TiB used 41.88TiB path /dev/sda1 # btrfs fi df /export/ Data, single: total=41.70TiB, used=41.57TiB System, DUP: total=64.00MiB, used=4.56MiB Metadata, DUP: total=90.00GiB, used=72.30GiB Metadata, single: total=1.53GiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B [617994.948036] [ cut here ] [617994.948040] kernel BUG at fs/btrfs/ctree.c:3182! [617994.952786] invalid opcode: [#1] SMP [617994.956896] Modules linked in: ipmi_devintf xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables intel_ra pl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btrfs pcbc aesni_intel aes_ x86_64 crypto_simd glue_helper cryptd dm_multipath joydev lpc_ich mei_me mei nfsd ioatdma auth_rpcgss nfs_acl ipmi_si wmi nfs ipmi_msghandler lockd grace sunrp c fscache shpchp mac_hid lp parport ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov hid_generic async_memcpy async_pq usbhid async_xor hid as ync_tx xor igb raid6_pq libcrc32c i2c_algo_bit raid1 ahci dca raid0 libahci ptp megaraid_sas multipath pps_core linear dm_mirror dm_region_hash dm_log [617995.025316] CPU: 1 PID: 3191 Comm: nfsd Tainted: GW 4.13.5-custom #1 [617995.032965] Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014 [617995.042092] task: 996bac7d5a00 task.stack: bb7984b74000 [617995.048134] RIP: 0010:btrfs_set_item_key_safe+0x14e/0x160 [btrfs] [617995.054310] RSP: 0018:bb7984b77658 EFLAGS: 00010246 [617995.059622] RAX: RBX: 0037 RCX: 00018000 [617995.066834] RDX: RSI: bb7984b7776e RDI: bb7984b77677 [617995.074051] RBP: bb7984b776b0 R08: bb7984b77677 R09: [617995.081263] R10: R11: 0003 R12: bb7984b77666 [617995.088483] R13: 99679cc00460 R14: bb7984b7776e R15: 9966184867a8 [617995.095705] FS: () GS:9967afc8() knlGS: [617995.103876] CS: 0010 DS: ES: CR0: 80050033 [617995.109707] CR2: 7fdbaad6 CR3: 00071fe09000 CR4: 001406e0 [617995.116921] Call Trace: [617995.119493] __btrfs_drop_extents+0x50c/0xdd0 [btrfs] [617995.124663] ? btrfs_encode_fh+0xd0/0xd0 [btrfs] [617995.129390] btrfs_log_changed_extents+0x31b/0x640 [btrfs] [617995.134990] ? free_extent_buffer+0x4b/0x90 [btrfs] [617995.139976] btrfs_log_inode+0x8de/0xb90 [btrfs] [617995.144686] ? dput+0xf1/0x1d0 [617995.147847] btrfs_log_inode_parent+0x21a/0x960 [btrfs] [617995.153164] ? kmem_cache_alloc+0x194/0x1a0 [617995.157459] ? start_transaction+0x120/0x440 [btrfs] [617995.162528] btrfs_log_dentry_safe+0x69/0x90 [btrfs] [617995.167599] btrfs_sync_file+0x2ab/0x3e0 [btrfs] [617995.172309] vfs_fsync_range+0x3d/0xb0 [617995.176168] btrfs_file_write_iter+0x45b/0x560 [btrfs] [617995.181396] do_iter_readv_writev+0xe2/0x130 [617995.185753] do_iter_write+0x7f/0x190 [617995.189506] vfs_iter_write+0x19/0x30 [617995.193271] nfsd_vfs_write+0xb1/0x310 [nfsd] [617995.197719] nfsd_write+0x134/0x1e0 [nfsd] [617995.201908] nfsd3_proc_write+0x92/0x110 [nfsd] [617995.206533] nfsd_dispatch+0xb9/0x250 [nfsd] [617995.210915] svc_process_common+0x36e/0x6f0 [sunrpc] [617995.215979] svc_process+0xfc/0x1c0 [sunrpc] [617995.220339] nfsd+0xe9/0x160 [nfsd] [617995.223918] kthread+0x109/0x140 [617995.227238] ? nfsd_destroy+0x60/0x60 [nfsd] [617995.231591] ? kthread_park+0x60/0x60 [617995.235348] ret_from_fork+0x25/0x30 [617995.239010] Code: 48 8b 45 bf 48 8d 7d c7 4c 89 f6 48 89 45 d0 0f b6 45 be 88 45 cf 48 8b 45 b6 48 89 45 c7 e8 aa f3 ff ff 85 c0 0f 8f 55 ff ff ff <0f> 0b 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 [617995.257983] RIP: btrfs_set_item_key_safe+0x14e/0x160 [btrfs] RSP: bb7984b77658 [617995.265696] ---[ end trace 41d8bb716a419cdd ]--- And after a reboot we come up with this warning: [ 112.712899] [ cut here ] [ 112.712943] WARNING: CPU: 5 PID: 505 at fs/btrfs/file.c:547 btrfs_drop_extent_cache+0x3c5/0x3d0 [btrfs] [ 112.712944] Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel xt_tcpudp kvm nf_conntrack_ipv4 nf_defrag_ipv4 irqbypass xt_conntrack crct10dif_pclmul nf_conntrack crc32_pclmul ghash_clmulni_intel pcbc iptable_filter ip_tables aesni_intel x_tables aes_x86_64 crypto_simd glue_helper cryptd dm_multipath joy
Re: Struggling with file system slowness
Those snapshots were created using Marc Merlin's script (thanks, Marc). They don't do anything except sit around on the file system for a week or so and then are removed. I'm now doing quarter-hourly snaps instead of nightly since I have nightly backups of the filesytem going off-site. So far the btrfs-transaction and memory spikes have not returned. -Matt On 05/09/2017 03:14 PM, Liu Bo wrote: On Fri, May 05, 2017 at 09:24:32AM -0400, Matt McKinnon wrote: Too little information. Is IO happening at the same time? Is compression on? Deduplicated? Lots of subvolumes? SSD? What kind of workload and file size/distribution profile? Only write IO during the load spikes. No compression, no deduplication. 12 volumes (including snapshots). Spinning disks. Medium workload; file sizes are all over the map since this hold about 30 user home directories. Interestingly enough, the problems which had persisted for many weeks went away when all snapshots were removed. btrfs-transaction spikes disappeared. Memory usage went from 30G to under 2G. Were those snapshots served as backup? Could you please elaborate how you create snapshots? We could probably hammer out a testcase to improve the situation. Thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Struggling with file system slowness
> Too little information. Is IO happening at the same time? Is > compression on? Deduplicated? Lots of subvolumes? SSD? What > kind of workload and file size/distribution profile? Only write IO during the load spikes. No compression, no deduplication. 12 volumes (including snapshots). Spinning disks. Medium workload; file sizes are all over the map since this hold about 30 user home directories. Interestingly enough, the problems which had persisted for many weeks went away when all snapshots were removed. btrfs-transaction spikes disappeared. Memory usage went from 30G to under 2G. -Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Struggling with file system slowness
Hi All, Trying to peg down why I have one server that has btrfs-transacti pegged at 100% CPU for most of the time. I thought this might have to do with fragmentation as mentioned in the Gotchas page in the wiki (btrfs-endio-wri doesn't seem to be involved as mentioned in the wiki), but after running a full defrag of the file system, and also enabling the 'autodefrag' mount option, the problem still persists. What's the best way to figure out what btrfs is chugging away at here? Kernel: 4.10.13-custom btrfs-progs: v4.10.2 -Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hard crash on 4.9.5, part 2
I have an error on this file system I've had in the distant pass where the mount would fail with a "file exists" error. Running a btrfs check gives the following over and over again: Found file extent holes: start: 0, len: 290816 root 257 inode 28472371 errors 1000, some csum missing root 257 inode 28472416 errors 1000, some csum missing root 257 inode 9182183 errors 1000, some csum missing root 257 inode 9182186 errors 1000, some csum missing root 257 inode 28419536 errors 1100, file extent discount, some csum missing Found file extent holes: start: 0, len: 290816 root 257 inode 28472371 errors 1000, some csum missing root 257 inode 28472416 errors 1000, some csum missing root 257 inode 9182183 errors 1000, some csum missing root 257 inode 9182186 errors 1000, some csum missing root 257 inode 28419536 errors 1100, file extent discount, some csum missing Are these found per subvolume snapshot I have and will eventually end? Here is the crash after the mount (with recovery/usebackuproot): [ 627.233213] BTRFS warning (device sda1): 'recovery' is deprecated, use 'usebackuproot' instead [ 627.233216] BTRFS info (device sda1): trying to use backup root at mount time [ 627.233218] BTRFS info (device sda1): disk space caching is enabled [ 627.233220] BTRFS info (device sda1): has skinny extents [ 709.234688] [ cut here ] [ 709.234734] WARNING: CPU: 5 PID: 3468 at fs/btrfs/file.c:546 btrfs_drop_extent_cache+0x3e8/0x400 [btrfs] [ 709.234735] Modules linked in: ipmi_devintf nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache lp parport intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel xt_tcpudp kvm nf_conntrack_ipv4 nf_defrag_ipv4 irqbypass crct10d if_pclmul crc32_pclmul ghash_clmulni_intel xt_conntrack aesni_intel btrfs nf_conntrack aes_x86_64 lrw gf128mul iptable_filter glue_h elper ip_tables ablk_helper cryptd x_tables dm_multipath joydev mei_me ioatdma mei lpc_ich wmi ipmi_si ipmi_msghandler shpchp mac_hi d ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic megarai d_sas raid6_pq ahci libcrc32c libahci igb usbhid raid1 hid i2c_algo_bit raid0 dca ptp multipath pps_core linear dm_mirror dm_region_ hash dm_log [ 709.234812] CPU: 5 PID: 3468 Comm: mount Not tainted 4.9.5-custom #1 [ 709.234813] Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014 [ 709.234816] bd3784bb7568 8e3c8e7c [ 709.234820] bd3784bb75a8 8e07d3d1 02220070 9e5f0ae4d150 [ 709.234823] 0002d000 9e5f0bc91f78 9e5f0bc91da8 0002c000 [ 709.234827] Call Trace: [ 709.234837] [] dump_stack+0x63/0x87 [ 709.234846] [] __warn+0xd1/0xf0 [ 709.234850] [] warn_slowpath_null+0x1d/0x20 [ 709.234874] [] btrfs_drop_extent_cache+0x3e8/0x400 [btrfs] [ 709.234895] [] __btrfs_drop_extents+0x5b2/0xd30 [btrfs] [ 709.234914] [] ? generic_bin_search.constprop.36+0x8b/0x1e0 [btrfs] [ 709.234931] [] ? btrfs_set_path_blocking+0x36/0x70 [btrfs] [ 709.234942] [] ? kmem_cache_alloc+0x194/0x1a0 [ 709.234958] [] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [ 709.234977] [] btrfs_drop_extents+0x79/0xa0 [btrfs] [ 709.235002] [] replay_one_extent+0x414/0x7b0 [btrfs] [ 709.235007] [] ? autoremove_wake_function+0x40/0x40 [ 709.235030] [] replay_one_buffer+0x4cc/0x7c0 [btrfs] [ 709.235053] [] ? mark_extent_buffer_accessed+0x4f/0x70 [btrfs] [ 709.235074] [] walk_down_log_tree+0x1ba/0x3b0 [btrfs] [ 709.235094] [] walk_log_tree+0xb4/0x1a0 [btrfs] [ 709.235114] [] btrfs_recover_log_trees+0x20e/0x460 [btrfs] [ 709.235133] [] ? replay_one_extent+0x7b0/0x7b0 [btrfs] [ 709.235154] [] open_ctree+0x2640/0x27f0 [btrfs] [ 709.235171] [] btrfs_mount+0xca4/0xec0 [btrfs] [ 709.235176] [] ? find_next_zero_bit+0x1e/0x20 [ 709.235180] [] ? pcpu_next_unpop+0x3e/0x50 [ 709.235184] [] ? find_next_bit+0x19/0x20 [ 709.235190] [] mount_fs+0x39/0x160 [ 709.235193] [] ? __alloc_percpu+0x15/0x20 [ 709.235196] [] vfs_kern_mount+0x67/0x110 [ 709.235213] [] btrfs_mount+0x18b/0xec0 [btrfs] [ 709.235216] [] ? find_next_zero_bit+0x1e/0x20 [ 709.235220] [] mount_fs+0x39/0x160 [ 709.235223] [] ? __alloc_percpu+0x15/0x20 [ 709.235225] [] vfs_kern_mount+0x67/0x110 [ 709.235228] [] do_mount+0x1bb/0xc80 [ 709.235232] [] ? kmem_cache_alloc_trace+0x14b/0x1b0 [ 709.235235] [] SyS_mount+0x83/0xd0 [ 709.235240] [] entry_SYSCALL_64_fastpath+0x1e/0xad [ 709.235243] ---[ end trace d4e5dcddb432b7d3 ]--- [ 709.354972] BTRFS: error (device sda1) in btrfs_replay_log:2506: errno=-17 Object already exists (Failed to recover log tree) [ 709.355570] BTRFS error (device sda1): cleaner transaction attach returned -30 [ 709.548919] BTRFS error (device sda1): open_ctree failed -Matt -- To unsubscribe from this lis
Re: Hard crash on 4.9.5
This same file system (which crashed again with the same errors) is also giving this output during a metadata or data balance: Jan 27 19:42:47 my_machine kernel: [ 335.018123] BTRFS info (device sda1): no csum found for inode 28472371 start 2191360 Jan 27 19:42:47 my_machine kernel: [ 335.018128] BTRFS info (device sda1): no csum found for inode 28472371 start 2195456 Jan 27 19:42:47 my_machine kernel: [ 335.018491] BTRFS info (device sda1): no csum found for inode 28472371 start 4018176 Jan 27 19:42:47 my_machine kernel: [ 335.018496] BTRFS info (device sda1): no csum found for inode 28472371 start 4022272 Jan 27 19:42:47 my_machine kernel: [ 335.018499] BTRFS info (device sda1): no csum found for inode 28472371 start 4026368 Jan 27 19:42:47 my_machine kernel: [ 335.018502] BTRFS info (device sda1): no csum found for inode 28472371 start 4030464 Jan 27 19:42:47 my_machine kernel: [ 335.019443] BTRFS info (device sda1): no csum found for inode 28472371 start 6156288 Jan 27 19:42:47 my_machine kernel: [ 335.019688] BTRFS info (device sda1): no csum found for inode 28472371 start 7933952 Jan 27 19:42:47 my_machine kernel: [ 335.019693] BTRFS info (device sda1): no csum found for inode 28472371 start 7938048 Jan 27 19:42:47 my_machine kernel: [ 335.019754] BTRFS info (device sda1): no csum found for inode 28472371 start 8077312 Jan 27 19:42:47 my_machine kernel: [ 335.025485] BTRFS warning (device sda1): csum failed ino 28472371 off 2191360 csum 4031061501 expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.025490] BTRFS warning (device sda1): csum failed ino 28472371 off 2195456 csum 2371784003 expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.025526] BTRFS warning (device sda1): csum failed ino 28472371 off 4018176 csum 3812080098 expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.025531] BTRFS warning (device sda1): csum failed ino 28472371 off 4022272 csum 2776681411 expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.025534] BTRFS warning (device sda1): csum failed ino 28472371 off 4026368 csum 1179241675 expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.025540] BTRFS warning (device sda1): csum failed ino 28472371 off 4030464 csum 1256914217 expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.026142] BTRFS warning (device sda1): csum failed ino 28472371 off 7933952 csum 2695958066 expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.026147] BTRFS warning (device sda1): csum failed ino 28472371 off 7938048 csum 3260800596 expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.026934] BTRFS warning (device sda1): csum failed ino 28472371 off 6156288 csum 4293116449 expected csum 0 Jan 27 19:42:47 my_machine kernel: [ 335.033249] BTRFS warning (device sda1): csum failed ino 28472371 off 8077312 csum 4031878292 expected csum 0 Can these be ignored? On 01/25/2017 04:06 PM, Liu Bo wrote: On Mon, Jan 23, 2017 at 03:03:55PM -0500, Matt McKinnon wrote: Wondering what to do about this error which says 'reboot needed'. Has happened a three times in the past week: Well, I don't think btrfs's logic here is wrong, the following stack shows that a nfs client has sent a second unlink against the same inode while somehow the inode was not fully deleted by the first unlink. So it'd be good that you could add some debugging information to get us further. Thanks, -liubo Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device sda1): err add delayed dir index item(index: 23810) into the deletion tree of the delayed node(root id: 257, inode id: 2661433, errno: -17) Jan 23 14:16:17 my_machine kernel: [ 2568.611010] [ cut here ] Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at fs/btrfs/delayed-inode.c:1557! Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode: [#1] SMP Jan 23 14:16:17 my_machine kernel: [ 2568.624960] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_rej ect_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd au th_rpcgss nfs_acl nfs lockd grace sunrpc fscache intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_int el kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper crypt d dm_multipath joydev mei_me mei lpc_ich ioatdma wmi ipmi_si ipmi_msghandler btrfs shpchp mac_hid lp parport ses enclosure scsi_tran sport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c igb hid_generic i2c_algo_ bit raid1 dca usbhid ahci raid0 ptp megaraid_sas multipath Jan 23 14:16:17 my_machine kernel: [ 2568.697150] hid libahci pps_core linear dm_mirror dm_region_hash dm_log Jan 23 14:16:17 my_machine kernel: [ 2568.702689] CPU: 0 PID: 2440 Comm: nfsd Tainted: GW 4.9.5-
Hard crash on 4.9.5
Wondering what to do about this error which says 'reboot needed'. Has happened a three times in the past week: Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device sda1): err add delayed dir index item(index: 23810) into the deletion tree of the delayed node(root id: 257, inode id: 2661433, errno: -17) Jan 23 14:16:17 my_machine kernel: [ 2568.611010] [ cut here ] Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at fs/btrfs/delayed-inode.c:1557! Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode: [#1] SMP Jan 23 14:16:17 my_machine kernel: [ 2568.624960] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_rej ect_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd au th_rpcgss nfs_acl nfs lockd grace sunrpc fscache intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_int el kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper crypt d dm_multipath joydev mei_me mei lpc_ich ioatdma wmi ipmi_si ipmi_msghandler btrfs shpchp mac_hid lp parport ses enclosure scsi_tran sport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c igb hid_generic i2c_algo_ bit raid1 dca usbhid ahci raid0 ptp megaraid_sas multipath Jan 23 14:16:17 my_machine kernel: [ 2568.697150] hid libahci pps_core linear dm_mirror dm_region_hash dm_log Jan 23 14:16:17 my_machine kernel: [ 2568.702689] CPU: 0 PID: 2440 Comm: nfsd Tainted: GW 4.9.5-custom #1 Jan 23 14:16:17 my_machine kernel: [ 2568.710166] Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28 /2014 Jan 23 14:16:17 my_machine kernel: [ 2568.719207] task: 95a42addab80 task.stack: b9da8533 Jan 23 14:16:17 my_machine kernel: [ 2568.725124] RIP: 0010:[] [] btrfs_delete_delayed_dir_inde x+0x286/0x290 [btrfs] Jan 23 14:16:17 my_machine kernel: [ 2568.735604] RSP: 0018:b9da85333be0 EFLAGS: 00010286 Jan 23 14:16:17 my_machine kernel: [ 2568.740917] RAX: RBX: 95a3b104b690 RCX: Jan 23 14:16:17 my_machine kernel: [ 2568.748048] RDX: 0001 RSI: 95a42fc0dcc8 RDI: 95a42fc0dcc8 Jan 23 14:16:17 my_machine kernel: [ 2568.755171] RBP: b9da85333c48 R08: 0491 R09: Jan 23 14:16:17 my_machine kernel: [ 2568.762297] R10: 0005 R11: 0006 R12: 95a3b104b6d8 Jan 23 14:16:17 my_machine kernel: [ 2568.769429] R13: 5d02 R14: 95a82953d800 R15: ffef Jan 23 14:16:17 my_machine kernel: [ 2568.776555] FS: () GS:95a42fc0() knlGS: Jan 23 14:16:17 my_machine kernel: [ 2568.784639] CS: 0010 DS: ES: CR0: 80050033 Jan 23 14:16:17 my_machine kernel: [ 2568.790377] CR2: 7f12ea376000 CR3: 0003e1e07000 CR4: 001406f0 Jan 23 14:16:17 my_machine kernel: [ 2568.797503] Stack: Jan 23 14:16:17 my_machine kernel: [ 2568.799524] 9b7fe5f2 95a3b104b560 0004 95a3f96b3e80 Jan 23 14:16:17 my_machine kernel: [ 2568.806983] 95a3f96b3e80 39ff95a814eeeb68 6000289c 5d02 Jan 23 14:16:17 my_machine kernel: [ 2568.814436] 95a3f7457c40 95a3bcb74138 95a814eeeb68 00289c39 Jan 23 14:16:17 my_machine kernel: [ 2568.821891] Call Trace: Jan 23 14:16:17 my_machine kernel: [ 2568.824343] [] ? mutex_lock+0x12/0x2f Jan 23 14:16:17 my_machine kernel: [ 2568.829671] [] __btrfs_unlink_inode+0x198/0x4c0 [btrfs] Jan 23 14:16:17 my_machine kernel: [ 2568.836555] [] btrfs_unlink_inode+0x1c/0x40 [btrfs] Jan 23 14:16:17 my_machine kernel: [ 2568.843086] [] btrfs_unlink+0x6b/0xb0 [btrfs] Jan 23 14:16:17 my_machine kernel: [ 2568.849091] [] vfs_unlink+0xda/0x190 Jan 23 14:16:17 my_machine kernel: [ 2568.854315] [] ? lookup_one_len+0xd3/0x130 Jan 23 14:16:17 my_machine kernel: [ 2568.860075] [] nfsd_unlink+0x16e/0x210 [nfsd] Jan 23 14:16:17 my_machine kernel: [ 2568.866084] [] nfsd3_proc_remove+0x7c/0x110 [nfsd] Jan 23 14:16:17 my_machine kernel: [ 2568.872529] [] nfsd_dispatch+0xb8/0x1f0 [nfsd] Jan 23 14:16:17 my_machine kernel: [ 2568.878641] [] svc_process_common+0x43f/0x700 [sunrpc] Jan 23 14:16:17 my_machine kernel: [ 2568.885432] [] svc_process+0xfc/0x1c0 [sunrpc] Jan 23 14:16:17 my_machine kernel: [ 2568.891528] [] nfsd+0xf0/0x160 [nfsd] Jan 23 14:16:17 my_machine kernel: [ 2568.896838] [] ? nfsd_destroy+0x60/0x60 [nfsd] Jan 23 14:16:17 my_machine kernel: [ 2568.902931] [] kthread+0xca/0xe0 Jan 23 14:16:17 my_machine kernel: [ 2568.907807] [] ? kthread_park+0x60/0x60 Jan 23 14:16:17 my_machine kernel: [ 2568.913296] [] ret_from_fork+0x25/0x30 Jan 23 14:16:17 my_machine kernel: [ 2568.918693] Code: ff ff 48 8b 43 10 49 8b b
kernel crash after upgrading to 4.9
Hi All, I seem to have a similar issue to a subject in December: Subject: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another In my case, this is caused when rsync'ing large amounts of data over NFS to the server with the BTRFS file system. This was not apparent in the previous kernel (4.7). The poster mentioned some suggestions from Ducan here: https://mail-archive.com/linux-btrfs@vger.kernel.org/msg60083.html But those are not visible in the thread. What suggestions were given to help alleviate this pain? -Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
-[ cut here ] [ 79.922000] WARNING: CPU: 6 PID: 2632 at fs/btrfs/file.c:546 btrfs_drop_extent_cache+0x3e8/0x400 [btrfs] [ 79.922002] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btrfs aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd dm_multipath nfsd auth_rpcgss joydev nfs_acl mei_me nfs lpc_ich mei lockd wmi grace ipmi_si sunrpc ipmi_msghandler fscache shpchp ioatdma mac_hid lp parport ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic igb raid6_pq i2c_algo_bit libcrc32c dca usbhid raid1 ahci raid0 ptp megaraid_sas multipath hid libahci pps_core linear dm_mirror dm_region_hash dm_log [ 79.922063] CPU: 6 PID: 2632 Comm: mount Not tainted 4.7.0-custom #1 [ 79.922065] Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014 [ 79.922067] 88046ca1f538 813b816c [ 79.922071] 88046ca1f578 8107a321 02226ca1f5e0 [ 79.922074] 880841d19460 e000 880841e21290 880841e210c0 [ 79.922077] Call Trace: [ 79.922089] [] dump_stack+0x63/0x87 [ 79.922096] [] __warn+0xd1/0xf0 [ 79.922099] [] warn_slowpath_null+0x1d/0x20 [ 79.922117] [] btrfs_drop_extent_cache+0x3e8/0x400 [btrfs] [ 79.922133] [] __btrfs_drop_extents+0x5b2/0xd30 [btrfs] [ 79.922147] [] ? generic_bin_search.constprop.36+0x85/0x190 [btrfs] [ 79.922160] [] ? btrfs_set_path_blocking+0x36/0x70 [btrfs] [ 79.922173] [] ? btrfs_search_slot+0x438/0x970 [btrfs] [ 79.922178] [] ? kmem_cache_alloc+0x1d6/0x1f0 [ 79.922190] [] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [ 79.922205] [] btrfs_drop_extents+0x79/0xa0 [btrfs] [ 79.94] [] replay_one_extent+0x419/0x750 [btrfs] [ 79.922241] [] replay_one_buffer+0x4db/0x7d0 [btrfs] [ 79.922258] [] ? mark_extent_buffer_accessed+0x4f/0x70 [btrfs] [ 79.922274] [] walk_down_log_tree+0x1cc/0x3d0 [btrfs] [ 79.922289] [] walk_log_tree+0xba/0x1a0 [btrfs] [ 79.922304] [] btrfs_recover_log_trees+0x213/0x470 [btrfs] [ 79.922318] [] ? replay_one_extent+0x750/0x750 [btrfs] [ 79.922335] [] open_ctree+0x264d/0x2760 [btrfs] [ 79.922348] [] btrfs_mount+0xc94/0xeb0 [btrfs] [ 79.922353] [] ? find_next_zero_bit+0x1e/0x20 [ 79.922358] [] ? pcpu_next_unpop+0x3e/0x50 [ 79.922362] [] ? find_next_bit+0x19/0x20 [ 79.922368] [] mount_fs+0x39/0x160 [ 79.922371] [] ? __alloc_percpu+0x15/0x20 [ 79.922375] [] vfs_kern_mount+0x67/0x110 [ 79.922387] [] btrfs_mount+0x18b/0xeb0 [btrfs] [ 79.922390] [] ? find_next_zero_bit+0x1e/0x20 [ 79.922394] [] mount_fs+0x39/0x160 [ 79.922397] [] ? __alloc_percpu+0x15/0x20 [ 79.922399] [] vfs_kern_mount+0x67/0x110 [ 79.922402] [] do_mount+0x22a/0xd90 [ 79.922406] [] ? __kmalloc_track_caller+0x1af/0x250 [ 79.922408] [] ? strndup_user+0x41/0x80 [ 79.922411] [] ? memdup_user+0x42/0x70 [ 79.922413] [] SyS_mount+0x83/0xd0 [ 79.922418] [] entry_SYSCALL_64_fastpath+0x1e/0xa8 [ 79.922436] ---[ end trace 0db3466cdad31dcf ]--- On 08/09/2016 10:25 PM, Chris Murphy wrote: On Tue, Aug 9, 2016 at 6:29 PM, Matt McKinnon wrote: Spoke too soon. Do I need to continue to run with that mount option in place? It shouldn't be necessary. Something's still wrong for some reason, even with DUP metadata being CoW'd so someone else is going to have to speak up what the problem is. And that btrfs check not only doesn't come up clean but crashes suggests some confluence of things in kernel 4.3 and your hardware conspired to make the file system inconsistent in a way that isn't immediately recovering the usual way. That is, usebackuproots working suggests that there's a bug elsewhere in the storage stack because normally that shouldn't be necessary - something's happened out of order. 1 size 50.93TiB used 22.67TiB path /dev/sda1 What is the exact nature of this block device? If getting this back up and running is urgent I suggest inquiring on IRC what the next steps are. In the meantime I'd get a btrfs-image (which is probably going to be quite large given metadata is 60GiB), if that pukes then see if 'btrfs inspect-internal dump-tree /dev/sda1 > dumptree.log' which may also fail but before it fails might contain something useful. Obviously btrfs check shouldn't crash so that's a bug already. What do you get for free -m? It's known that btrfs check needs a lot of memory and pretty much all the metadata needs to be read in, so... if you have an SSD available it might make sense to setup a huge pile of swap on that SSD and rerun btrfs check. -- To unsubscr
Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
I performed a quick balance which gave me: [39020.030638] BTRFS info (device sda1): relocating block group 25428383236096 flags 1 [39020.206097] BTRFS warning (device sda1): block group 23113395863552 has wrong amount of free space [39020.206101] BTRFS warning (device sda1): failed to load free space cache for block group 23113395863552, rebuilding it now then a crash dump. Remounted with -o clear_cache,nospace_cache and the balance completed. Running a larger balance now. Will umount, and remount with default options to see if that works. -Matt On 08/10/2016 03:09 AM, g6094...@freenet.de wrote: Hi, from what i see you have a non finished balance ongoing, since you have system and metadata DUP and single information on disk. so you should (re)run a balance for this data. sash Am 10.08.2016 um 02:17 schrieb Matt McKinnon: -o usebackuproot worked well. after the file system settled, performing a sync and a clean umount, a normal mount works now as well. Anything I should be doing going forward? Thanks, Matt On 08/09/2016 08:01 PM, Chris Murphy wrote: On Tue, Aug 9, 2016 at 5:15 PM, Matt McKinnon wrote: Hello, Our server recently crashed and was rebooted. When it returned our BTRFS volume is mounting read-only: What happens when you try mounting with -o usebackuproot ? If that fails, what output do you get for 'btrfs check' (without --repair)? If you only get some "errors 400, nbytes wrong" then --repair should fix the problem. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
# btrfs check /dev/sda1 Checking filesystem on /dev/sda1 UUID: 33f9089e-acc7-4a39-8b83-b18bb182faaf checking extents ref mismatch on [958277767168 5894144] extent item 0, found 1 Backref 958277767168 root 257 owner 15799573 offset 750342144 num_refs 0 not found in extent tree Incorrect local backref count on 958277767168 root 257 owner 15799573 offset 750342144 found 1 wanted 0 back 0x15d380f90 backpointer mismatch on [958277767168 5894144] ref mismatch on [958298935296 9666560] extent item 0, found 2 Backref 958298935296 root 257 owner 15799573 offset 559185920 num_refs 0 not found in extent tree Incorrect local backref count on 958298935296 root 257 owner 15799573 offset 559185920 found 2 wanted 0 back 0x15d3809a0 backpointer mismatch on [958298935296 9666560] about 859 of those ... Then: owner ref check failed [25737445867520 16384] checking free space cache There is no free space entry for 109105479680-109105496064 There is no free space entry for 109105479680-109551026176 cache appears valid but isn't 109014155264 There is no free space entry for 139709693952-139709710336 There is no free space entry for 139709693952-140152668160 cache appears valid but isn't 139615797248 Wanted offset 171291525120, found 171291426816 Wanted offset 171291525120, found 171291426816 cache appears valid but isn't 171291181056 Wanted offset 220146597888, found 220146532352 Wanted offset 220146597888, found 220146532352 cache appears valid but isn't 220146434048 btrfs: unable to add free space :-17 free-space-cache.c:824: btrfs_add_free_space: Assertion `ret == -EEXIST` failed. btrfs[0x464af9] btrfs(btrfs_add_free_space+0x154)[0x46531f] btrfs(load_free_space_cache+0xab7)[0x465e36] btrfs(cmd_check+0x22c7)[0x42db0e] btrfs(main+0x155)[0x40a4fd] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7faad34cdf45] btrfs[0x40a0f9] and we crashed out of the check there. -Matt On 08/09/2016 08:06 PM, Chris Murphy wrote: On Tue, Aug 9, 2016 at 6:01 PM, Chris Murphy wrote: On Tue, Aug 9, 2016 at 5:15 PM, Matt McKinnon wrote: Hello, Our server recently crashed and was rebooted. When it returned our BTRFS volume is mounting read-only: What happens when you try mounting with -o usebackuproot ? If that fails, what output do you get for 'btrfs check' (without --repair)? If you only get some "errors 400, nbytes wrong" then --repair should fix the problem. This could also be a regression somewhere... https://bugzilla.kernel.org/show_bug.cgi?id=60522 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
Spoke too soon. Do I need to continue to run with that mount option in place? [ 83.775984] BTRFS warning (device sda1): block group 25741009879040 has wrong amount of free space [ 83.775989] BTRFS warning (device sda1): failed to load free space cache for block group 25741009879040, rebuilding it now [ 85.231748] BTRFS warning (device sda1): block group 25737721544704 has wrong amount of free space [ 85.231752] BTRFS warning (device sda1): failed to load free space cache for block group 25737721544704, rebuilding it now [ 98.913796] BTRFS info (device sda1): disk space caching is enabled [ 98.913803] BTRFS info (device sda1): has skinny extents [ 179.564408] BTRFS warning (device sda1): block group 78412513280 has wrong amount of free space [ 179.564414] BTRFS warning (device sda1): failed to load free space cache for block group 78412513280, rebuilding it now [ 667.106718] [ cut here ] [ 667.106772] WARNING: CPU: 0 PID: 2726 at fs/btrfs/extent-tree.c:2963 btrfs_run_delayed_refs+0x292/0x2d0 [btrfs] [ 667.106775] BTRFS: Transaction aborted (error -17) [ 667.106777] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel btrfs kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd dm_multipath joydev lpc_ich mei_me mei wmi ipmi_si ipmi_msghandler nfsd auth_rpcgss nfs_acl nfs lockd grace ioatdma sunrpc shpchp mac_hid fscache lp parport ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c hid_generic igb raid1 usbhid i2c_algo_bit ahci raid0 dca multipath ptp hid megaraid_sas libahci linear pps_core dm_mirror dm_region_hash dm_log [ 667.106859] CPU: 0 PID: 2726 Comm: btrfs-transacti Not tainted 4.7.0-custom #1 [ 667.106861] Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014 [ 667.106864] 880464e73c08 813b816c 880464e73c58 [ 667.106869] 880464e73c48 8107a321 0b936c3cc170 [ 667.106873] 880443191130 88046c3cc170 88046b43f000 [ 667.106878] Call Trace: [ 667.106889] [] dump_stack+0x63/0x87 [ 667.106896] [] __warn+0xd1/0xf0 [ 667.106901] [] warn_slowpath_fmt+0x4f/0x60 [ 667.106925] [] btrfs_run_delayed_refs+0x292/0x2d0 [btrfs] [ 667.106947] [] btrfs_write_dirty_block_groups+0x178/0x3b0 [btrfs] [ 667.106974] [] commit_cowonly_roots+0x23c/0x2e0 [btrfs] [ 667.106999] [] btrfs_commit_transaction+0x4fb/0xa80 [btrfs] [ 667.107021] [] transaction_kthread+0x1d2/0x200 [btrfs] [ 667.107042] [] ? btrfs_cleanup_transaction+0x580/0x580 [btrfs] [ 667.107047] [] kthread+0xc9/0xe0 [ 667.107053] [] ret_from_fork+0x1f/0x40 [ 667.107056] [] ? kthread_park+0x60/0x60 [ 667.107060] ---[ end trace 336c80ba4db66e78 ]--- [ 667.107065] BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists [ 667.116389] BTRFS info (device sda1): forced readonly [ 667.117081] BTRFS warning (device sda1): Skipping commit of aborted transaction. [ 667.117086] BTRFS: error (device sda1) in cleanup_transaction:1853: errno=-17 Object already exists On 08/09/2016 08:06 PM, Chris Murphy wrote: On Tue, Aug 9, 2016 at 6:01 PM, Chris Murphy wrote: On Tue, Aug 9, 2016 at 5:15 PM, Matt McKinnon wrote: Hello, Our server recently crashed and was rebooted. When it returned our BTRFS volume is mounting read-only: What happens when you try mounting with -o usebackuproot ? If that fails, what output do you get for 'btrfs check' (without --repair)? If you only get some "errors 400, nbytes wrong" then --repair should fix the problem. This could also be a regression somewhere... https://bugzilla.kernel.org/show_bug.cgi?id=60522 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
-o usebackuproot worked well. after the file system settled, performing a sync and a clean umount, a normal mount works now as well. Anything I should be doing going forward? Thanks, Matt On 08/09/2016 08:01 PM, Chris Murphy wrote: On Tue, Aug 9, 2016 at 5:15 PM, Matt McKinnon wrote: Hello, Our server recently crashed and was rebooted. When it returned our BTRFS volume is mounting read-only: What happens when you try mounting with -o usebackuproot ? If that fails, what output do you get for 'btrfs check' (without --repair)? If you only get some "errors 400, nbytes wrong" then --repair should fix the problem. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
Hello, Our server recently crashed and was rebooted. When it returned our BTRFS volume is mounting read-only: [ 142.395093] BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists [ 142.404418] BTRFS info (device sda1): forced readonly I tried upgrading the kernel from 4.3 to 4.7. Upgraded btrfs-progs to v4.7 as well. # uname -a Linux hostname 4.7.0-custom #1 SMP Tue Aug 9 11:16:28 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux # btrfs --version btrfs-progs v4.7 # btrfs fi show Label: none uuid: 33f9089e-acc7-4a39-8b83-b18bb182faaf Total devices 1 FS bytes used 14.95TiB devid1 size 50.93TiB used 22.67TiB path /dev/sda1 # btrfs fi df /export/ Data, single: total=22.53TiB, used=14.89TiB System, DUP: total=40.00MiB, used=2.39MiB System, single: total=4.00MiB, used=0.00B Metadata, DUP: total=70.50GiB, used=60.21GiB Metadata, single: total=1.51GiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B # dmesg [ 142.394841] [ cut here ] [ 142.394874] WARNING: CPU: 6 PID: 269 at fs/btrfs/extent-tree.c:2963 btrfs_run_delayed_refs+0x292/0x2d0 [btrfs] [ 142.394876] BTRFS: Transaction aborted (error -17) [ 142.394878] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd dm_multipath joydev lpc_ich mei_me mei ioatdma wmi ipmi_si ipmi_msghandler shpchp mac_hid btrfs lp parport ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor igb raid6_pq libcrc32c i2c_algo_bit raid1 hid_generic dca usbhid raid0 ptp hid ahci megaraid_sas multipath libahci pps_core linear dm_mirror dm_region_hash dm_log [ 142.394942] CPU: 6 PID: 269 Comm: kworker/u18:5 Not tainted 4.7.0-custom #1 [ 142.394944] Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014 [ 142.394966] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [ 142.394969] 88086a057ca0 813b816c 88086a057cf0 [ 142.394972] 88086a057ce0 8107a321 0b9325288170 [ 142.394975] 8808519eb000 880825288170 88086b2c1000 0020 [ 142.394978] Call Trace: [ 142.394987] [] dump_stack+0x63/0x87 [ 142.394993] [] __warn+0xd1/0xf0 [ 142.394996] [] warn_slowpath_fmt+0x4f/0x60 [ 142.395012] [] btrfs_run_delayed_refs+0x292/0x2d0 [btrfs] [ 142.395025] [] delayed_ref_async_start+0x94/0xb0 [btrfs] [ 142.395044] [] normal_work_helper+0xc0/0x2d0 [btrfs] [ 142.395050] [] ? pwq_activate_delayed_work+0x42/0xb0 [ 142.395066] [] btrfs_extent_refs_helper+0x12/0x20 [btrfs] [ 142.395070] [] process_one_work+0x153/0x3f0 [ 142.395073] [] worker_thread+0x12b/0x4b0 [ 142.395076] [] ? rescuer_thread+0x340/0x340 [ 142.395079] [] kthread+0xc9/0xe0 [ 142.395085] [] ret_from_fork+0x1f/0x40 [ 142.395088] [] ? kthread_park+0x60/0x60 [ 142.395090] ---[ end trace e2b0b8dc37502011 ]--- [ 142.395093] BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists [ 142.404418] BTRFS info (device sda1): forced readonly -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Data recovery from a linear multi-disk btrfs file system
> On 15 Jul 2016, at 14:10, Austin S. Hemmelgarn wrote: > > On 2016-07-15 05:51, Matt wrote: >> Hello >> >> I glued together 6 disks in linear lvm fashion (no RAID) to obtain one large >> file system (see below). One of the 6 disk failed. What is the best way to >> recover from this? >> > The tool you want is `btrfs restore`. You'll need somewhere to put the files > from this too of course. That said, given that you had data in raid0 mode, > you're not likely to get much other than very small files back out of this, > and given other factors, you're not likely to get what you would consider > reasonable performance out of this either. Thanks so much for pointing me towards btrfs-restore. I surely will give it a try. Note that the FS is not a RAID0 but linear (“JPOD") configuration. This is why it somehow did not occur to me to try btrfs-restore. The good news about in this configuration the files are *not* distributed across disks. We can read most of the files just fine. The failed disk was actually smaller than the others five so that we should be able to recover more than 5/6 of the data, shouldn’t we? My trouble is that the IO errors due to the missing disk cripple the transfer speed of both rsync and dd_rescue. > Your best bet to get a working filesystem again would be to just recreate it > from scratch, there's not much else that can be done when you've got a raid0 > profile and have lost a disk. This is what I plan to do if there if btrfs-restore turns out to be too slow and nobody on this list has any better idea. It will, however, require transferring >15TB across the Atlantic (this is were the “backup” reside). This can be tedious which is why I would love to avoid it. Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Data recovery from a linear multi-disk btrfs file system
Hello I glued together 6 disks in linear lvm fashion (no RAID) to obtain one large file system (see below). One of the 6 disk failed. What is the best way to recover from this? Thanks to RAID1 of the metadata I can still access the data residing on the remaining 5 disks after mounting ro,force. What I would like to do now is to 1) Find out the names of all the files with missing data 2) Make the file system fully functional (rw) again. To achieve 2 I wanted to move the data of the disk. This, however, turns out to be rather difficult. - rsync does not provide a immediate time-out option in case of an IO error - Even when I set the time-out for dd_rescue to a minimum, the transfer speed is still way too low to move the data (> 15TB) off the file system. Both methods are too slow to move off the data within a reasonable time frame. Does anybody have a suggestion how to best recover from this? (Our backup is incomplete). I am looking for either a tool to move off the data — something which gives up immediately in case of IO error and log the affected files. Alternatively I am looking for a btrfs command like “ btrfs device delete missing “ for a non-RAID multi-disk btrfs filesystem. Would some variant of "btrfs balance" do something helpful? Any help is appreciated! Regards, Matt # btrfs fi show Label: none uuid: d82fff2c-0232-47dd-a257-04c67141fc83 Total devices 6 FS bytes used 16.83TiB devid1 size 3.64TiB used 3.47TiB path /dev/sdc devid2 size 3.64TiB used 3.47TiB path /dev/sdd devid3 size 3.64TiB used 3.47TiB path /dev/sde devid4 size 3.64TiB used 3.47TiB path /dev/sdf devid5 size 1.82TiB used 1.82TiB path /dev/sdb *** Some devices missing # btrfs fi df /work Data, RAID0: total=18.31TiB, used=16.80TiB Data, single: total=8.00MiB, used=8.00MiB System, RAID1: total=8.00MiB, used=896.00KiB System, single: total=4.00MiB, used=0.00B Metadata, RAID1: total=34.00GiB, used=30.18GiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs tragedy: lack of space for metadata leads to loss of fs.
On 2015-08-25 09:44, Miguel Negrão wrote: > Hi list, > > This weekend had my first btrfs horror story. > > system: 3.13.0-49-lowlatency, btrfs-progs v4.1.2 > > A disclaimer: I know 3.13 is very out of date, but I the requirement of > keeping kernel up to date clashes with my requirement of keeping a stable > system. At the moment I can't disturb my system as I'm doing important work, > upgrading kernel requires upgrading ubuntu, which will upgrade a lot of > packages and might lead to problems which I don't have time to fix. One > might argue that in the end I lost time anyway dealing with these btrfs > issues. When I'm done with this current work I will update the whole system > which will update the kernel in the process. Hi- I have no useful advice about filesystem recovery, but would like to point out that newer kernels are backported to Ubuntu LTS versions and can be installed without any significant disruption of the system. The normal kernel backports are named 'linux-generic-lts-', and the low latency versions are 'linux-lowlatency-lts-', so you could install kernel 3.16 (from 14.10 "utopic") by installing linux-lowlatency-lts-utopic, and kernel 3.19 (from 15.04 "vivid") by installing linux-lowlatency-lts-vivid. Kernel 4.1 will be available as linux-{generic,lowlatency}-lts-wily a bit after 15.10 is released. MMR... signature.asc Description: OpenPGP digital signature
[PATCH RESEND] btrfs: Align EOF length to block in extent_same
It is not currently possible to deduplicate the last block of files whose size is not a multiple of the block size, as the btrfs_extent_same ioctl returns -EINVAL if offset + size is greater than the file size or is not aligned to the fs block size. For example, with the default block size of 16K and two identical 1,000,000 byte files, calling the extent_same ioctl with offset 0 and length set to 1,000,000 the call fails with -EINVAL. The same call with a length of 999,424 will succeed, but the final 576 bytes can then not be shared. This seems to have a larger impact on the amount of space actually freed by the ioctl than would be expected - in my testing the amount of space freed was generally reduced by 50-100% for files sized from a few megabytes downwards which has a significant negative impact on the usefulness of the extent_same ioctl in some circumstances. To resolve this, this patch allows unaligned offset + length values to be passed to btrfs_ioctl_file_extent_same if offset + length is equal to the file size of both src and dest. This is implemented in the same way as in btrfs_ioctl_clone. To return to the earlier example 1,000,000 byte file - this patch would allow a length of 1,000,000 bytes to be passed as it is equal to the file lengths and would be internally extended to the end of the block (1,015,808), allowing one set of extents to be shared completely between the full length of both files. Signed-off-by: Matt Robinson --- fs/btrfs/ioctl.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index ca5d968..0588076 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2878,14 +2878,16 @@ static int btrfs_cmp_data(struct inode *src, u64 loff, struct inode *dst, return ret; } -static int extent_same_check_offsets(struct inode *inode, u64 off, u64 len) +static int extent_same_check_offsets(struct inode *inode, u64 off, u64 len, +u64 len_aligned) { u64 bs = BTRFS_I(inode)->root->fs_info->sb->s_blocksize; if (off + len > inode->i_size || off + len < off) return -EINVAL; + /* Check that we are block aligned - btrfs_clone() requires this */ - if (!IS_ALIGNED(off, bs) || !IS_ALIGNED(off + len, bs)) + if (!IS_ALIGNED(off, bs) || !IS_ALIGNED(off + len_aligned, bs)) return -EINVAL; return 0; @@ -2895,6 +2897,8 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 len, struct inode *dst, u64 dst_loff) { int ret; + u64 len_aligned = len; + u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize; /* * btrfs_clone() can't handle extents in the same file @@ -2909,11 +2913,15 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 len, btrfs_double_lock(src, loff, dst, dst_loff, len); - ret = extent_same_check_offsets(src, loff, len); + /* if we extend to both eofs, continue to block boundaries */ + if (loff + len == src->i_size && dst_loff + len == dst->i_size) + len_aligned = ALIGN(src->i_size, bs) - loff; + + ret = extent_same_check_offsets(src, loff, len, len_aligned); if (ret) goto out_unlock; - ret = extent_same_check_offsets(dst, dst_loff, len); + ret = extent_same_check_offsets(dst, dst_loff, len, len_aligned); if (ret) goto out_unlock; @@ -2926,7 +2934,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 len, ret = btrfs_cmp_data(src, loff, dst, dst_loff, len); if (ret == 0) - ret = btrfs_clone(src, dst, loff, len, len, dst_loff); + ret = btrfs_clone(src, dst, loff, len, len_aligned, dst_loff); out_unlock: btrfs_double_unlock(src, loff, dst, dst_loff, len); @@ -3172,8 +3180,7 @@ static void clone_update_extent_map(struct inode *inode, * @inode: Inode to clone to * @off: Offset within source to start clone from * @olen: Original length, passed by user, of range to clone - * @olen_aligned: Block-aligned value of olen, extent_same uses - * identical values here + * @olen_aligned: Block-aligned value of olen * @destoff: Offset within @inode to start clone */ static int btrfs_clone(struct inode *src, struct inode *inode, -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs-cleaner FS DoS issues
Hi! When ever I delete a large snapshot this stalls all the processes on my system for 30 minutes plus, kernel v19.2. Btrfs-cleaner is taking 100% CPU when completely stalled. Every few minutes its say 99.8% and satall abates and processes/io happen. About to lodge a kernel bug about this as it is serious issue. Could some one look at making the clean up process more sensitive to when the system is idle? MD Raid is very good at this, and it should be possible to set this up. Best Regards, Matt Grant
Re: [PATCH 1/1] btrfs: Align EOF length to block in extent_same
Hi All, As David hasn't got back to me I'm guessing that he is too busy with other things at present. If anyone else is able to spare the time to review my patch and give me feedback that would be very much appreciated. Many Thanks, Matt On 3 March 2015 at 00:27, Zygo Blaxell wrote: > > I second this. I've seen the same behavior. > > Clone seems to have evolved a little further than extent-same knows about. > e.g. there is code in the extent-same ioctl that tries to avoid doing > a clone from within one inode to elsewhere in the same inode; however, > the clone ioctl (which extent-same calls) has no such restriction. > > As Matt mentioned, clone_range seems quite happy to accept a partial block > at EOF. cp --reflink would be much harder to use if it did not. > > On Mon, Mar 02, 2015 at 08:59:11PM +, Matt Robinson wrote: > > Hi David, > > > > Have you had a chance to look at this? Am very happy to answer > > further questions, adjust my implementation, provide a different kind > > of test case, etc. > > > > Many Thanks, > > > > Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Python pybtrfs df wrapper script to report btrfs metadata, block, space in df compatible output
Hi! Use this at work. Releasing to list to prompt design of output that is 100% df format compatible for automated reporting and graphing. This is so BTRFS space statistics acn bre reported back to existing graphing tool chains that many of you would have in place. Quite useful for munin, hint, hint :-) URL for github is: https://github.com/grantma/pybtrfs.git There is also a shell script there that can be called from cron. Please get back to me if you have any questions. Standard not warranted disclaimers apply to the code. Its GPLv3 licensed. Best Regards, Matt Grant -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] btrfs: Align EOF length to block in extent_same
Hi David, Have you had a chance to look at this? Am very happy to answer further questions, adjust my implementation, provide a different kind of test case, etc. Many Thanks, Matt On 28 January 2015 at 19:46, Matt Robinson wrote: > On 28 January 2015 at 12:55, David Sterba wrote: >> On Mon, Jan 26, 2015 at 06:05:51PM +0000, Matt Robinson wrote: >>> It is not currently possible to deduplicate the last block of files >>> whose size is not a multiple of the block size, as the btrfs_extent_same >>> ioctl returns -EINVAL if offset + size is greater than the file size or >>> is not aligned to the fs block size. >> >> Do you have a reproducer for that? > > I've been using the (quick and dirty) bash script at the end of this > mail which uses btrfs-extent-same from > https://github.com/markfasheh/duperemove/ to call the ioctl. To > summarize: it creates a new filesystem, creates a file with a size > which is not a multiple of the block size, copies it, and then calls > the ioctl to ask firstly for all of the complete blocks (for > comparison) and then the entire files to be deduplicated. > > Running the script under a kernel compiled from > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git gives > a status of -22 from the second btrfs-extent-same call and the final > btrfs filesystem df shows: > Data, single: total=8.00MiB, used=1.91MiB > > However, running under the same kernel plus my patch shows this final > data usage: > Data, single: total=8.00MiB, used=980.00KiB > >> The alignment is required to let btrfs_clone and the extent dropping >> functions to work. [...] > > Which is why it is currently not possible to deduplicate a final > incomplete block of a file: > * Passing len + offset = the actual end of the file: Rejected as it is > not aligned > * Passing len + offset = the end of the block: Rejected as it exceeds > the actual end of the file. > > Please let me know if you need any further information, if my > implementation should be different or there is a better way I could > demonstrate the issue? > > Many Thanks, > > Matt > > --- > > #!/bin/bash -e > > if [[ $EUID -ne 0 ]]; then >echo "This script must be run as root" >exit 1 > fi > > loopback=$(losetup -f) > > echo "## Create new btrfs filesystem on a loopback device" > dd if=/dev/zero of=testfs bs=1048576 count=1500 > losetup $loopback testfs > mkfs.btrfs $loopback > mkdir testfsmnt > mount $loopback testfsmnt > > echo -e "\n## Create 100 byte random file" > dd if=/dev/urandom of=testfsmnt/test1 bs=100 count=1 > echo > btrfs filesystem sync testfsmnt > btrfs filesystem df testfsmnt > > echo -e "\n## Copy file" > cp testfsmnt/test1 testfsmnt/test2 > echo > btrfs filesystem sync testfsmnt > btrfs filesystem df testfsmnt > > echo -e "\n## Dedupe to end of last full block" > btrfs-extent-same 999424 testfsmnt/test1 0 testfsmnt/test2 0 > echo > btrfs filesystem sync testfsmnt > btrfs filesystem df testfsmnt > > echo -e "\n## Dedupe to end of file" > btrfs-extent-same 100 testfsmnt/test1 0 testfsmnt/test2 0 > echo > btrfs filesystem sync testfsmnt > btrfs filesystem df testfsmnt > > echo -e "\nClean up" > umount testfsmnt > rmdir testfsmnt > losetup -d $loopback > rm testfs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs fixes, changes don't appear on git repo
On Thu, Feb 26, 2015 at 11:04 PM, Chris Mason wrote: > > > On Thu, Feb 26, 2015 at 4:49 PM, Matt wrote: >> >> Hi linux-btrfs list, >> >> Hi Chris, Hi Josef, >> >> >> it seemingly happened in the past and now it seems to happen again: >> >> after patches have been posted to the linux-btrfs mailing list and >> pulled by Linus, >> >> changes occured and additional pull-requests followed - the old >> commits don't appear to be anywhere accessible besides Linus' tree >> >> >> example: >> >> http://marc.info/?l=linux-btrfs&m=142203898505309&w=2 >> >> [GIT PULL] Btrfs fixes >> from January 23rd > > > Sorry for the confusion. What happens is that I send Linus pulls for the > things he's missing, and we have slightly parallel development branches. > > Before 3.19-rc1, I forked 3.18-rc5 and rebased my 3.19 merge window on top > of that. All of my commits for 3.19 went on top of this branch. > > I forked our tree for the 4.0 merge window at 3.19-rc5. This is where all > the 4.0 commits went. But, 3.19 kept rolling and we had additional fixes in > before 3.19-final. > > I use the same branch for every pull to Linus (for-linus), so during > 3.19-rc6 I sent him code on top of for-linus, which at the time was based on > 3.18-rc5 and had all my 3.19 code in it. > > Then the 4.0 merge window started and I switched to my 3.19-rc5 based merge > window tree, which was actually missing the commit you mentioned because > Linus took it after rc5. > > It all works for Linus because git merges things easily, and he actually > prefers that you don't merge in later releases unless you need some fix to > keep things stable. In other words, if my for-linus for the 4.0 merge > window has a merge with 3.19-final, he may push back. Thanks for the swift and elaborate explanation ! Yes, that's what got me now and in the past confused - I'm sure I'm not the only one ;) > > In general, you can take my for-linus on top of the last released Linus > kernel and have all the current commits that are considered stable. That's the plan :) > > In the future, I'll keep a for-linus-xxyyzz for the last release to make > this less confusing. Wow, this would make things a lot clearer and getting an overview much faster ! Your repo then probably would resemble Paul E. McKenney's ( https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git/ ) but I like it that way - everything accessible and comprehensible Surely a win-win for the devs & community Thanks again > > -chris > > > Kind Regards Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs fixes, changes don't appear on git repo
Hi linux-btrfs list, Hi Chris, Hi Josef, it seemingly happened in the past and now it seems to happen again: after patches have been posted to the linux-btrfs mailing list and pulled by Linus, changes occured and additional pull-requests followed - the old commits don't appear to be anywhere accessible besides Linus' tree example: http://marc.info/?l=linux-btrfs&m=142203898505309&w=2 [GIT PULL] Btrfs fixes from January 23rd I picked a specific patch out: http://marc.info/?l=linux-btrfs&m=142141473603234&w=2 Btrfs: fix race deleting block group from space_info->ro_bgs list since it might lead to lockups I've seen some lockups in the past few days which involved Btrfs while playing and adding some bleeding edge patches to my custom kernel (3.19 based) - or in general to have Btrfs' latest and greatest code thus I want to make sure that I have the latest stability-/bugfix-related patches When searching at https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/ for "Btrfs: fix race deleting block group from space_info->ro_bgs list" it winds up in Linus' repo with a merge/pull from January 24th https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=Btrfs%3A+fix+race+deleting+block+group+from+space_info-%3Ero_bgs+list But when searching in the "integration" or current "for-linus" branch http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/log/?h=integration http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/log/?h=for-linus http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/log/?h=for-linus&qt=grep&q=Btrfs%3A+fix+race+deleting+block+group+from+space_info-%3Ero_bgs+list http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/log/?h=integration&qt=grep&q=Btrfs%3A+fix+race+deleting+block+group+from+space_info-%3Ero_bgs+list There is no such commit - which I can't seem to wrap my head around The same result with git log --grep "foo ..." after having fetched the latest state of Chris' repo Am I missing something ? It would be really nice to have a repo where all of the latest Btrfs patches are stored and accessible - and a clear picture on why this weirdness happens Sorry if this was already asked in the past, since I'm not aware of such a report Many thanks in advance for your answers Kind Regards Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] btrfs: Align EOF length to block in extent_same
On 28 January 2015 at 12:55, David Sterba wrote: > On Mon, Jan 26, 2015 at 06:05:51PM +0000, Matt Robinson wrote: >> It is not currently possible to deduplicate the last block of files >> whose size is not a multiple of the block size, as the btrfs_extent_same >> ioctl returns -EINVAL if offset + size is greater than the file size or >> is not aligned to the fs block size. > > Do you have a reproducer for that? I've been using the (quick and dirty) bash script at the end of this mail which uses btrfs-extent-same from https://github.com/markfasheh/duperemove/ to call the ioctl. To summarize: it creates a new filesystem, creates a file with a size which is not a multiple of the block size, copies it, and then calls the ioctl to ask firstly for all of the complete blocks (for comparison) and then the entire files to be deduplicated. Running the script under a kernel compiled from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git gives a status of -22 from the second btrfs-extent-same call and the final btrfs filesystem df shows: Data, single: total=8.00MiB, used=1.91MiB However, running under the same kernel plus my patch shows this final data usage: Data, single: total=8.00MiB, used=980.00KiB > The alignment is required to let btrfs_clone and the extent dropping > functions to work. [...] Which is why it is currently not possible to deduplicate a final incomplete block of a file: * Passing len + offset = the actual end of the file: Rejected as it is not aligned * Passing len + offset = the end of the block: Rejected as it exceeds the actual end of the file. Please let me know if you need any further information, if my implementation should be different or there is a better way I could demonstrate the issue? Many Thanks, Matt --- #!/bin/bash -e if [[ $EUID -ne 0 ]]; then echo "This script must be run as root" exit 1 fi loopback=$(losetup -f) echo "## Create new btrfs filesystem on a loopback device" dd if=/dev/zero of=testfs bs=1048576 count=1500 losetup $loopback testfs mkfs.btrfs $loopback mkdir testfsmnt mount $loopback testfsmnt echo -e "\n## Create 100 byte random file" dd if=/dev/urandom of=testfsmnt/test1 bs=100 count=1 echo btrfs filesystem sync testfsmnt btrfs filesystem df testfsmnt echo -e "\n## Copy file" cp testfsmnt/test1 testfsmnt/test2 echo btrfs filesystem sync testfsmnt btrfs filesystem df testfsmnt echo -e "\n## Dedupe to end of last full block" btrfs-extent-same 999424 testfsmnt/test1 0 testfsmnt/test2 0 echo btrfs filesystem sync testfsmnt btrfs filesystem df testfsmnt echo -e "\n## Dedupe to end of file" btrfs-extent-same 100 testfsmnt/test1 0 testfsmnt/test2 0 echo btrfs filesystem sync testfsmnt btrfs filesystem df testfsmnt echo -e "\nClean up" umount testfsmnt rmdir testfsmnt losetup -d $loopback rm testfs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] btrfs: Align EOF length to block in extent_same
It is not currently possible to deduplicate the last block of files whose size is not a multiple of the block size, as the btrfs_extent_same ioctl returns -EINVAL if offset + size is greater than the file size or is not aligned to the fs block size. This prevents btrfs from freeing up the last extent in the file, causing gains from deduplication to be smaller than expected. To resolve this, allow unaligned offset + length values to be passed to btrfs_ioctl_file_extent_same if offset + length = file size for both src and dest. This is implemented in the same way as in btrfs_ioctl_clone. Signed-off-by: Matt Robinson --- fs/btrfs/ioctl.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index d49fe8a..a407d8a 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2871,14 +2871,16 @@ static int btrfs_cmp_data(struct inode *src, u64 loff, struct inode *dst, return ret; } -static int extent_same_check_offsets(struct inode *inode, u64 off, u64 len) +static int extent_same_check_offsets(struct inode *inode, u64 off, u64 len, +u64 len_aligned) { u64 bs = BTRFS_I(inode)->root->fs_info->sb->s_blocksize; if (off + len > inode->i_size || off + len < off) return -EINVAL; + /* Check that we are block aligned - btrfs_clone() requires this */ - if (!IS_ALIGNED(off, bs) || !IS_ALIGNED(off + len, bs)) + if (!IS_ALIGNED(off, bs) || !IS_ALIGNED(off + len_aligned, bs)) return -EINVAL; return 0; @@ -2888,6 +2890,8 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 len, struct inode *dst, u64 dst_loff) { int ret; + u64 len_aligned = len; + u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize; /* * btrfs_clone() can't handle extents in the same file @@ -2899,11 +2903,15 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 len, btrfs_double_lock(src, loff, dst, dst_loff, len); - ret = extent_same_check_offsets(src, loff, len); + /* if we extend to both eofs, continue to block boundaries */ + if (loff + len == src->i_size && dst_loff + len == dst->i_size) + len_aligned = ALIGN(src->i_size, bs) - loff; + + ret = extent_same_check_offsets(src, loff, len, len_aligned); if (ret) goto out_unlock; - ret = extent_same_check_offsets(dst, dst_loff, len); + ret = extent_same_check_offsets(dst, dst_loff, len, len_aligned); if (ret) goto out_unlock; @@ -2916,7 +2924,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 len, ret = btrfs_cmp_data(src, loff, dst, dst_loff, len); if (ret == 0) - ret = btrfs_clone(src, dst, loff, len, len, dst_loff); + ret = btrfs_clone(src, dst, loff, len, len_aligned, dst_loff); out_unlock: btrfs_double_unlock(src, loff, dst, dst_loff, len); @@ -3162,8 +3170,7 @@ static void clone_update_extent_map(struct inode *inode, * @inode: Inode to clone to * @off: Offset within source to start clone from * @olen: Original length, passed by user, of range to clone - * @olen_aligned: Block-aligned value of olen, extent_same uses - * identical values here + * @olen_aligned: Block-aligned value of olen * @destoff: Offset within @inode to start clone */ static int btrfs_clone(struct inode *src, struct inode *inode, -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
corruption, bad block, input/output errors - do i run --repair?
esys+0xe1/0xe6 [501087.544199] ---[ end trace e2a77238816656f5 ]--- [501087.579519] parent transid verify failed on 20809493159936 wanted 4486137218058286914 found 390978 I have been sending incremental snapshot dumps over to an identical file server as backups. Everything checks out OK there. Do I try to run check with --repair first, and fall back to my backup if that fails? -Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
0a000 8806210f16b0 8806180abfd8 81e11500 [16388.319453] 8806210f16b0 0206 8113e6cc 88081ec135c0 [16388.319455] Call Trace: [16388.319460] [] ? delayacct_end+0x7c/0x90 [16388.319463] [] ? wait_on_page_read+0x60/0x60 [16388.319467] [] ? io_schedule+0x88/0xe0 [16388.319468] [] ? sleep_on_page+0x5/0x10 [16388.319469] [] ? __wait_on_bit_lock+0x3c/0x90 [16388.319471] [] ? __lock_page+0x65/0x70 [16388.319475] [] ? autoremove_wake_function+0x30/0x30 [16388.319477] [] ? __find_lock_page+0x44/0x70 [16388.319478] [] ? find_or_create_page+0x2a/0xa0 [16388.319481] [] ? io_ctl_prepare_pages+0x4f/0x150 [16388.319483] [] ? __load_free_space_cache+0x195/0x5d0 [16388.319485] [] ? load_free_space_cache+0xeb/0x1b0 [16388.319488] [] ? cache_block_group+0x191/0x390 [16388.319489] [] ? prepare_to_wait_event+0xf0/0xf0 [16388.319491] [] ? find_free_extent+0x95a/0xdb0 [16388.319493] [] ? btrfs_reserve_extent+0x69/0x150 [16388.319496] [] ? cow_file_range+0x136/0x420 [16388.319497] [] ? submit_compressed_extents+0x1f3/0x480 [16388.319499] [] ? submit_compressed_extents+0x480/0x480 [16388.319500] [] ? normal_work_helper+0x1ab/0x330 [16388.319503] [] ? process_one_work+0x16d/0x490 [16388.319504] [] ? worker_thread+0x12b/0x410 [16388.319505] [] ? manage_workers.isra.28+0x2c0/0x2c0 [16388.319507] [] ? kthread+0xca/0xe0 [16388.319508] [] ? kthread_create_on_node+0x180/0x180 [16388.319510] [] ? ret_from_fork+0x7c/0xb0 [16388.319511] [] ? kthread_create_on_node+0x180/0x180 [16388.319514] INFO: task btrfs-transacti:12042 blocked for more than 180 seconds. [16388.319515] Tainted: P O 3.14.13_btrfs+_BFS_test27_integration #2 [16388.319515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [16388.319516] btrfs-transacti D 88081ec13540 0 12042 2 0x0008 [16388.319517] 88009c7adb20 0046 88040d84ca68 [16388.319519] a000 88061f284ba0 88009c7adfd8 81e11500 [16388.319520] 88061f284ba0 88061a21dea8 811b8c2d 8805fc919e00 [16388.319521] Call Trace: [16388.319524] [] ? kmem_cache_alloc_trace+0x14d/0x160 [16388.319526] [] ? cache_block_group+0x122/0x390 [16388.319527] [] ? prepare_to_wait_event+0xf0/0xf0 [16388.319529] [] ? find_free_extent+0x95a/0xdb0 [16388.319530] [] ? btrfs_reserve_extent+0x69/0x150 [16388.319532] [] ? __btrfs_prealloc_file_range+0xe8/0x380 [16388.319534] [] ? btrfs_write_dirty_block_groups+0x642/0x6d0 [16388.319535] [] ? commit_cowonly_roots+0x173/0x221 [16388.319537] [] ? btrfs_commit_transaction+0x509/0xa30 [16388.319538] [] ? start_transaction+0x8b/0x5b0 [16388.319539] [] ? transaction_kthread+0x1d5/0x240 [16388.319540] [] ? btrfs_cleanup_transaction+0x560/0x560 [16388.319541] [] ? kthread+0xca/0xe0 [16388.319543] [] ? kthread_create_on_node+0x180/0x180 [16388.319544] [] ? ret_from_fork+0x7c/0xb0 [16388.319545] [] ? kthread_create_on_node+0x180/0x180 but the previous error message I saw seemed related to http://www.spinics.net/lists/linux-btrfs/msg35145.html and http://www.spinics.net/lists/linux-btrfs/msg33628.html Be aware that this kernel is a highly patched up 3.14.13 with latest Btrfs integration/for-linus branch - up to commit abdd2e80a57e5f7278f47913315065f0a3d78d20 Author: Filipe Manana Date: Tue Jun 24 17:46:58 2014 +0100 Btrfs: fix crash when starting transaction except Btrfs: fix broken free space cache after the system crashed (commit e570fd27f2c5d7eac3876bccf99e9838d7f911a3) which doesn't seem to apply cleanly for me. So it's not really representative when looking for other kernel internals but should show almost 100% similar behavior like a 3.15+ kernel with latest integration/for-linus branch. Currently I have no reason and plans to migrate to 3.15 since I'm planning to wait for it to mature a little bit more. Root is on Btrfs with lzo compression on an Intel SSD. Last time this happened I had the partition formatted with zlib/gzip compression. This time it's with lzo and also happening. The problem is that rsync can't be killed off - so the load will increase over time, only option being to reboot via Magic SYSRQ Key: ps aux | grep rsync root 12233 0.1 0.0 33880 4776 pts/0D+ 22:20 0:03 rsync -aiP --delete --inplace --stats /home/matt/news/ /bak/matt/news/ root 12234 0.0 0.0 0 0 pts/0Z+ 22:20 0:00 [rsync] root 12579 0.0 0.0 30380 1376 pts/0S+ 23:20 0:00 rsync -ai --delete --inplace --stats /home/matt/.links/ /bak/matt/.links/ root 12580 0.0 0.0 30352 940 pts/0D+ 23:20 0:00 rsync -ai --delete --inplace --stats /home/matt/.links/ /bak/matt/.links/ root 12581 0.0 0.0 30352 280 pts/0S+ 23:20 0:00 rsync -ai --delete --inplace --stats /home/matt/.links/ /bak/matt/.links/ root 12583 0.0 0.0 18916 1000 pts/1S+ 23:21 0:00 grep --color=auto rsync /bak is a newly created partition which a few days ago just
Some impossible benchmark results with LUKS - what am I missing?
>Hey folks, >I have been experimenting with btrfs on a home NAS box, and have some >benchmark \ >results that I don't believe. I'm hoping someone here has some insight on what >I've \ >missed that would cause such a result. >The short version: I'm seeing roughly a 36% (on write) to 55% (on read) >performance \ >*improvement* when using btrfs on LUKS containers over btrfs on raw disks. >This \ >should not be the case! >The test setup: >My test file is a large file that I previously generated from /dev/urandom. I >saw \ >similar results using /dev/zero as input, as well as a repeated copy of the >whole \ >Ubuntu 14.04 ISO (i.e., real-ish data). >My calculated MB/s numbers are based on the 'real' time in the output. >Live-booted Ubuntu 14.04 (nightly from 3/11/14, kernel 3.13.5) >4x 4TB WD Red drives in standard (no RAID) configuration >i3-4130 CPU (has AES-NI for accelerated encryption) >Default BTRFS options from the disk gui, always raid10. >Tested configurations: >Raw: btrfs raid10 on 4x raw drives >Encrypted: btrfs raid10 on 4 separate LUKS containers on 4x raw drives >(default LUKS \ >options) >Read command: $ time sh -c "dd if=test.out of=/dev/null bs=4k" >Raw: >512+0 records in >512+0 records out >2097152 bytes (21 GB) copied, 149.841 s, 140 MB/s >real 2m29.849s >user 0m2.764s >sys 0m7.064s >= 133.467690809 MB/s >Encrypted: >$ time sh -c "dd if=test2.out of=/dev/null bs=4k" >512+0 records in >512+0 records out >2097152 bytes (21 GB) copied, 96.6127 s, 217 MB/s >real 1m36.627s >user 0m3.331s >sys 0m9.518s >= 206.981485506 MB/s >Read+Write: $ time sh -c "dd if=test2.out of=test20grand.out bs=4k && sync" >Raw: >512+0 records in >512+0 records out >2097152 bytes (21 GB) copied, 227.069 s, 92.4 MB/s >real 3m49.701s >user 0m2.854s >sys 0m15.936s >= 87.069712365 MB/s >Encrypted: >512+0 records in >512+0 records out >2097152 bytes (21 GB) copied, 167.823 s, 125 MB/s >real 2m48.784s >user 0m2.955s >sys 0m17.956s >= 118.494644042 MB/s >Any ideas what could explain this result? >One coworker suggested that perhaps the LUKS container was returning from >'sync' \ >early, before actually finishing the write to disk. This would seem to violate >the \ >assumptions of the 'sync' primitive, so I have my doubts. >I'm also interested in learning how I can reliably benchmark the real cost of >running \ >full-disk encryption under btrfs on my system. >Thanks! >Evan Powell | Technical Lead >epow...@zenoss.com Hi Evan, just to be sure: did you do a echo 3 > /proc/sys/vm/drop_caches before *each* test ? also try reversing the order of tests like so: Encrypted RAW whether that makes a difference It would also be interesting to see the output of cryptsetup luksDump and Version: * Cipher name:* Cipher mode:* Hash spec:* Interesting find indeed ! Thanks for sharing the finding I'm currently using Btrfs on an encrypted system partition (without AES-NI supported hardware) and things already feel and are faster than with ext4 we need to find out what this magic is =) Kind Regards Thanks Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: One random read streaming is fast (~1200MB/s), but two or more are slower (~750MB/s)?
Hey Josef, Were you able to try this multi-thread test on any more drives? I did a test with 12, 6, 3, and 1 drive. And, it looks like I see the multi-thread speed reduces, as the number of drives in the raid goes up. Like this: - 50% speed reduction with 2 threads on 12 drives - 25% speed reduction with 2 threads on 6 drives - 10% speed reduction with 2 threads on 3 drives - 5% speed reduction with 2 threads on 1 drive I only have 12 slots on my HBA card, but I wonder if 24 drives would reduce the speed to 25% with 2 threads? Matt make btrfs fs... ___ 12 drives... mkfs.btrfs -f -d raid6 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl 6 drives... mkfs.btrfs -f -d raid6 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf 3 drives... mkfs.btrfs -f -d raid5 /dev/sda /dev/sdb /dev/sdc 1 drive... mkfs.btrfs -f /dev/sda mount /dev/sda /tmp/btrfs_test/ ___ make zero files... ___ kura1 ~ # for j in {1..2} ; do dd if=/dev/zero of=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M count=1 conv=fdatasync & done ___ === btrfs raid6 on 12 drives with 2 threads = ~650MB/s ___ kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M & done vm.drop_caches = 1 1048576 bytes (10 GB) copied, 31.0431 s, 338 MB/s 1048576 bytes (10 GB) copied, 31.2235 s, 336 MB/s kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M & done 1048576 bytes (10 GB) copied, 29.869 s, 351 MB/s 1048576 bytes (10 GB) copied, 30.5561 s, 343 MB/s ___ btrfs raid6 on 12 drives with 1 thread = ~1100MB/s ___ kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M & done 1048576 bytes (10 GB) copied, 9.69881 s, 1.1 GB/s kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M & done 1048576 bytes (10 GB) copied, 9.56475 s, 1.1 GB/s ___ == btrfs raid6 on 6 drives with 2 thread = ~500MB/s ___ kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M & done 1048576 bytes (10 GB) copied, 41.3899 s, 253 MB/s 1048576 bytes (10 GB) copied, 41.6916 s, 252 MB/s kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M & done 1048576 bytes (10 GB) copied, 40.3178 s, 260 MB/s 1048576 bytes (10 GB) copied, 41.4087 s, 253 MB/s ___ btrfs raid6 on 6 drives with 1 thread = ~600MB/s ___ kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M & done 1048576 bytes (10 GB) copied, 17.5686 s, 597 MB/s kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M & done 1048576 bytes (10 GB) copied, 17.5396 s, 598 MB/s ___ == btrfs raid5 on 3 drives with 2 thread = ~300MB/s ___ kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M & done 1048576 bytes (10 GB) copied, 67.636 s, 155 MB/s 1048576 bytes (10 GB) copied, 70.1783 s, 149 MB/s kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M & done 1048576 bytes (10 GB) copied, 69.4945 s, 151 MB/s 1048576 bytes (10 GB) copied, 70.8279 s, 148 MB/s ___ btrfs raid5 on 3 drives with 1 thread = ~319MB/s ___ kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M & done 1048576 bytes (10 GB) copied, 32.8559 s, 319 MB/s kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M & done 1048576 bytes (10 GB) copied, 32.8483 s, 319 MB/s ___ == btrfs (no raid) on 1 drive with 2 thread = ~155MB/s ___ kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M & done 1048576 bytes (10 GB) copied, 134.982 s, 77.7 MB/s 1048576 bytes (10 GB) copied, 135.237 s, 77.5 MB/s kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M & done 1048576 bytes (10 GB) copied, 134.549 s, 77.9 MB/s 1048576 bytes (10 GB) copied, 135.293 s, 77.5 MB/s ___ btrfs (no raid) on 1 drive with 1 thread = ~162MB/s ___ kura1 btrfs_test # sysctl vm.drop_caches
Re: One random read streaming is fast (~1200MB/s), but two or more are slower (~750MB/s)?
Hey Josef, Thanks for looking into this further!That is about the same results that I was seeing, though I didn't test it with just one drive.. only with all 12 drives in my jbod. I will do a test with just one disk, and see if I also get the same results. Let me know if you also see the same results with multiple drives in your raid... Thanks, Matt On Thu, Apr 25, 2013 at 2:10 PM, Josef Bacik wrote: > On Thu, Apr 25, 2013 at 03:01:18PM -0600, Matt Pursley wrote: >> Ok, awesome, let me know how it goes.. I don't have the raid >> formatted to btrfs right now, but I could probably do that in about 30 >> minutes or so. >> > > Huh so I'm getting the full bandwidth, 120 mb/s with one thread and 60 mb/s > with > two threads. These are just cheap sata drives tho, I'll try and dig up a box > with 3 fusion cards for something a little closer to the speeds you are seeing > and see if that makes a difference. Thanks, > > Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: One random read streaming is fast (~1200MB/s), but two or more are slower (~750MB/s)?
Ok, awesome, let me know how it goes.. I don't have the raid formatted to btrfs right now, but I could probably do that in about 30 minutes or so. Thanks Josef, Matt On Thu, Apr 25, 2013 at 1:39 PM, Josef Bacik wrote: > On Thu, Apr 25, 2013 at 01:52:44PM -0600, Matt Pursley wrote: >> Hey Josef, >> >> Were you able to look into this any further? >> It's still pretty reproducible on my machine... >> > > Nope I've been tracking down random problems, I'll try it now. Thanks, > > Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: One random read streaming is fast (~1200MB/s), but two or more are slower (~750MB/s)?
Hey Josef, Were you able to look into this any further? It's still pretty reproducible on my machine... Thanks, Matt On Thu, Apr 18, 2013 at 2:58 PM, Josef Bacik wrote: > This is strange, and I can't see any reason why this would happen. I'll try > and > reproduce next week when I'm back from LSF. Thanks, > > Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: One random read streaming is fast (~1200MB/s), but two or more are slower (~750MB/s)?
Hey All, Here are the results of making and reading back a 13GB file on "mdraid6 + ext4", "mdraid6 + btrfs", and "btrfsraid6 + btrfs". Seems to show that: 1) "mdraid6 + ext4" can do ~1100 MB/s for these sequential reads with either one or two files at once. 2) "btrfsraid6 + btrfs" can do ~1100 MB/s for sequential reads with one file at a time, but only ~750 MB/s with two (or more). 3) "mdraid6 + btrfs" can only do ~750 MB/s for these sequential reads with either one or two files at once. So, seems like the speed drop is related more to the btrfs files system, then the experimental raid. Although it is interesting that btrfs can only do the full ~1100 MB/s with a single file on the btrfsraid6, but not mdraid6. Anyway, just some more info and reproducible results. I have also opened a ticket in bugzilla.kernel.org for this issue here... https://bugzilla.kernel.org/show_bug.cgi?id=56771 Thanks, Matt ___ mdraid6 + ext4 ___ kura1 / # mount | grep -i /var/data /dev/md0 on /var/data type ext4 (rw) kura1 / # cat /proc/mdstat Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear] [multipath] md0 : active raid6 sdm[11] sdl[10] sdk[9] sdj[8] sdi[7] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1] sdb[0] 29302650880 blocks super 1.2 level 6, 512k chunk, algorithm 2 [12/12] [] [>] resync = 0.0% (2731520/2930265088) finish=47268.1min speed=1031K/sec unused devices: ## Create two 13GB testfiles... kura1 / # sysctl vm.drop_caches=1 ; dd if=/dev/zero of=/var/data/persist/testfile1 bs=640k count=2 conv=fdatasync vm.drop_caches = 1 2+0 records in 2+0 records out 1310720 bytes (13 GB) copied, 47.27 s, 277 MB/s kura1 / # sysctl vm.drop_caches=1 ; dd if=/dev/zero of=/var/data/persist/testfile2 bs=640k count=2 conv=fdatasync vm.drop_caches = 1 2+0 records in 2+0 records out 1310720 bytes (13 GB) copied, 47.0237 s, 279 MB/s ## Read back one testfile... ~1300 MB/s kura1 / # sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k vm.drop_caches = 1 2+0 records in 2+0 records out 1310720 bytes (13 GB) copied, 10.3469 s, 1.3 GB/s kura1 / # sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k vm.drop_caches = 1 2+0 records in 2+0 records out 1310720 bytes (13 GB) copied, 10.0073 s, 1.3 GB/s kura1 / # sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k vm.drop_caches = 1 2+0 records in 2+0 records out 1310720 bytes (13 GB) copied, 10.69 s, 1.2 GB/s ## Read back the two testfiles at the same time.. ~1100MB/s kura1 / # (sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k) & (sysctl vm.drop_caches=1 ; dd of=//dev/null if=/var/data/persist/testfile2 bs=640k) & wait vm.drop_caches = 1 vm.drop_caches = 1 2+0 records in 2+0 records out 1310720 bytes (13 GB) copied, 24.4988 s, 535 MB/s 2+0 records in 2+0 records out 1310720 bytes (13 GB) copied, 24.591 s, 533 MB/s kura1 / # (sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k) & (sysctl vm.drop_caches=1 ; dd of=//dev/null if=/var/data/persist/testfile2 bs=640k) & wait vm.drop_caches = 1 vm.drop_caches = 1 2+0 records in 2+0 records out 1310720 bytes (13 GB) copied, 24.7013 s, 531 MB/s 2+0 records in 2+0 records out 1310720 bytes (13 GB) copied, 24.7016 s, 531 MB/s kura1 / # (sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k) & (sysctl vm.drop_caches=1 ; dd of=//dev/null if=/var/data/persist/testfile2 bs=640k) & wait vm.drop_caches = 1 vm.drop_caches = 1 2+0 records in 2+0 records out 1310720 bytes (13 GB) copied, 24.5512 s, 534 MB/s 2+0 records in 2+0 records out 1310720 bytes (13 GB) copied, 24.8276 s, 528 MB/s ___ mdraid6 + btrfs ___ kura1 ~ # mount | grep -i /var/data /dev/md0 on /var/data type btrfs (rw,noatime) kura1 ~ # cat /proc/mdstat Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear] [multipath] md0 : active raid6 sdm[11] sdl[10] sdk[9] sdj[8] sdi[7] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1] sdb[0] 29302650880 blocks super 1.2 level 6, 512k chunk, algorithm 2 [12/12] [] [>] resync = 0.0% (1917184/2930265088) finish=44415.7min speed=1098K/sec unused devices: kura1 ~ # btrfs filesystem show failed to open /dev/sr0: No medium found Label: none uuid: 5eb756b5-03a1-4d06-8e91-0f683a763a88 Total devices 1 FS bytes used 448.00KB devid1 size 27.29TB used 2.04GB path /dev/md0 Label: none uuid: 4546715c-8948-42b3-b529-a1c9cd175c2e Total devices 12 FS bytes used 80.74GB devid 12 size 2.73TB used 9.35GB path /dev/sdm devid 11 size 2.73TB used 9.35GB path /dev/sdl devid 10 si
Re: One random read streaming is fast (~1200MB/s), but two or more are slower (~750MB/s)?
On Tue, Apr 16, 2013 at 11:55 PM, Sander wrote: > Matt Pursley wrote (ao): >> I have an LSI HBA card (LSI SAS 9207-8i) with 12 7200rpm SAS drives >> attached. When it's formated with mdraid6+ext4 I get about 1200MB/s >> for multiple streaming random reads with iozone. With btrfs in >> 3.9.0-rc4 I can also get about 1200MB/s, but only with one stream at a >> time. > > Just curious, is that btrfs on top of mdraid6, or is this experimental > btrfs raid6 without md? This is the "experimental btrfs raid6 without md". But, I did do a "mdraid6 with btrfs" test last night... and with that setup I only get the ~750MB/s result.. even with just one thread/stream... I will flip the system back to "btrfsraid6+btrfs" today to verify that I still get the full 1200MB/s with one stream/thread and ~750MB/s with two or more streams/threads with that setup... Thanks, Matt ___ mdraid6+btrfs_64GBRam_80files # sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile bs=640k vm.drop_caches = 1 2+0 records in 2+0 records out 1310720 bytes (13 GB) copied, 18.2109 s, 720 MB/s ___ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
One random read streaming is fast (~1200MB/s), but two or more are slower (~750MB/s)?
Hi All, I have an LSI HBA card (LSI SAS 9207-8i) with 12 7200rpm SAS drives attached. When it's formated with mdraid6+ext4 I get about 1200MB/s for multiple streaming random reads with iozone. With btrfs in 3.9.0-rc4 I can also get about 1200MB/s, but only with one stream at a time. As soon as I add a second (or more), the speed will drop to about 750MB/s. If I add more streams (10, 20, etc), the total throughput stays at around 750MB/s. I only see the full 1200MB/s in btrfs when I'm running a single read at a time (e.g. sequential reads with dd, random reads with iozone, etc). This feel like a bug or mis-configuration on my system. As if can read at the full speed, but just only with one stream running at a time. The options I have tried varying are "-l 64k" with mkfs.btrfs, and "-o thread_pool=16" when mounting. But, neither of those options seem to change the behaviour. Anyone know any reasons why I would see the speed drop when going from one to more then one stream at a time with btrfs raid6? We would like to use btrfs (mostly for snapshots), but we do need to get the full 1200MB/s streaming speeds too.. Thanks, Matt ___ Here's some example output.. Single thread = ~1.1GB/s _ kura1 persist # sysctl vm.drop_caches=1 ; dd if=/dev/zero of=/var/data/persist/testfile bs=640k count=2 vm.drop_caches = 1 2+0 records in 2+0 records out 1310720 bytes (13 GB) copied, 7.14139 s, 1.8 GB/s kura1 persist # sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile bs=640k vm.drop_caches = 1 2+0 records in 2+0 records out 1310720 bytes (13 GB) copied, 11.2666 s, 1.2 GB/s kura1 persist # sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile bs=640k vm.drop_caches = 1 2+0 records in 2+0 records out 1310720 bytes (13 GB) copied, 11.5005 s, 1.1 GB/s 1 thread = ~1000MB/s ... ___ kura1 scripts # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd of=/dev/null if=/var/data/persist/testfile_$j bs=640k ; done vm.drop_caches = 1 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 6.52018 s, 1.0 GB/s kura1 scripts # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd of=/dev/null if=/var/data/persist/testfile_$j bs=640k ; done vm.drop_caches = 1 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 6.55731 s, 999 MB/s ___ 2 threads = ~750MB/s combined... ___ # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd of=/dev/null if=/var/data/persist/testfile_$j bs=640k & done vm.drop_caches = 1 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 17.5068 s, 374 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 17.7599 s, 369 MB/s ___ 20 threads = ~750MB/s combined... ___ # sysctl vm.drop_caches=1 ; for j in {1..20} ; do dd of=/dev/null if=/var/data/persist/testfile_$j bs=640k & done vm.drop_caches = 1 kura1 scripts # 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 168.223 s, 39.0 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 168.275 s, 38.9 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 169.466 s, 38.7 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 169.606 s, 38.6 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 170.503 s, 38.4 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 170.629 s, 38.4 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 170.633 s, 38.4 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 170.744 s, 38.4 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 170.844 s, 38.4 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 170.896 s, 38.3 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 171.027 s, 38.3 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 171.135 s, 38.3 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 171.389 s, 38.2 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 171.414 s, 38.2 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 171.674 s, 38.2 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 171.897 s, 38.1 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 171.956 s, 38.1 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 171.995 s, 38.1 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 172.044 s, 38.1 MB/s 1+0 records in 1+0 records out 655360 bytes (6.6 GB) copied, 172.08 s, 38.1 MB/s ### Similar results with random reads in iozone... 1 thread = ~1000MB/s _ kura1 scripts # for j in {1..1} ; do sy
Btrfs and more compression algorithms
Hi Chris, Hi Josef, Hi Btrfs-List and all other Btrfs-devs that I've forgot, is there a chance we'll see a xz file-compression support in Btrfs anytime soon ? I'm sure folks have been waiting for additional compression support besides gzip and lzo (bzip2 seems out of question due to its slowness, there's pbzip2 but that's not included in the kernel). This would be a really nice bonus due to the processors getting faster and SSD usage is more and more widespread - add an efficient implementation and we would have a fast, extremely efficient and feature-rich filesystem. My current situation is that several of my harddrives are almost completely full - even with forced gzip-compression - so I thought I'd asked whether there was any change in the near future ahead. There's fusecompress but that probably wouldn't end up being as stable as a btrfs with xz/lzma-support. Thanks for your consideration and your work on Btrfs ! It got significantly more stable compared to the past :) (I use it mainly for some small backup hdds; a troublesome usage however is still suspending-to-ram/to-disk regularly and with that the partition [I have a dedicated partition for the portage-tarball of Gentoo Linux] where the filesystem seems to take some damage where it can't be written to anymore via rsync (or other programs). The bash session hangs (and nothing gets written to the partition). Running scrub revealed no issues. I haven't had a chance to test it yet with the new btrfs-progs - haven't suspended meanwhile) Kind Regards Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 0/3] drivers/staging: zcache: dynamic page cache/swap compression
On Wed, Feb 16, 2011 at 1:27 AM, Dan Magenheimer wrote: >> -Original Message- >> From: Matt [mailto:jackdac...@gmail.com] >> Sent: Tuesday, February 15, 2011 5:12 PM >> To: Minchan Kim >> Cc: Dan Magenheimer; gre...@suse.de; Chris Mason; linux- >> ker...@vger.kernel.org; linux...@kvack.org; ngu...@vflare.org; linux- >> bt...@vger.kernel.org; Josef Bacik; Dan Rosenberg; Yan Zheng; >> mi...@cn.fujitsu.com; Li Zefan >> Subject: Re: [PATCH V2 0/3] drivers/staging: zcache: dynamic page >> cache/swap compression >> >> On Mon, Feb 14, 2011 at 4:35 AM, Minchan Kim >> wrote: >> > On Mon, Feb 14, 2011 at 10:29 AM, Matt wrote: >> >> On Mon, Feb 14, 2011 at 1:24 AM, Matt wrote: >> >>> On Mon, Feb 14, 2011 at 12:08 AM, Matt >> wrote: >> >>>> On Wed, Feb 9, 2011 at 1:03 AM, Dan Magenheimer >> >>>> wrote: >> >>>> [snip] >> >>>>> >> >>>>> If I've missed anything important, please let me know! >> >>>>> >> >>>>> Thanks again! >> >>>>> Dan >> >>>>> >> >>>> >> >>>> Hi Dan, >> >>>> >> >>>> thank you so much for answering my email in such detail ! >> >>>> >> >>>> I shall pick up on that mail in my next email sending to the >> mailing list :) >> >>>> >> >>>> >> >>>> currently I've got a problem with btrfs which seems to get >> triggered >> >>>> by cleancache get-operations: >> >>>> >> >>>> >> >>>> Feb 14 00:37:19 lupus kernel: [ 2831.297377] device fsid >> >>>> 354120c992a00761-5fa07d400126a895 devid 1 transid 7 >> >>>> /dev/mapper/portage >> >>>> Feb 14 00:37:19 lupus kernel: [ 2831.297698] btrfs: enabling disk >> space caching >> >>>> Feb 14 00:37:19 lupus kernel: [ 2831.297700] btrfs: force lzo >> compression >> >>>> Feb 14 00:37:19 lupus kernel: [ 2831.315844] zcache: created >> ephemeral >> >>>> tmem pool, id=3 >> >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853188] BUG: unable to handle >> >>>> kernel paging request at 01400050 >> >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853219] IP: >> [] >> >>>> btrfs_encode_fh+0x2b/0x120 >> >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853242] PGD 0 >> >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853251] Oops: [#1] >> PREEMPT SMP >> >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853275] last sysfs file: >> >>>> /sys/devices/platform/coretemp.3/temp1_input >> >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853295] CPU 4 >> >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853303] Modules linked in: >> radeon >> >>>> ttm drm_kms_helper cfbcopyarea cfbimgblt cfbfillrect ipt_REJECT >> >>>> ipt_LOG xt_limit xt_tcpudp xt_state nf_nat_irc nf_conntrack_irc >> >>>> nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 >> nf_conntrack_ftp >> >>>> iptable_filter ipt_addrtype xt_DSCP xt_dscp xt_iprange ip_tables >> >>>> ip6table_filter xt_NFQUEUE xt_owner xt_hashlimit xt_conntrack >> xt_mark >> >>>> xt_multiport xt_connmark nf_conntrack xt_string ip6_tables >> x_tables >> >>>> it87 hwmon_vid coretemp snd_seq_dummy snd_seq_oss >> snd_seq_midi_event >> >>>> snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss >> snd_hda_codec_hdmi >> >>>> snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep >> snd_pcm >> >>>> snd_timer snd soundcore i2c_i801 wmi e1000e shpchp snd_page_alloc >> >>>> libphy e1000 scsi_wait_scan sl811_hcd ohci_hcd ssb usb_storage >> >>>> ehci_hcd [last unloaded: tg3] >> >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853682] >> >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853690] Pid: 11394, comm: >> >>>> btrfs-transacti Not tainted 2.6.37-plus_v16_zcache #4 FMP55/ipower >> >>>> G3710 >> >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853725] RIP: >> >>>> 0010:[] [] >> >>>> btrfs_encode_fh+0x2b/0x120 >> >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853751] RSP: >> >>>> 0018:880129a11b00 EFLAGS: 00010246 >> >>
Re: [PATCH V2 0/3] drivers/staging: zcache: dynamic page cache/swap compression
On Mon, Feb 14, 2011 at 4:35 AM, Minchan Kim wrote: > On Mon, Feb 14, 2011 at 10:29 AM, Matt wrote: >> On Mon, Feb 14, 2011 at 1:24 AM, Matt wrote: >>> On Mon, Feb 14, 2011 at 12:08 AM, Matt wrote: >>>> On Wed, Feb 9, 2011 at 1:03 AM, Dan Magenheimer >>>> wrote: >>>> [snip] >>>>> >>>>> If I've missed anything important, please let me know! >>>>> >>>>> Thanks again! >>>>> Dan >>>>> >>>> >>>> Hi Dan, >>>> >>>> thank you so much for answering my email in such detail ! >>>> >>>> I shall pick up on that mail in my next email sending to the mailing list >>>> :) >>>> >>>> >>>> currently I've got a problem with btrfs which seems to get triggered >>>> by cleancache get-operations: >>>> >>>> >>>> Feb 14 00:37:19 lupus kernel: [ 2831.297377] device fsid >>>> 354120c992a00761-5fa07d400126a895 devid 1 transid 7 >>>> /dev/mapper/portage >>>> Feb 14 00:37:19 lupus kernel: [ 2831.297698] btrfs: enabling disk space >>>> caching >>>> Feb 14 00:37:19 lupus kernel: [ 2831.297700] btrfs: force lzo compression >>>> Feb 14 00:37:19 lupus kernel: [ 2831.315844] zcache: created ephemeral >>>> tmem pool, id=3 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853188] BUG: unable to handle >>>> kernel paging request at 01400050 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853219] IP: [] >>>> btrfs_encode_fh+0x2b/0x120 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853242] PGD 0 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853251] Oops: [#1] PREEMPT SMP >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853275] last sysfs file: >>>> /sys/devices/platform/coretemp.3/temp1_input >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853295] CPU 4 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853303] Modules linked in: radeon >>>> ttm drm_kms_helper cfbcopyarea cfbimgblt cfbfillrect ipt_REJECT >>>> ipt_LOG xt_limit xt_tcpudp xt_state nf_nat_irc nf_conntrack_irc >>>> nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp >>>> iptable_filter ipt_addrtype xt_DSCP xt_dscp xt_iprange ip_tables >>>> ip6table_filter xt_NFQUEUE xt_owner xt_hashlimit xt_conntrack xt_mark >>>> xt_multiport xt_connmark nf_conntrack xt_string ip6_tables x_tables >>>> it87 hwmon_vid coretemp snd_seq_dummy snd_seq_oss snd_seq_midi_event >>>> snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_hda_codec_hdmi >>>> snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm >>>> snd_timer snd soundcore i2c_i801 wmi e1000e shpchp snd_page_alloc >>>> libphy e1000 scsi_wait_scan sl811_hcd ohci_hcd ssb usb_storage >>>> ehci_hcd [last unloaded: tg3] >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853682] >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853690] Pid: 11394, comm: >>>> btrfs-transacti Not tainted 2.6.37-plus_v16_zcache #4 FMP55/ipower >>>> G3710 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853725] RIP: >>>> 0010:[] [] >>>> btrfs_encode_fh+0x2b/0x120 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853751] RSP: >>>> 0018:880129a11b00 EFLAGS: 00010246 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853767] RAX: 00ff >>>> RBX: 88014a1ce628 RCX: >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853788] RDX: 880129a11b3c >>>> RSI: 880129a11b70 RDI: 0006 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853808] RBP: 0140 >>>> R08: 8133eef0 R09: 880129a11c68 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853829] R10: 0001 >>>> R11: 0001 R12: 88014a1ce780 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853849] R13: 88021fefc000 >>>> R14: 88021fef9000 R15: >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853870] FS: >>>> () GS:8800bf50() >>>> knlGS: >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853894] CS: 0010 DS: ES: >>>> CR0: 8005003b >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853911] CR2: 01400050 >>>> CR3: 01c27000 CR4: 06e0 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853932] DR0: >>>> DR1: 00
Re: [PATCH V2 0/3] drivers/staging: zcache: dynamic page cache/swap compression
On Mon, Feb 14, 2011 at 8:59 PM, Matt wrote: > On Mon, Feb 14, 2011 at 1:29 AM, Matt wrote: >> On Mon, Feb 14, 2011 at 1:24 AM, Matt wrote: >>> On Mon, Feb 14, 2011 at 12:08 AM, Matt wrote: >>>> On Wed, Feb 9, 2011 at 1:03 AM, Dan Magenheimer >>>> wrote: >>>> [snip] >>>>> >>>>> If I've missed anything important, please let me know! >>>>> >>>>> Thanks again! >>>>> Dan >>>>> >>>> >>>> Hi Dan, >>>> >>>> thank you so much for answering my email in such detail ! >>>> >>>> I shall pick up on that mail in my next email sending to the mailing list >>>> :) >>>> >>>> >>>> currently I've got a problem with btrfs which seems to get triggered >>>> by cleancache get-operations: >>>> >>>> >>>> Feb 14 00:37:19 lupus kernel: [ 2831.297377] device fsid >>>> 354120c992a00761-5fa07d400126a895 devid 1 transid 7 >>>> /dev/mapper/portage >>>> Feb 14 00:37:19 lupus kernel: [ 2831.297698] btrfs: enabling disk space >>>> caching >>>> Feb 14 00:37:19 lupus kernel: [ 2831.297700] btrfs: force lzo compression >>>> Feb 14 00:37:19 lupus kernel: [ 2831.315844] zcache: created ephemeral >>>> tmem pool, id=3 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853188] BUG: unable to handle >>>> kernel paging request at 01400050 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853219] IP: [] >>>> btrfs_encode_fh+0x2b/0x120 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853242] PGD 0 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853251] Oops: [#1] PREEMPT SMP >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853275] last sysfs file: >>>> /sys/devices/platform/coretemp.3/temp1_input >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853295] CPU 4 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853303] Modules linked in: radeon >>>> ttm drm_kms_helper cfbcopyarea cfbimgblt cfbfillrect ipt_REJECT >>>> ipt_LOG xt_limit xt_tcpudp xt_state nf_nat_irc nf_conntrack_irc >>>> nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp >>>> iptable_filter ipt_addrtype xt_DSCP xt_dscp xt_iprange ip_tables >>>> ip6table_filter xt_NFQUEUE xt_owner xt_hashlimit xt_conntrack xt_mark >>>> xt_multiport xt_connmark nf_conntrack xt_string ip6_tables x_tables >>>> it87 hwmon_vid coretemp snd_seq_dummy snd_seq_oss snd_seq_midi_event >>>> snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_hda_codec_hdmi >>>> snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm >>>> snd_timer snd soundcore i2c_i801 wmi e1000e shpchp snd_page_alloc >>>> libphy e1000 scsi_wait_scan sl811_hcd ohci_hcd ssb usb_storage >>>> ehci_hcd [last unloaded: tg3] >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853682] >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853690] Pid: 11394, comm: >>>> btrfs-transacti Not tainted 2.6.37-plus_v16_zcache #4 FMP55/ipower >>>> G3710 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853725] RIP: >>>> 0010:[] [] >>>> btrfs_encode_fh+0x2b/0x120 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853751] RSP: >>>> 0018:880129a11b00 EFLAGS: 00010246 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853767] RAX: 00ff >>>> RBX: 88014a1ce628 RCX: >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853788] RDX: 880129a11b3c >>>> RSI: 880129a11b70 RDI: 0006 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853808] RBP: 0140 >>>> R08: 8133eef0 R09: 880129a11c68 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853829] R10: 0001 >>>> R11: 0001 R12: 88014a1ce780 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853849] R13: 88021fefc000 >>>> R14: 88021fef9000 R15: >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853870] FS: >>>> () GS:8800bf50() >>>> knlGS: >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853894] CS: 0010 DS: ES: >>>> CR0: 8005003b >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853911] CR2: 01400050 >>>> CR3: 01c27000 CR4: 06e0 >>>> Feb 14 00:39:20 lupus kernel: [ 2951.853932] DR0: >>>> DR1:
Re: 2.6.37: Multi-second I/O latency while untarring
On Fri, Feb 11, 2011 at 3:08 PM, Andrew Lutomirski wrote: > As I type this, I have an ssh process running that's dumping data into > a fifo at high speed (maybe 500Mbps) and a tar process that's > untarring from the same fifo onto btrfs. The btrfs fs is mounted -o > space_cache,compress. This machine has 8GB ram, 8 logical cores, and > a fast (i7-2600) CPU, so it's not an issue with the machine struggling > under load. > > Every few tens of seconds, my system stalls for several seconds. > These stalls cause keyboard input to be lost, firefox to hang, etc. > > Setting tar's ionice priority to best effort / 7 or to idle makes no > difference. > > ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes > no difference. > > max_sectors_kb = 64 in addition to the above doesn't help either. > > latencytop shows regular instances of 2-7 *second* latency, variously > in sync_page, start_transaction, btrfs_start_ordered_extent, and > do_get_write_access (from jbd2 on my ext4 root partition). > > echo 3 >drop_caches gave me 7 GB free RAM. I still had stalls when > 4-5 GB were still free (so it shouldn't be a problem with important > pages being evicted). > > In case it matters, all of my partitions are on LVM on dm-crypt, but > this machine has AES-NI so the overhead from that should be minimal. > In fact, overall CPU usage is only about 10%. > > What gives? I thought this stuff was supposed to be better on modern kernels. > > --Andy > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Hi Andrew, you could try the following patch to speed up dm-crypt: https://patchwork.kernel.org/patch/365542/ I'm using it on top of a highly-patched 2.6.37 kernel not sure if exactly that version was included in 2.6.38 there are some additional handles to speed up dm: e.g. PCRYCONFIG_CRYPTO_PCRYPT=y Regards Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs BUG during Ceph cosd open() syscall
heavy writes as well Jan 5 16:56:46 linuscs101 kernel: [ 3666.496742] [ cut here ] Jan 5 16:56:46 linuscs101 kernel: [ 3666.496754] WARNING: at fs/btrfs/inode.c:2143 btrfs_orphan_commit_root+0xb0/0xc0() Jan 5 16:56:46 linuscs101 kernel: [ 3666.496756] Hardware name: ProLiant DL380 G5 Jan 5 16:56:46 linuscs101 kernel: [ 3666.496758] Modules linked in: nfsd exportfs nfs lockd nfs_acl auth_rpcgss bonding sunrpc radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac usbhid lp shpchp ipmi_si i2c_algo_bit hid edac_core parport ipmi_msghandler serio_raw i5k_amb hpilo cciss fbcon tileblit font bitblit softcursor Jan 5 16:56:46 linuscs101 kernel: [ 3666.496788] Pid: 2764, comm: cosd Not tainted 2.6.37-ceph-client #1 Jan 5 16:56:46 linuscs101 kernel: [ 3666.496790] Call Trace: Jan 5 16:56:46 linuscs101 kernel: [ 3666.496797] [] warn_slowpath_common+0x7f/0xc0 Jan 5 16:56:46 linuscs101 kernel: [ 3666.496800] [] warn_slowpath_null+0x1a/0x20 Jan 5 16:56:46 linuscs101 kernel: [ 3666.496804] [] btrfs_orphan_commit_root+0xb0/0xc0 Jan 5 16:56:46 linuscs101 kernel: [ 3666.496807] [] commit_fs_roots+0xa1/0x140 Jan 5 16:56:46 linuscs101 kernel: [ 3666.496810] [] btrfs_commit_transaction+0x350/0x730 Jan 5 16:56:46 linuscs101 kernel: [ 3666.496816] [] ? autoremove_wake_function+0x0/0x40 Jan 5 16:56:46 linuscs101 kernel: [ 3666.496820] [] btrfs_mksubvol+0x363/0x380 Jan 5 16:56:46 linuscs101 kernel: [ 3666.496823] [] btrfs_ioctl_snap_create_transid+0xed/0x140 Jan 5 16:56:46 linuscs101 kernel: [ 3666.496826] [] btrfs_ioctl_snap_create+0xf7/0x140 Jan 5 16:56:46 linuscs101 kernel: [ 3666.496830] [] btrfs_ioctl+0x61f/0xa20 Jan 5 16:56:46 linuscs101 kernel: [ 3666.496834] [] ? fsnotify+0x1ea/0x320 Jan 5 16:56:46 linuscs101 kernel: [ 3666.496839] [] do_vfs_ioctl+0xa9/0x5a0 Jan 5 16:56:46 linuscs101 kernel: [ 3666.496842] [] sys_ioctl+0x81/0xa0 Jan 5 16:56:46 linuscs101 kernel: [ 3666.496847] [] system_call_fastpath+0x16/0x1b Jan 5 16:56:46 linuscs101 kernel: [ 3666.496850] ---[ end trace 2a6c3f752cfb5f1b ]--- Jan 5 17:07:45 linuscs101 kernel: [ 4325.723170] CPU 1 Jan 5 17:07:45 linuscs101 kernel: [ 4325.723210] Modules linked in: nfsd exportfs nfs lockd nfs_acl auth_rpcgss bonding sunrpc radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac usbhid lp shpchp ipmi_si i2c_algo_bit hid edac_core parport ipmi_msghandler serio_raw i5k_amb hpilo cciss fbcon tileblit font bitblit softcursor Jan 5 17:07:45 linuscs101 kernel: [ 4325.724006] Jan 5 17:07:45 linuscs101 kernel: [ 4325.724041] Pid: 2766, comm: cosd Tainted: GW 2.6.37-ceph-client #1 /ProLiant DL380 G5 Jan 5 17:07:45 linuscs101 kernel: [ 4325.724169] RIP: 0010:[] [] btrfs_truncate+0x510/0x530 Jan 5 17:07:45 linuscs101 kernel: [ 4325.724318] RSP: 0018:8803d7e1bd48 EFLAGS: 00010286 Jan 5 17:07:45 linuscs101 kernel: [ 4325.724397] RAX: ffe4 RBX: 8803dfaf1800 RCX: 880406ce7090 Jan 5 17:07:45 linuscs101 kernel: [ 4325.724493] RDX: RSI: ea000e17d288 RDI: 0206 Jan 5 17:07:45 linuscs101 kernel: [ 4325.724592] RBP: 8803d7e1bdd8 R08: 0783 R09: 8803d7e1bb28 Jan 5 17:07:45 linuscs101 kernel: [ 4325.724691] R10: ffe4 R11: 0001 R12: 8803dee49f00 Jan 5 17:07:45 linuscs101 kernel: [ 4325.724793] R13: 8803d5369c10 R14: 8803d5369a78 R15: 8803d5369d38 Jan 5 17:07:45 linuscs101 kernel: [ 4325.724899] FS: 7f77acfb6710() GS:8800cfc4() knlGS: Jan 5 17:07:45 linuscs101 kernel: [ 4325.725019] CS: 0010 DS: ES: CR0: 80050033 Jan 5 17:07:45 linuscs101 kernel: [ 4325.725096] CR2: 7f81cd5b8000 CR3: 0003dfad3000 CR4: 06e0 Jan 5 17:07:45 linuscs101 kernel: [ 4325.725195] DR0: DR1: DR2: Jan 5 17:07:45 linuscs101 kernel: [ 4325.725293] DR3: DR6: 0ff0 DR7: 0400 Jan 5 17:07:45 linuscs101 kernel: [ 4325.725392] Process cosd (pid: 2766, threadinfo 8803d7e1a000, task 8803dfaf8000) Jan 5 17:07:45 linuscs101 kernel: [ 4325.725549] 8803d5369d78 01da Jan 5 17:07:45 linuscs101 kernel: [ 4325.725695] 0fff d5369d38 1000 Jan 5 17:07:45 linuscs101 kernel: [ 4325.725841] 8803d5369aa8 8803d5369c10 8803d7e1bdc8 Jan 5 17:07:45 linuscs101 kernel: [ 4325.726039] [] vmtruncate+0x56/0x70 Jan 5 17:07:45 linuscs101 kernel: [ 4325.726113] [] btrfs_setattr+0x13e/0x2a0 Jan 5 17:07:45 linuscs101 kernel: [ 4325.726202] [] notify_change+0x170/0x2e0 Jan 5 17:07:45 linuscs101 kernel: [ 4325.726292] [] do_truncate+0x64/0xa0 Jan 5 17:07:45 linuscs101 kernel: [ 4325.726370] [] ? generic_permission+0x23/0xc0 Jan 5 17:07:45 linuscs101 kernel: [
Re: hunt for 2.6.37 dm-crypt+ext4 corruption?
On Thu, Jan 6, 2011 at 4:56 PM, Heinz Diehl wrote: > On 05.12.2010, Milan Broz wrote: > >> It still seems to like dmcrypt with its parallel processing is just >> trigger to another bug in 37-rc. > > To come back to this: my 3 systems (XFS filesystem) running the latest > dm-crypt-scale-to-multiple-cpus patch from Andi Kleen/Milan Broz have > not showed a single problem since 2.6.37-rc6 and above. No corruption any > longer, no freezes, nothing. The patch applies cleanly to 2.6.37, too, > and runs just fine. > > I blindly guess that my data corruption problem was related to something else > in the > 2.6.37-rc series up to -rc4/5. > > Since this patch is a significant improvement: any chance that it finally gets > merged into mainline/stable? > > Hi Heinz, I've been using this patch since 2.6.37-rc6+ with ext4 and xfs filesystems and haven't seen any corruptions since then (ext4 got "fixed" since 2.6.37-rc6, xfs showed no problems from the start) http://git.eu.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1449032be17abb69116dbc393f67ceb8bd034f92 (is the actual temporary fix for ext4) Regards Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Wed, Dec 15, 2010 at 8:25 PM, Matt wrote: > On Wed, Dec 15, 2010 at 8:16 PM, Andi Kleen wrote: >>> I have a question though: the deactivation of multiple page-io >>> submission support most likely only would affect bigger systems or >>> also desktop systems (like mine) ? >> >> I think this is not a final fix, just a workaround. >> The problem with the other path still really needs to be tracked down. >> >> -Andi >> >> -- >> a...@linux.intel.com -- Speaking for myself only. >> > > ok, > > thanks for the clarification > > Regards > > Matt > Sorry to spam the mailing lists again make that a Reported-and-Tested-by: Matthias Bayer (hope that's the correct way to write it) Regards Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Wed, Dec 15, 2010 at 8:16 PM, Andi Kleen wrote: >> I have a question though: the deactivation of multiple page-io >> submission support most likely only would affect bigger systems or >> also desktop systems (like mine) ? > > I think this is not a final fix, just a workaround. > The problem with the other path still really needs to be tracked down. > > -Andi > > -- > a...@linux.intel.com -- Speaking for myself only. > ok, thanks for the clarification Regards Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Mon, Dec 13, 2010 at 7:56 PM, Jon Nelson wrote: > On Sun, Dec 12, 2010 at 8:06 PM, Ted Ts'o wrote: >> On Sun, Dec 12, 2010 at 07:11:28AM -0600, Jon Nelson wrote: >>> I'm glad you've been able to reproduce the problem! If you should need >>> any further assistance, please do not hesitate to ask. >> >> This patch seems to fix the problem for me. (Unless the partition is >> mounted with mblk_io_submit.) >> >> Could you confirm that it fixes it for you as well? > > I believe I have applied the (relevant) inode.c changes to > bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc, rebuilt and begun testing. > Now at 28 passes without error, I think I can say that the patch > appears to resolve the issue. > > -- > Jon > Confirmed ! I'm running my box for 5+ hours right now with your patch applied in addition to Andi's/Milan's patch (http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-scale-to-multiple-CPUs.patch) , Ted, and can't see any indications of corruptions so far (while doing an emerge -e system) and doing everyday stuff. My /home partition (with ext4) is also still intact [which of course has a backup] so it seems to fix it for me, too so the corruption I was seeing was similar in a way to that of Jon You can add a Tested-by: Matthias Bayer Thanks a lot to everyone for your support ! :) I have a question though: the deactivation of multiple page-io submission support most likely only would affect bigger systems or also desktop systems (like mine) ? Regards Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Fri, Dec 10, 2010 at 2:38 AM, Chris Mason wrote: > Excerpts from Andi Kleen's message of 2010-12-09 18:16:16 -0500: >> > 512MB. >> > >> > 'free' reports 75MB, 419MB free. >> > >> > I originally noticed the problem on really real hardware (thinkpad >> > T61p), however. >> >> If you can easily reproduce it could you try a git bisect? > > Do we have a known good kernel? I looked back through the thread and > didn't see any reports where the postgres test on ext4 passed in this > config. > > -chris > Try a kernel before 5a87b7a5da250c9be6d757758425dfeaf8ed3179 from the tests I've done that one showed the least or no corruption if you count the empty /etc/env.d/03opengl as an artefact (I tested 3 commits in total) 1) 5a87b7a5da250c9be6d757758425dfeaf8ed3179 2) 1de3e3df917459422cb2aecac440febc8879d410 3) bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc 1 -> 3 (earlier -> later) Regards Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] hunt for 2.6.37 dm-crypt+ext4 corruption?
On Sun, Dec 5, 2010 at 2:24 PM, Theodore Tso wrote: > > On Dec 5, 2010, at 5:21 AM, Milan Broz wrote: >> >> Which kernel? 2.6.37-rc? >> >> Anyone seen this with 2.6.36 and the same dmcrypt patch? >> (All info I had is that is is stable with here.) >> >> It still seems to like dmcrypt with its parallel processing is just >> trigger to another bug in 37-rc. > > I've been using a kernel which is between 2.6.37-rc2 and -rc3 with a LUKS / > dm-crypt / LVM / ext4 setup for my primary file systems, and I haven't > observed any corruption for the last two weeks or so. It's on my todo list > to upgrade to top of Linus's tree, but perhaps this is a useful data point. > > As another thought, what version of GCC are people using who are having > difficulty? Could this perhaps be a compiler-related issue? > > -- Ted > > Hi Ted, to quote its output: gcc -v Using built-in specs. COLLECT_GCC=/usr/x86_64-pc-linux-gnu/gcc-bin/4.5.1/gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/4.5.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /var/tmp/portage/sys-devel/gcc-4.5.1-r1/work/gcc-4.5.1/configure --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.5.1 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.1/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.5.1 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.5.1/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.5.1/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.1/include/g++-v4 --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec --disable-fixed-point --with-ppl --with-cloog --enable-lto --enable-nls --without-included-gettext --with-system-zlib --disable-werror --enable-secureplt --enable-multilib --enable-libmudflap --disable-libssp --enable-esp --enable-libgomp --enable-cld --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/4.5.1/python --enable-checking=release --enable-java-awt=gtk --enable-objc-gc --enable-languages=c,c++,java,objc,obj-c++,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo Hardened 4.5.1-r1 p1.4, pie-0.4.5' Thread model: posix gcc version 4.5.1 (Gentoo Hardened 4.5.1-r1 p1.4, pie-0.4.5) output of emerge -p gcc: These are the packages that would be merged, in order: Calculating dependencies ... done! [ebuild R ] sys-devel/gcc-4.5.1-r1 USE="fortran gcj graphite gtk hardened lto mudflap (multilib) multislot nls nptl objc objc++ objc-gc openmp (-altivec) -bootstrap -build -doc (-fixed-point) (-libffi) (-n32) (-n64) -nocxx -nopie -nossp -test -vanilla" 0 kB and to be precise it's gcc 4.5.1 with some gentoo-specific fixes and fixes from upstream (4.5.2) [take a look at patchset 1.4], in my case it also has the --enable-esp functionality [hardened] which should include something like -D_FORTIFY_SOURCE=2, -fstack-protector-all and for linking/ldd: -Wl,-z,now -Wl,-z,relro (I don't know if the part with the linker and the fstack-protector is accurate) I'm adding below the output of mount of the system-partition of the system I was running the kernel on - where the [more observable] corruption was observed (checkout bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc) -> this output got generated while I mounted it from my working (no corruption observed) system with 2.6.36 kernel - I don't know if it's useful - just in case you might need it [forgot to post this in my last email] Thanks & Regards Matt [ 607.849644] EXT4-fs (dm-7): INFO: recovery required on readonly filesystem [ 607.849651] EXT4-fs (dm-7): write access will be enabled during recovery [ 609.559363] EXT4-fs (dm-7): orphan cleanup on readonly fs [ 609.559375] EXT4-fs (dm-7): ext4_orphan_cleanup: truncating inode 2238873 to 0 bytes [ 609.559493] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting unreferenced inode 2231865 [ 609.559531] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting unreferenced inode 2231870 [ 609.559553] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting unreferenced inode 2396001 [ 609.559588] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting unreferenced inode 2396036 [ 609.559610] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting unreferenced inode 2395699 [ 609.559675] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting unreferenced inode 2231859 [ 609.559695] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting unreferenced inode 2231868 [ 609.559715] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting unreferenced inode 2396696 [ 609.559736] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting unreferenced inode 2396697 [ 609.559755] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting unreferenced inode 2396699 [ 609.559775] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting unreferenced inode 2395948 [ 609.559809
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Sat, Dec 4, 2010 at 8:38 PM, Mike Snitzer wrote: > On Sat, Dec 04 2010 at 2:18pm -0500, > Matt wrote: > >> On Wed, Dec 1, 2010 at 10:23 PM, Mike Snitzer wrote: >> > Matt and Jon, >> > >> > If you'd be up to it: could you try testing your dm-crypt+ext4 >> > corruption reproducers against the following two 2.6.37-rc commits: >> > >> > 1) 1de3e3df917459422cb2aecac440febc8879d410 >> > then >> > 2) bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc >> > >> > Then, depending on results of no corruption for those commits, bonus >> > points for testing the same commits but with Andi and Milan's latest >> > dm-crypt cpu scalability patch applied too: >> > https://patchwork.kernel.org/patch/365542/ >> > >> > Thanks! >> > Mike >> > >> >> Hi Mike, >> >> it seems like there isn't even much testing to do: >> >> I tested all 3 commits / checkouts by re-compiling gcc which was/is >> the 2nd easy way to trigger this "corruption", compiling google's >> chromium (v9) and looking at the output/existance of gcc, g++ and >> eselect opengl list > > Can you be a bit more precise about what you're doing to reproduce? > What sequence? What (if any) builds are going in parallel? Etc. > >> so far everything went fine >> >> After that I used the new patch (v6 or pre-v6), before that I had to >> >> replace WQ_MEM_RECLAIM with WQ_RESCUER >> >> and, re-compiled the kernels >> >> shortly after I had booted up the system with the first kernel >> (http://git.eu.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5a87b7a5da250c9be6d757758425dfeaf8ed3179) >> the output of 'eselect opengl list' did show no opengl backend >> selected >> >> so it seems to manifest itself even earlier (ext4: call >> mpage_da_submit_io() from mpage_da_map_blocks()) even if only subtly >> and over time - >> I'm still currently running that kernel and posting from it & having tests >> run > > OK. > >> I'm not sure if it's even a problem with ext4 - I haven't had the time >> to test with XFS yet - maybe it's also happening with that so it more >> likely would be dm-core, like Milan suspected >> (http://marc.info/?l=linux-kernel&m=129123636223477&w=2) :( > > It'd be interesting to try to reproduce with that same kernel but using > XFS. I'll check with Milan on what he thinks would be the best next > steps. Ideally we'll be able to reproduce your results to aid in > pinpointing the issue. I think Milan will be trying to do so shortly > (if he hasn't started already -- using gentoo emerge, etc). > >> even though most of the time it's compiling I don't need to do much - >> I need the box for work so if my time allows next tests would be next >> weekend and I'm back to my other partition >> >> I really do hope that this bugger can be nailed down ASAP - I like the >> improvements made in 2.6.37 but without the dm-crypt multi-cpu patch >> it's only half the "fun" ;) > > Sure, we'll need to get to the bottom of this before we can have > confidence sending the dm-crypt cpu scalability patch upstream. > > Thanks for your testing, > Mike > OK, before bed time I found some kind of corruption: running kernel is from commit: bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc the messages might be overseen - so they're difficult to notice: steps: 1) bootup 2) (might need to re-install graphics driver due to driver switch, in this case magic properties [or what's its name] didn't change so the kernel module still worked) 3) firing up 2 xterms, xload, xclock, gksu -> terminal -> firefox, nautilus --no-desktop, gnome-mplayer (playing mp3) 4) emerge -1 sys-devel/gcc (from one of the xterms) after emerge -1 sys-devel/gcc finished it displayed: >>> Auto-cleaning packages... portage: COUNTER for sys-devel/patch-2.6.1 was corrupted; resetting to value of 0 portage: COUNTER for sys-devel/patch-2.6.1 was corrupted; resetting to value of 0 (the COUNTER file normally should have a value, e.g.: cat /var/db/pkg/sys-devel/gcc-4.5.1-r1/COUNTER 20560) in this case it's empty: cat /var/db/pkg/sys-devel/patch-2.6.1/COUNTER (shows nothing) reference thread: http://forums.gentoo.org/viewtopic-t-836605-start-0.html it's solvable by re-install but in case of not-recoverable files (e.g. personal files) it would be critical -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Sat, Dec 4, 2010 at 8:38 PM, Mike Snitzer wrote: > On Sat, Dec 04 2010 at 2:18pm -0500, > Matt wrote: > >> On Wed, Dec 1, 2010 at 10:23 PM, Mike Snitzer wrote: >> > Matt and Jon, >> > >> > If you'd be up to it: could you try testing your dm-crypt+ext4 >> > corruption reproducers against the following two 2.6.37-rc commits: >> > >> > 1) 1de3e3df917459422cb2aecac440febc8879d410 >> > then >> > 2) bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc >> > >> > Then, depending on results of no corruption for those commits, bonus >> > points for testing the same commits but with Andi and Milan's latest >> > dm-crypt cpu scalability patch applied too: >> > https://patchwork.kernel.org/patch/365542/ >> > >> > Thanks! >> > Mike >> > >> >> Hi Mike, >> >> it seems like there isn't even much testing to do: >> >> I tested all 3 commits / checkouts by re-compiling gcc which was/is >> the 2nd easy way to trigger this "corruption", compiling google's >> chromium (v9) and looking at the output/existance of gcc, g++ and >> eselect opengl list > > Can you be a bit more precise about what you're doing to reproduce? > What sequence? What (if any) builds are going in parallel? Etc. > >> so far everything went fine >> >> After that I used the new patch (v6 or pre-v6), before that I had to >> >> replace WQ_MEM_RECLAIM with WQ_RESCUER >> >> and, re-compiled the kernels >> >> shortly after I had booted up the system with the first kernel >> (http://git.eu.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5a87b7a5da250c9be6d757758425dfeaf8ed3179) >> the output of 'eselect opengl list' did show no opengl backend >> selected >> >> so it seems to manifest itself even earlier (ext4: call >> mpage_da_submit_io() from mpage_da_map_blocks()) even if only subtly >> and over time - >> I'm still currently running that kernel and posting from it & having tests >> run > > OK. > >> I'm not sure if it's even a problem with ext4 - I haven't had the time >> to test with XFS yet - maybe it's also happening with that so it more >> likely would be dm-core, like Milan suspected >> (http://marc.info/?l=linux-kernel&m=129123636223477&w=2) :( > > It'd be interesting to try to reproduce with that same kernel but using > XFS. I'll check with Milan on what he thinks would be the best next > steps. Ideally we'll be able to reproduce your results to aid in > pinpointing the issue. I think Milan will be trying to do so shortly > (if he hasn't started already -- using gentoo emerge, etc). > >> even though most of the time it's compiling I don't need to do much - >> I need the box for work so if my time allows next tests would be next >> weekend and I'm back to my other partition >> >> I really do hope that this bugger can be nailed down ASAP - I like the >> improvements made in 2.6.37 but without the dm-crypt multi-cpu patch >> it's only half the "fun" ;) > > Sure, we'll need to get to the bottom of this before we can have > confidence sending the dm-crypt cpu scalability patch upstream. > > Thanks for your testing, > Mike > I should have made it clear that the results I get are observed when using the kernels/checkouts *with* the dm-crypt multi-cpu patch, without the patch I didn't see that kind of problems (hardlocks, files missing, etc.) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Wed, Dec 1, 2010 at 10:23 PM, Mike Snitzer wrote: > On Wed, Dec 01 2010 at 3:45pm -0500, > Milan Broz wrote: > >> >> On 12/01/2010 08:34 PM, Jon Nelson wrote: >> > Perhaps this is useful: for myself, I found that when I started using >> > 2.6.37rc3 that postgresql starting having a *lot* of problems with >> > corruption. Specifically, I noted zeroed pages, corruption in headers, >> > all sorts of stuff on /newly created/ tables, especially during index >> > creation. I had a fairly high hit rate of failure. I backed off to >> > 2.6.34.7 and have *zero* problems (in fact, prior to 2.6.37rc3, I had >> > never had a corruption issue with postgresql). I ran on 2.6.36 for a >> > few weeks as well, without issue. >> > >> > I am using kcrypt with lvm on top of that, and ext4 on top of that. >> >> With unpatched dmcrypt (IOW with Linus' git)? Then it must be ext4 or >> dm-core problem because there were no patches for dm-crypt... > > Matt and Jon, > > If you'd be up to it: could you try testing your dm-crypt+ext4 > corruption reproducers against the following two 2.6.37-rc commits: > > 1) 1de3e3df917459422cb2aecac440febc8879d410 > then > 2) bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc > > Then, depending on results of no corruption for those commits, bonus > points for testing the same commits but with Andi and Milan's latest > dm-crypt cpu scalability patch applied too: > https://patchwork.kernel.org/patch/365542/ > > Thanks! > Mike > Hi Mike, it seems like there isn't even much testing to do: I tested all 3 commits / checkouts by re-compiling gcc which was/is the 2nd easy way to trigger this "corruption", compiling google's chromium (v9) and looking at the output/existance of gcc, g++ and eselect opengl list so far everything went fine After that I used the new patch (v6 or pre-v6), before that I had to replace WQ_MEM_RECLAIM with WQ_RESCUER and, re-compiled the kernels shortly after I had booted up the system with the first kernel (http://git.eu.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5a87b7a5da250c9be6d757758425dfeaf8ed3179) the output of 'eselect opengl list' did show no opengl backend selected so it seems to manifest itself even earlier (ext4: call mpage_da_submit_io() from mpage_da_map_blocks()) even if only subtly and over time - I'm still currently running that kernel and posting from it & having tests run I'm not sure if it's even a problem with ext4 - I haven't had the time to test with XFS yet - maybe it's also happening with that so it more likely would be dm-core, like Milan suspected (http://marc.info/?l=linux-kernel&m=129123636223477&w=2) :( @Jon, you had time to do some tests meanwhile ? what did you find out ? even though most of the time it's compiling I don't need to do much - I need the box for work so if my time allows next tests would be next weekend and I'm back to my other partition I really do hope that this bugger can be nailed down ASAP - I like the improvements made in 2.6.37 but without the dm-crypt multi-cpu patch it's only half the "fun" ;) Thanks & Regards Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Wed, Dec 1, 2010 at 10:23 PM, Mike Snitzer wrote: > On Wed, Dec 01 2010 at 3:45pm -0500, > Milan Broz wrote: > >> >> On 12/01/2010 08:34 PM, Jon Nelson wrote: >> > Perhaps this is useful: for myself, I found that when I started using >> > 2.6.37rc3 that postgresql starting having a *lot* of problems with >> > corruption. Specifically, I noted zeroed pages, corruption in headers, >> > all sorts of stuff on /newly created/ tables, especially during index >> > creation. I had a fairly high hit rate of failure. I backed off to >> > 2.6.34.7 and have *zero* problems (in fact, prior to 2.6.37rc3, I had >> > never had a corruption issue with postgresql). I ran on 2.6.36 for a >> > few weeks as well, without issue. >> > >> > I am using kcrypt with lvm on top of that, and ext4 on top of that. >> >> With unpatched dmcrypt (IOW with Linus' git)? Then it must be ext4 or >> dm-core problem because there were no patches for dm-crypt... > > Matt and Jon, > > If you'd be up to it: could you try testing your dm-crypt+ext4 > corruption reproducers against the following two 2.6.37-rc commits: > > 1) 1de3e3df917459422cb2aecac440febc8879d410 > then > 2) bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc > > Then, depending on results of no corruption for those commits, bonus > points for testing the same commits but with Andi and Milan's latest > dm-crypt cpu scalability patch applied too: > https://patchwork.kernel.org/patch/365542/ > > Thanks! > Mike > Yeah sure, I'll have to set up another testing system (on a separate partition / volume group) for its own so that will take some time, first tests will be run probably in the weekend, thanks for those pointers ! I took a look at git-web - you think 5a87b7a5da250c9be6d757758425dfeaf8ed3179 might be relevant, too ? the others seem rather minor compared to those you posted Afaik last time I run vanilla 2.6.37-rc* (which was probably around rc1) I saw no corruption at all but I'll give it a test-run without the dm-crypt patch anyway Thanks & Regards Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dm-crypt barrier support is effective
On Wed, Dec 1, 2010 at 5:52 PM, Mike Snitzer wrote: > On Wed, Dec 01 2010 at 11:05am -0500, > Matt wrote: > >> On Mon, Nov 15, 2010 at 12:24 AM, Matt wrote: >> > On Sun, Nov 14, 2010 at 10:54 PM, Milan Broz wrote: >> >> On 11/14/2010 10:49 PM, Matt wrote: >> >>> only with the dm-crypt scaling patch I could observe the data-corruption >> >> >> >> even with v5 I sent on Friday? >> >> >> >> Are you sure that it is not related to some fs problem in 2.6.37-rc1? >> >> >> >> If it works on 2.6.36 without problems, it is probably problems somewhere >> >> else (flush/fua conversion was trivial here - DM is still doing full flush >> >> and there are no other changes in code IMHO.) >> >> >> >> Milan >> >> >> > >> > Hi Milan, >> > >> > I'm aware of your new v5 patch (which should include several >> > improvements (or potential fixes in my case) over the v3 patch) >> > >> > as I already wrote my schedule unfortunately currently doesn't allow >> > me to test it >> > >> > * in the case of no corruption it would be nice to have 2.6.37-rc* running >> > :) >> > >> > * in the case of data corruption that would mean restoring my system - >> > since it's my production box and right now I don't have a fallback at >> > reach >> > at earliest I could give it a shot at the beginning of December. Then >> > I could also test reiserfs and ext4 as a system partition to rule out >> > that it's >> > a ext4-specific thing (currently I'm running reiserfs on my >> > system-partition). >> > >> > Thanks ! >> > >> > Matt >> > >> >> >> OK guys, >> >> I've updated my system to latest glibc 2.12.1-r3 (on gentoo) and gcc >> hardened 4.5.1-r1 with 1.4 patchset which also uses pie (that one >> should fix problems with graphite) >> >> not much system changes besides that, >> >> with those it worked fine with 2.6.36 and I couldn't observe any >> filesystem corruption > > So dm-crypt cpu scalability v5 with 2.6.36 worked fine. > >> the bad news is: I'm again seeing corruption (!) [on ext4, on the / >> (root) partition]: > > ... > >> ===> so the No.1 trigger of this kind of corruption where files are >> empty, missing or the content gets corrupted (at least for me) is >> compiling software which is part of the system (e.g. emerge -e >> system); >> >> the system is Gentoo ~amd64; with binutils 2.20.51.0.12 (afaik this >> one has changed from 2.20.51.0.10 to 2.20.51.0.12 from my last >> report); gcc 4.5.1 (Gentoo Hardened 4.5.1-r1 p1.4, pie-0.4.5) <-- >> works fine with 2.6.36 and 2.6.36.1 >> >> I'm not sure whether benchmarks would have the same "impact" > > Seems this emerge is a good test if it reliably enduces the corruption. > >> the kernel currently running is 2.6.37-rc4 with the [PATCH v5] dm >> crypt: scale to multiple CPUs >> >> besides that additional patchsets are applied (I apologize that it's >> not only plain vanilla with the dm-crypt patch): >> * Prevent kswapd dumping excessive amounts of memory in response to >> high-order allocation >> * ext4: coordinate data-only flush requests sent by fsync >> * vmscan: protect executable page from inactive list scan >> * writeback livelock fixes v2 > > Have you actually experienced any of the issues the above patches are > meant to address? Seems you're applying patches guessing/hoping > that they'll fix the dm-crypt corruption. > >> I originally had hoped that the mentioned patch in "ext4: coordinate >> data-only flush requests sent by fsync", namely: "md: Call >> blk_queue_flush() to establish flush/fua" and additional changes & >> fixes to 2.6.37-rc4 would once and for all fix problems but it didn't > > That md patch doesn't help DM at all. And the ext4 coordination patch > is completely bleeding and actually broken (especially as it relates to > DM -- but that breakage is ony a concern for request-based DM, > e.g. DM-mapth), anyway see: > https://www.redhat.com/archives/dm-devel/2010-November/msg00185.html > > I'm not sure which patches you're using for the ext4 fsync changes but > please don't use them at all. It is purely an optimization for > extremely heavy fsync workloads and is only getting in the way at this > point. > >> I'm also u
Re: dm-crypt barrier support is effective
On Mon, Nov 15, 2010 at 12:24 AM, Matt wrote: > On Sun, Nov 14, 2010 at 10:54 PM, Milan Broz wrote: >> On 11/14/2010 10:49 PM, Matt wrote: >>> only with the dm-crypt scaling patch I could observe the data-corruption >> >> even with v5 I sent on Friday? >> >> Are you sure that it is not related to some fs problem in 2.6.37-rc1? >> >> If it works on 2.6.36 without problems, it is probably problems somewhere >> else (flush/fua conversion was trivial here - DM is still doing full flush >> and there are no other changes in code IMHO.) >> >> Milan >> > > Hi Milan, > > I'm aware of your new v5 patch (which should include several > improvements (or potential fixes in my case) over the v3 patch) > > as I already wrote my schedule unfortunately currently doesn't allow > me to test it > > * in the case of no corruption it would be nice to have 2.6.37-rc* running :) > > * in the case of data corruption that would mean restoring my system - > since it's my production box and right now I don't have a fallback at > reach > at earliest I could give it a shot at the beginning of December. Then > I could also test reiserfs and ext4 as a system partition to rule out > that it's > a ext4-specific thing (currently I'm running reiserfs on my system-partition). > > Thanks ! > > Matt > OK guys, I've updated my system to latest glibc 2.12.1-r3 (on gentoo) and gcc hardened 4.5.1-r1 with 1.4 patchset which also uses pie (that one should fix problems with graphite) not much system changes besides that, with those it worked fine with 2.6.36 and I couldn't observe any filesystem corruption the bad news is: I'm again seeing corruption (!) [on ext4, on the / (root) partition]: I was re-emerging/re-installing stuff - pretty trivial stuff actually (which worked fine in the past): emerging gnome-base programs (gconf, librsvg, nautilus, gnome-mount, gnome-vfs, gvfs, imagemagick, xine-lib) and some others: terminal (from xfce), vtwm, rman, vala (library), xclock, xload, atk, gtk+, vte during that I noticed some corruption and programs kept failing to configure/compile, saying that g++ was missing; I re-extracted gcc (which I previously had made an backup-tarball), that seemed to help for some time until programs again failed with some corrupted files from gcc so I re-emerged gcc (compiling it) and after it had finished the same error occured I already had written about in an previous email: the content of /etc/env.d/03opengl got corrupted - but NOT the whole file: normally it's # Configuration file for eselect # This file has been automatically generated. LDPATH= OPENGL_PROFILE= <-- where the path to the graphics-drivers and the opengl profile is written; in this case of the corruption it only where symbols I have no clue how this file could be connected with gcc ===> so the No.1 trigger of this kind of corruption where files are empty, missing or the content gets corrupted (at least for me) is compiling software which is part of the system (e.g. emerge -e system); the system is Gentoo ~amd64; with binutils 2.20.51.0.12 (afaik this one has changed from 2.20.51.0.10 to 2.20.51.0.12 from my last report); gcc 4.5.1 (Gentoo Hardened 4.5.1-r1 p1.4, pie-0.4.5) <-- works fine with 2.6.36 and 2.6.36.1 I'm not sure whether benchmarks would have the same "impact" the kernel currently running is 2.6.37-rc4 with the [PATCH v5] dm crypt: scale to multiple CPUs besides that additional patchsets are applied (I apologize that it's not only plain vanilla with the dm-crypt patch): * Prevent kswapd dumping excessive amounts of memory in response to high-order allocation * ext4: coordinate data-only flush requests sent by fsync * vmscan: protect executable page from inactive list scan * writeback livelock fixes v2 I originally had hoped that the mentioned patch in "ext4: coordinate data-only flush requests sent by fsync", namely: "md: Call blk_queue_flush() to establish flush/fua" and additional changes & fixes to 2.6.37-rc4 would once and for all fix problems but it didn't I'm also using the the writeback livelock fixes and the dm-crypt scale to multiple CPUs with 2.6.36 so those generally work fine so it has be something that changed from 2.6.36->2.6.37 within dm-crypt or other parts that gets stressed and breaks during usage of the "[PATCH v5] dm crypt: scale to multiple CPUs" patch the other included patches surely won't be the cause for that (100%). Filesystem corruption only seems to occur on the / (root) where the system resides - Fortunately I haven't encountered any corruption on my /home partition which also uses ext4 and during rsync'ing from /home to other data partitions with ext4 and xfs (I don't want to try t
Re: dm-crypt barrier support is effective
On Sun, Nov 14, 2010 at 10:54 PM, Milan Broz wrote: > On 11/14/2010 10:49 PM, Matt wrote: >> only with the dm-crypt scaling patch I could observe the data-corruption > > even with v5 I sent on Friday? > > Are you sure that it is not related to some fs problem in 2.6.37-rc1? > > If it works on 2.6.36 without problems, it is probably problems somewhere > else (flush/fua conversion was trivial here - DM is still doing full flush > and there are no other changes in code IMHO.) > > Milan > Hi Milan, I'm aware of your new v5 patch (which should include several improvements (or potential fixes in my case) over the v3 patch) as I already wrote my schedule unfortunately currently doesn't allow me to test it * in the case of no corruption it would be nice to have 2.6.37-rc* running :) * in the case of data corruption that would mean restoring my system - since it's my production box and right now I don't have a fallback at reach at earliest I could give it a shot at the beginning of December. Then I could also test reiserfs and ext4 as a system partition to rule out that it's a ext4-specific thing (currently I'm running reiserfs on my system-partition). Thanks ! Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dm-crypt barrier support is effective (was: Re: DM-CRYPT: Scale to multiple CPUs v3 on 2.6.37-rc* ?)
ic private pointer in per-cpu patches, recompiled the kernel and rebooted into that new environment it seemingly caused corruptions right from the start (the mentioned corruption /etc/env.d/02opengl to be the most obvious candidate and probably even more) with those corruptions being anticipated over longer uptime and heavy use-patterns (such as re-compiling the whole system). I don't know if the new multi-cpu scaling patch for dm-crypt makes a change (since I can't test it right now due to a busy schedule) [PATCH v5] dm crypt: scale to multiple CPUs I however have a request: could you guys please take this patch through a "battery of heavy tests" until it's included in mainline ? so that you can spot any issues (races, BUGs, etc.) which might be inherent/triggered in current dm-crypt so that my reported corruptions might be prevented in the future ? Again: the vanilla kernel and dm-crypt are perfectly stable ! only with the dm-crypt scaling patch I could observe the data-corruption Thanks ! Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: checkpatch fixes in various files
From: Matt Lupfer Fixes innocuous style issues identified by the checkpatch stript. Signed-off-by: Matt Lupfer Reviewed-by: Ben Chociej Reviewed-by: Conor Scott Reviewed-by: Steve French --- fs/btrfs/async-thread.c |2 +- fs/btrfs/disk-io.c |4 ++-- fs/btrfs/export.c |3 ++- fs/btrfs/extent-tree.c |8 fs/btrfs/extent_io.h|4 ++-- fs/btrfs/extent_map.h |8 fs/btrfs/free-space-cache.c | 20 +--- fs/btrfs/inode.c| 27 +++ fs/btrfs/ioctl.c| 19 +++ fs/btrfs/locking.c |4 ++-- fs/btrfs/ordered-data.c |3 +-- fs/btrfs/ordered-data.h |7 --- fs/btrfs/tree-log.c |4 ++-- fs/btrfs/tree-log.h |3 ++- fs/btrfs/volumes.c |4 ++-- 15 files changed, 63 insertions(+), 57 deletions(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index 7ec1409..e142da3 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -260,7 +260,7 @@ static struct btrfs_work *get_next_work(struct btrfs_worker_thread *worker, struct btrfs_work *work = NULL; struct list_head *cur = NULL; - if(!list_empty(prio_head)) + if (!list_empty(prio_head)) cur = prio_head->next; smp_mb(); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 34f7c37..4513eaf 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -243,8 +243,8 @@ static int csum_tree_block(struct btrfs_root *root, struct extent_buffer *buf, "failed on %llu wanted %X found %X " "level %d\n", root->fs_info->sb->s_id, - (unsigned long long)buf->start, val, found, - btrfs_header_level(buf)); + (unsigned long long)buf->start, val, + found, btrfs_header_level(buf)); } if (result != (char *)&inline_result) kfree(result); diff --git a/fs/btrfs/export.c b/fs/btrfs/export.c index 951ef09..e7e5463 100644 --- a/fs/btrfs/export.c +++ b/fs/btrfs/export.c @@ -223,7 +223,8 @@ static struct dentry *btrfs_get_parent(struct dentry *child) key.type = BTRFS_INODE_ITEM_KEY; key.offset = 0; - dentry = d_obtain_alias(btrfs_iget(root->fs_info->sb, &key, root, NULL)); + dentry = d_obtain_alias(btrfs_iget(root->fs_info->sb, &key, root, + NULL)); if (!IS_ERR(dentry)) dentry->d_op = &btrfs_dentry_operations; return dentry; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a46b64d..1298500 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4578,9 +4578,8 @@ static noinline int find_free_extent(struct btrfs_trans_handle *trans, empty_cluster = 64 * 1024; } - if ((data & BTRFS_BLOCK_GROUP_DATA) && btrfs_test_opt(root, SSD)) { + if ((data & BTRFS_BLOCK_GROUP_DATA) && btrfs_test_opt(root, SSD)) last_ptr = &root->fs_info->data_alloc_cluster; - } if (last_ptr) { spin_lock(&last_ptr->lock); @@ -4642,7 +4641,8 @@ have_block_group: if (unlikely(block_group->cached == BTRFS_CACHE_NO)) { u64 free_percent; - free_percent = btrfs_block_group_used(&block_group->item); + free_percent = btrfs_block_group_used( + &block_group->item); free_percent *= 100; free_percent = div64_u64(free_percent, block_group->key.offset); @@ -7862,7 +7862,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info) release_global_block_rsv(info); - while(!list_empty(&info->space_info)) { + while (!list_empty(&info->space_info)) { space_info = list_entry(info->space_info.next, struct btrfs_space_info, list); diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 5691c7b..2ebfef0 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -184,8 +184,8 @@ int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end, int clear_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, int bits, gfp_t mask); int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, -int bits, int wake, int delete, struct extent_state **cached, -
Re: Copy/move btrfs volume
On 07/01/2010 05:33 AM, Lubos Kolouch wrote: > Daniel J Blueman, Thu, 01 Jul 2010 12:26:10 +0100: >>> What is the correct way to do this? >> >> The only way to do this preserving duplication is to use hardlinks >> between duplicated files (which reference counts the inode), and use >> 'rsync -H'. >> >> Dan Hello, With backed up files consisting of hard links, I usually use dd to copy the file systems at the block level # dd if=/dev/sda of=/dev/sdb bs=20M and then expand the file system. This is because I found that tools like rsync, while usually fast, are extremely slow when dealing with millions of hard linked files. This could also be used for btrfs to keep its snapshots. > A scenario - I have raid5 of say, 1TB HDDs. It contains many snapshots. > Then, few years later, new machine is bought and there are, say, 5TB > discs. > ... > Lubos For me, I had to copy over BackupPC hardlinked files from a full disk to a smaller disk, both using ext4, and I could not use dd. What normally should have taken an hour, instead took almost a week. (Yes, I wanted to use btrfs, but it had a hard link limit of 255 - don't know if it still does.) It would be nice to have a btrfs command that could rapidly copy over the file system, snapshots, and all other file system info. But what benefit would having a native btrfs 'copy/rsync' command have over the dd/resize option? Pros - Files will be immediately checksumed on new disks, but this may not be as important since a checksum/verify command will be implemented. - Great 'feature' for copying files to new drives, and keeping snapshots. Could even be used to export snapshots. - I believe compressed files will have to be uncompressed and recompressed, depending on when file is checksummed. (I may be wrong on this one). This will actually be a con for slow and/or high load machines. - One command instead of many (dd -> resize -> verify). Cons - File system would still have to be unmounted, or at least read-only, as I doubt the command will have rsync's update or delete abilities. But, maybe it could. Questionable - May be faster than dd/resize, or it may be just as slow as rsync is with hard links. And I am talking about dozens to thousands of snapshots, and millions to billions of files. Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Mounting raid without a btrfsctl scan
Hi, Would it be possible and feasible to support mounting btrfs raid/multi-device filesystems without having to run 'btrfsctl -a'? Currently, as you may know, if a one wants to attach a btrfs raid filesystem to a system (usb, hotswap, reboot, etc), the user or program has to run: btrfsctl -a (or similar) mount /dev/sdb1 /mount/point While this works, it will require patching of various subsystems involved with managing disks, such as udev, mkinitrd, dracut, hal, and others. Each one will have to know to scan, then mount. For example, I have a system that has a btrfs raid1 as root. However, I had to patch the boot loader (dracut) so during boot it would scan just before mounting the root filesystem. I filed a bug with dracut, but the more I think of it, the more it seems that either mount.btrfs should be taking care of this, or another part of btrfs. Any thoughts or plans on the matter? Thanks, Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html