atime updates stuck under btrfs_dirty_inode
Hi all, I'm seeing processes block (for minutes or longer) in directory access on btrfs. I'm running commit ccda756d13f13a9276edc8d53d05aaecbf3c4baa Merge: 0c21e3a 65e5341 Author: Andy Isaacson a...@hexapodia.org Date: Sun Jan 9 12:37:23 2011 -0800 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable on a quad core Core i7. There seem to be a few failure cases. My most recent ls has been stuck like this for several hours, on a directory that should have somewhere between 6 and 50 entries: % ps u 15028 USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND adi 15028 0.0 0.0 27032 1204 pts/8S+ 22:18 0:00 ls -l /btr/foo % cat /proc/15028/stack [a013af78] shrink_delalloc+0x112/0x147 [btrfs] [a013b09f] reserve_metadata_bytes+0xf2/0x1a8 [btrfs] [a013b2d0] btrfs_block_rsv_add+0x28/0x49 [btrfs] [a013b33e] btrfs_trans_reserve_metadata+0x4d/0x75 [btrfs] [a014b4d4] start_transaction+0x13e/0x1d8 [btrfs] [a014b5c6] btrfs_start_transaction+0x13/0x15 [btrfs] [a014f9db] btrfs_dirty_inode+0x6c/0xfd [btrfs] [8111d82f] __mark_inode_dirty+0x2b/0x1ab [8111297a] touch_atime+0x10e/0x131 [8110d49d] vfs_readdir+0x79/0xa0 [8110d60d] sys_getdents+0x81/0xd1 [81009e97] tracesys+0xd9/0xde [] 0x Sometimes, if I poll fast enough I see this stack instead: % cat /proc/15028/stack [a0149b2d] wait_for_commit+0x99/0xe2 [btrfs] [a014ac20] btrfs_commit_transaction+0xfe/0x612 [btrfs] [a014b4e7] start_transaction+0x151/0x1d8 [btrfs] [a014b5c6] btrfs_start_transaction+0x13/0x15 [btrfs] [a014f9db] btrfs_dirty_inode+0x6c/0xfd [btrfs] [8111d82f] __mark_inode_dirty+0x2b/0x1ab [8111297a] touch_atime+0x10e/0x131 [8110d49d] vfs_readdir+0x79/0xa0 [8110d60d] sys_getdents+0x81/0xd1 [81009e97] tracesys+0xd9/0xde [] 0x Another ls has been accumulating runtime for several days: 7406 pts/3R+ 2973:56 | \_ ls -F the directory it's reading has around 200 files in it, I think. When I catch it in kernel mode its stack looks like this: % cat /proc/7406/stack [8111d293] writeback_inodes_sb_nr+0x76/0x7d [8111d6b0] writeback_inodes_sb_nr_if_idle+0x41/0x55 [a013af15] shrink_delalloc+0xaf/0x147 [btrfs] [a013b09f] reserve_metadata_bytes+0xf2/0x1a8 [btrfs] [a013b2d0] btrfs_block_rsv_add+0x28/0x49 [btrfs] [a013b33e] btrfs_trans_reserve_metadata+0x4d/0x75 [btrfs] [a014b4d4] start_transaction+0x13e/0x1d8 [btrfs] [a014b5c6] btrfs_start_transaction+0x13/0x15 [btrfs] [a014f9db] btrfs_dirty_inode+0x6c/0xfd [btrfs] [8111d82f] __mark_inode_dirty+0x2b/0x1ab [8111297a] touch_atime+0x10e/0x131 [8110d49d] vfs_readdir+0x79/0xa0 [8110d60d] sys_getdents+0x81/0xd1 [81009c82] system_call_fastpath+0x16/0x1b [] 0x vmstat is rather odd: % vmstat 1 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 2 0 0 124348 177388 1127622800 1572430 2 10 87 1 1 0 0 122108 177388 1127854000 536 112 2870 813309 2 13 83 2 2 0 0 119628 177388 1128085200 704 0 2938 887219 2 11 87 0 2 0 0 117768 177388 1128268400 800 0 2936 883427 2 15 82 1 2 0 0 114872 177388 1128546400 1104 0 2819 883498 1 8 90 0 1 0 0 112704 177388 1128738800 612 0 2961 880957 2 14 84 0 ^C One of the three block devices on this btrfs is reading a few hundred blocks per second, but is nowhere near 100% busy; the CS count is extremely high; and we have less than a full CPU spent in system time (even though there are two ls processes blocked). If anyone can point me to a good place to start tracking down the root cause, I'd appreciate it. Thanks, -andy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2.6.37 BUG at inode.c:1616 (was Re: 2.6.37: Bug on btrfs while umount)
On Thu, Jan 06, 2011 at 08:29:12PM -0500, Chris Mason wrote: [50010.838804] [ cut here ] [50010.838931] kernel BUG at fs/btrfs/inode.c:1616! [50010.839053] invalid opcode: [#1] PREEMPT SMP [snip] [50010.839653] Pid: 1681, comm: btrfs-endio-wri Not tainted 2.6.37 #1 Could you please pull from the master branch of the btrfs unstable tree. We had a late fix that is related to this. I saw BUG at inode.c:1616 while running 2.6.37-rc6-11882-g55ec86f, I saw your message and upgraded to Linus tip (0c21e3a) + btrfs-unstable tip (65e5341), and I just saw it again. Including both BUG traces below. The machine is a Core i7 with 12GB, with btrfs spanning three volumes: Label: btr uuid: 1271de53-b3d2-4d68-9d48-b19487e1c982 Total devices 3 FS bytes used 735.97GB devid1 size 18.65GB used 18.64GB path /dev/sda2 devid2 size 512.00GB used 511.88GB path /dev/sdb1 devid3 size 512.00GB used 225.26GB path /dev/sdc1 The primary writer to the filesystem is rtorrent; normally I have ffmpeg writing to the filesystem at about 100 kbyte/sec as well, but it wasn't running in this latest crash. [ 9275.240027] [ cut here ] [ 9275.249991] kernel BUG at fs/btrfs/inode.c:1616! [ 9275.259914] invalid opcode: [#1] SMP [ 9275.269794] last sysfs file: /sys/devices/pci:00/:00:1a.7/usb1/1-4/1-4:1.0/host8/target8:0:0/8:0:0:0/block/sdd/stat [ 9275.280066] CPU 0 [ 9275.280127] Modules linked in: tun ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc kvm_intel kvm xfs exportfs loop snd_hda_codec_hdmi snd_hda_codec_realtek radeon ttm drm_kms_helper drm snd_hda_intel snd_hda_codec i2c_algo_bit snd_usb_audio uvcvideo snd_hwdep i2c_i801 snd_usbmidi_lib snd_pcm snd_rawmidi snd_timer videodev snd_seq_device snd v4l2_compat_ioctl32 pcspkr i2c_core serio_raw soundcore snd_page_alloc processor tpm_tis tpm tpm_bios evdev shpchp button thermal_sys ext3 jbd mbcache dm_mod btrfs zlib_deflate crc32c libcrc32c usb_storage uas sd_mod crc_t10dif ehci_hcd usbcore ahci libahci libata r8169 scsi_mod mii nls_base [last unloaded: scsi_wait_scan] [ 9275.358450] [ 9275.369821] Pid: 3654, comm: btrfs-endio-wri Not tainted 2.6.37-03739-gccda756 #73 MSI X58 Pro-E (MS-7522)/MS-7522 [ 9275.381570] RIP: 0010:[a0152824] [a0152824] T.1234+0x76/0x201 [btrfs] [ 9275.393380] RSP: 0018:88025f275c30 EFLAGS: 00010286 [ 9275.405100] RAX: ffe4 RBX: 88032b596b40 RCX: 88032b596c60 [ 9275.416865] RDX: RSI: ea000b17b8d0 RDI: fff4 [ 9275.428666] RBP: 88025f275cc0 R08: 0005 R09: 88025f2759a0 [ 9275.440522] R10: 88025f275970 R11: dead00100100 R12: 880083a7e888 [ 9275.452374] R13: 06e0c000 R14: 880331d7c800 R15: 8800bb38d880 [ 9275.464146] FS: () GS:8800bf40() knlGS: [ 9275.475923] CS: 0010 DS: ES: CR0: 8005003b [ 9275.487673] CR2: 7f3eada57000 CR3: 01603000 CR4: 26e0 [ 9275.499547] DR0: DR1: DR2: [ 9275.511395] DR3: DR6: 0ff0 DR7: 0400 [ 9275.523192] Process btrfs-endio-wri (pid: 3654, threadinfo 88025f274000, task 88032554b020) [ 9275.535168] Stack: [ 9275.546852] 06e0c000 1000 00b8c1376000 1000 [ 9275.558616] 880331d7c800 0001 8800bb38d880 880331d7c800 [ 9275.570416] 88025f275cb0 a014b53f 88025f275ce0 8802e21ad7f0 [ 9275.582149] Call Trace: [ 9275.593731] [a014b53f] ? start_transaction+0x1a9/0x1d8 [btrfs] [ 9275.605513] [a0152e1e] btrfs_finish_ordered_io+0x1e6/0x2c2 [btrfs] [ 9275.617426] [a0152f14] btrfs_writepage_end_io_hook+0x1a/0x1c [btrfs] [ 9275.629403] [a0166871] end_bio_extent_writepage+0xae/0x159 [btrfs] [ 9275.641463] [81125947] bio_endio+0x2d/0x2f [ 9275.653462] [a01470a0] end_workqueue_fn+0x111/0x120 [btrfs] [ 9275.665484] [a016ecc2] worker_loop+0x195/0x4c4 [btrfs] [ 9275.677451] [a016eb2d] ? worker_loop+0x0/0x4c4 [btrfs] [ 9275.689317] [a016eb2d] ? worker_loop+0x0/0x4c4 [btrfs] [ 9275.701079] [81061a8b] kthread+0x82/0x8a [ 9275.712839] [8100aaa4] kernel_thread_helper+0x4/0x10 [ 9275.724455] [81061a09] ? kthread+0x0/0x8a [ 9275.735873] [8100aaa0] ? kernel_thread_helper+0x0/0x10 [ 9275.747329] Code: 0f 0b eb fe 80 88 88 00 00 00 08 45 31 c9 48 8b 4d 88 4c 8d 45 c0 4c 01 e9 4c 89 ea 4c 89 e6 4c 89 ff e8 7c 4c 00 00 85 c0 74 04 0f 0b eb fe 49 8b 84 24 a8 00 00 00 4c 89 6d a9 48 89 45 a0 c6 [ 9275.771177] RIP [a0152824] T.1234+0x76/0x201 [btrfs] [ 9275.782973] RSP 88025f275c30 [
Re: btrfs: 21 minutes to read 1.2M file directory
On Tue, Dec 21, 2010 at 03:07:33AM +0200, Felipe Contreras wrote: On Tue, Dec 21, 2010 at 12:24 AM, Andy Isaacson a...@hexapodia.org wrote: I have a directory with 1.2M files in it, which makes readdir very slow on btrfs with cold caches (although it's reasonably fast with hot caches as in the first example below): Sounds like: Bug 21562 - btrfs is dead slow due to fragmentation https://bugzilla.kernel.org/show_bug.cgi?id=21562 Hmmm, how do I look at the btree layout for a given inode? btrfs-image for this filesystem is 1.7GiB .bz2, so I'm afraid it's not reasonable to publish it. -andy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs: 21 minutes to read 1.2M file directory
Sigh, wrong btrfs address on the original. Apologies for the double-post. On Mon, Dec 20, 2010 at 02:24:46PM -0800, Andy Isaacson wrote: I have a directory with 1.2M files in it, which makes readdir very slow on btrfs with cold caches (although it's reasonably fast with hot caches as in the first example below): % time find /btr/foo /btr/foo.list find /btr/foo /btr/foo.list 4.10s user 7.97s system 36% cpu 33.275 total % head /btr/foo.list /btr/foo /btr/foo/1281373625.777.fg.jpg /btr/foo/1281373625.777.bg.jpg /btr/foo/1281373625.948.fg.jpg /btr/foo/1281373625.948.bg.jpg /btr/foo/1281373626.096.fg.jpg /btr/foo/1281373626.096.bg.jpg /btr/foo/1281373626.218.fg.jpg /btr/foo/1281373626.218.bg.jpg /btr/foo/1281373626.350.fg.jpg % wc !$ wc /btr/foo.list 1216 1216 401499940 /btr/foo.list % wc -l /btr/foo.list 1216 /btr/foo.list % sudo sysctl -w vm.drop_caches=3 vm.drop_caches=0 vm.drop_caches = 3 vm.drop_caches = 0 % time find /btr/foo /btr/foo.list.2 find /btr/foo /btr/foo.list.2 5.62s user 24.54s system 2% cpu 21:40.90 total % uname -a Linux pyron 2.6.36-rc7-00149-g29979aa #71 SMP Wed Oct 13 09:42:57 PDT 2010 x86_64 GNU/Linux Interestingly, while readdir is busy I'm only seeing IO on sdb even though the btrfs is on 3 targets: Label: btr uuid: 1271de53-b3d2-4d68-9d48-b19487e1c982 Total devices 3 FS bytes used 555.13GB devid1 size 18.65GB used 18.64GB path /dev/sda2 devid3 size 512.00GB used 44.13GB path /dev/sdc1 devid2 size 512.00GB used 511.76GB path /dev/sdb1 iostat -k 1 | grep sdb tells me: Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sdb 173.00 692.00 0.00692 0 sdb 185.00 740.00 0.00740 0 sdb 198.00 792.00 0.00792 0 sdb 177.00 712.00 0.00712 0 I updated to a recent git and it's still slow (my test hasn't completed yet 19 minutes in): Linux pyron 2.6.37-rc6-11882-g55ec86f #72 SMP Mon Dec 20 13:34:38 PST 2010 x86_64 GNU/Linux The devices are: [1.834527] ata1.00: ATA-7: INTEL SSDSA2M040G2GC, 2CV102HD, max UDMA/133 [1.834816] ata1.00: 78165360 sectors, multi 1: LBA48 NCQ (depth 31/32) [1.835369] ata1.00: configured for UDMA/133 [1.835776] scsi 0:0:0:0: Direct-Access ATA INTEL SSDSA2M040 2CV1 PQ: 0 ANSI: 5 ... [2.904919] ata3.00: ATA-8: ST31500341AS, CC1H, max UDMA/133 [2.905206] ata3.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32) [2.947393] ata3.00: configured for UDMA/133 [2.947850] scsi 2:0:0:0: Direct-Access ATA ST31500341AS CC1H PQ: 0 ANSI: 5 ... [3.989664] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [4.018524] ata5.00: ATA-8: ST31500341AS, CC1H, max UDMA/133 [4.018811] ata5.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32) [4.060838] ata5.00: configured for UDMA/133 [4.061205] scsi 4:0:0:0: Direct-Access ATA ST31500341AS CC1H PQ: 0 ANSI: 5 The host is a Intel(R) Core(TM) i7 CPU 930 @2.80GHz with 12GB RAM. Thanks, -andy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html