atime updates stuck under btrfs_dirty_inode

2011-01-16 Thread Andy Isaacson
Hi all,

I'm seeing processes block (for minutes or longer) in directory access
on btrfs.  I'm running

commit ccda756d13f13a9276edc8d53d05aaecbf3c4baa
Merge: 0c21e3a 65e5341
Author: Andy Isaacson a...@hexapodia.org
Date:   Sun Jan 9 12:37:23 2011 -0800

Merge branch 'master' of
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable

on a quad core Core i7.

There seem to be a few failure cases.  My most recent ls has been stuck
like this for several hours, on a directory that should have somewhere
between 6 and 50 entries:

% ps u 15028
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
adi  15028  0.0  0.0  27032  1204 pts/8S+   22:18   0:00 ls -l /btr/foo
% cat /proc/15028/stack
[a013af78] shrink_delalloc+0x112/0x147 [btrfs]
[a013b09f] reserve_metadata_bytes+0xf2/0x1a8 [btrfs]
[a013b2d0] btrfs_block_rsv_add+0x28/0x49 [btrfs]
[a013b33e] btrfs_trans_reserve_metadata+0x4d/0x75 [btrfs]
[a014b4d4] start_transaction+0x13e/0x1d8 [btrfs]
[a014b5c6] btrfs_start_transaction+0x13/0x15 [btrfs]
[a014f9db] btrfs_dirty_inode+0x6c/0xfd [btrfs]
[8111d82f] __mark_inode_dirty+0x2b/0x1ab
[8111297a] touch_atime+0x10e/0x131
[8110d49d] vfs_readdir+0x79/0xa0
[8110d60d] sys_getdents+0x81/0xd1
[81009e97] tracesys+0xd9/0xde
[] 0x

Sometimes, if I poll fast enough I see this stack instead:

% cat /proc/15028/stack
[a0149b2d] wait_for_commit+0x99/0xe2 [btrfs]
[a014ac20] btrfs_commit_transaction+0xfe/0x612 [btrfs]
[a014b4e7] start_transaction+0x151/0x1d8 [btrfs]
[a014b5c6] btrfs_start_transaction+0x13/0x15 [btrfs]
[a014f9db] btrfs_dirty_inode+0x6c/0xfd [btrfs]
[8111d82f] __mark_inode_dirty+0x2b/0x1ab
[8111297a] touch_atime+0x10e/0x131
[8110d49d] vfs_readdir+0x79/0xa0
[8110d60d] sys_getdents+0x81/0xd1
[81009e97] tracesys+0xd9/0xde
[] 0x

Another ls has been accumulating runtime for several days:

 7406 pts/3R+   2973:56  |   \_ ls -F

the directory it's reading has around 200 files in it, I think.

When I catch it in kernel mode its stack looks like this:

% cat /proc/7406/stack
[8111d293] writeback_inodes_sb_nr+0x76/0x7d
[8111d6b0] writeback_inodes_sb_nr_if_idle+0x41/0x55
[a013af15] shrink_delalloc+0xaf/0x147 [btrfs]
[a013b09f] reserve_metadata_bytes+0xf2/0x1a8 [btrfs]
[a013b2d0] btrfs_block_rsv_add+0x28/0x49 [btrfs]
[a013b33e] btrfs_trans_reserve_metadata+0x4d/0x75 [btrfs]
[a014b4d4] start_transaction+0x13e/0x1d8 [btrfs]
[a014b5c6] btrfs_start_transaction+0x13/0x15 [btrfs]
[a014f9db] btrfs_dirty_inode+0x6c/0xfd [btrfs]
[8111d82f] __mark_inode_dirty+0x2b/0x1ab
[8111297a] touch_atime+0x10e/0x131
[8110d49d] vfs_readdir+0x79/0xa0
[8110d60d] sys_getdents+0x81/0xd1
[81009c82] system_call_fastpath+0x16/0x1b
[] 0x

vmstat is rather odd:

% vmstat 1
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 2  0  0 124348 177388 1127622800   1572430  2 10 87  1
 1  0  0 122108 177388 1127854000   536   112 2870 813309  2 13 83  
2
 2  0  0 119628 177388 1128085200   704 0 2938 887219  2 11 87  0
 2  0  0 117768 177388 1128268400   800 0 2936 883427  2 15 82  
1
 2  0  0 114872 177388 1128546400  1104 0 2819 883498  1  8 90  0
 1  0  0 112704 177388 1128738800   612 0 2961 880957  2 14 84  0
^C

One of the three block devices on this btrfs is reading a few hundred
blocks per second, but is nowhere near 100% busy; the CS count is
extremely high; and we have less than a full CPU spent in system time
(even though there are two ls processes blocked).

If anyone can point me to a good place to start tracking down the root
cause, I'd appreciate it.

Thanks,
-andy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.37 BUG at inode.c:1616 (was Re: 2.6.37: Bug on btrfs while umount)

2011-01-10 Thread Andy Isaacson
On Thu, Jan 06, 2011 at 08:29:12PM -0500, Chris Mason wrote:
  [50010.838804] [ cut here ]
  [50010.838931] kernel BUG at fs/btrfs/inode.c:1616!
  [50010.839053] invalid opcode:  [#1] PREEMPT SMP
[snip]
  [50010.839653] Pid: 1681, comm: btrfs-endio-wri Not tainted 2.6.37 #1
 
 Could you please pull from the master branch of the btrfs unstable tree.
 We had a late fix that is related to this.

I saw BUG at inode.c:1616 while running 2.6.37-rc6-11882-g55ec86f, I saw
your message and upgraded to Linus tip (0c21e3a) + btrfs-unstable tip
(65e5341), and I just saw it again.  Including both BUG traces below.

The machine is a Core i7 with 12GB, with btrfs spanning three volumes:

Label: btr  uuid: 1271de53-b3d2-4d68-9d48-b19487e1c982
Total devices 3 FS bytes used 735.97GB
devid1 size 18.65GB used 18.64GB path /dev/sda2
devid2 size 512.00GB used 511.88GB path /dev/sdb1
devid3 size 512.00GB used 225.26GB path /dev/sdc1

The primary writer to the filesystem is rtorrent; normally I have ffmpeg
writing to the filesystem at about 100 kbyte/sec as well, but it wasn't
running in this latest crash.

[ 9275.240027] [ cut here ]
[ 9275.249991] kernel BUG at fs/btrfs/inode.c:1616!
[ 9275.259914] invalid opcode:  [#1] SMP 
[ 9275.269794] last sysfs file: 
/sys/devices/pci:00/:00:1a.7/usb1/1-4/1-4:1.0/host8/target8:0:0/8:0:0:0/block/sdd/stat
[ 9275.280066] CPU 0 
[ 9275.280127] Modules linked in: tun ebtable_nat ebtables ipt_MASQUERADE 
iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack 
ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc kvm_intel 
kvm xfs exportfs loop snd_hda_codec_hdmi snd_hda_codec_realtek radeon ttm 
drm_kms_helper drm snd_hda_intel snd_hda_codec i2c_algo_bit snd_usb_audio 
uvcvideo snd_hwdep i2c_i801 snd_usbmidi_lib snd_pcm snd_rawmidi snd_timer 
videodev snd_seq_device snd v4l2_compat_ioctl32 pcspkr i2c_core serio_raw 
soundcore snd_page_alloc processor tpm_tis tpm tpm_bios evdev shpchp button 
thermal_sys ext3 jbd mbcache dm_mod btrfs zlib_deflate crc32c libcrc32c 
usb_storage uas sd_mod crc_t10dif ehci_hcd usbcore ahci libahci libata r8169 
scsi_mod mii nls_base [last unloaded: scsi_wait_scan]
[ 9275.358450] 
[ 9275.369821] Pid: 3654, comm: btrfs-endio-wri Not tainted 
2.6.37-03739-gccda756 #73 MSI X58 Pro-E (MS-7522)/MS-7522
[ 9275.381570] RIP: 0010:[a0152824]  [a0152824] 
T.1234+0x76/0x201 [btrfs]
[ 9275.393380] RSP: 0018:88025f275c30  EFLAGS: 00010286
[ 9275.405100] RAX: ffe4 RBX: 88032b596b40 RCX: 88032b596c60
[ 9275.416865] RDX:  RSI: ea000b17b8d0 RDI: fff4
[ 9275.428666] RBP: 88025f275cc0 R08: 0005 R09: 88025f2759a0
[ 9275.440522] R10: 88025f275970 R11: dead00100100 R12: 880083a7e888
[ 9275.452374] R13: 06e0c000 R14: 880331d7c800 R15: 8800bb38d880
[ 9275.464146] FS:  () GS:8800bf40() 
knlGS:
[ 9275.475923] CS:  0010 DS:  ES:  CR0: 8005003b
[ 9275.487673] CR2: 7f3eada57000 CR3: 01603000 CR4: 26e0
[ 9275.499547] DR0:  DR1:  DR2: 
[ 9275.511395] DR3:  DR6: 0ff0 DR7: 0400
[ 9275.523192] Process btrfs-endio-wri (pid: 3654, threadinfo 88025f274000, 
task 88032554b020)
[ 9275.535168] Stack:
[ 9275.546852]  06e0c000 1000 00b8c1376000 
1000
[ 9275.558616]  880331d7c800 0001 8800bb38d880 
880331d7c800
[ 9275.570416]  88025f275cb0 a014b53f 88025f275ce0 
8802e21ad7f0
[ 9275.582149] Call Trace:
[ 9275.593731]  [a014b53f] ? start_transaction+0x1a9/0x1d8 [btrfs]
[ 9275.605513]  [a0152e1e] btrfs_finish_ordered_io+0x1e6/0x2c2 [btrfs]
[ 9275.617426]  [a0152f14] btrfs_writepage_end_io_hook+0x1a/0x1c 
[btrfs]
[ 9275.629403]  [a0166871] end_bio_extent_writepage+0xae/0x159 [btrfs]
[ 9275.641463]  [81125947] bio_endio+0x2d/0x2f
[ 9275.653462]  [a01470a0] end_workqueue_fn+0x111/0x120 [btrfs]
[ 9275.665484]  [a016ecc2] worker_loop+0x195/0x4c4 [btrfs]
[ 9275.677451]  [a016eb2d] ? worker_loop+0x0/0x4c4 [btrfs]
[ 9275.689317]  [a016eb2d] ? worker_loop+0x0/0x4c4 [btrfs]
[ 9275.701079]  [81061a8b] kthread+0x82/0x8a
[ 9275.712839]  [8100aaa4] kernel_thread_helper+0x4/0x10
[ 9275.724455]  [81061a09] ? kthread+0x0/0x8a
[ 9275.735873]  [8100aaa0] ? kernel_thread_helper+0x0/0x10
[ 9275.747329] Code: 0f 0b eb fe 80 88 88 00 00 00 08 45 31 c9 48 8b 4d 88 4c 
8d 45 c0 4c 01 e9 4c 89 ea 4c 89 e6 4c 89 ff e8 7c 4c 00 00 85 c0 74 04 0f 0b 
eb fe 49 8b 84 24 a8 00 00 00 4c 89 6d a9 48 89 45 a0 c6 
[ 9275.771177] RIP  [a0152824] T.1234+0x76/0x201 [btrfs]
[ 9275.782973]  RSP 88025f275c30
[ 

Re: btrfs: 21 minutes to read 1.2M file directory

2010-12-22 Thread Andy Isaacson
On Tue, Dec 21, 2010 at 03:07:33AM +0200, Felipe Contreras wrote:
 On Tue, Dec 21, 2010 at 12:24 AM, Andy Isaacson a...@hexapodia.org wrote:
  I have a directory with 1.2M files in it, which makes readdir very slow
  on btrfs with cold caches (although it's reasonably fast with hot caches
  as in the first example below):
 
 Sounds like:
 
 Bug 21562 - btrfs is dead slow due to fragmentation
 https://bugzilla.kernel.org/show_bug.cgi?id=21562

Hmmm, how do I look at the btree layout for a given inode?

btrfs-image for this filesystem is 1.7GiB .bz2, so I'm afraid it's not
reasonable to publish it.

-andy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs: 21 minutes to read 1.2M file directory

2010-12-20 Thread Andy Isaacson
Sigh, wrong btrfs address on the original.  Apologies for the
double-post.

On Mon, Dec 20, 2010 at 02:24:46PM -0800, Andy Isaacson wrote:
 I have a directory with 1.2M files in it, which makes readdir very slow
 on btrfs with cold caches (although it's reasonably fast with hot caches
 as in the first example below):
 
 % time find /btr/foo  /btr/foo.list
 find /btr/foo  /btr/foo.list  4.10s user 7.97s system 36% cpu 33.275 total
 % head /btr/foo.list
 /btr/foo
 /btr/foo/1281373625.777.fg.jpg
 /btr/foo/1281373625.777.bg.jpg
 /btr/foo/1281373625.948.fg.jpg
 /btr/foo/1281373625.948.bg.jpg
 /btr/foo/1281373626.096.fg.jpg
 /btr/foo/1281373626.096.bg.jpg
 /btr/foo/1281373626.218.fg.jpg
 /btr/foo/1281373626.218.bg.jpg
 /btr/foo/1281373626.350.fg.jpg
 % wc !$
 wc /btr/foo.list
  1216  1216 401499940 /btr/foo.list
 % wc -l /btr/foo.list
 1216 /btr/foo.list
 % sudo sysctl -w vm.drop_caches=3 vm.drop_caches=0
 vm.drop_caches = 3
 vm.drop_caches = 0
 % time find /btr/foo  /btr/foo.list.2
 find /btr/foo  /btr/foo.list.2  5.62s user 24.54s system 2% cpu 21:40.90 
 total
 % uname -a
 Linux pyron 2.6.36-rc7-00149-g29979aa #71 SMP Wed Oct 13 09:42:57 PDT 2010 
 x86_64 GNU/Linux
 
 Interestingly, while readdir is busy I'm only seeing IO on sdb even
 though the btrfs is on 3 targets:
 
 Label: btr  uuid: 1271de53-b3d2-4d68-9d48-b19487e1c982
 Total devices 3 FS bytes used 555.13GB
 devid1 size 18.65GB used 18.64GB path /dev/sda2
 devid3 size 512.00GB used 44.13GB path /dev/sdc1
 devid2 size 512.00GB used 511.76GB path /dev/sdb1
 
 iostat -k 1 | grep sdb tells me:
 
 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
 
 sdb 173.00   692.00 0.00692  0
 sdb 185.00   740.00 0.00740  0
 sdb 198.00   792.00 0.00792  0
 sdb 177.00   712.00 0.00712  0
 
 I updated to a recent git and it's still slow (my test hasn't completed
 yet 19 minutes in):
 
 Linux pyron 2.6.37-rc6-11882-g55ec86f #72 SMP Mon Dec 20 13:34:38 PST 2010 
 x86_64 GNU/Linux
 
 The devices are:
 
 [1.834527] ata1.00: ATA-7: INTEL SSDSA2M040G2GC, 2CV102HD, max UDMA/133
 [1.834816] ata1.00: 78165360 sectors, multi 1: LBA48 NCQ (depth 31/32)
 [1.835369] ata1.00: configured for UDMA/133
 [1.835776] scsi 0:0:0:0: Direct-Access ATA  INTEL SSDSA2M040 2CV1 
 PQ: 0 ANSI: 5
 ...
 [2.904919] ata3.00: ATA-8: ST31500341AS, CC1H, max UDMA/133
 [2.905206] ata3.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32)
 [2.947393] ata3.00: configured for UDMA/133
 [2.947850] scsi 2:0:0:0: Direct-Access ATA  ST31500341AS CC1H 
 PQ: 0 ANSI: 5
 ...
 [3.989664] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 [4.018524] ata5.00: ATA-8: ST31500341AS, CC1H, max UDMA/133
 [4.018811] ata5.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32)
 [4.060838] ata5.00: configured for UDMA/133
 [4.061205] scsi 4:0:0:0: Direct-Access ATA  ST31500341AS CC1H 
 PQ: 0 ANSI: 5
 
 The host is a Intel(R) Core(TM) i7 CPU 930 @2.80GHz with 12GB RAM.
 
 Thanks,
 -andy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html